An intelligent, self-healing platform designed for real-time system reliability and safe deployments. This project leverages AI to monitor, analyze, and automatically respond to system failures β ensuring resilience, stability, and minimal downtime.
Modern distributed systems are complex and prone to failures. This platform combines:
Intelligent traffic routing RAG-based log analysis Autonomous AI agents
to proactively detect issues and take corrective actions such as instant rollback or recovery, enabling self-healing systems.
π Real-Time Traffic Routing Dynamically routes traffic between stable and test backends Enables safe deployments (canary / blue-green strategies)
Uses Retrieval-Augmented Generation (RAG) to: Analyze logs in real time Identify anomalies and failure patterns Provide contextual insights
AI agents continuously monitor system health Detect failures without manual intervention Trigger automated recovery workflows
Automatically rolls back to stable versions on failure detection Minimizes downtime and user impact
Track: Request trends Error rates Backend health Traffic distribution
# Install dependencies in each folder
cd proxy && npm install
cd ../backend-stable && npm install
cd ../backend-test && npm installTo run the entire system (all backends, proxy, and dashboard) with a single command:
docker-compose up --buildThis will start all 4 services and map them to your host machine.
Warning
Configuration Note for Docker:
Since Docker runs services in their own network, 127.0.0.1 inside the Proxy container will point to itself, not the backend containers. For local testing with Docker, you will need to change proxy/config.json to point to http://backend-stable:5001 and http://backend-test:5002 instead of http://127.0.0.1:5001.
Start in 3 separate terminals:
Terminal 1 - Proxy
cd proxy
npm startRuns on port 4000
Terminal 2 - Stable Backend
cd backend-stable
npm startRuns on port 5001 (0% failure rate)
Terminal 3 - Test Backend
cd backend-test
npm startRuns on port 5002 (40% failure rate)
curl http://127.0.0.1:4000/api/statsfor i in {1..5}; do
curl http://127.0.0.1:4000/api
sleep 0.2
donecurl http://127.0.0.1:4000/api/logs- Edit
proxy/config.json: change"mode": "stable"to"mode": "test" - Make 50 requests:
for i in {1..50}; do
curl http://127.0.0.1:4000/api 2>/dev/null
sleep 0.1
done- Watch Terminal 1 for auto-rollback message
- Check config was auto-updated:
curl http://127.0.0.1:4000/api/config
curl http://127.0.0.1:4000/api/rollback-historyEdit proxy/config.json:
{
"mode": "stable", // stable, test, or canary
"stable_url": "http://127.0.0.1:5001",
"test_url": "http://127.0.0.1:5002",
"canary_percent": 10 // % of traffic to test in canary mode
}| Method | Endpoint | Description |
|---|---|---|
| GET | /api/stats | Current error rate and metrics |
| GET | /api/logs | Today's request logs |
| GET | /api/config | Current configuration |
| POST | /api/config | Change mode (send {"mode":"stable"}) |
| GET | /api/health | System health status |
| GET | /api/rollback-history | Past rollback events |
| POST | /api/rollback | Manual rollback trigger |
| POST | /api/reset-stats | Reset statistics |
Stable - All traffic to production backend (port 5001) Test - All traffic to test backend (port 5002, 40% failures) Canary - 90% to stable, 10% to test (configurable percentage)
System automatically switches to stable mode when error rate exceeds 20%.
Threshold can be changed in proxy/server.js:
const autoRollback = new AutoRollback(20); // Change 20 to desired threshold
## Features- Real-time error rate tracking (last 100 requests)
- Automatic failover when errors exceed threshold
- JSON-based request logging with daily rotation
- RESTful API for monitoring and control
- Professional logging with [INFO], [ERROR], [SUCCESS], [ALERT] tags
proxy/server.js- Main proxy serverproxy/enhanced-logger.js- Request logging systemproxy/error-tracker.js- Error rate trackingproxy/auto-rollback.js- Automatic failover logicproxy/config.json- Configuration filebackend-stable/server.js- Stable backendbackend-test/server.js- Test backend
We welcome contributions from developers of all skill levels! Whether you're fixing bugs, improving documentation, or adding features β your help is appreciated π
Fork the repository
Create a new branch:
git checkout -b feature/your-feature-name Implement your feature Ensure everything works as expected
Commit your changes:
git commit -m "Add: short description of feature" Push to your fork and open a Pull Request
Fork the repository Create a branch: git checkout -b fix/issue-name Fix the issue Test thoroughly Submit a Pull Request with a clear description
You can also contribute by improving the AI capabilities of the platform: Enhance RAG pipelines Improve log parsing & anomaly detection Optimize AI agent decision-making Add new recovery or rollback strategies
-Good documentation is just as important as code!
-Improve README clarity
-Add architecture explanations
-Fix typos or formatting
-Provide setup or deployment guides
-Follow the existing project structure
-Write clean, readable, and modular code
-Add comments where necessary
-Keep commits meaningful and concise
-Update documentation when required
Before submitting your PR, make sure:
-β The project runs without errors
-β Logs and monitoring features work correctly
-β AI-based detection behaves as expected
-β Rollback/recovery triggers properly
-β No breaking changes are introduced
This project includes dashboards and UI components that should work across modern environments.
β Recommended Browsers
-Google Chrome
-Mozilla Firefox
-Microsoft Edge
-Safari
Ensure your changes work across:
-Desktop π»
-Tablet π±
-Mobile π²
Helpful tools:
-Chrome DevTools Device Toolbar
-Firefox Responsive Mode
Some problems may arise due to:
-Cached assets
-Browser-specific rendering
-Unsupported APIs
-Extension conflicts
If something doesnβt work:
-Hard refresh (Ctrl + Shift + R)
-Clear cache
-Use Incognito mode
-Disable extensions
-Check console for errors
Make sure:
-β Code is tested
-β UI is responsive
-β Features work as intended
-β No console errors
-β Documentation is updated
If you have questions, ideas, or run into issues, feel free to reach out:
-π¬ Discussions: Use GitHub Discussions to ask questions or share ideas
-π Bug Reports: Open an Issue to report bugs or request features
-π§ Direct Contact: For any queries, simply create an issue β weβll respond as soon as possible
-πΌ LinkedIn: Kumari Lucky Raj
-π GitHub: kumariluckyraj
If this project helped you, please consider:
-β Starring this repository
-π΄ Forking it to contribute
-π’ Sharing it with others
-π Following for more amazing projects