Skip to content

pdudotdev/aiNOC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

135 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

✨ aiNOC

Latest Release Last Commit

Cisco IOS-XE Arista EOS MikroTik RouterOS

Transports SSH eAPI REST

πŸ“– Table of Contents

πŸ”­ Overview

AI-based network troubleshooting framework for multi-vendor, multi-protocol, multi-area/multi-AS, L2/L3 enterprise networks.

▫️ Key characteristics:

  • Multi-vendor support
  • Multi-protocol, L2/L3
  • Multi-area/multi-AS
  • SSH/eAPI/REST API
  • 15 MCP tools, 6 skills
  • 32 operational guardrails
  • Jira integration

▫️ Operating modes of aiNOC:

▫️ Important project files:

▫️ Agent guardrails list:

▫️ Supported models:

  • Haiku 4.5 (best for costs)
  • Sonnet 4.6 (best balance)
  • Opus 4.6 (default, best reasoning)

⚠️ NOTE: Due to the intermittent nature of troubleshooting, it's worth using an advanced model by default. Costs won't become unsustainable even if addressing and fixing several issues per day.

▫️ Set your default model:
Create settings.json under .claude/:

{
  "model":"opus",
  "effortLevel":"medium"
}

▫️ High-level architecture:

arch

πŸ€ Here's a Quick Demo

  • See a DEMO HERE of v3.0.
    • Next video demo coming soon with v5.0

♻️ Repository Lifecycle

New features are being added periodically (vendors, protocols, integrations, etc.).

Stay up-to-date:

  • Watch and Star this repository

Current version:

  • aiNOC v4.5

⭐ What's New in v4.5

βš’οΈ Current Tech Stack

Tool
Claude Code βœ“
MCP (FastMCP) βœ“
ContainerLab βœ“
Python βœ“
Scrapli βœ“
Genie βœ“
REST API βœ“
EOS eAPI βœ“
Jira API βœ“
Vector βœ“
Ubuntu βœ“
VS Code βœ“
VirtualBox/VMware βœ“

πŸ“‹ Supported Vendors

Vendor Platform
Arista EOS (cEOS)
Cisco IOS/IOS-XE (IOL)
MikroTik RouterOS

πŸš› Supported Transports

Vendor Transport
Cisco IOS Scrapli SSH
Arista EOS Arista eAPI
MikroTik RouterOS REST API

πŸŽ“ Troubleshooting Scope

Category Capabilities
OSPF Reference bandwidth Β· Point-to-point links Β· Passive interfaces Β· MD5 authentication Β· External type 1 routes Β· Default route injection Β· ABR route summarization Β· EIGRP ↔ OSPF redistribution Β· Prefix list filtering Β· Distribute list filtering Β· Area types: normal, stubby, totally NSSA
EIGRP Passive interfaces Β· MD5 authentication Β· Stub summary Β· OSPF ↔ EIGRP redistribution Β· Default metric via route maps
BGP eBGP dual-ISP Β· Default-originate Β· Prefix lists and route maps Β· Route reflectors and clients
Others Policy-Based Routing Β· IP SLA Β· MikroTik Netwatch Β· Arista Connectivity Monitor Β· NAT/PAT on ASBRs Β· Management APIs Β· Static routing Β· Syslog Β· NTP

πŸ› οΈ Installation & Usage

▫️ Step 1:

git clone https://github.com/pdudotdev/aiNOC/
cd aiNOC
python3 -m venv mcp
source mcp/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

▫️ Step 2: The included CLAUDE.md and skills/* are templates. Customize them with your own troubleshooting methodology, tool descriptions, and operational guidelines.

▫️ Step 3:

  • Configure IP SLA, Connectivity Monitor, Netwatch etc. paths in your network
  • Make sure they are being tracked and logged remotely to Vector (Syslog)
  • Configure the transforms inside /etc/vector/vector.yaml - example
  • aiNOC monitors Vector's /var/log/network.json file for specific logs and parses them

▫️ Step 4: Run the aiNOC watcher β€” two modes:

⌨️ Interactive (dev/testing): runs in your current terminal, agent sessions open inline.

python3 oncall/watcher.py

♻️ Service (production): install once, runs permanently, survives reboots. Each agent session spawns in a tmux window β€” attach with tmux attach -t <session_name>.

sudo apt install tmux
sudo cp oncall/oncall-watcher.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now oncall-watcher.service

Manage with: systemctl start|stop|restart|status oncall-watcher

▫️ Step 5: Check if Watcher and Vector are running:

sudo systemctl status vector
python3 oncall/watcher.py
ainoc.watcher β€” Watcher started. Monitoring /var/log/network.json for IP SLA Down events.

or (if installed as a systemd service):

sudo systemctl status vector
sudo systemctl status oncall-watcher.service

πŸ”„ Test Network Topology

▫️ Network diagram:

topology

▫️ Naming conventions:

  • RXY where:
    • R: device type (router)
    • X: device number id
    • Y: vendor (A-Arista, C-Cisco, M-MikroTik, etc.)

▫️ Router configurations:

  • Please find my test lab's config files under the lab_configs directory
  • They are the network's fallback configs for containerlab redeploy -t lab.yml
  • Default credentials: see .env file at .env.example

πŸ“ž aiNOC Operating Modes

aiNOC runs as an On-Call watcher that monitors Vector's /var/log/network.json for SLA path failures and automatically invokes a Claude agent to diagnose the issue and propose a fix.

How It Works

  1. Network devices track connectivity paths (Cisco IP SLA, Arista Connectivity Monitor, MikroTik Netwatch etc.)
  2. Failures are logged to Syslog β†’ Vector parses and writes to /var/log/network.json
  3. oncall/watcher.py detects the failure, opens a Jira ticket, and invokes a Claude agent session
  4. Agent follows structured troubleshooting (CLAUDE.md + /skills + MCP tools) β†’ identifies root cause β†’ proposes fix
  5. Only upon operator approval, the agent applies and verifies the fix
  6. Results are logged to Jira and the watcher resumes monitoring

Deployment Options

Mode Command Agent Sessions Best For
Interactive python3 oncall/watcher.py Inline (current terminal) Development, testing
Service systemctl start oncall-watcher tmux (detached, attach anytime) Production

▫️ See Installation & Usage for setup instructions.

Storm Prevention

Only one agent session runs at a time. Concurrent SLA failures during an active session are deferred and presented for review after the current case closes. A drain mechanism ensures no duplicate event processing. A process-level lock file (oncall.lock) with stale-PID detection prevents duplicate watcher instances.

⬆️ Planned Upgrades

Expected in version v5.0:

  • Fresh, enterprise-focused topology
  • New vendors (Juniper, Aruba, SONiC)
  • New protocols and services
  • NetBox integration

🌱 AI Automation 101

If you're completely new to Network Automation using AI & MCP, then you may want to start here before moving on.

πŸ“„ Disclaimer

You are responsible for defining your own troubleshooting methodologies and context files, as well as building your own test environment and meeting the necessary conditions (e.g., RAM/vCPU, router OS images, Claude subscription/API key, etc.).

πŸ“œ License

Licensed under the GNU GENERAL PUBLIC LICENSE Version 3.

πŸ“§ Collaborations

Interested in customizing and adapting aiNOC to your own network, or looking to collaborate long-term?

About

aiNOC: Network troubleshooting framework for multi-vendor, multi-protocol, multi-area/multi-AS, OSI L2-L4 enterprise networks based on Claude Code, FastMCP, Python, Scrapli, REST, Containerlab, Jira, etc.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors