Skip to content

microsoft/Triangle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Triangle: Empowering Incident Triage with Multi-Agents

Triangle is an end-to-end incident triage system using multiple LLM agents to route incidents to appropriate teams. It addresses challenges in cloud service incident management through semantic distillation and multi-role agent negotiation. Experiments show Triangle improves triage accuracy by over 20% while reducing response time. The system has been successfully deployed at a leading technology company serving millions of users.

🌟 Overview

This project implements an intelligent incident triage system using multiple Large Language Model (LLM) agents to analyze and route incidents to the most suitable teams. The system combines TF-IDF similarity matching with semantic analysis to achieve accurate incident assignment. By leveraging both rule-based and machine-learning components, Triangle ensures adaptability, scalability, and continuous improvement over time.

Key Goals

  1. Efficiency: Reduce incident response time and streamline communication across different teams.
  2. Accuracy: Increase correct team assignment by understanding the semantic context of incoming incidents.
  3. Scalability: Seamlessly integrate new capabilities and scale to handle an increasing number of incidents.
  4. Extensibility: Allow new routing policies, data sources, and integration points to be added with minimal overhead.

πŸš€ Features

  • Multi-Agent Architecture

  • Triage Decider: Makes final routing decisions.

  • Team Manager: Handles team information and negotiations.

  • Analyzer: Performs semantic analysis and TF-IDF matching.

  • Intelligent Matching

  • TF-IDF based similarity scoring.

  • Semantic analysis of incident descriptions.

  • Multi-hop routing capability for complex incident redirections.

  • Team function phrase matching for increased accuracy.

  • Performance Tracking

  • Real-time accuracy monitoring through dashboards.

  • Detailed logging of decisions for post-mortem analysis.

  • Result analysis and visualization for iterative improvements.

  • Confidence Estimation

  • Confidence scores for each triage decision.

  • Threshold-based auto-assignment or manual review process.

βš™οΈ Architecture

The system consists of three main components, each performing specialized tasks to ensure consistency and efficiency:

  1. Triage Decider (TriageDecider)

    • Gathers incident data from the Analyzer and Team Manager.
    • Matches incidents with relevant teams based on confidence scores.
    • Provides traceable reasoning for each routing decision.
  2. Team Manager (TeamManager)

    • Maintains team capability profiles and summary key phrases.
    • Negotiates and escalates incidents when multiple teams are possible matches.
    • Ensures that team availability and load constraints are respected.
  3. Analyzer (Analyser)

    • Performs TF-IDF analysis to derive initial similarity ranks.
    • Conducts semantic distillation of incident data.
    • Merges findings to produce final similarity and confidence metrics.

⏱ Performance Tracking and Metrics

Triangle continuously monitors performance indicators to evaluate its effectiveness:

Metric Description
Accuracy Percentage of correct team assignments
Response Time Average triage completion time
Escalation Rate Frequency of manual interventions or reassignments
Confidence Score Mean Average confidence for automated triage decisions

By measuring these metrics over time, Triangle helps identify improvements and ensure consistent, data-driven enhancements to incident triage workflows.

πŸ“‹ Requirements

  • Python 3.8+
  • Azure OpenAI API access
  • Required Python packages (see requirements.txt)

Recommended Environment

  • A stable internet connection for reliable LLM access.
  • Sufficient resource allocation (CPU/Memory) for larger incident volumes.

πŸ› οΈ Installation & Setup

  1. Clone the repository

    git clone <url>
    cd triangle
  2. Setting Up the Virtual Environment

    Follow these steps to create and activate a virtual environment, then install the required packages from requirements.txt.

Prerequisites

  • Python Installation: Ensure Python is installed on your system. You can download it from the official Python website.
  • Pip Verification: Verify pip is installed by running pip --version in your terminal or command prompt.

Steps

  1. Create a Virtual Environment

    Open your terminal (Linux/Mac) or command prompt (Windows) and navigate to your project directory:

    cd /path/to/your/project

    Create a virtual environment named venv:

    python -m venv venv
  2. Activate the Virtual Environment

    • Windows:
      .\venv\Scripts\activate
    • Linux/Mac:
      source venv/bin/activate

    After activation, your terminal prompt will change to indicate that the virtual environment is active.

  3. Install Required Packages

    With the virtual environment activated, install the dependencies listed in requirements.txt:

    pip install -r requirements.txt

    This command will read the requirements.txt file and install all necessary packages.

  4. Configure Azure OpenAI Credentials

    Create a config.json file with your Azure OpenAI credentials:

    {
        "ENDPOINT_URL": "your_azure_endpoint",
        "DEPLOYMENT_NAME": "your_deployment_name",
        "API_VERSION": "your_api_version",
        "API_KEY": "your_api_key"
    }

πŸ“Š Data Format

Team Data (person.json)

[
    {
        "name": "team_name",
        "summary_key_phrases": ["key_phrase1", "key_phrase2", ...]
    }
]

Incident Data (dataset.json)

[
    {
        "case": "incident_id",
        "message": "incident_description",
        "last_person": "assigned_team"
    }
]

Advanced Topics for Data Management

  • Data Versioning: Use Git LFS or specialized tools to manage large datasets and historical changes.
  • Privacy & Security: Ensure that only sanitized or anonymized data is shared where needed, and follow your organization's data handling policies.

🎯 Usage

Run the main triage system:

python triangle.py

You can customize parameters in triangle.py to adjust agent behaviors, logging levels, or threshold settings for confidence scores.

πŸ“ˆ Results

When the triage process completes, the system generates detailed results in the results directory, including:

  • Assignment decisions for each incident
  • Confidence scores
  • Performance metrics
  • Routing paths

Review these logs continuously to identify recurring issues and potential improvements in your triage logic.

πŸ“– Contributing

We welcome contributions from the community to make Triangle even better:

  1. Fork the Repo and create your branch from main.
  2. Implement Features or bug fixes in alignment with the project’s guidelines.
  3. Open a Pull Request, detailing your changes, improvements, and testing for easy review.

❓ FAQ

Question Answer
How do I add a new team? Add a new JSON object in person.json and include relevant key phrases that describe the team’s domain.
How do I retrain or refine the model? Update your training scripts using new incident data, then adjust the Analyzer module accordingly.
Is on-prem deployment supported? Yes, you can run Triangle self-hosted, but you need a stable internal environment for the LLM API.

πŸ”Ž Limitations & Future Work

  1. Language Coverage: While the system supports English data, non-English data may require additional adjustments in the Analyzer.
  2. Contextual Knowledge: Domain-specific knowledge bases can help enrich the semantic matching but are currently not fully integrated.
  3. LLM Dependence: Triage decisions depend on the accuracy, availability, and cost of LLM services.
  4. Future Enhancements: Plans include adding advanced multi-modal interfaces (voice, images) and incorporating user feedback loops for continuous learning.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages