SAFE is a system for contextual security auditing of research artifacts. It takes outputs from static analysis tools (e.g., Semgrep, Trivy) and uses a repository-aware LLM agent to determine whether a finding is truly exploitable in real-world usage.
The system uses a ReAct-style agent powered by LangGraph, equipped with filesystem tools to inspect code, trace dependencies, and reason about execution context.
SAFE/
│── main.py # Entry point
│── config.yaml # Configuration file
│── requirements.txt # Dependencies
│── SAFE_TOOL_OUTPUT.csv # Tool (Semgrep and Trivy) output for SAFE (its own codebase)
│
├── core/ # Core reasoning & pipeline logic
│ ├── auditor.py # Main auditing pipeline
│ ├── validator.py # Validation logic
│ ├── schemas.py # Data schemas
│ └── logger.py # Logging utilities
│
├── tools/ # Static & structural analysis tools
│ ├── repo_parser.py # Repository structure parsing
│ ├── ast_parser.py # Code-level AST analysis
│ ├── dependency_analyzer.py # Analyzes dependencies between modules and files
│ ├── file_reader.py # Handles file loading and content extraction
│ ├── code_search.py. # Enables keyword and pattern-based code search
│ └── artifact_resolver.py. # Resolves artifacts and links related components
│
├── llm/ # LLM interaction layer
│ ├── provider.py # LLM abstraction
│ └── cost_tracker.py # Token & cost tracking
│
├── outputs/ # Generated outputs from SAFE runs
│ ├── final_results.csv # Final classified findings/results
│ ├── costs/ # Cost tracking outputs
│ │ └── *.json # Token usage and cost logs
│ └── logs/ # Execution logs (generated per run)
│ └── *.log # Detailed execution traces
├── ARTIFACT
│ └── SAFE/ # Self-contained sample artifact used for testing SAFE on its own codebase-
Setup Repository
git clone https://github.com/nanda-rani/SAFE.git cd SAFE -
Install Virtual Environment
python -m venv .venv source .venv/bin/activate pip install -r requirements.txt -
Configure API Keys You must export the relevant API key for the model you intend to use.
export OPENAI_API_KEY="sk-..."
-
Data Placement
- Modify
config.yamlthe CSV file containing findings/flags (SAFE_TOOL_OUTPUT.csvfor demo) obtained from static analysis tool.SAFE_TOOL_OUTPUT.csvis the colelction of flags obtained using static analysis tools (Semgrep and Trivy). - Place your analyzed repositories in the
ARTIFACT/folder. The structure should mapartifact_idto directories (e.g.ARTIFACT/<artifact_id>/).ARTIFACTFolder contains SAFE codebase copied in side the folder.
- Modify
-
Configuration Settings Edit
config.yamlto select model string, paths, and maximum retry counts for schema validation. -
Run the auditor after completing setup
python main.py
-
Check logs and outputs
cat outputs/logs/system.log cat outputs/logs/<finding_uid>.json cat outputs/logs/repo_<artifact_id>.log
This repository already includes a ready-to-run demo setup so you can test SAFE without preparing any external data.
SAFE_TOOL_OUTPUT.csv→ Findings generated from SAFE’s own codebase (via Semgrep + Trivy)ARTIFACT/SAFE/→ A copy of the SAFE repository used as the target artifact
artifact_id (CSV) = SAFE
→ ARTIFACT/SAFE/
python main.pyARTIFACT/
└── my_project/semgrep scan --config auto --json > semgrep.jsontrivy fs --format json --output trivy.json .Example Entry:
artifact_id;tool;finding_id;category;severity_raw;file;line;message;package;version;cwe;cvss;scanner_applicable my_project;semgrep;python.lang.security.eval;code;HIGH;app.py;10;Use of eval;;;"CWE-95";;true
Edit config.yaml:
input_csv: my_project_findings.csv
artifact_root: ARTIFACT/
python main.py- outputs/final_results.csv
- outputs/logs/
- outputs/costs/
artifact_id → ARTIFACT/<artifact_id>/
- Initialization: The script reads findings from
SAFE_TOOL_OUTPUT.csv. - Artifact Resolution: For each row, the script resolves the
artifact_idagainst theARTIFACT/root to pinpoint the local Git repository matching the finding. - Agent Delegation: The specific finding metadata (file, message, priority, line code) is passed to the LangGraph ReAct agent.
- Tool Use: The Agent dynamically executes local python functions exposed to it (
get_repo_tree,read_snippet,search_package_usage, etc.) inspecting the filesystem for evidence. - JSON Emittance: The Agent concludes its analysis loop by emitting a final strict JSON evaluation mapping the artifact into standard Taxonomy boundaries (e.g.
CONTEXTUAL_RISKvsFALSE_POSITIVE). - Validation: Pydantic logic strictly enforces that the final payload complies. If not, the LangGraph loop repeats requesting the LLM to fix formatting.
All outputs from the framework are routed into the outputs/ directory dynamically creating logs and saving costs at both the global and finding level dynamically using concurrent file locks.
outputs/logs/system.log: Standard operational log encompassing tool run initialization, finding traversal, schema retries, etc.outputs/logs/error.log: Isolates stack-traces and validation aborts.outputs/logs/<finding_uid>.log: Highly detailed debug log tracing the exact LangGraph inputs, raw node chains, precise tool parameters, and tool return data generated specifically for a singular finding.outputs/logs/<finding_uid>.json: The final valid structured Pydantic payload returned upon successfully analyzing a finding.outputs/costs/global_costs.json: An appending aggregator showing combined total dollars dynamically calculated per the requested LLM.outputs/costs/<finding_uid>_cost.json: The isolated calculated USD dollar cost strictly applied measuring the prompt/completion token usage to investigate the singular finding.
- Missing Artifact: If the log states "Skipping due to missing artifact path", ensure the ID column in your CSV correlates locally to a folder exactly inside
ARTIFACT/. - API Timeout/Limits: Switch model down (e.g., to
gpt-4o-mini) viaconfig.yamlif you are hitting aggressive tier limits on concurrent tools querying models. - Missing API keys: Running the system fails out immediately. Set your standard keys via
exportbefore invoking.