Skip to content

ishitharaj/CVE-analyzer

Repository files navigation

CVE Analyzer

Check whether a CVE or GHSA actually affects your application code — not just whether a dependency appears in a lockfile.

  • Go backend with embedded React UI
  • OSV / NVD for advisory data
  • Static analysis for imports and vulnerable API usage
  • OpenAI-compatible LLM for final impact judgment

Quick start

Prerequisites

  • Go 1.22+
  • Node.js 18+ (build UI once)
  • A local copy of the repo to scan (download or extract yourself; the tool does not clone remotes)

Setup

cp .env.example .env
# Edit REPO_PATH, REPO_BRANCH, LLM_API_KEY, LLM_MODEL
make build
make run

Open http://localhost:8080, enter a CVE or GHSA ID (e.g. CVE-2025-64718), click Analyze.

The UI supports English and Русский (language switcher in the top-right).

Docker

Build and run with your repository mounted at /repo:

cp .env.example .env
# Set LLM_API_KEY, REPO_BRANCH, etc. REPO_PATH is overridden in compose to /repo

export REPO_HOST_PATH=/absolute/path/to/your/project
docker compose up --build

Or build the image only:

docker build -t cve-analyzer .
docker run --rm -p 8080:8080 --env-file .env \
  -e REPO_PATH=/repo \
  -v /absolute/path/to/your/project:/repo:ro \
  cve-analyzer

git is included in the image for optional branch checkout when .git exists in the mounted repo.

Example .env

REPO_PATH=/path/to/your/project
REPO_BRANCH=main
LLM_BASE_URL=https://api.openai.com/v1
LLM_API_KEY=sk-...
LLM_MODEL=gpt-4o-mini

Corporate gateways: set LLM_BASE_URL to your OpenAI-compatible endpoint.

Usage

  1. Point REPO_PATH at the project root (must contain go.mod, package.json, etc.).
  2. Enter CVE-… or GHSA-….
  3. Read the results table: affected?, locations, explanation, fix version.
  4. Progress logs (SSE) show each pipeline step.

CLI test (server running):

./scripts/test-analyze.sh CVE-2025-64718

Environment variables

Variable Required Description
REPO_PATH Yes Absolute path to local repository
REPO_BRANCH Yes Branch name (git switch if .git exists)
LLM_BASE_URL No Default https://api.openai.com/v1
LLM_API_KEY Yes API key / bearer token
LLM_MODEL No Default gpt-4o-mini
LLM_TIMEOUT_SEC No Default 120
EXCLUDE_DIRS No Comma-separated dirs to skip when scanning
SERVER_ADDR No Default :8080
MAX_SNIPPET_FILES No Max files sent as snippets to LLM (default 20)

API

Endpoint Method Description
/api/v1/config GET Repo path, branch, exclude dirs
/api/v1/analyze POST {"alias":"CVE-..."} — JSON result
/api/v1/analyze/stream POST Same body — SSE progress + result
/health GET Liveness

Development

# Backend only (after make build-frontend once)
go run ./cmd/server

# UI hot reload (proxies API to :8080)
make dev-frontend   # terminal 2
make test
make tidy

Architecture

The analyzer is a six-stage pipeline. The LLM is only used for advisory interpretation (twice: parsing + final verdict reasoning). All code scanning is deterministic — your repository content never travels to the LLM as source code, only as a small structured evidence report.

┌─────────────────────────────────────────────────────────────────────────┐
│ 1. FETCH ADVISORY                                                       │
│    OSV  →  fallback OSV aliases  →  NVD (for CVEs)  →  enrich GHSA text │
│    Output: domain.Vulnerability { Summary, Details, Affected[], … }     │
└─────────────────────────────────────────────────────────────────────────┘
                                  │
┌─────────────────────────────────────────────────────────────────────────┐
│ 2. BUILD ADVISORY EXCERPT (structural, library-agnostic)                │
│    • Keeps every H1–H6 section, every fenced code block, every line     │
│      with backticked tokens.                                            │
│    • Drops Credits / References / table separators / URL-only lines.    │
│    • Caps at ~12k runes for cheap, focused prompts.                     │
└─────────────────────────────────────────────────────────────────────────┘
                                  │
┌─────────────────────────────────────────────────────────────────────────┐
│ 3. EXTRACT SIGNALS  (LLM with JSON-Schema → heuristic backstop)         │
│    Schema-constrained output:                                           │
│       signals[]:   { name, kind, scan, confidence }                     │
│       ecosystems[]                                                      │
│       guidance: markdown checklist                                      │
│    Kinds: function | method | property | type | path | concept          │
│    scan=true ONLY for direct-call CVEs with concrete callable names.    │
│    Server-side gate strips: HTTP verbs, config keys, defensive checks,  │
│    runtime built-ins, file-ext / domain fragments, tokens <4 chars.     │
│    If LLM fails twice → regex heuristic produces the SAME shape.        │
└─────────────────────────────────────────────────────────────────────────┘
                                  │
┌─────────────────────────────────────────────────────────────────────────┐
│ 4. RESOLVE DEPENDENCIES                                                 │
│    Parses go.mod, go.sum, package.json, package-lock.json, yarn.lock,   │
│    pnpm-lock.yaml, Pipfile, requirements.txt, …                         │
│    Output: map[ecosystem:name] → installed version.                     │
└─────────────────────────────────────────────────────────────────────────┘
                                  │
┌─────────────────────────────────────────────────────────────────────────┐
│ 5. STATIC SCAN  (binding-aware, ecosystem-aware, no LLM)                │
│    For each affected package present in the repo:                       │
│      a. Walk source tree, skip node_modules / vendor / .venv / …        │
│      b. Parse imports per language:                                     │
│           Go      — AST import paths, alias names                       │
│           JS/TS   — default / named / namespace / require, with binding │
│           Python  — import … as, from … import …                        │
│      c. For each file that imports the package, build local bindings    │
│         (qualified namespace + direct named imports).                   │
│      d. For each scan symbol, match either                              │
│           <binding>.<symbol>(    OR    bare <symbol>(                   │
│         depending on how the file imported the package.                 │
│    Output: EvidencePackage { imports, api_usages, observed_apis }       │
└─────────────────────────────────────────────────────────────────────────┘
                                  │
┌─────────────────────────────────────────────────────────────────────────┐
│ 6. VERDICT (deterministic rule first, LLM organizes the result)         │
│    Decision rules (in priority order):                                  │
│      • package not in repo                       → Not affected (high)  │
│      • version not in advisory range             → Not affected (high)  │
│      • imported but no scan symbols extracted    → Manual review (low)  │
│      • imported AND scan symbols found in source → Likely affected      │
│      • imported AND scan symbols, none found     → Not affected (high)  │
│    Then a schema-constrained LLM call enriches the verdict with         │
│    natural-language summary and per-package recommendation.             │
└─────────────────────────────────────────────────────────────────────────┘

Component layout

Package Role
internal/domain Pure data types and ports (VulnFetcher, CodeScanner, VulnAnalyzer, Signal, EvidencePackage, …). No I/O.
internal/usecase Orchestration. analyze.go runs the six stages. advisory_signals.go drives LLM-with-heuristic-backstop. advisory_excerpt.go, heuristic_extract.go, ecosystem.go are pure logic.
internal/adapter/osv OSV vulnerability API client.
internal/adapter/nvd NVD fallback client for CVEs not in OSV.
internal/adapter/vuln Composite fetcher that chains OSV → NVD and enriches with GHSA.
internal/adapter/deps Manifest / lockfile parsing per ecosystem.
internal/adapter/scanner Binding-aware static scanner. One file per language for import/binding extraction; api_usage.go does the symbol matching.
internal/adapter/llm OpenAI-compatible client. JSON-Schema response_format with json_object fallback. Two prompts: advisory parse + result organizer.
internal/adapter/http HTTP handlers, SSE streaming, terminal logging of progress events.
internal/adapter/repo Optional git switch to the configured branch.
internal/web Embedded React UI.
frontend/ React + TypeScript source; built once into internal/web.

Library-agnostic by design

  • No package name is hardcoded anywhere in production code (axios, lodash, js-yaml exist only in test fixtures).
  • Per-ecosystem behaviour is dispatched by file extension and OSV ecosystem label.
  • Adding a new ecosystem means adding one file: import detection + binding resolver.

Failure modes that don't crash the pipeline

  • LLM down / 4xx / 5xx → 2 retries with timing logs, then regex heuristic produces the same AdvisoryParseResult shape. Verdict still runs on static evidence.
  • OSV miss for a CVE → automatic NVD fallback.
  • NVD miss too → analysis fails fast with a clear error.
  • Repo has no manifests → 0 evidence packages, single "advisory-only" row in the result table.
  • Indirect-trigger CVEs (prototype pollution, env injection, race conditions) → zero scan targets is the correct outcome; verdict says "Cannot prove safety statically — manual review required" instead of false-negative "Not affected."

Where logs go

Destination Content
Browser UI (Progress panel) Full SSE event stream — per-step labels, advisory parse timing, scan counts, verdict.
Terminal (server stdout) Same events, one line each, prefixed with [analyze] and the CVE alias. Useful when the JSON /api/v1/analyze endpoint is used (no SSE) or when running unattended.

Important rule: an import does not mean affected. The static scanner looks for specific APIs named in the advisory and only flags reachable usage. For indirect-trigger CVEs (e.g., prototype pollution), the result is "manual review" rather than a false "Not affected".

See docs/DEVELOPERS.md for further pipeline detail.

Limitations and accuracy

Results are guidance, not a guarantee. The tool combines OSV version ranges, LLM-parsed advisory signals (with regex fallback), static scans, and a second LLM pass for impact. Any step can be wrong or incomplete. Always verify critical findings (upgrade paths, pentest scope, compliance) with your own review.

Topic What to expect
Import ≠ affected A package in package.json and even imported in source does not automatically mean the CVE applies. The tool checks advisory APIs (e.g. js-yaml merge vs your use of safeLoadAll).
Version vs exploitability You may be on a vulnerable version while the UI says Not affected (no matching API calls in your code). You should still upgrade if OSV lists your version as affected.
Opposite case The UI may say Likely affected when only the version matches and axios/LLM over-counts usage; confirm before treating as a confirmed vuln in production.
Prototype pollution CVEs Many axios/js issues need Object.prototype polluted by another dependency. The tool cannot prove that chain exists in your runtime—only that axios is present and in range.
Browser vs Node Some axios CVEs target the Node HTTP adapter (NO_PROXY, toFormData, http.js). A frontend-only app may have lower real risk but the same vulnerable package version.
Advisory parsing “Vulnerable APIs” are guessed from CVE text. Words like request, headers, or test helpers (it, true) can appear in Locations as noise; prefer Explanation and version/fix columns.
LLM Needs a working API key and network. On failure, analysis falls back to static rules. Model output can disagree with static evidence or miss nuance.
Coverage No full type-checker or dataflow; reflection, dynamic imports, and generated code may be missed. node_modules / vendor are never scanned.
Unknown CVEs If OSV and NVD have no record, analysis fails.

For more detail and examples from manual testing, see docs/DEVELOPERS.md#limitations-and-accuracy.

Sample manual tests

ID Typical result Why
CVE-2026-42043 Affected axios in range + used in source
CVE-2025-64718 Not affected js-yaml imported; uses safeLoadAll, not merge
CVE-2021-23337 Not affected lodash patched version
CVE-2021-44228 Not affected Log4j not in repo

License

Internal / project use — adjust as needed.

About

Check if CVE/GHSA truly affects your code — not just lockfiles

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors