Detect when two functions implement identical logic β across any programming language.
A Java function and a Python function that do the same thing produce the same hash. SIR Engine finds those matches, shows you exactly where the duplicates are, and lets you merge them out of your codebase in one click.
π Landing page: sir-engine.com
π¦ Web app: app.sir-engine.com
π VS Code extension: Download .vsix
Most duplicate detection tools compare tokens β they find copy-paste duplicates but miss functions that were rewritten, renamed, or translated between languages.
SIR Engine compares logical structure:
- Translate β any language gets translated to Python first using an LLM. One parser handles 25+ languages.
- Canonicalize β variable names, function names, and formatting are stripped. Only pure logical structure remains.
- Hash β the canonical structure is hashed with SHA-256. Same hash means same logic, guaranteed.
- Match β every hash is compared against every other. Matching pairs are structural duplicates regardless of language.
- Merge β remove duplicates in one click. Auto merge or choose manually. Download cleaned files instantly.
This is based on alpha equivalence β a concept from mathematics β applied to source code.
| Feature | Description |
|---|---|
| π Web App | Upload files, scan instantly in the browser. No install required. |
| β‘ CLI Tool | sir scan ./src from any terminal. CI/CD integration with --strict flag. |
| π VS Code Extension | Scans your workspace. Merge duplicates with diff preview. |
| π€ AI Translation | Cross-language detection via Ollama (local/free) or Claude API. |
| π¦ Pack & Diff | Export semantic fingerprints as JSON. Compare codebases without sharing source. |
| π Merge | Auto merge all duplicates or choose manually per cluster. |
| π Private by Default | Files processed in memory, never stored. Fully local with Ollama. |
Native (no AI needed): Python, JavaScript, TypeScript
AI-powered (via Ollama or Claude API): Java, Rust, Go, C, C++, C#, Swift, Kotlin, Scala, Ruby, PHP, Haskell, Elixir, Dart, Julia, R, Nim, Zig, and more.
Go to app.sir-engine.com β no install needed.
# Clone the repo
git clone https://github.com/lflin00/SIR-ENGINE.git
cd SIR-ENGINE
# Add alias
echo 'alias sir="python3 ~/path/to/SIR-ENGINE/sir_cli.py"' >> ~/.zshrc
source ~/.zshrc
# Scan a folder
sir scan ./my_project
# Scan with AI (requires Ollama running locally)
sir ai-scan ./my_project --backend ollama --model codellama:7b
# Check health score only
sir health ./my_project
# Compare two versions of a codebase
sir diff ./v1 ./v2
# CI/CD β exit code 1 if duplicates found
sir scan ./src --strict- Download sir-engine-0.0.2.vsix
- Open VS Code β Extensions β
...menu β Install from VSIX - Select the downloaded file
- Open any Python or JavaScript project and run SIR: Scan Workspace from the command palette
Option 1 β Ollama (free, local):
# Install Ollama from https://ollama.ai
ollama pull codellama:7b
# Then select "Ollama" as backend in the web app sidebarOption 2 β Claude API: Get an API key from console.anthropic.com and enter it in the web app sidebar.
sir scan <path> [--min N] [--output file.json] [--strict] [--no-recurse]
sir ai-scan <path> [--backend ollama|anthropic] [--model MODEL]
sir health <path>
sir diff <path1> <path2>
| Flag | Description |
|---|---|
--min N |
Minimum cluster size to report (default: 2) |
--output FILE |
Save full report as JSON |
--strict |
Exit with code 1 if any duplicates found (for CI/CD) |
--no-recurse |
Don't scan subdirectories |
Add SIR Engine as an automatic pull request check in any GitHub repository.
Copy the workflow file into your repo and push:
mkdir -p .github/workflows
curl -sSL https://raw.githubusercontent.com/lflin00/SIR-ENGINE/main/.github/workflows/sir-scan.yml \
-o .github/workflows/sir-scan.yml
git add .github/workflows/sir-scan.yml
git commit -m "Add SIR Engine semantic duplicate check"
git pushThe check runs automatically on every pull request. No secrets required for native Python / JS / TS scanning.
- Detects changed files β diffs HEAD against the PR base, filters to
.py,.js,.ts,.jsx,.tsx - Scans the full configured path β runs
sir scanacross the repo (or the directory you specify), so cross-file duplicates are caught even when only one side of the duplicate changed - Posts a PR comment β lists every duplicate cluster that touches a changed file, with function name, file path, and line number; the comment is updated in place on every new push so it never spams the thread
- Optionally fails the check β set
SIR_STRICT: "true"to block merges until duplicates are resolved
Edit the env block near the top of the workflow file:
| Variable | Default | Description |
|---|---|---|
SIR_STRICT |
"false" |
"true" β fail the check if duplicates are found in changed files |
SIR_MIN_CLUSTER_SIZE |
"2" |
Minimum copies to report as a duplicate cluster |
SIR_SCAN_PATH |
"." |
Root directory to scan (relative to repo root) |
SIR_AI_BACKEND |
"" |
"anthropic" to also scan Java, Go, Rust, C, C#, Swift, Kotlin, and 20+ other languages |
# .github/workflows/sir-scan.yml
env:
SIR_STRICT: "true"The PR check turns red and blocks merging until all duplicate clusters in the changed files are resolved.
env:
SIR_AI_BACKEND: "anthropic"Add your Anthropic API key as a repository secret named ANTHROPIC_API_KEY (Settings β Secrets and variables β Actions β New repository secret). The action will translate Java, Go, Rust, C, C#, and other languages to Python before hashing, enabling cross-language duplicate detection across the whole PR.
Instead of copying the file, call the workflow directly from the SIR Engine repository:
# .github/workflows/pr-checks.yml (in your repo)
name: PR Checks
on:
pull_request:
jobs:
sir-scan:
uses: lflin00/SIR-ENGINE/.github/workflows/sir-scan.yml@main
with:
strict: true
min_cluster_size: 2
scan_path: "src"
ai_backend: ""
base_sha: ${{ github.event.pull_request.base.sha }}
head_sha: ${{ github.sha }}
pr_number: ${{ github.event.pull_request.number }}
secrets:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}SIR Engine can export semantic fingerprints of entire codebases as portable .sir.json files. This lets you:
- Compare two codebases without sharing source code
- Store a semantic snapshot of your codebase at a point in time
- Merge fingerprints from multiple codebases into a unified index
Use the Pack tab in the web app to create and manage bundles.
Source code (any language)
β
βΌ
AI Translation β Ollama / Claude API (for non-Python/JS)
β
βΌ
Python AST parse
β
βΌ
AlphaRenamer β strips variable names, function names
β
βΌ
SHA-256(ast.dump()) β deterministic structural hash
β
βΌ
Hash comparison β same hash = same logic
Proprietary Source-Available License β the source code is visible but not free to use commercially.
| Use case | Allowed |
|---|---|
| Personal projects | Free |
| Education / academic research | Free |
| Non-commercial open source projects | Free |
| Company internal use | Requires commercial license |
| Scanning production code | Requires commercial license |
| SaaS / hosted offering | Requires commercial license |
| Client / for-profit work | Requires commercial license |
For commercial licensing: sir-engine.com
Open an issue on the Issues tab β bug reports, feedback, and feature requests welcome.
Built by Lucas Flinders β biomedical engineering student at Ohio State.