Detect AI-generated (synthetic) code patterns in your repository and automatically open a GitHub Issue with the findings.
120 detection patterns across 14 categories · Severity-weighted scoring · Normalised per 1 000 LOC · Editable Markdown pattern file
-
Patterns are defined in a human-readable Markdown file (patterns/synthetic_patterns.md).
Each pattern is either a plain-text substring (case-insensitive) or a Python regex (prefixed withregex:).
Patterns carry a severity (CRITICAL = 10, HIGH = 5, MEDIUM = 2, LOW = 1 points).
Categories can optionally be scoped to file extensions (e.g.Applies to: .py). -
The scanner walks every source file in the target directory, tests each line against every applicable pattern, and computes:
- a raw score — the sum of severity points for all matches,
- the Synthetic Code Score — the raw score normalised per 1 000 lines of code.
This makes the score comparable across projects of different sizes.
-
A GitHub Issue is created (or updated) with:
- the Synthetic Code Score and severity breakdown,
- every matched snippet grouped by category,
- the file path and line number for each hit.
-
A JSON report is uploaded as a build artifact for programmatic consumption.
The headline metric is score per 1 000 lines of code:
This normalisation prevents large codebases from naturally accumulating higher scores than small ones.
A project with 100 000 LOC and a handful of incidental matches will score near zero,
while a small but fully AI-generated project will score significantly higher.
Reference ranges (from benchmark testing):
| Score range | Interpretation |
|---|---|
| 0 – 5 | Likely human-written |
| 5 – 15 | Low AI signal — review flagged lines |
| 15 – 30 | Moderate AI signal |
| 30+ | Strong AI signal |
Add this workflow to any repo at .github/workflows/synthscan.yml:
name: SynthScan
on:
workflow_dispatch:
inputs:
scan_path:
description: "Path to scan"
default: "."
score_threshold:
description: "Fail when Synthetic Code Score >= value (0 = never)"
default: "0"
permissions:
contents: read
issues: write
jobs:
synthscan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: marcoramilli/SynthScan@v1
with:
scan_path: ${{ github.event.inputs.scan_path || '.' }}
score_threshold: ${{ github.event.inputs.score_threshold || '0' }}
create_issue: "true"# scan the current directory
INPUT_SCAN_PATH=. python3 scanner/synthscan.py
# scan a specific folder, write report to a custom path
INPUT_SCAN_PATH=./src INPUT_REPORT_PATH=report.json python3 scanner/synthscan.pyExample output:
============================================================
Raw score : 277 (162 matches)
Lines scanned : 11187 (32 files)
Synthetic Code Score : 24.8 (per 1k LOC)
============================================================
Matches by category:
- Decorative Section Separators: 97 matches (194 pts)
- Excessive Try-Catch Wrapping: 57 matches (57 pts)
- Cross-Language Confusion: 3 matches (15 pts)
- Synthetic Comment Markers: 1 matches (5 pts)
- Self-Referential Comments: 2 matches (4 pts)
- Verbosity Indicators: 2 matches (2 pts)
| Name | Default | Description |
|---|---|---|
scan_path |
. |
Directory to scan (relative to repo root). |
patterns_file |
patterns/synthetic_patterns.md |
Markdown file with detection patterns. |
score_threshold |
0 |
Fail the step when the Synthetic Code Score (per 1k LOC) ≥ this value. 0 = never fail. |
create_issue |
true |
Open / update a GitHub Issue with the report. |
issue_label |
synthscan |
Label applied to the created issue. |
report_path |
synthscan-report.json |
Path for the JSON artefact. |
| Name | Description |
|---|---|
score |
Synthetic Code Score (normalised per 1k LOC). |
raw_score |
Un-normalised sum of severity points. |
match_count |
Number of pattern hits. |
lines_scanned |
Total lines of source code scanned. |
issue_body |
Full Markdown report. |
| Category | Default Severity | What it detects |
|---|---|---|
| Slop Phrases | MEDIUM | Filler clichés AI injects (Feel free to modify, Here's a simple example) |
| AI Slop Vocabulary | MEDIUM | Overused LLM words (delve, leverage, robust, seamless) |
| Synthetic Comment Markers | HIGH | Direct AI attribution (Generated by GPT, AI-generated) |
| Self-Referential Comments | MEDIUM | Comments narrating structure instead of intent |
| Redundant / Tautological Comments | LOW | Comments restating code verbatim (# Set x to 5) |
| Verbosity Indicators | LOW | Overly explanatory phrases (This line initializes) |
| Example Usage Blocks | LOW | # Example usage: blocks AI always appends |
| Fake / Example Data | MEDIUM | Placeholder data (John Doe, user@example.com) |
| Cross-Language Confusion | HIGH | Wrong-language idioms in Python (.push(), null, &&) |
| Hallucination Indicators | CRITICAL | Phantom imports, hallucinated API chains |
| Overly Generic Function Names | LOW | process_data(), do_something(), helper() |
| Excessive Try-Catch Wrapping | MEDIUM | Bare except Exception, generic error messages |
| Decorative Section Separators | MEDIUM | Unicode box-drawing headers, long ---- lines |
All detection patterns live in patterns/synthetic_patterns.md.
To add a new pattern:
- Open the file and find (or create) a
## Categorysection. - Optionally add
Applies to: .py, .jsto restrict the category to specific file extensions. - Inside the
```patternsblock, add one pattern per line.- Plain text → matched as a case-insensitive substring.
regex:prefix → compiled as a Python regular expression.- Prepend
[CRITICAL],[HIGH],[MEDIUM], or[LOW]to override the category default. - Lines starting with
#are comments and ignored.
- Commit and push. The next scan will pick up the changes automatically.
The scanner writes a JSON report to synthscan-report.json (configurable via report_path):
{
"synthetic_code_score": 24.8,
"raw_score": 277,
"match_count": 162,
"lines_scanned": 11187,
"files_scanned": 32,
"matches": [
{
"file": "app.py",
"line": 31,
"text": "# ── Background task tracker ──────────",
"pattern": "#.*[─━═╌╍┄┅]{5,}",
"category": "Decorative Section Separators",
"severity": "MEDIUM",
"score": 2.0
}
]
}SynthScan/
├── action.yml # GitHub Action definition (Marketplace entry)
├── LICENSE # MIT License
├── scanner/
│ └── synthscan.py # Core scanning engine
├── patterns/
│ └── synthetic_patterns.md # Detection patterns (editable)
└── README.md
MIT — see LICENSE.