Zero-dependency Python tool that detects dangerous invisible Unicode characters in source code and dependencies. Language-agnostic source scanning covers 60+ file extensions and well-known extensionless files (Makefile, Dockerfile, etc.) across all major ecosystems. Provides coverage on 416 codepoints across 6 threat categories including Glassworm supply chain attacks (Variation Selectors), Trojan Source (bidi controls, CVE-2021-42574), zero-width steganography, tag character injection, and exotic whitespace.
pip install hecklerRequires Python 3.9+. No runtime dependencies.
# Scan a directory
heckler .
# CI mode — exit code 1 on findings
heckler --ci .
# Include node_modules / site-packages / vendor
heckler --ci --scan-deps .
# Vet a package before installing
heckler --vet express@4.18.0
heckler --vet requests==2.31.0
# JSON or SARIF output
heckler --format json .
heckler --format sarif .$ heckler suspect-project/
Found 6 dangerous character(s): 3 CRITICAL, 1 HIGH, 1 MEDIUM, 1 LOW
suspect-project/api.js
12:8 CRITICAL U+FE01 (VARIATION SELECTOR-2) [GLASSWORM]
12:14 CRITICAL U+FE02 (VARIATION SELECTOR-3) [GLASSWORM]
suspect-project/auth.js
4:5 CRITICAL U+202E (RIGHT-TO-LEFT OVERRIDE) [TROJAN-SOURCE]
4:32 HIGH U+202C (POP DIRECTIONAL FORMATTING) [TROJAN-SOURCE]
suspect-project/config.py
9:22 MEDIUM U+200B (ZERO WIDTH SPACE)
18:5 LOW U+00AD (SOFT HYPHEN)
Total: 6 finding(s) across 3 file(s).
| Category | Codepoints | Severity | Example |
|---|---|---|---|
| Variation Selectors (Glassworm) | U+FE00-FE0F, U+E0100-E01EF, U+180B-180D | CRITICAL/HIGH | Invisible payload encoding |
| Bidi controls (Trojan Source) | U+202A-202E, U+2066-2069, U+2028-2029, U+200E-200F, U+061C | CRITICAL/HIGH | Code displays differently than it executes |
| Tag characters | U+E0001, U+E0020-E007F | HIGH | Invisible ASCII mirror used in prompt injection |
| Zero-width characters | U+200B-200D, U+FEFF, U+2060 | MEDIUM | Steganographic encoding, string comparison bypass |
| Invisible identifiers | U+3164, U+FFA0, U+2800, U+115F-1160 | MEDIUM | Invisible variable/function names |
| Invisible format/whitespace | U+00AD, U+2000-200A, U+2061-2064, U+3000, ... | LOW-MEDIUM | String comparison bypass, obfuscation |
416 codepoints total. Severity levels: CRITICAL > HIGH > MEDIUM > LOW > INFO.
heckler [paths...] [options]
heckler --vet PACKAGE [--registry npm|pypi]
| Flag | Description |
|---|---|
--ci |
Exit code 1 if findings detected |
--format text|json|sarif |
Output format (default: text) |
--severity LEVEL |
Minimum severity to report (default: low) |
--scan-deps |
Include dependency directories |
--diff-only |
With --scan-deps, only scan packages changed in staged lockfile diffs |
--vet PACKAGE |
Download and scan a package before installing (fetches directly from public registries) |
--registry npm|pypi |
Package registry for --vet (auto-detected if omitted) |
--config PATH |
Path to .heckler.yml config file (error if not found) |
--no-color |
Disable colored output |
--quiet |
Only output findings, no summary |
--all-text |
Scan all text files regardless of extension |
Exit codes: 0 clean, 1 findings detected (with --ci), 2 error.
from heckler import scan, Scanner, Finding
# Simple: scan a path
findings = scan("src/")
# Advanced: configure a scanner
scanner = Scanner(scan_deps=True, severity_threshold=Severity.HIGH)
findings = scanner.scan_text(some_string, filename="input.js")
findings = scanner.scan_file(Path("app.js"))
findings = scanner.scan_path(Path("project/"))Create .heckler.yml in your project root:
severity: medium # Minimum severity to report
allow_bom: true # Treat U+FEFF at file start as INFO (suppressed)
allowlist: # Glob patterns for files to skip
- "**/*.po"
- "**/locale/**"
extra_skip_dirs: # Additional directories to skip
- third_party
extra_extensions: # Additional file extensions to scan
- .customAlso reads [tool.heckler] from pyproject.toml.
Suppress the next line with a dedicated directive (preferred):
// heckler-ignore-next-line
const emoji = "\uFE0F";
// heckler-ignore-next-line U+FE0F U+FE0E
const selectors = "\uFE0F\uFE0E"; // only listed codepoints suppressedOr suppress inline (legacy, still supported):
const emoji = "\uFE0F"; // heckler-ignoreemoji = "\uFE0F" # heckler-ignoreSupported comment tokens: //, #, /*, --, ;. Placing heckler-ignore inside a string literal or variable name does not suppress detection. Suppression directives are never honored in dependency code (node_modules, vendor, site-packages, target) to prevent malicious packages from hiding attacks.
Source scanning is language-agnostic — the regex-based detector works on any text file. Files encoded as UTF-16 or UTF-32 (with BOM) are automatically detected and decoded correctly. Out of the box, heckler scans 60+ file extensions:
| Category | Extensions |
|---|---|
| Web / JS / TS | .js, .cjs, .mjs, .ts, .jsx, .tsx, .vue, .svelte |
| Python | .py, .pyi |
| Systems | .c, .cpp, .h, .hpp, .rs, .go, .zig, .nim, .d |
| JVM | .java, .kt, .scala, .groovy, .clj, .cljs, .cljc |
| .NET | .cs, .vb, .vbs |
| Functional | .hs, .lhs, .ml, .mli, .elm, .ex, .exs, .erl, .hrl, .purs, .rkt, .lisp, .cl, .el, .jl |
| Mobile | .swift, .dart, .m, .mm |
| Scripting | .rb, .php, .lua, .pl, .r, .tcl, .cr |
| Shell | .sh, .bash, .zsh, .ps1, .bat, .cmd, .fish |
| Config / Data | .yaml, .yml, .json, .toml, .xml, .sql, .graphql, .gql, .proto, .tf, .hcl |
| Templates | .html, .css, .scss, .ejs, .hbs, .njk, .pug, .jinja |
| Build | .gradle, .rake, .cmake, .mk |
| Docs | .md, .txt |
Well-known extensionless files are also scanned: Makefile, Dockerfile, Gemfile, Rakefile, Vagrantfile, Procfile, Justfile, BUILD, Podfile, .gitignore, .dockerignore, and more.
Use --all-text to scan every text file regardless of extension.
| Capability | Supported Ecosystems |
|---|---|
--vet (pre-install scan) |
npm, PyPI |
--diff-only (lockfile parsing) |
npm, yarn, pnpm, pip, poetry |
--scan-deps (installed deps) |
node_modules, vendor, site-packages, target (Cargo) |
Lockfiles for Cargo, Go, Ruby, and Composer are detected but parsers are not yet implemented — a warning is emitted when using --diff-only with these.
Private registries:
--vetfetches packages directly from the public npm and PyPI registries (registry.npmjs.org,pypi.org) using only Python's stdlib — it does not shell out tonpmorpipand does not execute any package code during download. This means private or corporate registries are not supported by--vet. If you need to scan packages from a private registry, download them manually and useheckler <path>to scan the extracted source.
Available on the GitHub Marketplace. Use as a composite action:
- uses: kholcomb/heckler@v1
with:
scan-deps: true
format: sarif
upload-sarif: true # Findings appear in GitHub Security tabOr invoke directly:
- run: pip install heckler
- run: heckler --ci --format sarif . > results.sarifAdd to .pre-commit-config.yaml:
repos:
- repo: https://github.com/kholcomb/heckler
rev: v1.0.0
hooks:
- id: heckler # Scan source files
- id: heckler-lockfile # Scan changed dependencies on lockfile changeA separate workflow (dependency-scan.yml) triggers on lockfile changes and weekly, auto-detects your package manager, installs dependencies, scans them, and reports findings in the GitHub Actions job summary. Results are cached by lockfile hash.
| Layer | When | Speed | What it catches |
|---|---|---|---|
--vet |
Before npm add / pip install |
~5s | Malicious packages before they enter your project |
| Pre-commit (source) | git commit |
<2s | Invisible chars in your own code |
| Pre-commit (lockfile) | Lockfile change + commit | <2s | Changed deps via diff-based scanning |
| CI source scan | PR / push | <5s | Source scan, enforceable |
| CI dep scan | Lockfile change + weekly | 30-60s (cached: 5s) | Full dependency tree post-install |
For environments without Python, a grep-based fallback is included:
bash scripts/heckler-scan.sh [directory]Requires GNU grep with PCRE support (grep -P). macOS users: brew install grep.
pip install -e ".[dev]"
pytestThe test suite includes:
- Character detection — verifies the regex matches every dangerous codepoint and rejects safe ones
- Scanner — writes real files with benign planted invisible Unicode to temp directories and scans them
- CLI — calls
main()with real argv, validates JSON/SARIF output structure - Config — writes real
.heckler.ymlfiles and loads them through the config pipeline - Archive safety — builds tar/zip archives with path traversal and symlink style payloads, verifies they're safely rejected
- Vet end-to-end — builds fake
.tgzand.whlpackages with planted Glassworm signatures, extracts and scans them - Git integration — stages a real lockfile in the project repo, parses the diff, resolves package directories, and scans planted findings through the full
--diff-onlychain (non-destructive, cleanup infinallyblocks) - Hardening — tests for bypass resistance including null-byte injection, heckler-ignore abuse, U+2028/2029 detection, UTF-16/32 encoding evasion, missing config errors, multi-language extension coverage, and extensionless file scanning
MIT
