15 Threat Scanning

Threat Scanning

nitpik detects potentially harmful code patterns in your diffs — obfuscated payloads, dangerous API calls, supply chain hooks, backdoors, and data exfiltration — before the LLM review even starts. When an LLM provider is configured, flagged patterns are triaged by the model to reduce false positives.

Enabling Threat Scanning

nitpik review --diff-base main --scan-threats

Or enable it permanently in .nitpik.toml:

[threats]
enabled = true

Tip: Combine --scan-threats with --scan-secrets in CI pipelines for both secret redaction and threat detection in a single pass.

What Gets Detected

nitpik ships with 44 built-in rules plus structural heuristics across five threat categories:

Category	Examples
Obfuscation	Base64-encoded blobs, hex-encoded strings, `eval` of decoded content, invisible Unicode characters, Hangul filler payloads (Glassworm), mixed-script homoglyph identifiers
Dangerous APIs	`eval()`, `exec()`, `subprocess` with `shell=True`, `child_process`, `Function()` constructor, `dlopen`/`ctypes`
Supply chain	`postinstall` script hooks, `pip install` from URLs, `curl \| bash` patterns, typosquatting signals
Exfiltration	HTTP POSTs to external URLs combined with environment or file reads, credential harvesting patterns
Backdoor	Reverse shells, hardcoded C2 addresses, socket-based command execution, cron/systemd persistence

Unicode and homoglyph attacks

Beyond regex-based rules, nitpik includes dedicated structural checks:

Invisible Unicode characters — zero-width spaces, joiners, fillers, and soft hyphens hidden in source code.
Bidi override characters — Trojan Source attacks (CVE-2021-42574) using Unicode right-to-left overrides to visually reorder code.
Mixed-script homoglyph identifiers — identifiers that mix Latin characters with visually identical Cyrillic or Greek lookalikes (e.g., Cyrillic р in рaypal). This detects supply chain substitution and credential interception attacks that pass visual review.

How It Works

Threat scanning runs in two phases:

Pattern matching — every added line is checked against the built-in rules (regex + entropy + keyword prefilters). Full file contents are also scanned with proximity weighting — matches near changed lines get full severity, distant matches are downgraded to info.
LLM triage — when an LLM provider is configured, all pattern matches are sent to a single triage call. The model classifies each as confirmed, dismissed, or downgraded. This eliminates false positives from legitimate uses of flagged patterns (e.g., a test file that intentionally contains eval).

Note: If the LLM call fails or no provider is configured, all pattern matches pass through as-is — the scanner is fail-open.

Custom Rules

Add your own threat rules:

nitpik review --diff-base main --scan-threats --threat-rules ./custom-threats.toml

Custom rules are loaded in addition to the built-in set. The format:

[[rules]]
id = "internal-backdoor-pattern"
description = "Detects our known-bad internal pattern"
category = "backdoor"
severity = "error"
scope = "line"
regex = '''connect\(\s*["']internal-c2\.example\.com'''
keywords = ["connect"]
languages = ["py", "js"]

# Optional fields:
# entropy_threshold = 4.0
# min_match_length = 20
# allowlist_paths = ["**/test_*"]
# allowlist_regexes = ["// nosec"]

Rule fields

Field	Required	Description
`id`	yes	Unique rule identifier.
`description`	yes	Human-readable description shown in findings.
`category`	yes	One of: `obfuscation`, `dangerous-api`, `supply-chain`, `exfiltration`, `backdoor`.
`severity`	yes	One of: `error`, `warning`, `info`.
`scope`	yes	`line` (match individual added lines) or `file` (match full file contents with proximity weighting).
`regex`	yes	Pattern to match. Uses Rust `regex` crate syntax with full Unicode support.
`keywords`	no	Keyword prefilter — at least one must appear (case-insensitive) before the regex runs. Improves performance on large diffs. Defaults to no prefilter.
`languages`	no	File extension filter (e.g., `["py", "js"]`). Empty means all languages.
`entropy_threshold`	no	Minimum Shannon entropy for the matched text. Useful for filtering low-entropy false positives on base64/hex rules. Defaults to `0.0` (disabled).
`min_match_length`	no	Minimum character length for the matched text. Defaults to `0` (disabled).
`allowlist_paths`	no	Glob patterns for files to skip (e.g., `["*/test_", "docs/**"]`).
`allowlist_regexes`	no	Regex patterns — if any match the full line, the finding is suppressed (e.g., `["// nosec", "# noqa"]`).

Combining with Secret Scanning

Threat scanning and secret scanning are complementary:

Secret scanning (--scan-secrets) detects and redacts credentials before code is sent to the LLM.
Threat scanning (--scan-threats) detects malicious patterns and reports them as findings.

Enable both for maximum coverage:

nitpik review --diff-base main --scan-secrets --scan-threats

# .nitpik.toml
[secrets]
enabled = true

[threats]
enabled = true

CLI Flags

Flag	Default	Description
`--scan-threats`	`false`	Enable threat pattern detection and LLM triage.
`--threat-rules <PATH>`	—	Additional threat rules file (TOML format). Loaded alongside built-in rules.

Configuration

`.nitpik.toml`

[threats]
enabled = true
# additional_rules = "path/to/custom-threats.toml"

Key	Type	Default	Description
`enabled`	bool	`false`	Enable threat scanning by default. Equivalent to always passing `--scan-threats`.
`additional_rules`	string	(none)	Path to additional threat rules TOML file.

Performance

Built-in rules are compiled once per run using parallel regex compilation. The startup cost is comparable to secret scanning (~3–5 seconds). LLM triage adds one additional API call when findings exist.

Limitations

Threat scanning is a best-effort defense layer, not a guarantee. The built-in rules and heuristics cover common attack patterns, but determined attackers can craft payloads that evade regex-based detection. LLM triage reduces false positives but may also dismiss genuine threats.

nitpik does not guarantee detection of all malicious code. Novel obfuscation techniques, zero-day patterns, and sophisticated supply chain attacks may not be caught.
Threat findings are advisory. Always verify flagged patterns with human judgment before acting on them.
nitpik is not a replacement for dedicated security tooling — use it alongside SAST scanners, dependency audits, and manual security review.

Related Pages

Secret Scanning — credential detection and redaction
How Reviews Work — where threat scanning fits in the pipeline
Configuration — full config reference
CI/CD Integration — enabling threat scanning in pipelines

Getting Started

Using nitpik

How It Works

Security & Privacy

Deployment

Reference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

15 Threat Scanning

Threat Scanning

Enabling Threat Scanning

What Gets Detected

Unicode and homoglyph attacks

How It Works

Custom Rules

Rule fields

Combining with Secret Scanning

CLI Flags

Configuration

`.nitpik.toml`

Performance

Limitations

Related Pages

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally