Differ Rules

The Differ lives at internal/orchestrator/differ/. It walks every event from a finished run, extracts fingerprints, compares against the package's baseline, writes deviation rows for novel fingerprints, and auto-promotes the run to baseline when no novel fingerprints exist.

Trigger

The orchestrator schedules a per-run debounce: each event-batch arrival on POST /v1/runs/<id>/events resets a 2-second timer. When the timer fires (2s after the last batch), Differ.AnalyzeRun(runID) runs ONCE on the complete event set.

Reasons for debouncing:

A scan streams events as it generates them — running Differ per batch wastes work.
The same AnalyzeRun may fire repeatedly if a late batch arrives; the operation is idempotent (DeleteDeviationsForRun before WriteDeviations).

Extraction (`fingerprint.go`)

For each event type, a per-type extractor reads the JSON-encoded payload and emits at most one fingerprint per (category, value) tuple per run:

file_access

func extractFileAccess(e EventRow, add fn, filter *Filter)

Read PathName from payload. Empty → drop.
Normalize the path (see Normalization below).
If filter.SuppressPath(norm) → drop.
Inspect Flags: bits for O_WRONLY (1), O_RDWR (2), O_CREAT (64). Any set → write-intent → CatFSPathWrite. None → CatFSPathRead.

exec

func extractExec(e EventRow, add fn)

Read BinaryPathStr from payload.
Normalize binary path.
Emit CatProcExec.

No allowlist for exec today — every exec is a fingerprint. (Operator- configurable proc allowlist is a v2 item.)

net_connect

func extractNetConnect(e EventRow, add fn, filter *Filter)

Read DestIP, DestPort. Empty IP → drop.
Allowlist check — filter.SuppressIP(ip) || IsAllowlistedCDN(ip):
- Hardcoded DefaultCDNAllowlist covers Cloudflare/GitHub/Google/Fastly/CloudFront ranges.
- Operator-added CIDR entries get merged.
- Matching IPs are dropped from net_new_destination. Their identity (when it matters) is carried by net_new_https_host or net_new_dns.
Emit <ip>:<port> after destination normalization.

Important: this is NOT a blanket port-443 suppression. Connections to non-allowlisted IPs on port 443 are still flagged. Malware commonly tunnels to 443 to look legitimate — we want those to deviate.

dns_query

func extractDNS(e EventRow, add fn)

Read QName. Normalize.
Emit CatNetDNS.

No DNS-level allowlist today — every distinct qname is a fingerprint unless covered by the SNI/path allowlist via inference.

tls_sni

func extractTLSSNI(e EventRow, add fn, filter *Filter)

Read parsed SNI (string).
Normalize (lowercase, strip trailing dot).
If filter.SuppressSNI(norm) → drop.
Emit CatNetHTTPSHost.

Normalization (`normalize.go`)

Per-run noise that shouldn't masquerade as a deviation gets rewritten to a stable form before becoming a fingerprint.

Pattern	Rewrite
`/proc/<PID>/foo`	`/proc/<PID>/foo` (literal `<PID>` token)
`/tmp/<random>-<digits>...`	`/tmp/<RAND>`
npm debug log `YYYY-MM-DDTHH_MM_SS_mmmZ-debug-N.log`	`<TIMESTAMP>-debug-<N>.log`
ISO-date filenames `2026-05-22.json`	`<DATE>.ext`
npm cacache content paths `_cacache/content-v2/sha256/aa/bb/...`	`<SHA256>`
npm cacache tmp `/_cacache/tmp/<hex>`	`<HEX>`
Container hostname hex paths	`<HEX>`
SNI	lowercased + trimmed
DNS qname	lowercased + trailing-dot stripped

These rules are in source, not config — they're shaped by what npm + node ACTUALLY do in sandboxes. A new noise pattern emerging means a source edit + rebuild.

Filter (`filter.go`)

The Differ builds a per-run Filter at the start of AnalyzeRun:

allEntries := store.ListAllowEntries(ctx)
applicable := storage.EntriesForPackage(allEntries, run.PackageName)
filter := NewFilter(applicable, logger)

EntriesForPackage returns global entries + entries scoped to this package.

Filter fields:

cidrs []*net.IPNet — CDN allowlist + operator CIDRs
paths []string — operator path prefixes
snis map[string]struct{} — operator SNIs (lowercased)

Methods:

SuppressIP(ipStr) bool — checks every CIDR via Contains
SuppressPath(p) bool — strings.HasPrefix against each entry
SuppressSNI(s) bool — set-membership on lowercased+trimmed value

Diff + write (`analyze.go`)

runFPs := ExtractFingerprintsWith(events, filter)
baseline := store.LoadBaseline(ctx, run.PackageName)

Two paths:

First run (`len(baseline) == 0`)

seedBaseline(runID, pkgName, runFPs)
// Inside:
//   MergeBaseline(runID, every fingerprint as a baseline row)
//   MarkRunBaseline(runID, true)
return 0, nil

The first run ALWAYS becomes baseline regardless of what it observed. Zero deviations. Operator's responsibility to bring trust at watch-add time.

Subsequent run

Build a lookup set from baseline:

known := map[string]struct{}
for _, b := range baseline {
    known[b.Category + "|" + b.Value] = struct{}{}
}

Walk runFPs. For each:

In known → bump baseline row's occurrence_count + update last_seen_run_id via MergeBaseline.
NOT in known → emit DeviationRow with random ID, evidence- pointing to the first event that produced the fingerprint, severity from defaultSeverity(category).

After the loop:

DeleteDeviationsForRun(runID)       // idempotency
WriteDeviations(deviations)
MergeBaseline(occurrence_bumps)

D38 auto-promotion:

if len(deviations) == 0 {
    MarkRunBaseline(runID, true)
}

Zero-deviation runs join the baseline. The baseline rows' content doesn't change (every fingerprint was already known) but last_seen_run_id updates serve as a recency signal — "this baseline entry is still being observed."

Allowlist semantics

The allowlist is applied DURING fingerprint extraction, before the baseline comparison. Consequences:

Allowlisted values never become baseline. A new operator CIDR entry doesn't retroactively pull IPs out of the baseline — the existing baseline rows stay, but new runs no longer add new IPs from that range.
Removing an allowlist entry doesn't immediately surface old noise. Next run will re-observe the previously-suppressed destinations, but they go into the baseline (auto-promote) instead of becoming deviations — assuming the run is otherwise clean.

Per-package entries layer on top of global entries via storage.EntriesForPackage.

Manual baseline promote

fangs baseline promote <run-id-prefix>

Operator-driven path:

Re-extracts the run's fingerprints with the current allowlist filter applied.
Merges all of them into baseline_fingerprints.
Marks the run is_baseline=true.
Deletes any deviation rows for the run.

Same semantics as auto-promote, just human-triggered.

Manual baseline rebuild

Not exposed as a single command today. Workaround:

# 1. Find the runs you want to be the new baseline
fangs run list -package lodash | head -20

# 2. Promote each one — order doesn't matter; merges + idempotent
for rid in 18b2... 18b3...; do
  fangs baseline promote $rid
done

A fangs baseline rebuild -package P -count N subcommand that clears the baseline and re-extracts from the last N runs is on the v2 list.

Severity defaults

defaultSeverity in differ.go:

case CatFSPathWrite, CatProcExec:  high   // write or exec is high signal
case CatNetDestination:             medium // raw IP exfil possible
case CatNetDNS, CatNetHTTPSHost:    medium
case CatFSPathRead:                 medium // bumped to high on cred-tag

Severity is currently per-category, not per-value. A v2 enhancement would let operators set per-package severity overrides (e.g. "any new SNI for lodash is critical").

Category	Value	Source events	Default severity
`net_new_destination`	`<ip>:<port>`	`net_connect` after allowlist	medium
`net_new_dns`	normalized qname	`dns_query`	medium
`net_new_https_host`	normalized SNI	`tls_sni`	medium
`fs_new_path_read`	normalized path	`file_access` without write flags	medium (high when cred-tagged)
`fs_new_path_write`	normalized path	`file_access` with O_WRONLY/O_RDWR/O_CREAT	high
`proc_new_exec`	normalized binary path	`exec`	medium

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Differ Rules

Differ Rules

Trigger

Categories

Extraction (`fingerprint.go`)

file_access

exec

net_connect

dns_query

tls_sni

Normalization (`normalize.go`)

Filter (`filter.go`)

Diff + write (`analyze.go`)

First run (`len(baseline) == 0`)

Subsequent run

Allowlist semantics

Manual baseline promote

Manual baseline rebuild

Severity defaults

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Differ Rules

Differ Rules

Trigger

Categories

Extraction (fingerprint.go)

file_access

exec

net_connect

dns_query

tls_sni

Normalization (normalize.go)

Filter (filter.go)

Diff + write (analyze.go)

First run (len(baseline) == 0)

Subsequent run

Allowlist semantics

Manual baseline promote

Manual baseline rebuild

Severity defaults

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Extraction (`fingerprint.go`)

Normalization (`normalize.go`)

Filter (`filter.go`)

Diff + write (`analyze.go`)

First run (`len(baseline) == 0`)