-
Notifications
You must be signed in to change notification settings - Fork 1
Differ Rules
The Differ lives at internal/orchestrator/differ/. It walks every
event from a finished run, extracts fingerprints, compares against the
package's baseline, writes deviation rows for novel fingerprints, and
auto-promotes the run to baseline when no novel fingerprints exist.
The orchestrator schedules a per-run debounce: each event-batch arrival
on POST /v1/runs/<id>/events resets a 2-second timer. When the timer
fires (2s after the last batch), Differ.AnalyzeRun(runID) runs ONCE
on the complete event set.
Reasons for debouncing:
- A scan streams events as it generates them — running Differ per batch wastes work.
- The same
AnalyzeRunmay fire repeatedly if a late batch arrives; the operation is idempotent (DeleteDeviationsForRunbeforeWriteDeviations).
Six categories. Each fingerprint is keyed on (category, value):
| Category | Value | Source events | Default severity |
|---|---|---|---|
net_new_destination |
<ip>:<port> |
net_connect after allowlist |
medium |
net_new_dns |
normalized qname | dns_query |
medium |
net_new_https_host |
normalized SNI | tls_sni |
medium |
fs_new_path_read |
normalized path |
file_access without write flags |
medium (high when cred-tagged) |
fs_new_path_write |
normalized path |
file_access with O_WRONLY/O_RDWR/O_CREAT |
high |
proc_new_exec |
normalized binary path | exec |
medium |
syscall_rare_category is reserved for future work (rare-syscall
detection) — not active in v1.
Severity bumps to high when an event carries EVENT_TAG_CRED_ACCESS
— set by the BPF probe when the matched watched-path entry was
@cred-tagged.
For each event type, a per-type extractor reads the JSON-encoded payload and emits at most one fingerprint per (category, value) tuple per run:
func extractFileAccess(e EventRow, add fn, filter *Filter)- Read
PathNamefrom payload. Empty → drop. - Normalize the path (see Normalization below).
- If
filter.SuppressPath(norm)→ drop. - Inspect
Flags: bits forO_WRONLY(1),O_RDWR(2),O_CREAT(64). Any set → write-intent →CatFSPathWrite. None →CatFSPathRead.
func extractExec(e EventRow, add fn)- Read
BinaryPathStrfrom payload. - Normalize binary path.
- Emit
CatProcExec.
No allowlist for exec today — every exec is a fingerprint. (Operator- configurable proc allowlist is a v2 item.)
func extractNetConnect(e EventRow, add fn, filter *Filter)- Read
DestIP,DestPort. Empty IP → drop. -
Allowlist check —
filter.SuppressIP(ip) || IsAllowlistedCDN(ip):- Hardcoded
DefaultCDNAllowlistcovers Cloudflare/GitHub/Google/Fastly/CloudFront ranges. - Operator-added CIDR entries get merged.
- Matching IPs are dropped from
net_new_destination. Their identity (when it matters) is carried bynet_new_https_hostornet_new_dns.
- Hardcoded
- Emit
<ip>:<port>after destination normalization.
Important: this is NOT a blanket port-443 suppression. Connections to non-allowlisted IPs on port 443 are still flagged. Malware commonly tunnels to 443 to look legitimate — we want those to deviate.
func extractDNS(e EventRow, add fn)- Read
QName. Normalize. - Emit
CatNetDNS.
No DNS-level allowlist today — every distinct qname is a fingerprint unless covered by the SNI/path allowlist via inference.
func extractTLSSNI(e EventRow, add fn, filter *Filter)- Read parsed
SNI(string). - Normalize (lowercase, strip trailing dot).
- If
filter.SuppressSNI(norm)→ drop. - Emit
CatNetHTTPSHost.
Per-run noise that shouldn't masquerade as a deviation gets rewritten to a stable form before becoming a fingerprint.
| Pattern | Rewrite |
|---|---|
/proc/<PID>/foo |
/proc/<PID>/foo (literal <PID> token) |
/tmp/<random>-<digits>... |
/tmp/<RAND> |
npm debug log YYYY-MM-DDTHH_MM_SS_mmmZ-debug-N.log
|
<TIMESTAMP>-debug-<N>.log |
ISO-date filenames 2026-05-22.json
|
<DATE>.ext |
npm cacache content paths _cacache/content-v2/sha256/aa/bb/...
|
<SHA256> |
npm cacache tmp /_cacache/tmp/<hex>
|
<HEX> |
| Container hostname hex paths | <HEX> |
| SNI | lowercased + trimmed |
| DNS qname | lowercased + trailing-dot stripped |
These rules are in source, not config — they're shaped by what npm + node ACTUALLY do in sandboxes. A new noise pattern emerging means a source edit + rebuild.
The Differ builds a per-run Filter at the start of AnalyzeRun:
allEntries := store.ListAllowEntries(ctx)
applicable := storage.EntriesForPackage(allEntries, run.PackageName)
filter := NewFilter(applicable, logger)EntriesForPackage returns global entries + entries scoped to this
package.
Filter fields:
-
cidrs []*net.IPNet— CDN allowlist + operator CIDRs -
paths []string— operator path prefixes -
snis map[string]struct{}— operator SNIs (lowercased)
Methods:
-
SuppressIP(ipStr) bool— checks every CIDR viaContains -
SuppressPath(p) bool—strings.HasPrefixagainst each entry -
SuppressSNI(s) bool— set-membership on lowercased+trimmed value
runFPs := ExtractFingerprintsWith(events, filter)
baseline := store.LoadBaseline(ctx, run.PackageName)Two paths:
seedBaseline(runID, pkgName, runFPs)
// Inside:
// MergeBaseline(runID, every fingerprint as a baseline row)
// MarkRunBaseline(runID, true)
return 0, nilThe first run ALWAYS becomes baseline regardless of what it observed. Zero deviations. Operator's responsibility to bring trust at watch-add time.
Build a lookup set from baseline:
known := map[string]struct{}
for _, b := range baseline {
known[b.Category + "|" + b.Value] = struct{}{}
}Walk runFPs. For each:
- In
known→ bump baseline row'soccurrence_count+ updatelast_seen_run_idviaMergeBaseline. - NOT in
known→ emitDeviationRowwith random ID, evidence- pointing to the first event that produced the fingerprint, severity fromdefaultSeverity(category).
After the loop:
DeleteDeviationsForRun(runID) // idempotency
WriteDeviations(deviations)
MergeBaseline(occurrence_bumps)D38 auto-promotion:
if len(deviations) == 0 {
MarkRunBaseline(runID, true)
}Zero-deviation runs join the baseline. The baseline rows' content
doesn't change (every fingerprint was already known) but
last_seen_run_id updates serve as a recency signal — "this baseline
entry is still being observed."
The allowlist is applied DURING fingerprint extraction, before the baseline comparison. Consequences:
- Allowlisted values never become baseline. A new operator CIDR entry doesn't retroactively pull IPs out of the baseline — the existing baseline rows stay, but new runs no longer add new IPs from that range.
- Removing an allowlist entry doesn't immediately surface old noise. Next run will re-observe the previously-suppressed destinations, but they go into the baseline (auto-promote) instead of becoming deviations — assuming the run is otherwise clean.
Per-package entries layer on top of global entries via
storage.EntriesForPackage.
fangs baseline promote <run-id-prefix>Operator-driven path:
- Re-extracts the run's fingerprints with the current allowlist filter applied.
- Merges all of them into
baseline_fingerprints. - Marks the run
is_baseline=true. - Deletes any deviation rows for the run.
Same semantics as auto-promote, just human-triggered.
Not exposed as a single command today. Workaround:
# 1. Find the runs you want to be the new baseline
fangs run list -package lodash | head -20
# 2. Promote each one — order doesn't matter; merges + idempotent
for rid in 18b2... 18b3...; do
fangs baseline promote $rid
doneA fangs baseline rebuild -package P -count N subcommand that
clears the baseline and re-extracts from the last N runs is on the
v2 list.
defaultSeverity in differ.go:
case CatFSPathWrite, CatProcExec: high // write or exec is high signal
case CatNetDestination: medium // raw IP exfil possible
case CatNetDNS, CatNetHTTPSHost: medium
case CatFSPathRead: medium // bumped to high on cred-tagSeverity is currently per-category, not per-value. A v2 enhancement
would let operators set per-package severity overrides (e.g. "any new
SNI for lodash is critical").