-
Notifications
You must be signed in to change notification settings - Fork 0
processing lifecycle
You will learn what happens after the downloader stages a feed body, how the processing engine produces published artifacts, and how heavy phases and background work fit in.
When the processing loop wakes, it follows this sequence:
-
Claim staged body — rename a
.newfile to.processingfor each feed in the batch. - Feed-local processing — analyze the canonical feed body, write the committed feed body, and update per-feed state.
- Heavy phases — run global enrichment and comparison across all relevant feeds.
- Publish — stage public artifacts, publish the staged artifact tree, and save the updated cache state.
For each admitted feed, the engine produces:
- Metadata — size, unique IP count, IP family, change rate.
- History — bounded point-in-time snapshots of feed size over time.
- Retention — how long IPs have been listed, how long removed IPs had stayed.
- Change rate — rotation percentage, update frequency measurements.
- Provider enrichment — ASN distribution, geographic distribution, bogon overlap.
The engine reads only local canonical feed bodies and local provider databases. It never fetches upstream.
After feed-local work completes for the batch, the engine runs global phases:
| Phase | What it does |
|---|---|
| Pairwise comparison | Compares each updated feed against every other enabled public feed. Updates overlap counts on both sides. |
| GeoIP fan-out | Updates geographic enrichment for affected feeds using the current GeoIP provider. |
| ASN fan-out | Updates ASN enrichment for affected feeds using the current ASN provider. |
| Bogon analysis | Checks affected feeds against bogon reference data. |
| Critical infrastructure | Generates overlap artifacts for critical-infrastructure reference feeds. |
| Insights | Produces deterministic insights from all computed facts. |
Heavy-phase concurrency is independently configurable. The engine stops admitting new heavy work during shutdown and waits for in-flight workers to settle.
The processing loop makes a successful batch visible in stages:
- During feed-local processing, each successful feed writes its committed canonical body and latest binary set.
- Public artifacts are staged with the mtimes required for pipeline integrity.
- Supporting staged downloads, such as provider archives and artifact-parent archives, are promoted before public artifact publication when they belong to the successful batch.
- The staged public artifact tree is published.
- The updated cache state is saved.
If processing fails before publication, the staged or processing input remains available for retry. If publication is interrupted after a feed body was committed, integrity checks detect missing or stale public artifacts and recovery can reprocess from the committed local body.
After the main batch commits, some work runs in the background:
- Entity artifact patching — country and ASN detail pages update incrementally.
- Entity sidecar generation — per-feed entity sidecars are precomputed during processing, then consumed by the background patcher.
Background work is visible in the admin UI. It does not block the next processing cycle.
Within one batch, feeds are processed in this order:
- Normal feeds (plain sources, artifact children)
- History derivatives
- Merges (ordered by increasing dependency count)
This ensures deterministic publication order. The engine does not compose history derivatives or merges — that is the downloader's job.
- Pipeline Overview — how processing fits into the full pipeline
- Download Lifecycle — what happens before processing
- Triggers and Reprocessing — what causes processing to run
- Daemon Command Reference
- Environment Variables
- Configuration Reload
- Listener Topologies
- Admin Authentication
- Feed Families
- Source Feeds
- Processor Reference
- Static Feeds
- Merge Feeds
- Artifact Parents
- History Derivatives
- Provider Databases
- Use Roles
- Critical Infrastructure Reference Feeds
- Legal Fields
- Feed Visibility & Lifecycle
- YAML Field Reference
- Pipeline Overview
- Download Lifecycle
- Processing Lifecycle
- Feed Status Reference
- Health Classes
- What Triggers Reprocessing
- Accessing the Admin
- Runtime Status
- Feed Inventory
- Artifact Inventory
- Live Queues
- Background Work
- Schedule State
- Operator Actions
- Enable & Disable