Repair Network Flows documentation by ktsaou · Pull Request #22449 · netdata/netdata

ktsaou · 2026-05-07T21:16:35Z

Summary

Repair Network Flows documentation and generated integration cards after SOW 14 regression review.
Align public docs with supported behavior for retention, IP intelligence defaults, enrichment, visualization limits, troubleshooting, and generated default-behavior text.
Fix timestamp_source persistence so the decoded source timestamp is stored in _SOURCE_REALTIME_TIMESTAMP while journal entry realtime remains receive/write time.
Close SOW 14 by moving it to .agents/sow/done/ with final validation evidence.

Validation

cargo test --manifest-path src/crates/netflow-plugin/Cargo.toml timestamp_source -- --nocapture
python3 integrations/gen_integrations.py
python3 integrations/gen_docs_integrations.py
python3 -c 'import yaml; yaml.safe_load(open("src/crates/netflow-plugin/metadata.yaml")); print("metadata yaml ok")'
Targeted public-doc quality scans over docs/network-flows/, metadata.yaml, and generated integration-card body text.
git diff --check
.agents/sow/audit.sh confirmed SOW 14 status/directory consistency after move; it still exits nonzero on an unrelated pre-existing sensitive-data pattern in .agents/skills/mirror-netdata-repos/SKILL.md and existing root TODO-file warnings.

Summary by cubic

Repairs and restructures Network Flows docs and integration cards to match actual behavior, fixes source‑timestamp persistence in the netflow-plugin, and simplifies retention/query config to per‑tier only. Also ensures Learn renders the section overview, adds Network Flows to the generated “Monitor anything” page, and improves startup by moving the raw rebuild scan off the async startup path.

Bug Fixes
- Persist the decoded source timestamp to _SOURCE_REALTIME_TIMESTAMP when timestamp_source is used; adds an end‑to‑end test.
- Move raw rebuild scanning off the async startup path to avoid blocking plugin start.
- Documentation overhaul: collapsed seven enrichment concept pages into a single Enrichment guide; added “Enrichment Intel Downloader” and “Visualization/Overview”; corrected doubling vs bidirectional symmetry, per‑flow sampling handling, UI navigation (Live → Network Flows), GeoIP internal‑IP behavior, tier auto‑pick rules, rollup‑field survival, and Time‑Series bucket limits; Learn map now renders the Overview at the section root; regenerated and recategorized integration cards under “Flow Protocols” and “Enrichment Methods”, including new Classifiers, Decapsulation, and Static Metadata; updated generators so COLLECTORS.md includes a “Network Flows” section.
Migration
- Move top‑level journal.size_of_journal_files and journal.duration_of_journal_files into per‑tier entries under journal.tiers.*. Each tier requires at least one positive limit; set a field to null to disable that limit on that tier.
- Remove unsupported keys: journal.query_1m_max_window, journal.query_5m_max_window, and journal.query_facet_max_values_per_field. Keep journal.query_max_groups (documented to fold overflow into __overflow__).

^{Written for commit 165e02c. Summary will update on new commits.}

The 2026-05-07 closure of SOW-0014 was premature. The learn netlify deploy preview for PR netdata#2852 surfaced major content errors that the prior validation pass missed: multiple statements that contradict the source code, generic flow-monitoring advice imported from research notes that does not apply to Netdata, several invented behaviours, and structural choices that read as academic / generic guidance rather than as a practical Netdata-specific guide. Move file from done/ back to current/, set Status: in-progress, and append a `## Regression - 2026-05-07` section with: - 21 findings transcribed verbatim from the user (F1..F21); - code-citation verdicts for each (per-flow sampling multiplication at decoder/record/core/record.rs:24-26 confirms F4/F5/F15 wrong; template persistence at decoder/protocol/{v9,ipfix}/templates.rs confirms F14 wrong; etc.); - root cause analysis (subagents extracted data accurately but missed behavioural framing claims; codex review focused on autocomplete code, not docs prose; validation evidence was structural, not semantic; closure was driven by "all phases done" rather than "all claims true"); - a three-phase repair plan: per-finding fixes one at a time with no batching (R1), per-page audit subagents that re-run until clean (R2), final close with a whole-section review (R3). The SOW remains in current/ until every F1..F21 has a fix landed with code citation, every page passes a per-page audit, and a whole-section review returns no new findings.

The /docs/network-flows URL rendered as an auto-generated tile grid because the section root meta block carried only `label:` -- no leaf content. Learn's `get_dir_make_file_and_recurse` then synthesised a category index page from the children. Hoist `edit_url:` and `description:` to the section root, pointing at the existing README.md. Drop the redundant child "Overview" entry that pointed to the same file. Pattern reference: every section that renders an Overview at its root URL on learn.netdata.cloud (Collecting Metrics, Dashboards and Charts, Netdata Cloud, Welcome to Netdata, etc.) carries `edit_url:` directly on the section root. Logged as F1 in SOW-0014 regression log with code references.

Two related findings, fixed together because they touched the same paragraphs. - The "doubling" effect (per-packet ingress+egress accounting on a single router) was conflated with bidirectional traffic symmetry. - The doubling fix said "filter by one exporter, one interface, in one direction". The "in one direction" is redundant on top of "one interface" and misleads readers into expecting another 50% halving. - The bidirectional-traffic explanation said "when you see traffic X-to-Y and Y-to-X of similar volume, that's one conversation, not two". Bidirectional conversations are usually asymmetric (downloads vs ACKs), so "similar volume" is wrong as an identification heuristic. Rewritten: - Doubling fix is now: one exporter + one interface (Input Interface OR Output Interface, pick one). Each packet crossing that interface produces exactly one record on it. - The mirror-conversation section is renamed and reframed: separate packets in each direction, separate records, typically asymmetric volumes. Per-direction accounting, not duplication. Files touched: - README.md, quick-start.md: paragraphs rewritten. - summary-sankey.md, anti-patterns.md, validation.md: "in one direction" lines fixed in place. Anti-patterns / validation will be rewritten more broadly under F14-F17 but the wrong claims are removed now. Logged as F2+F3 in SOW-0014 regression log.

…e myth) Two related findings, fixed together because they are the same wrong claim repeated across the documentation. Source-code reality: src/crates/netflow-plugin/src/decoder/record/core/record.rs:24-26 let sampling_rate = rec.sampling_rate.max(1); rec.bytes = rec.bytes.saturating_mul(sampling_rate); rec.packets = rec.packets.saturating_mul(sampling_rate); `sampling_rate` is set per-record from each protocol's appropriate source (legacy header, v9 IE / Sampling Options Template, IPFIX IE / options, sFlow per-sample rate, or static override). Multiplication runs PER FLOW at decode time. Mixed sampling rates across exporters, interfaces, or time are handled correctly automatically. Removed false claims wherever they appeared: - README.md "What sampling does to your numbers" -- "works correctly only if all your exporters use the same sampling rate" and "the clean path: keep sampling rates uniform across your network". Rewrote the paragraph to state per-flow multiplication, explain why the UI does not surface a single rate (mixed rates have no meaningful display value; uniform rates are already known to the operator), and keep the real statistical-floor caveat (sampling can miss small / short flows regardless of rate uniformity). - field-reference.md and anti-patterns.md -- `RAW_BYTES` no longer framed as "use when sampling is uniform". Now correctly framed as the literal pre-multiplication value the exporter sent. - troubleshooting.md "Bandwidth doesn't match SNMP" -- "Mixed sampling rates ... isn't comparable to any single SNMP measurement" replaced with the actual mistake (comparing aggregates of many interfaces to a single interface SNMP counter). Per-flow multiplication is correct regardless of rate uniformity. - validation.md -- "undocumented sampling rate changes" dropped from the silent-failure intro; the "Sampling rate change" monitoring table row removed (per-flow multiplication absorbs rate changes). - investigation-playbooks.md -- "Sampling rate of the exporter (so the numbers can be interpreted)" deliverable removed and "A change in sampling rate during the analysis window invalidates the trend" caveat removed. Both wrong under per-flow scaling. - anti-patterns.md cross-protocol-counts section -- "Same goes for sampling-rate differences across exporters" removed; the protocol-counts-not-comparable point stays. The F2/F3 doubling-fix wording ("filter by exporter + interface + direction") was also wrong and got cleaned in the same anti-patterns summary table row. What stays: NetFlow v7 / v5 with rate=0 / v9 / IPFIX without a Sampling Options Template are real cases where the plugin sees no rate and undercounts. Those remain documented as the actual silent failure mode. Items deferred to F14 / F15 which rewrite their containing sections: - validation.md silent-failure items #2, #3, #5 (F14 removes them as a block). - anti-patterns.md "Ignoring the sampling rate" section + its summary-table row (F15 removes the section entirely). Logged as F4+F5 in SOW-0014 regression log with code references.

…ment The globe and city map render the same query response with the same table beneath. The 3D projection is not "less useful for analysis"; it is a different rendering with different strengths. Replaced both occurrences with a neutral framing that the 2D city map is best for in-continent precision and the 3D globe is best when distance and great-circle paths matter (transcontinental traffic, undersea cables, intercontinental CDN routing). While here, fixed the same page's "Mirroring" subsection to drop the F2 symmetry myth ("25 top-N = 12 conversations" implied a 1:1 pairing of A->B and B->A by volume; bidirectional traffic is usually asymmetric). Logged as F6 in SOW-0014 regression log.

The docs claimed "the Network Flows tab should appear in the top navigation". The actual Netdata UI exposes Network Flows as a Function under the **Live** tab. Verified against docs/dashboards-and-charts/live-tab.md. Adopted convention: - "Open Network Flows" (verb) - "the Network Flows view" (noun) - "Click the Live tab in the top navigation; Network Flows appears in the Functions list on the right" (setup context). Swept every "Network Flows tab" reference in: - installation.md (the original bad sentence + follow-up). - troubleshooting.md, investigation-playbooks.md, anti-patterns.md. - visualization/dashboard-cards.md (4 occurrences). - visualization/summary-sankey.md. Logged as F7 in SOW-0014 regression log.

Cleans up an unjustified schema redundancy. Today the plugin accepts both top-level `size_of_journal_files` / `duration_of_journal_files` under `journal` AND per-tier values under `journal.tiers.<tier>`. The runtime already uses only per-tier values; the global was just a default that flowed through `retention_for_tier()`. The two paths gave operators a configuration surface with no underlying behavioural difference. After: - `JournalConfig` carries only `tiers: JournalTierRetentionOverrides` (plus journal_dir and the query guardrails). No globals. - Each `JournalTierRetentionConfig` is `Option<ByteSize> / Option<Duration>` directly. Omitted fields fall back to the built-in tier defaults (uniform 10GB / 7d, preserving today's default behaviour). Explicit `null` disables that limit on that tier; validation still requires at least one positive limit. - The `RetentionLimitOverride<T>` enum, its serializer, deserializer, and `resolve` helpers are removed -- no longer reachable. - The orphan `parse_bytesize` helper that fed the removed clap `value_parser` is gone. Tests rewritten to exercise the per-tier-only schema: - `journal_tier_retention_uses_built_in_tier_defaults` - `journal_tier_retention_uses_per_tier_values_when_present` - `journal_rotation_size_derives_from_tier_size_budget` - `journal_rotation_size_uses_100mb_for_time_only_retention` - `journal_validation_rejects_tier_size_below_100mb` - `journal_validation_allows_time_only_retention_when_size_is_disabled` - `journal_tier_retention_null_disables_size_limit_for_that_tier_only` Plus mechanical updates to memory_tests / startup_memory_tests to use the new schema. Full crate: 427 passed, 0 failed. Documentation: - configuration.md: rewrote the `## journal` section with a per-tier-only schema. Updated the production retention profile example. Replaced the existing "Top-level retention" subsection with a "Per-tier retention" subsection. - retention-querying.md: dropped the global-form example; per-tier example only; cross-link to configuration.md. Breaking change notice: any existing user config using `journal.size_of_journal_files` / `journal.duration_of_journal_files` at the top level will now fail to deserialize (deny_unknown_fields). Migrate by moving those values under `journal.tiers.<tier>.*`. The plugin is recently shipped (PR netdata#22439, 2026-05-07); breaking-change risk is low. Logged as F8 in SOW-0014 regression log.

Code investigation: these two journal config knobs were declared, validated, and exposed in YAML, but nothing in `src/.../query/` ever read them. The actual tier auto-pick logic in `query/planner/spans.rs::plan_query_tier_spans_recursive` selects the coarser tier strictly from window / bucket-duration alignment math -- it does not consult either knob. Verdict: dead schema. Removed both: - field declarations on `JournalConfig` - non-zero / ordering validation in validate_journal - two YAML test fixtures that mentioned them - the two configuration.md table rows + code-block lines + the "query-window limits" explanation - the retention-querying.md sentence that referenced them Updated the retention-querying explanation of tier auto-pick to match the actual behaviour: the planner uses bucket alignment, not config-driven window caps. Build + tests: 427 passed, 0 failed. Breaking change notice: any user config that set `journal.query_1m_max_window` or `journal.query_5m_max_window` will now fail to deserialize (deny_unknown_fields). The keys had no effect before; migration is delete-only. Logged as F9 in SOW-0014 regression log.

…values_per_field; document properly Two journal config knobs, opposite verdicts after code investigation: - query_max_groups: REAL. Read at query/service.rs:52 and threaded into the projected group accumulator via query/projected/apply.rs:48. When the accumulator's grouped_total() exceeds the limit, additional group keys are folded into a synthetic __overflow__ bucket (query/grouping/labels.rs:17, query/grouping/model/compact.rs:35) and the response carries a warning ("Group accumulator limit reached; additional groups were folded into __overflow__" at query/timeseries.rs:124). Bounds memory on accidentally wide group-by combinations. Keep; document properly. - query_facet_max_values_per_field: DEAD. Declared, validated for non-zero, but the consumer at query/facets/render.rs:19,27 uses the hardcoded constant DEFAULT_FACET_ACCUMULATOR_MAX_VALUES_PER_FIELD (query/request/constants.rs:17) instead of the config knob. The two coincidentally have the same default value (5000) but the config knob is never threaded to the consumer. Remove. Code: - types/journal.rs: removed the dead field; added a doc comment for query_max_groups explaining the __overflow__ bucket and the warning. - defaults.rs: removed default_query_facet_max_values_per_field(). - validation/journal.rs: removed the non-zero check for the dead knob. - plugin_config_tests.rs: removed validate_rejects_zero_query_facet_max_values_per_field test entirely; cleaned YAML fixtures. Stock config + README: - src/crates/netflow-plugin/configs/netflow.yaml: rewrote the journal block to use the per-tier retention form (carries over the F8 schema in the stock config); dropped both dead knobs; added clear comments for query_max_groups. - src/crates/netflow-plugin/README.md: example updated, explanatory paragraph rewritten to describe what query_max_groups actually does. Docs: - configuration.md: Query guardrails table now lists only query_max_groups, with full description of overflow behaviour. - retention-querying.md: Group-by limit section consolidated. - visualization/filters-facets.md: removed the "Facet limits" subsection that documented the dead knob. Build + tests: 426 passed, 0 failed (one dead-knob validation test removed). Breaking change notice: any user config setting journal.query_facet_max_values_per_field will now fail to deserialize. The key had no effect before; migration is delete-only. Logged as F10 in SOW-0014 regression log.

The file existed as 0 bytes since the original documentation rewrite. The netlify deploy preview rendered it as an empty page. Multiple other pages cross-link to it (asn-resolution, static-metadata, network-identity, README, configuration, the four provider integration cards), so the empty placeholder was both a UX failure and a coverage gap. Authored from scratch, code-grounded against: - src/.../plugin_config/types/enrichment/geoip.rs (config schema) - src/.../plugin_config/runtime.rs (auto-detect path) - src/.../enrichment/data/geoip/resolver.rs (load + 30s refresh + per-IP composing-multiple-databases lookup) - src/.../enrichment/data/network/asn.rs (AS-name rendering) Page covers: the fields IP intelligence populates (with tier-preservation notes), the configuration schema, auto-detection, refresh cadence, lookup order vs the broader ASN provider chain, the four provider integration cards (DB-IP / MaxMind / IPtoASN / Custom), private-IP rendering, IPv6/IPv4 database split behaviour, staleness and accuracy caveats, and a failure-modes table. Frontmatter `learn_rel_path` matches the bgp-routing / network-identity siblings ("Network Flows/Enrichment Concepts") for now -- the source frontmatter is informational; the actual sidebar position derives from docs/.map/map.yaml, and F20 will rename the section consistently. Logged as F11 in SOW-0014 regression log.

…hment" User: "'Encrichement Concepts' is a wrong title. 'Flows Enrichement' is the right one." Renamed in: - docs/.map/map.yaml line 499 (the section label that drives the actual sidebar position on Learn). - All seven `learn_rel_path` frontmatter values across docs/network-flows/enrichment/*.md. Prior state was inconsistent (4 files had "Network Flows/Enrichment", 2 had "Network Flows/Enrichment Concepts", 1 had the F11-introduced "Flows Enrichment"). Settled on the canonical "Network Flows/Flows Enrichment" everywhere. Logged as F20 in SOW-0014 regression log.

…ols" User: "'Sources' is too generic. 'Flow Protocols' is the right one." Renamed in: - integrations/categories.yaml: flows.sources.name now "Flow Protocols". - The three protocol-card frontmatter values (`netflow.md`, `ipfix.md`, `sflow.md`) now declare `learn_rel_path: "Network Flows/Flow Protocols"`. - src/crates/netflow-plugin/metadata.yaml: removed three self-referencing learn URLs that pointed at /docs/network-flows/sources/{netflow,ipfix,sflow}. These links were broken before the rename (no /docs/network-flows/sources directory exists in source) and would stay broken under the new label. Replaced with the surviving "Network Flows Overview" anchor. - Re-ran integrations/gen_integrations.py + gen_docs_integrations.py to regenerate the three protocol cards. Both exit clean. Logged as F21 in SOW-0014 regression log.

User: "Netdata logs in namespace 'netdata'. Journalctl needs `--namespace netdata`." `-u netdata` selects the systemd UNIT and captures only stdout/stderr the unit emits to the journal. Netdata writes structured logs into a journal NAMESPACE called `netdata`. Without `--namespace netdata`, users see at most unit-level startup/shutdown messages -- not the plugin output that helps with debugging. Swept all `journalctl -u netdata` invocations to `journalctl --namespace netdata` across: - quick-start.md - troubleshooting.md (5 occurrences) - installation.md - enrichment/network-identity.md Grep clean afterwards. Logged as F18 in SOW-0014 regression log.

…tern User: "How is it possible for users to ignore the sampling rate if we calculate the estimated volume at ingestion? You invented reasons for it. ... section must be removed." The premise was wrong on multiple counts: per-flow multiplication is always consistent (each record carries its own rate), users CAN'T "ignore" the rate because Netdata applies it automatically at decode time, and the "uniform rates required" framing is exactly the myth F4 + F5 already corrected. The two real concerns the section conflated remain documented elsewhere: - small flows missed at high sampling rates -- preserved in the Overview's "What sampling does to your numbers" section and in investigation-playbooks "Caveats". - exporter sends no rate (v7 / v5 rate=0 / v9-IPFIX without Sampling Options Template) -- preserved in troubleshooting "Bandwidth doesn't match SNMP" and in validation.md. Removed the entire section. Section numbering renumber will land with F17 once all three section removals have completed. Logged as F15 in SOW-0014 regression log.

…i-pattern User: "Geolocation does not position internal IPs on the map. ... section must be removed." Code-verified at src/crates/netflow-plugin/src/enrichment/data/geoip/decode.rs:40-72. `apply_geo_record` writes country/state/city/latitude/longitude only when the MMDB record carries non-empty values for those fields. For RFC 1918 / private IPs, the MMDB either has no entry or has one tagged `ip_class: "private"` with no country/city/coords. Internal IPs simply do not appear on geographic maps. The "in random countries" claim was invented. Removed the entire section. The numbering renumber lands with F17. The troubleshooting.md "Internal IPs in random countries" subsection carries the same invented claim and will be addressed under F19. Logged as F16 in SOW-0014 regression log.

…ds" + renumber User: "Netdata does not support alerting of flows yet. Remove this section." The section's own footnote already acknowledged the issue: "Netdata's alerting on flow data is in development; for now this pattern lives in your monitoring practice, not in the plugin." So it was advice for users to apply outside Netdata, not a Netdata anti-pattern. Plus: with F15, F16, F17 all deleting sections, this commit renumbers the anti-patterns.md sections to be sequential again (1 ... 9). Dropped three rows from the summary table at the bottom (Ignored sampling, GeoIP for internal IPs, Absolute thresholds) that referred to the removed sections. Updated the cross-link in visualization/time-series.md that called out "time-shifted comparison beats absolute thresholds" -- replaced with a general anti-patterns reference since the specific section is gone. Logged as F17 in SOW-0014 regression log.

User: "This page has a mix of all the above issues: sampling, geoip, etc." Surgical fixes after F2-F18 land: - Removed "Internal IPs in random countries" subsection -- same invented claim as F16 (GeoIP does not position internal IPs on maps; code-verified at src/.../enrichment/data/geoip/decode.rs:40-72). - Rewrote "Things that look like bugs but aren't" entries to remove the F2/F3 "filter to one direction" framing and the F16 GeoIP myth. The two doubling-related bullets now point at exporter+interface filtering and at Source/Destination ASN filtering for bidirectional. Renamed "tier-0" -> "raw-tier" for consistency with the field reference. Items kept (framing already correct after earlier findings): - "Sampling rate not honoured by the exporter" -- correct (F4/F5). - Doubling references in the SNMP-mismatch table (F2/F3). - ASN provider chain debug recipe. - Decapsulation destructive-on-non-tunnel. Items deferred to per-page audit (R2): - "Cisco's default template refresh is 30 minutes" -- vendor- specific claim; verify against current Cisco IOS-XE/XR docs in R2. Logged as F19 in SOW-0014 regression log.

User: "I think the entire 'Validation and Data Quality' is completely off. It mentions again sampling rates, etc." Code-verified facts driving the rewrite: - Per-flow sampling multiplication at decode time (decoder/record/core/record.rs:24-26): users don't need to monitor "sampling rate change" or "sampling rate misinterpretation". - Template persistence across restarts (decoder/protocol/v9/templates.rs:106 + decoder/protocol/ipfix/templates/data.rs:67): users don't need to monitor "template loss after collector restart". - UDP buffer overflow alert already exists (src/health/health.d/udp_errors.conf:6-19): kernel-level UDP drops are signalled by an existing system alert, not a silent failure. Rewrite: - New "What you actually need to watch" table -- five real failure modes (kernel UDP drops via existing alert, exporter stopped sending, wrong interfaces being exported, exporter sampling without communicating the rate, stale MMDB). - Removed the three invented silent-failure items (sampling misinterpretation, sampling change, template loss). - Removed the "Internal IP enrichment validation" section (F16 confirmed GeoIP does not position internal IPs). - Renamed "Sampling rate sanity check" to "Sampling rate verification" with the uniform-rate myth gone; kept the practical RAW_BYTES vs BYTES comparison. - Removed the "Template cache health" subsection. - Renamed the alerting table to "Plugin-side signals worth alerting on" and clarified these are signals the plugin exposes for the operator, not "silent failures" the dashboard hides. Logged as F14 in SOW-0014 regression log.

…actical guide User: "People want sizing and planning directions. This is not an academic paper, not a blog." Rewrote the page from scratch around the user's seven requirements: - plugin cap (single-thread post-decode; ~25k flows/s sustained; ISP-scale anchor) - how ingestion rate maps to storage (single table, 4 rows; ~800 bytes/flow empirical) - raw tier dominates; bound it; example per-tier production config - fast NVMe is the right call for the raw tier; slow storage means shorter retention - memory: routing-trie footprint + page-cache headroom - query speed: indexed fields fast; FTS = full scan of raw tier - distributed deployment as the scaling answer (one agent per router/site; federated via Netdata Cloud; no central aggregation needed for flow data) Removed: - All benchmark tables and methodology. Engineering benchmark numbers remain in src/crates/netflow-plugin/README.md. - The "Bounding storage for capacity planning" formula derivation (ignored tier rollover and dedup; partly invalid). Logged as F13 in SOW-0014 regression log.

…ion Overview User on F12: "Retention is closer to configuration and querying is closer to visualization. ... If you need to put generic visualization rules, these should be a generic 'Visualization/Overview' page, to explain FTS, sharing, grouping, etc." User on F22: "The 'Section index' in the overview page is not needed. Learn already shows the index as a side bar." F12: - New visualization/overview.md page collects "how queries work", "group-by limit and overflow", "full-text search", "URL sharing", filtering pointers, "picking the right view". - retention-querying.md slimmed to retention-only (tiers, what survives rollup, tier auto-pick, "no data", what forces raw tier, default retention misconfig). Sidebar label renamed to "Retention and Tiers". - map.yaml: Visualization sub-section root now carries edit_url + description pointing at visualization/overview.md (matches the F1 pattern). Retention sidebar label renamed. F22: - Removed the "## Section index" block from README.md. The Learn sidebar already renders the same hierarchy. The "Where to start" role-based pointer block stays (not a sidebar duplicate). The "specific feature in depth" bullet now points readers at the sidebar. Logged as F12 + F22 in SOW-0014 regression log.

…Overview Per user: the bullet was redundant with the F22 cleanup. Sidebar guidance is implicit; the four role-based bullets above are the intended "where to start" entry points.

Documentation audits against source code surfaced a small number of inaccuracies and a few low-severity polish items across the most-edited pages. Fixed in-place, with the same surgical-edit policy as Phase R1. Critical/high severity: - configuration.md: drop the false claim that listener / protocols / journal keys can also appear at the top level. The flatten attribute is clap-only (CLI flag flatten), not serde; with deny_unknown_fields the YAML schema rejects unknown top-level keys. Stock file uses the nested form. - validation.md: replace the wrong `dRcv` ss column reference with the actual `d<N>` value inside the `skmem:(...)` line (the sock_drop counter from iproute2 ss output). - validation.md: replace the unreachable RAW_BYTES vs BYTES dashboard comparison with the supported approach -- group by the Sampling Rate field. RAW_BYTES is filtered from supported_flow_field_names and is not surfaced as a default table column. Medium severity: - README.md: rename "Source ASN" to "Source AS Name" in the default group-by description; the actual default uses SRC_AS_NAME, which the dashboard renders as "Source AS Name". Same fix swept through quick-start.md, investigation-playbooks.md, troubleshooting.md, visualization/time-series.md, visualization/summary-sankey.md so the doc text matches the dashboard label. - README.md, quick-start.md, anti-patterns.md, validation.md, visualization/summary-sankey.md: soften the "doubling by default" framing. Both ingress + egress export is a common configuration but not a property of the protocol, and vendor best practice is ingress-only. Wording now reflects that. - validation.md: note the udp_errors alert ships as `to: silent` by default; operators must override `to:` to receive notifications. - validation.md: add a `du -sh` example for cross-checking on-disk tier sizes (cross-link to sizing-capacity.md). Low-severity polish: - README.md: classifier expression language is an Akvorado-compatible subset (matches classifiers.md framing). - README.md: rollup tier note now mentions the dropped fields, so the tier auto-pick claim is not over-broad. - sizing-capacity.md: drop the "after rotation and compression" qualifier on the 800-bytes/flow figure (the bench window is too short to reflect rotation cycles); rephrase the ingest description and the "spinning rust" sentence; soften the BMP/BioRIS RSS guidance to a rough estimate with bench numbers anchored. - configuration.md: document the query_max_groups / query-max-groups alias; document enrichment.geoip.optional and the abort-vs-warn semantics; clarify that default_sampling_rate and override_sampling_rate both accept a single integer or a per-prefix map; mention the 100 MB rotation-size fallback when size_of_journal_files is null. - anti-patterns.md: add the missing "What it costs" line to sections 8 and 9 for shape consistency with sections 1-7.

Second round of per-page audits against source code surfaced several critical inaccuracies plus the usual long tail of low-severity polish. Critical / high: - retention-querying.md: rewrite the tier auto-pick rules. The previous thresholds were inverted ("8h20m and longer -> 1-hour") -- the actual planner walks coarsest first and accepts the first tier with at least 100 aligned buckets, so >=100h -> 1h, 8h20m..<100h -> 5m, 100min..<8h20m -> 1m. Verified at src/crates/netflow-plugin/src/query/planner/timeseries.rs:34-46 and TIMESERIES_MIN_BUCKETS=100 at src/crates/netflow-plugin/src/query/request/constants.rs:18. - retention-querying.md: rewrite the rollup-preserved field list. The previous list claimed AS path, BGP communities, MPLS labels, MACs, and post-NAT addresses survive into rollups -- they do not. The actual rollup tier carries only the fields defined in src/crates/netflow-plugin/src/tiering/rollup/schema/fields/defs/{core,exporter,interface,network,presence}.rs; every other field is raw-only and forces the query to the raw tier. - retention-querying.md: correct the "no data" / fallback semantics. The planner does NOT fall back to a coarser tier for raw-only queries -- rollups don't carry the field, so the span returns empty. Confirmed at src/crates/netflow-plugin/src/query/planner/prepare.rs:25-28 and src/crates/netflow-plugin/src/query/planner/spans.rs:99-105 (lower_fallback_candidate_tiers returns &[] for Raw). - validation.md: replace the "group by Sampling Rate field" verification with an SNMP-magnitude cross-check. The SAMPLING_RATE field is filtered out of supported_flow_field_names (src/crates/netflow-plugin/src/query/request/constants.rs:80), excluded from the groupable set (src/crates/netflow-plugin/src/query/fields/rules.rs:33), and not available as a facet (src/crates/netflow-plugin/src/facet_catalog.rs:123). Users cannot pick it from the dashboard. The honest verification path is SNMP magnitude or a per-prefix override. - validation.md: correct the alert threshold framing. RcvbufErrors is read with RRD_ALGORITHM_INCREMENTAL (src/collectors/proc.plugin/proc_net_netstat.c:400-434), so the value Netdata stores is per-second. The "lookup: average -1m absolute" plus "$this > 10" in src/health/health.d/udp_errors.conf means >10 errors/SECOND averaged over 1 minute, not >10/minute as previously stated. - quick-start.md: correct the field labels used in the doubling fix step from "Input Interface Name" / "Output Interface Name" to "Ingress Interface Name" / "Egress Interface Name", matching the display labels in src/crates/netflow-plugin/src/presentation/display.rs:39-40. Same fix swept across anti-patterns.md, troubleshooting.md, validation.md, investigation-playbooks.md, and visualization/summary-sankey.md. - quick-start.md: drop the false "60-second template refresh" claim for softflowd. softflowd's `expint` flag controls expiry-check interval, not template refresh; the NetFlow v9 template interval in softflowd is a compile-time default (NF9_DEFAULT_TEMPLATE_INTERVAL=16 in netflow9.c) with no CLI knob. - quick-start.md: complete the Juniper J-Flow snippet. The previous example defined a sampling instance but never bound it to a forwarding card and never set a sampling rate, so it would not produce flows. Add `set chassis fpc 0 sampling-instance NETDATA` and `set forwarding-options sampling instance NETDATA input rate 1000`, with a short note explaining the FPC binding requirement. - quick-start.md: correct the dashboard navigation step from "click the Network Flows tab" to "open the Live tab and select Network Flows from the Functions list". - ip-intelligence.md: correct the GeoLite2 / DB-IP / IPtoASN cadence claims. DB-IP Lite is monthly. MaxMind GeoLite2 City/Country update on weekdays; GeoLite2 ASN updates daily since June 2024. IPtoASN is not MMDB -- it is a public-domain TSV feed that includes both ASN and country and must be converted to MMDB before the plugin can read it (the plugin only supports MMDB). - ip-intelligence.md: correct the dual-stack guidance. Most current providers ship a single dual-stack MMDB; the previous "configure both an IPv4 file and a separate IPv6 file" advice was misleading. - ip-intelligence.md: clarify the asn_providers chain semantics. The `geoip` provider is a terminal "use 0" shortcut -- when reached the AS number is forced to 0 (the AS name still comes from the MMDB lookup independently). Confirmed at src/crates/netflow-plugin/src/enrichment/asn/resolve.rs:75-109. - ip-intelligence.md: distinguish the database-composition rules. ASN fields use pure last-wins; geo fields are written only when the matching record has a non-empty value, so a later database with an empty city does not overwrite an earlier database's city (src/crates/netflow-plugin/src/enrichment/data/geoip/decode.rs:40-72). Medium / low: - sizing-capacity.md: rewrite the example raw-tier YAML so the size cap matches the page's own 25k flows/s framing. The previous example used 200GB / 24h; at 25k flows/s the size cap would fire after ~2.8h, not 24h. Now uses 2TB / 24h with a paragraph on how to scale down for lighter loads, and explains the size-vs-duration relationship. - validation.md: drop the misleading `decoder_state_dir` config-key reference (it is a derived path, not a user-facing key); add the `-n` flag to the ss command to keep the port numeric in the output. - anti-patterns.md / troubleshooting.md / validation.md / visualization/summary-sankey.md: consistent doubling-framing hedge ("a common configuration; vendor best practice is ingress-only") across all five docs that mention doubling. - anti-patterns.md: summary-table row "Doubled aggregate" qualified with "(when ingress + egress are both exported)". - configuration.md: correct the `override_sampling_rate` default example from `{}` to `~` (the actual schema default is None).

retention-querying.md: - correct the Time-Series sub-100-min fallback. The planner walks coarsest-first and falls back to the 1-minute tier (TierKind::Minute1) when no tier has >=100 aligned buckets, not to raw. Verified at src/crates/netflow-plugin/src/query/planner/timeseries.rs:39-46. - split the field-eligibility list. The "force raw" set is exactly RAW_ONLY_FIELDS plus V9_*/IPFIX_* prefixes (src/crates/netflow-plugin/src/query/fields/rules.rs:5-11 + src/crates/netflow-plugin/src/query/request/constants.rs:46-57). The previously included AS path / BGP communities / MPLS labels / MAC addresses / NAT addresses do NOT switch tier; they are dropped from rollup output and return null on rollup queries. Page now describes both classes separately. troubleshooting.md: - correct Cisco's default template refresh from "30 minutes" to "600 seconds (10 minutes)" -- this is the IOS / IOS-XE Flexible NetFlow `template data timeout` default. - replace the wrong "/proc/net/udp ... RcvbufErrors column" recipe. The /proc/net/udp file lists open sockets without per-socket drop counters; the kernel-wide RcvbufErrors total lives under the Udp: line of /proc/net/snmp, which is what Netdata's proc.plugin reads (src/collectors/proc.plugin/proc_net_netstat.c:1521). Also dropped the contradictory "30-60 seconds" template-refresh hint. ip-intelligence.md: - correct the ASN composition rule. The previous "last-wins for ASN, but geo writes only when non-empty" framing was wrong -- src/crates/netflow-plugin/src/enrichment/data/geoip/decode.rs:3-28 filters empty / zero values for ASN fields just like geo, so both sets follow the same "last database with a non-empty value wins" rule. - correct the GeoLite2 cadence. MaxMind's documentation publishes City and Country twice weekly (Tuesday and Friday); GeoLite2 ASN moved to every weekday in June 2024.

…+ Enrichment Methods The previous category tree split enrichment-method integrations across three separate sub-categories (IP Intelligence, BGP Routing, Network Identity Sources), which made the integrations page navigation harder than it needed to be and did not match the conceptual model: the operator is choosing a *data source* for enrichment, regardless of what kind of data it produces. Collapsed under flows.enrichment-methods: - ip-intelligence: dbip, maxmind, iptoasn, custom-mmdb - bgp-routing: bmp, bioris - network-identity: aws-ip-ranges, gcp-ip-ranges, azure-ip-ranges, netbox, generic-ipam flows.sources keeps its existing membership (netflow / ipfix / sflow) and the "Flow Protocols" name unchanged. Both YAML files validated. Per-card content merges and the new cross-cutting "Enrichment" + "Enrichment Intel Downloader" pages land in subsequent commits.

…t integration cards Round-by-round, agent-per-card merges. Each card absorbed the durable provider-specific content from its corresponding concept page; cross-cutting content was extracted for the new "Enrichment" page (separate commit). The agents verified every behavioural claim against current source code at file:line and every upstream URL by WebFetch -- not a mechanical sweep. IP Intelligence: - dbip: framed as the auto-detected default, monthly Lite cadence, CC-BY-4.0, populated-fields breakdown (geo + ASN), AS0 Private/Unknown labels driven by the DB-IP-built ip_class flag, raw-tier-only city/lat/lon. - maxmind: GeoLite2 vs commercial GeoIP2 split, account-id + license-key auth, twice-weekly Tuesday/Friday cadence for City/Country, every-weekday for ASN since June 2024, geoipupdate setup. Important correction: the bundled topology-ip-intel-downloader does NOT support MaxMind (only dbip and iptoasn); the previous card's hint at the netdata downloader was misleading. - iptoasn: PDDL public-domain feed, hourly TSV cadence (the previous card said "daily" -- wrong), bundled topology-ip-intel-downloader natively supports the TSV->MMDB conversion (correcting an earlier prompt assumption to the contrary), three setup examples including ASN-only and combined with DB-IP geo. - custom-mmdb: reframed as the escape hatch for operators producing custom MMDBs (CIDR overlays, internal AS labels). Lists the field names the decoder reads from any MMDB, with file:line evidence; cites mmdbwriter libraries; recommends `optional: true` during build iteration. BGP Routing: - bmp: BMP-v3-only handling -- v1/v2 silently dropped (previously implicit). RFC 8671 cited for JunOS post-policy support since 18.3R1, separately from RFC 7854 which was previously lumped together. Cisco IOS-XE BMP added (was missing). Nokia SR OS added. JunOS minimum 13.3 documented. No IANA-registered port for BMP. - bioris: corrected the topology -- Netdata connects to a USER-RUN bio-rd cmd/ris/ daemon over user-supplied gRPC, NOT directly to RIPE RIS. The user's bio-rd daemon does the BGP/BMP peering with upstream sources. No shipped collector list (ris_instances is required and operator-supplied). Memory cost (~hundreds of MB per peer for full-table feeds), no eviction, raw-tier-only AS path / communities. Network Identity: - aws-ip-ranges: schema reference (top-level + per-entry), live cadence softened ("whenever AWS IP space changes, often several times per day" rather than the folklore "every 15 minutes" -- AWS docs do not promise any fixed schedule). Three jq examples including network_border_group as site. Plugin's actual config key is `transform`, not `jq_program` (was wrong in the migration prompt). - gcp-ip-ranges: cloud.json vs goog.json comparison; cloud.json today reports `service: "Google Cloud"` uniformly so per-service pivots are not possible from this file. No fixed Google cadence in the docs. - azure-ip-ranges: URL rotates weekly. Service Tag Discovery REST API as authoritative alternative. API data lags JSON file by up to four weeks; new IPs aren't used for at least one week after publication. Three workaround patterns documented honestly. - netbox: documented breaking change in NetBox 4.2 -- the `site` foreign key on Prefix was replaced with the generic `scope` field; the previous example used `(.site.name // "")` which silently breaks on 4.2+. New card ships scope-aware (4.x) and legacy (3.x/4.0/4.1) examples plus a fallback `(.scope.name // .site.name // "")`. Two token formats documented (legacy v1 hex and v2 nbt_<key>.<token>). - generic-ipam: full RemoteNetworkSourceConfig schema (13 options including proxy, tls.enable, tls.verify, tls.skip_verify with explicit "rejected by validation" notes). Honest call-outs: POST is sent without a body (fetch.rs:11-17), interval floored at 60s by service.rs:73, TLS verification cannot be disabled (validation/enrichment.rs:183-192). Categories: each card now under flows.enrichment-methods (the previous ip-intelligence / bgp-routing / network-identity sub-categories were collapsed in the previous commit). Generated .md files updated via integrations/gen_docs_integrations.py.

…ownloader page Three new cards under flows.enrichment-methods, completing the "every enrichment method is an integration" model. Each card was authored by an agent that read the corresponding concept page and the source code, then produced a metadata.yaml entry with every claim cited at file:line. The agents flagged real inconsistencies between the concept pages and the code; corrections were absorbed into the cards. static_metadata: - Three configuration surfaces: enrichment.metadata_static.exporters, enrichment.networks, and enrichment.override_sampling_rate (plus default_sampling_rate for the distinction). - Field-population table tied to source at apply/metadata.rs:41-53, data/network/write.rs:93-125, apply/metadata.rs:78-97. - Five corrections vs the previous concept page: boundary "undefined" vs numeric 0 are byte-identical in output; lookup priority is dominated by prefix specificity, not source-kind; override_sampling_rate matches the UDP datagram source IP; the "networks merges last and wins" framing was overstated. classifiers: - Two evaluation surfaces: exporter_classifiers + interface_classifiers, the latter called twice per flow (once per interface side). - Akvorado-compatible expression-language *subset* — explicitly only the operators and actions implemented at enrichment/classifiers/parse.rs; every example in the card uses syntax verified against the parser and the existing test suite. - Output normalisation includes "+" (concept page only listed ". -"); static metadata short-circuits classifier evaluation (enrichment/classify.rs:117-119, :150-154); first-write-wins per slot (runtime/eval/action.rs:43-46); default cache 5m with >=1s validation (defaults.rs:46-48, validation/enrichment.rs:10-12). decapsulation: - Two modes (srv6, vxlan) per protocol.rs:50-57; default none. - Three transport feeders: NetFlow v9 IE 104 (decoder.rs:80), IPFIX IE 315 (decoder.rs:74), sFlow SampledHeader (sflow/record.rs:44-69). - Inner parsers at decoder/common.rs:3-18 (VXLAN port 4789) and :35-63 (SRv6 SRH walker); merge points at packet/transport.rs:21-33 and record/packet/parse/transport.rs:14-21. - Vendor-verification finding: Cisco IOS-XE / IOS-XR collect datalink frame-section could NOT be verified (cisco.com 403'd anonymous WebFetch and Akvorado's IOS-XE recipe deliberately omits L2 frame- section export). The card flags Cisco support as unverified and instructs operators to validate by template inspection. Juniper inline-monitoring with datalink-frame-size confirmed via the Akvorado mirror; sFlow header sampling confirmed via the project's decoder. intel-downloader.md (new operator-tool page): - Documents /usr/sbin/topology-ip-intel-downloader -- supported sources, CLI flags, atomic replacement, auto-detect integration with the netflow plugin's 30s reload window. - Findings: no packaged systemd timer or cron file (operators must install their own; page provides a starter unit + timer pair); MaxMind support confirmed absent (no license_key field anywhere in config.go; only iptoasn:combined and dbip:asn-lite/country-lite/ city-lite are recognised by builtInSource); MaxMind users directed to geoipupdate as the alternative. - Hidden capability documented: interesting_cidrs config knob lets operators stamp public CIDRs as netdata.ip_class = "interesting" in both ASN and geo MMDBs (write.go:228-246). The categories.yaml category for all 14 enrichment methods is now flows.enrichment-methods (previously split across ip-intelligence / bgp-routing / network-identity).

…; update map.yaml Restructure phase 5+8+9: collapse the docs/network-flows/enrichment/ directory (7 concept pages: asn-resolution, bgp-routing, classifiers, decapsulation, ip-intelligence, network-identity, static-metadata) into ONE consolidated cross-cutting page at docs/network-flows/enrichment.md, plus per-method integration cards (already added in previous commits). The new Enrichment page (447 lines) is the single home for cross-cutting concepts that span every enrichment method: - Order of evaluation per flow record (8-step pipeline cited at apply/resolve.rs:5-50 and init.rs:50-64). - The two provider chains (asn_providers / net_providers, the geoip terminal "use 0" shortcut, the AS-number-vs-AS-name distinction). - Composition rules: specificity dominates, ties to static, per-field non-empty-wins merge. - The MMDB shared mechanism (auto-detect path order, last-non-empty-wins composition, 30s signature reload, IPv4/IPv6 dual-stack handling). - Network sources operational properties (fetch loop, 60s floor, jq schema, deny_unknown_fields, TLS-no-disable, no pagination, no auth helpers, POST-without-body, journal diagnostics). - Static-metadata-blocks-classifiers semantics. - Classifier evaluation surfaces and ordering. - Decapsulation inner-packet override. - Routing overlay (BMP+BioRIS shared trie). - Cross-method operational properties: refresh windows, restart behaviour, no in-process freshness signal, empty-tree disables enricher, rollup tier survival table, geographic accuracy, sampling-rate knobs, integration test gap. The page resolved several discrepancies the original concept pages had against the source code -- e.g. the "static metadata > classifiers > network sources > GeoIP > BGP routing" precedence claim was misleading (actual rule: specificity dominates, ties to static, merge primitive is non-empty-wins); the bmp alias for routing lives at providers.rs:10,12, not validation/enrichment.rs; the GeoIP terminal shortcut sets the AS number to 0 but the AS *name* still comes from the MMDB independently; POST is sent without a body (fetch.rs:11-17); interval is silently floored at 60s (service.rs:73). Each correction is cited at file:line. map.yaml: removed the "Flows Enrichment" sub-section (which contained the 7 deleted pages); added two new entries between Configuration and Field Reference: - "Enrichment" -> docs/network-flows/enrichment.md - "Enrichment Intel Downloader" -> docs/network-flows/intel-downloader.md Cross-references updated in 8 surviving pages to point at either the new Enrichment page (for cross-cutting concepts) or the relevant integration cards under src/crates/netflow-plugin/integrations/ (for per-method specifics): configuration, validation, intel-downloader, quick-start, installation, troubleshooting, visualization/maps-globe. The 7 deleted concept pages had their durable content fully absorbed into the Enrichment page (cross-cutting) and the 14 integration cards (per-method); each integration card cites file:line evidence for every behavioural claim and was re-verified against current source code by its merge agent.

The merge agents preserved Learn-URL cross-references from the original concept pages (e.g. https://learn.netdata.cloud/docs/network-flows/enrichment/ ip-intelligence). Those URLs now 404 because the seven concept pages were collapsed into one. Sweep them all to point at the new consolidated https://learn.netdata.cloud/docs/network-flows/enrichment page, which covers the cross-cutting concepts (MMDB shared mechanism, asn_providers chain, network-source operational properties, etc.) that the per-method references were calling out. Generated cards regenerated via gen_docs_integrations.py to flush the new URL into the .md outputs.

Copilot

Pull request overview

This pull request aligns Network Flows documentation and generated integration cards with current supported behavior, while also updating the netflow-plugin to (1) use per-tier retention configuration and (2) correctly persist timestamp_source into _SOURCE_REALTIME_TIMESTAMP.

Changes:

Fix netflow-plugin ingestion so timestamp_source persists the decoded source timestamp as _SOURCE_REALTIME_TIMESTAMP (keeping journal entry realtime as receive/write time) and add an end-to-end test.
Migrate journal retention configuration to per-tier journal.tiers.* and remove unsupported/obsolete journal query knobs.
Restructure/refresh Network Flows docs + integration cards (new categories, new visualization overview, updated enrichment/network-source cards, moved/removed older concept pages).

Reviewed changes

Copilot reviewed 60 out of 62 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/crates/netflow-plugin/src/startup_memory_tests.rs	Updates tests to rely on per-tier retention defaults.
src/crates/netflow-plugin/src/query/tests.rs	Minor formatting tweak in request deserialization tests.
src/crates/netflow-plugin/src/plugin_config/validation/journal.rs	Removes validation for deleted knobs; validates per-tier retention limits.
src/crates/netflow-plugin/src/plugin_config/types/journal.rs	Implements per-tier retention schema and removes retired journal query fields.
src/crates/netflow-plugin/src/plugin_config/defaults.rs	Removes now-unused retention override helpers and facet limit default.
src/crates/netflow-plugin/src/plugin_config.rs	Drops re-export of removed `RetentionLimitOverride`.
src/crates/netflow-plugin/src/plugin_config_tests.rs	Updates config tests for per-tier retention and removed knobs.
src/crates/netflow-plugin/src/memory_tests.rs	Updates fixtures to use per-tier retention config.
src/crates/netflow-plugin/src/main_tests.rs	Adds E2E test ensuring `_SOURCE_REALTIME_TIMESTAMP` reflects decoded source timestamp; adds helpers to read first raw journal entry fields.
src/crates/netflow-plugin/src/ingest/service/runtime.rs	Persists decoded source timestamp into journal entry timestamps via `_SOURCE_REALTIME_TIMESTAMP`.
src/crates/netflow-plugin/src/ingest_resource_bench_tests.rs	Minor formatting cleanups in benchmark test code.
src/crates/netflow-plugin/src/facet_runtime.rs	Minor formatting in tests.
src/crates/netflow-plugin/src/api/flows/handler.rs	Minor formatting of spawn_blocking autocomplete path.
src/crates/netflow-plugin/README.md	Updates documented config to per-tier retention and `query_max_groups` semantics.
src/crates/netflow-plugin/integrations/sflow.md	Updates generated integration card content and config options for per-tier retention and docs navigation.
src/crates/netflow-plugin/integrations/netflow.md	Updates generated NetFlow integration card for per-tier retention and revised guidance.
src/crates/netflow-plugin/integrations/maxmind_geoip_-_geolite2.md	Rewrites MaxMind integration card with clarified setup, limits, and composition behavior.
src/crates/netflow-plugin/integrations/iptoasn.md	Rewrites IPtoASN integration card (cadence, downloader behavior, composition).
src/crates/netflow-plugin/integrations/ipfix.md	Updates IPFIX integration card for per-tier retention and revised guidance.
src/crates/netflow-plugin/integrations/gcp_ip_ranges.md	Expands GCP network-source integration card (cloud.json vs goog.json, options, examples).
src/crates/netflow-plugin/integrations/decapsulation.md	Adds new generated decapsulation integration card under enrichment methods.
src/crates/netflow-plugin/integrations/db-ip_ip_intelligence.md	Updates DB-IP integration card with new defaults and operational guidance.
src/crates/netflow-plugin/integrations/custom_mmdb_database.md	Updates custom MMDB integration card (schema expectations, atomic refresh guidance).
src/crates/netflow-plugin/integrations/azure_ip_ranges.md	Expands Azure Service Tags integration card (URL rotation, transform schema, examples).
src/crates/netflow-plugin/integrations/aws_ip_ranges.md	Expands AWS IP ranges integration card (schema, transform requirements, options).
src/crates/netflow-plugin/configs/netflow.yaml	Migrates sample config to per-tier retention and removes retired query knobs.
integrations/categories.yaml	Restructures Network Flows categories (Flow Protocols + Enrichment Methods).
docs/network-flows/visualization/time-series.md	Updates tier terminology and defaults; describes raw-tier forcing conditions.
docs/network-flows/visualization/summary-sankey.md	Updates UI navigation and default group-by language (AS Name).
docs/network-flows/visualization/overview.md	Adds shared visualization/query behavior overview page.
docs/network-flows/visualization/maps-globe.md	Updates tier terminology and troubleshooting guidance for map/globe views.
docs/network-flows/visualization/filters-facets.md	Updates tier terminology and removes docs for deleted facet-limit knob.
docs/network-flows/visualization/dashboard-cards.md	Updates wording and troubleshooting commands for plugin health charts.
docs/network-flows/validation.md	Reworks validation routine and monitoring guidance.
docs/network-flows/troubleshooting.md	Updates troubleshooting steps, commands, and terminology; reflects per-tier retention docs.
docs/network-flows/README.md	Updates overview, defaults, and conceptual guidance (sampling, doubling, bidirectional traffic).
docs/network-flows/quick-start.md	Updates quick-start configs and dashboard navigation guidance.
docs/network-flows/investigation-playbooks.md	Updates playbooks for new terminology/default fields and UI navigation.
docs/network-flows/intel-downloader.md	Adds a new page documenting `topology-ip-intel-downloader` behavior and ops.
docs/network-flows/installation.md	Updates installation guidance and IP intelligence defaults/links.
docs/network-flows/field-reference.md	Updates field semantics text (timestamps, selectivity, observation time mapping).
docs/network-flows/enrichment/static-metadata.md	Removes old enrichment concept page (superseded by new structure/cards).
docs/network-flows/enrichment/network-identity.md	Removes old concept page (superseded by new structure/cards).
docs/network-flows/enrichment/decapsulation.md	Removes old decapsulation doc page (replaced by integration card/updated docs).
docs/network-flows/enrichment/classifiers.md	Removes old classifiers concept page (superseded by new structure/cards).
docs/network-flows/enrichment/bgp-routing.md	Removes old BGP routing concept page (superseded by new structure/cards).
docs/network-flows/enrichment/asn-resolution.md	Removes old ASN resolution concept page (superseded by new structure/cards).
docs/.map/map.yaml	Updates docs sidebar structure for Network Flows section and adds new pages.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ktsaou · 2026-05-07T23:14:02Z

@cubic-dev-ai please review again

cubic-dev-ai · 2026-05-07T23:14:12Z

@cubic-dev-ai please review again

@ktsaou I have started the AI code review. It will take a few minutes to complete.

cubic-dev-ai

9 issues found across 85 files

Confidence score: 4/5

This PR is likely safe to merge, but there is mild user-facing risk because several documentation updates in src/crates/netflow-plugin/integrations/*.md, docs/network-flows/enrichment.md, and .agents/skills/* are factually inaccurate (mostly severity 4/10, high confidence).
The most impactful issue is misleading setup/troubleshooting guidance in integration docs (for example iptoasn.md, aws_ip_ranges.md, and generic_json-over-http_ipam.md), which could cause incorrect configuration choices or confusion during onboarding rather than runtime breakage.
There is also a small portability concern in .agents/skills/codacy-audit/scripts/pr-issues.sh where bare mktemp may behave inconsistently across GNU/BSD/macOS; this is low severity and straightforward to harden.
Pay close attention to src/crates/netflow-plugin/integrations/iptoasn.md, src/crates/netflow-plugin/integrations/aws_ip_ranges.md, src/crates/netflow-plugin/integrations/generic_json-over-http_ipam.md, and docs/network-flows/enrichment.md - correct inaccurate behavior claims so users are not misled.

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name=".agents/skills/integrations-lifecycle/pipeline.md">

<violation number="1" location=".agents/skills/integrations-lifecycle/pipeline.md:455">
P2: This line overstates CI behavior: `check-markdown.yml` does not enforce that committed `COLLECTORS.md` is fresh, only that regenerated docs pass ingest.</violation>
</file>

<file name=".agents/skills/codacy-audit/scripts/pr-issues.sh">

<violation number="1" location=".agents/skills/codacy-audit/scripts/pr-issues.sh:77">
P3: Use an explicit portable mktemp template instead of bare `mktemp`.

(Based on your team's feedback about portable mktemp usage across GNU/BSD/macOS.) [FEEDBACK_USED]</violation>
</file>

<file name=".agents/skills/learn-pr-preview/SKILL.md">

<violation number="1" location=".agents/skills/learn-pr-preview/SKILL.md:45">
P2: The source-copy command omits intentional untracked docs files, so the documented uncommitted-preview flow is incomplete and can produce a stale preview.</violation>
</file>

<file name="src/crates/netflow-plugin/integrations/iptoasn.md">

<violation number="1" location="src/crates/netflow-plugin/integrations/iptoasn.md:52">
P2: The new wording implies `topology-ip-intel-downloader` is always bundled, but repository docs say packaged 32-bit installs do not include it. Add a platform/installation caveat to avoid broken setup instructions.</violation>
</file>

<file name="src/crates/netflow-plugin/integrations/netbox.md">

<violation number="1" location="src/crates/netflow-plugin/integrations/netbox.md:88">
P3: The performance note claims trie lookup for runtime network-source enrichment, but current code does a linear scan over source records.</violation>

<violation number="2" location="src/crates/netflow-plugin/integrations/netbox.md:309">
P3: The troubleshooting text says HTTP errors are silently consumed, but the refresher logs them as warnings.</violation>
</file>

<file name="src/crates/netflow-plugin/integrations/aws_ip_ranges.md">

<violation number="1" location="src/crates/netflow-plugin/integrations/aws_ip_ranges.md:231">
P2: The new docs incorrectly say `transform: "."` fails with an "empty result" error; with this plugin it typically fails because the emitted object has no required `prefix` field. Update this section to describe schema/row-mapping failure instead of empty output.</violation>
</file>

<file name="src/crates/netflow-plugin/integrations/generic_json-over-http_ipam.md">

<violation number="1" location="src/crates/netflow-plugin/integrations/generic_json-over-http_ipam.md:358">
P2: This new troubleshooting note is factually incorrect: URL-embedded credentials are handled by reqwest as HTTP Basic auth, so saying they "do not work" can mislead users.</violation>
</file>

<file name="docs/network-flows/enrichment.md">

<violation number="1" location="docs/network-flows/enrichment.md:30">
P2: This statement is inaccurate: provider-chain-only configuration can still enable the enricher, so it does not run *only* when an enrichment input source is configured.</violation>
</file>

Architecture diagram

sequenceDiagram
    participant Router as Router/Exporter
    participant UDP as UDP Socket
    participant Parser as Protocol Decoder
    participant Enrich as Enrichment Pipeline
    participant Journal as Journal (4 Tiers)
    participant Planner as Query Tier Planner
    participant Dashboard as Network Flows UI
    participant Intel as Intel Downloader
    participant MMDB as MMDB Files
    participant Config as netflow.yaml
    participant DocGen as Doc Generator
    participant Learn as learn.netdata.cloud

    Note over Router,Learn: PR repairs documentation, persistence, and defaults to match actual behavior

    Router->>UDP: Send NetFlow / IPFIX / sFlow datagrams
    UDP->>Parser: Decode raw protocol fields
    Parser->>Parser: Process timestamp_source config

    alt timestamp_source = netflow_first_switched
        Parser->>Journal: NEW: Write decoded source timestamp as _SOURCE_REALTIME_TIMESTAMP
        Parser->>Journal: Keep entry realtime as receive/write time
    else default (input)
        Parser->>Journal: Write receive time as _SOURCE_REALTIME_TIMESTAMP
    end

    Parser->>Enrich: Decoded flow record
    Enrich->>MMDB: GeoIP / ASN lookup (auto-detect or explicit path)
    MMDB-->>Enrich: Country, city, ASN data
    Enrich->>Enrich: Apply classifier rules, static metadata, provider chains

    Note over Enrich: Order: decode → decap → GeoIP → static → network sources → classifiers → routing overlay

    Enrich->>Journal: Write enriched record to raw tier
    Journal->>Journal: Rollup to 1m / 5m / 1h tiers (drops high-cardinality fields)

    Planner->>Journal: Select tier for query
    alt Raw-only fields (IP/port/city) or FTS
        Planner->>Journal: Force raw tier
        Journal-->>Planner: Raw data (bounded by raw retention)
    else Preserved fields (country/ASN/interface)
        Planner->>Journal: Use coarser tier per window alignment
        Journal-->>Planner: Aggregated data
    end

    Dashboard->>Planner: flows / autocomplete query
    Planner-->>Dashboard: Top-N groups, facet values
    Dashboard->>Dashboard: Render Sankey, Time-Series, Maps

    Intel->>Intel: CHANGED: Fetch DB-IP / IPtoASN data
    Intel->>MMDB: Atomic rename(2) of topology-ip-asn.mmdb, .geo.mmdb
    MMDB-->>Enrich: Reloads on file change within 30s

    Config->>Config: Removed top-level retention knobs and query guardrails
    Config->>Config: Per-tier retention only (raw, minute_1, minute_5, hour_1)
    Config->>Config: Removed query_1m_max_window, query_5m_max_window, query_facet_max_values_per_field

    DocGen->>DocGen: CHANGED: Generate integration cards under Flow Protocols / Enrichment Methods
    DocGen->>DocGen: Collapse 7 enrichment pages into single Enrichment guide
    DocGen->>DocGen: CHANGED: flows category treated as Monitor Anything section
    DocGen->>Learn: Publish regenerated COLLECTORS.md, integration cards, docs
    Learn-->>Dashboard: Ingest for in-app help and search

_{Note: This PR contains a large number of files. cubic only reviews up to 75 files per PR, so some files may not have been reviewed. cubic prioritizes the most important files to review.

On a pro plan you can use ultrareview for larger PRs.}

Copilot

Pull request overview

Copilot reviewed 82 out of 85 changed files in this pull request and generated 2 comments.

ktsaou · 2026-05-07T23:33:49Z

@cubic-dev-ai please review again

cubic-dev-ai · 2026-05-07T23:34:24Z

@cubic-dev-ai please review again

@ktsaou I have started the AI code review. It will take a few minutes to complete.

Copilot

Pull request overview

Copilot reviewed 82 out of 85 changed files in this pull request and generated 2 comments.

ktsaou · 2026-05-07T23:44:51Z

@cubic-dev-ai please review again

cubic-dev-ai · 2026-05-07T23:45:00Z

@cubic-dev-ai please review again

@ktsaou I have started the AI code review. It will take a few minutes to complete.

sonarqubecloud · 2026-05-07T23:46:00Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Copilot

Pull request overview

Copilot reviewed 82 out of 85 changed files in this pull request and generated no new comments.

ktsaou added 30 commits May 7, 2026 19:09

docs(network-flows): drop the trailing "use the sidebar" bullet from …

36c4ca5

…Overview Per user: the bullet was redundant with the F22 cleanup. Sidebar guidance is implicit; the four role-based bullets above are the intended "where to start" entry points.

Repair Network Flows documentation

e7ed808

ktsaou requested a review from Copilot May 7, 2026 21:23

ktsaou marked this pull request as ready for review May 7, 2026 21:23

ktsaou requested review from Ancairon, Ferroin, kanelatechnical and vkalintiris as code owners May 7, 2026 21:23

Copilot started reviewing on behalf of ktsaou May 7, 2026 21:27 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

Comment thread src/crates/netflow-plugin/src/plugin_config/types/journal.rs Outdated

Comment thread docs/network-flows/visualization/time-series.md Outdated

Comment thread docs/network-flows/visualization/maps-globe.md Outdated

Address Network Flows documentation regressions

0a3eda6

github-actions Bot added the area/collectors Everything related to data collection label May 7, 2026

ktsaou requested a review from Copilot May 7, 2026 23:14

Copilot started reviewing on behalf of ktsaou May 7, 2026 23:14 View session

cubic-dev-ai Bot reviewed May 7, 2026

View reviewed changes

Copilot AI reviewed May 7, 2026

View reviewed changes

Comment thread src/crates/netflow-plugin/src/plugin_config/types/journal.rs

Comment thread docs/network-flows/visualization/overview.md Outdated

Fix Network Flows review regressions

6f58059

ktsaou requested a review from Copilot May 7, 2026 23:33

Copilot started reviewing on behalf of ktsaou May 7, 2026 23:34 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

Comment thread src/crates/netflow-plugin/src/ingest/rebuild.rs Outdated

Comment thread src/crates/netflow-plugin/src/ingest/rebuild.rs Outdated

Move raw rebuild scan off async startup path

165e02c

ktsaou requested a review from Copilot May 7, 2026 23:44

Copilot started reviewing on behalf of ktsaou May 7, 2026 23:45 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

ktsaou merged commit 3562289 into netdata:master May 8, 2026
130 of 147 checks passed

Conversation

ktsaou commented May 7, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Summary by cubic

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ktsaou commented May 7, 2026

Uh oh!

cubic-dev-ai Bot commented May 7, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

ktsaou commented May 7, 2026

Uh oh!

cubic-dev-ai Bot commented May 7, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

ktsaou commented May 7, 2026

Uh oh!

cubic-dev-ai Bot commented May 7, 2026

Uh oh!

sonarqubecloud Bot commented May 7, 2026

Quality Gate passed

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ktsaou commented May 7, 2026 •

edited by cubic-dev-ai Bot

Loading