Skip to content

Refactor and fix network tracing integration in controllerconn#5648

Merged
eriknordmark merged 1 commit intolf-edge:masterfrom
milan-zededa:fix-nettrace
Mar 6, 2026
Merged

Refactor and fix network tracing integration in controllerconn#5648
eriknordmark merged 1 commit intolf-edge:masterfrom
milan-zededa:fix-nettrace

Conversation

@milan-zededa
Copy link
Copy Markdown
Contributor

@milan-zededa milan-zededa commented Mar 4, 2026

Description

  • Update eve-libs/nettrace to include the fix where HTTPClient.GetTrace() flushes all in-memory data and blocks until all asynchronous batch offloads are completed. This guarantees that exported traces are complete.

  • Extract tracing setup and teardown logic into helper methods (prepareNetworkTracing and processNetworkTraces) to remove duplication and unify trace lifecycle handling.

  • Fix trace handling in handleHTTPReqFailure. After introducing Bolt-based batch offloading, this path was not updated accordingly and continued to append in-memory traces to netdump. Since traces are offloaded to Bolt in batch mode, the in-memory structures were empty, resulting in missing network traces precisely when HTTP requests failed (and when traces are the most needed to troubleshoot).

    The failure path now uses the same processing logic as the success path, ensuring that traces are properly flushed, exported, and attached to SendRetval.

  • Fix netdump tar path to properly use the per-session directory name

How to test and validate this PR

To test and validate this PR, deploy an edge node running an EVE version that includes the fix. SSH into the node and navigate to /persist/netdump, where collected netdump tarballs are stored. Extract one of the tarballs and inspect the requests/<req-name>/ directories. Verify that these directories are not empty and that each contains a valid nettrace.json file with non-empty network trace data, including HTTP, TLS, TCP, and (if applicable) DNS traces. It is preferred to check NIM netdumps for both successful and failed HTTP requests if available (i.e. nim-ok-* and nim-fail-* tarballs), since these were most affected by the previous issues and are typically smaller and easier to review than larger downloader netdumps. Both successful and failed requests should now contain complete and properly exported traces.

Changelog notes

Fix nettrace handling to ensure complete trace export (including failed requests)

PR Backports

Checklist

  • I've provided a proper description
  • I've added the proper documentation
  • I've tested my PR on amd64 device
  • I've tested my PR on arm64 device
  • I've written the test verification instructions
  • I've set the proper labels to this PR
  • I've checked the boxes above, or I've provided a good reason why I didn't check them.

@milan-zededa milan-zededa added bug Something isn't working stable Should be backported to stable release(s) labels Mar 4, 2026
@milan-zededa milan-zededa changed the title Refactor and fix network tracing integration in controllerconn. Refactor and fix network tracing integration in controllerconn Mar 4, 2026
- Update eve-libs/nettrace to include the fix where HTTPClient.GetTrace()
  flushes all in-memory data and blocks until all asynchronous batch
  offloads are completed. This guarantees that exported traces are complete.

- Extract tracing setup and teardown logic into helper methods
  (prepareNetworkTracing and processNetworkTraces) to remove duplication
  and unify trace lifecycle handling.

- Fix trace handling in handleHTTPReqFailure. After introducing Bolt-based
  batch offloading, this path was not updated accordingly and continued to
  append in-memory traces to netdump. Since traces are offloaded to Bolt in
  batch mode, the in-memory structures were empty, resulting in missing
  network traces precisely when HTTP requests failed.

  The failure path now uses the same processing logic as the success path,
  ensuring that traces are properly flushed, exported, and attached to
  SendRetval.

- Fix netdump tar path to properly use the per-session directory name

Signed-off-by: Milan Lenco <milan@zededa.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 27.96%. Comparing base (2281599) to head (22cac5e).
⚠️ Report is 331 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5648      +/-   ##
==========================================
+ Coverage   19.52%   27.96%   +8.43%     
==========================================
  Files          19       18       -1     
  Lines        3021     2417     -604     
==========================================
+ Hits          590      676      +86     
+ Misses       2310     1591     -719     
- Partials      121      150      +29     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Do we have a test where we can verify that the memory usage doesn't increase (and maybe even decreases as you use Bolt differently)?

@milan-zededa
Copy link
Copy Markdown
Contributor Author

LGTM

Do we have a test where we can verify that the memory usage doesn't increase (and maybe even decreases as you use Bolt differently)?

I’m not aware of any test specifically verifying that memory usage remains stable or decreases with the updated Bolt usage. Perhaps @kperakis-zededa has created or experimented with something along those lines.

@eriknordmark eriknordmark merged commit 4166493 into lf-edge:master Mar 6, 2026
47 of 49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working stable Should be backported to stable release(s)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants