Skip to content

Conversation

@Frando
Copy link
Member

@Frando Frando commented Nov 17, 2025

Description

Based on #3672

Add back some of the path congestion metrics added in #3491. @Arqu please check my port of these metrics and if the calculations are correct. I commented-out the metrics added in #3491 that I didn't know how to port. But I think I got the important ones.

Breaking Changes

Notes & open questions

Change checklist

  • Self-review.
  • Documentation updates following the style guide, if relevant.
  • Tests if relevant.
  • All breaking changes documented.
    • List all breaking changes in the above "Breaking Changes" section.
    • Open an issue or PR on any number0 repos that are affected by this breaking change. Give guidance on how the updates should be handled or do the actual updates themselves. The major ones are:

@github-actions
Copy link

github-actions bot commented Nov 17, 2025

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/3669/docs/iroh/

Last updated: 2025-11-18T10:49:06Z

@Frando Frando linked an issue Nov 17, 2025 that may be closed by this pull request
@n0bot n0bot bot added this to iroh Nov 17, 2025
@github-project-automation github-project-automation bot moved this to 🏗 In progress in iroh Nov 17, 2025
@@ -220,6 +226,7 @@ impl EndpointStateActor {
trace!("actor started");
let idle_timeout = MaybeFuture::None;
tokio::pin!(idle_timeout);
let mut metrics_interval = time::interval(METRICS_INTERVAL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be cfged as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failed to get the cfg be happy with tokio::select!. Maybe should be forever-pending then if the cfg is not active?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, sth like that I have done in the past

Copy link
Contributor

@flub flub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to limit this to figure out what the easy and interesting counters to add. Nothing more. If we need congestion metrics we can do that in a later pass.

@@ -75,6 +78,9 @@ const APPLICATION_ABANDON_PATH: u8 = 30;
/// in a high frequency, and to keep data about previous path around for subsequent connections.
const ACTOR_MAX_IDLE_TIMEOUT: Duration = Duration::from_secs(60);

/// Interval in which connection and path metrics are emitted.
const METRICS_INTERVAL: Duration = Duration::from_secs(10);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Soo, I'm going to argue that this is fundamentally doing metrics wrong. Metrics increment counters or record historgrams at discreet points. If you are recording something on a timer you were supposed to provide the metrics that can compute the thing, and the metrics backend can compute it on the fly over the range it wants, using sampling intervals it wants etc etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be discreet points to record connection latency at? Open/close path events?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is metrics backend code. I'll have a much easier time with this PR if we don't try and do anything like that here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is present on main. It is not new, just ported from main to feat-multipath. See #3606 and #3491.

Copy link
Member Author

@Frando Frando Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to not do this here, but IIRC these are the metrics we are most interested in to compare feat-multipath to main via n0des? Correct me if I'm wrong please, @Arqu .

Edit: I've split the PR, the simple metrics are now here #3672 and this PR is about adding back the congestion metrics.

fn record_metrics(&mut self) {
#[cfg(feature = "metrics")]
for state in self.connections.values_mut() {
state.record_metrics_periodic(&self.metrics, self.selected_path.get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry if I am missing sth, but should this be the path for the specific connection that we pass in here, this seems like a generic one?

Copy link
Member Author

@Frando Frando Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a single selected path per ~~connection ~~remote endpoint. This is passed through here, because the metrics tracking code further down uses the RTT of the selected path as the connection's RTT. (There is no connection-level RTT exposed anymore with multipath on the quinn level).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but why is it the same for all connections?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the selected path logic works on the level of the remote endpoint, and the result is then applied to all connections.

@Frando
Copy link
Member Author

Frando commented Nov 17, 2025

I'd suggest to limit this to figure out what the easy and interesting counters to add. Nothing more. If we need congestion metrics we can do that in a later pass.

This is present on main. It is not new, just ported from main to feat-multipath. See #3606 and #3491.

Happy to not do this here and only the simple counters initially, but IIRC these are the metrics we are most interested in to compare feat-multipath to main via n0des? Correct me if I'm wrong please, @Arqu .

@Frando Frando changed the base branch from feat-multipath to Frando/mp-metrics-basics November 17, 2025 15:22
@Frando Frando changed the title feat: add back metrics to multipath feat: add back path congestion metris to multipath Nov 17, 2025
@Frando Frando changed the title feat: add back path congestion metris to multipath feat: add back path congestion metrics to multipath Nov 17, 2025
@Frando
Copy link
Member Author

Frando commented Nov 17, 2025

I've split this PR: The basic, simple counter metrics are now in #3672. This PR is now based on 3672 and adds back the congestion metrics.

@Frando Frando changed the title feat: add back path congestion metrics to multipath feat(multipath): add back path congestion metrics Nov 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🏗 In progress

Development

Successfully merging this pull request may close these issues.

Multipath metrics

4 participants