Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order name servers by SRTT #1784

Merged
merged 7 commits into from
Oct 10, 2022
Merged

Order name servers by SRTT #1784

merged 7 commits into from
Oct 10, 2022

Conversation

nhurley3
Copy link
Contributor

Calculates the smoothed round-trip time (SRTT) using an exponentially weighted moving average and orders the name servers by the value. The implemented algorithm is similar to what was discussed in #1702.

Fixes #1702 (originally filed by @peterthejohnston)

Calculates the smoothed round-trip time (SRTT) using an exponentially weighted moving average and orders the name servers by the value. The implemented algorithm is similar to what was discussed in hickory-dns#1702.

Fixes hickory-dns#1702
@nhurley3
Copy link
Contributor Author

@bluejekyll and @djc, here's a draft PR that implements a version of the algorithm mentioned in #1702 (comment). Let me know what you think. Thanks!

The actual f64::total_cmp imlementation is only stable in Rust 1.62.0
and beyond.
Copy link
Member

@bluejekyll bluejekyll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so with these changes, we’ll be ignoring the success and failures from the algorithm and only paying attention to request time. I think I’m ok with this, but it’s worth a discussion before landing.

crates/resolver/src/name_server/name_server.rs Outdated Show resolved Hide resolved
crates/resolver/src/name_server/name_server_stats.rs Outdated Show resolved Hide resolved
crates/resolver/src/name_server/name_server_stats.rs Outdated Show resolved Hide resolved
crates/resolver/src/name_server/name_server_stats.rs Outdated Show resolved Hide resolved
crates/resolver/src/name_server/name_server_stats.rs Outdated Show resolved Hide resolved
crates/resolver/src/name_server/name_server_stats.rs Outdated Show resolved Hide resolved
crates/resolver/src/name_server/name_server_stats.rs Outdated Show resolved Hide resolved
crates/resolver/src/name_server/name_server_stats.rs Outdated Show resolved Hide resolved
crates/resolver/src/name_server/name_server_stats.rs Outdated Show resolved Hide resolved
crates/resolver/src/name_server/name_server_stats.rs Outdated Show resolved Hide resolved
Additionally, implement a few minor fixes from feedback.
@nhurley3
Copy link
Contributor Author

nhurley3 commented Oct 3, 2022

Ok, so with these changes, we’ll be ignoring the success and failures from the algorithm and only paying attention to request time. I think I’m ok with this, but it’s worth a discussion before landing.

This algorithm doesn't explicitly track the number of successes/failures, but they are still incorporated in some fashion. Successes now correspond to recorded latencies. Similarly, failures result in a penalty being applied to the SRTT value, which serves as a backoff.

Coupling the failure state with the SRTT value is somewhat of an arbitrary decision that simplifies things. One potentially interesting alternative could be to effectively remove failed servers from the rotation and probe them in the background (somewhat similar to what Unbound does). Of course, the tradeoff is increased complexity and additional resource cost. Happy to consider other alternatives as well. There's certainly more than 1 reasonable approach here.

@codecov
Copy link

codecov bot commented Oct 10, 2022

Codecov Report

Merging #1784 (7bf15a6) into main (a88a7a5) will increase coverage by 0.09%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1784      +/-   ##
==========================================
+ Coverage   80.03%   80.12%   +0.09%     
==========================================
  Files         177      177              
  Lines       18108    18196      +88     
==========================================
+ Hits        14492    14580      +88     
  Misses       3616     3616              
Impacted Files Coverage Δ
crates/resolver/src/name_server/name_server.rs 96.89% <100.00%> (-0.80%) ⬇️
...ates/resolver/src/name_server/name_server_stats.rs 99.24% <100.00%> (+3.89%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a88a7a5...7bf15a6. Read the comment docs.

@bluejekyll
Copy link
Member

Thanks for the work and getting this completed. It looks like a really good improvement. Merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Name server sorting is problematic
3 participants