Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resolve: Add a Reset state #633

Merged
merged 5 commits into from
Aug 24, 2020
Merged

resolve: Add a Reset state #633

merged 5 commits into from
Aug 24, 2020

Conversation

olix0r
Copy link
Member

@olix0r olix0r commented Aug 22, 2020

The recovery layer needs to maintain its own cache for reconcilation (in
addition to the other cache layers we keep). This caching can be
eliminated by introducing a resolve::Update::Reset state that implicitly
remotes all endpoints not present in the endpoint set.

The recovery layer needs to maintain its own cache for reconcilation (in
addition to the other cache layers we keep). This caching can be
eliminated by introducing a resolve::Update::Reset state that implicitly
remotes all endpoints not present in the endpoint set.
@olix0r olix0r requested a review from a team August 22, 2020 17:04
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems reasonable to me. i had a couple questions.

linkerd/proxy/discover/src/from_resolve.rs Outdated Show resolved Hide resolved
linkerd/proxy/resolve/src/map_endpoint.rs Outdated Show resolved Hide resolved
linkerd/proxy/resolve/src/map_endpoint.rs Outdated Show resolved Hide resolved
Comment on lines 169 to 172
let update = match initial {
Update::Add(eps) => Update::Reset(eps),
up => up,
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to make sure i'm understanding correctly: if the last update was an Add, we send a Reset after reconnecting to nuke those endpoints. but what if the last update before the connection ended was not an Add? could it have been a Remove that only removed some previously sent endpoints? don't we need to do a complete reset regardless?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite.

If the first update was an add, we send a reset. I.e. we've been disconnected, so we could have missed removals. The first add, then, has the full state of the service. It doesn't make any sense to receive a Remove as the first update after a connect--we handle that state explicitly now.

I've tried to clarify this in the code.

Copy link
Member

@zaharidichev zaharidichev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@olix0r olix0r merged commit 692abb8 into main Aug 24, 2020
@olix0r olix0r deleted the ver/resolve-reset branch August 24, 2020 14:29
Comment on lines +257 to +271
Some(Ok(Update::Remove(_))) => {
debug_assert!(false, "First update after connection must not be a Remove");
// If we see this in production, just ignore and keep polling until an
// actionable update is received.
tracing::debug!("Ignoring removal after connect");
continue;
}
Some(Ok(update)) => {
tracing::debug!("Connected");
// If the first update is an Add; convert it to a Reset, indicating all
// other endpoints should be removed.
let update = match update {
Update::Add(eps) => Update::Reset(eps),
up => up,
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is much clearer, thanks!

olix0r added a commit to linkerd/linkerd2 that referenced this pull request Aug 26, 2020
This release includes internal changes to the service discovery system,
especially when discovering control plane components (like the
destination and identity controllers). Now, the proxy attempts to
balance requests across all pods in each control plane service. This
requires control plane changes to use "headless" services so that SRV
records are exposed. When the control plane services have a `clusterIP`
set, the proxy falls back to using normal A-record lookups.

---

* tracing: add richer verbose spans to http clients (linkerd/linkerd2-proxy#622)
* trace: update tracing dependencies (linkerd/linkerd2-proxy#623)
* Remove `Resolution` trait (linkerd/linkerd2-proxy#606)
* Update proxy-identity to edge-20.8.2 (linkerd/linkerd2-proxy#627)
* Add build arg for skipping identity wrapper (linkerd/linkerd2-proxy#624)
* Wait for proxy thread to terminate in integration tests (linkerd/linkerd2-proxy#625)
* Remove scrubbing for unused headers (linkerd/linkerd2-proxy#628)
* Split orig-proto tests out of discovery tests (linkerd/linkerd2-proxy#629)
* Re-enable outbound timeout test (linkerd/linkerd2-proxy#630)
* profiles: perform profile resolution for IP addresses (linkerd/linkerd2-proxy#626)
* Move resolve api to async-stream (linkerd/linkerd2-proxy#599)
* Decouple discovery buffering from endpoint conversion (linkerd/linkerd2-proxy#631)
* resolve: Add a Reset state (linkerd/linkerd2-proxy#633)
* resolve: Eagerly fail resolutions (linkerd/linkerd2-proxy#634)
* test: replace `net2` dependency with `socket2` (linkerd/linkerd2-proxy#635)
* dns: Run DNS resolutions on the main runtime (linkerd/linkerd2-proxy#637)
* Load balance requests to the control plane (linkerd/linkerd2-proxy#594)
* Unify control plane client construction (linkerd/linkerd2-proxy#638)
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Aug 26, 2020
This release includes internal changes to the service discovery system,
especially when discovering control plane components (like the
destination and identity controllers). Now, the proxy attempts to
balance requests across all pods in each control plane service. This
requires control plane changes to use "headless" services so that SRV
records are exposed. When the control plane services have a `clusterIP`
set, the proxy falls back to using normal A-record lookups.

---

* tracing: add richer verbose spans to http clients (linkerd/linkerd2-proxy#622)
* trace: update tracing dependencies (linkerd/linkerd2-proxy#623)
* Remove `Resolution` trait (linkerd/linkerd2-proxy#606)
* Update proxy-identity to edge-20.8.2 (linkerd/linkerd2-proxy#627)
* Add build arg for skipping identity wrapper (linkerd/linkerd2-proxy#624)
* Wait for proxy thread to terminate in integration tests (linkerd/linkerd2-proxy#625)
* Remove scrubbing for unused headers (linkerd/linkerd2-proxy#628)
* Split orig-proto tests out of discovery tests (linkerd/linkerd2-proxy#629)
* Re-enable outbound timeout test (linkerd/linkerd2-proxy#630)
* profiles: perform profile resolution for IP addresses (linkerd/linkerd2-proxy#626)
* Move resolve api to async-stream (linkerd/linkerd2-proxy#599)
* Decouple discovery buffering from endpoint conversion (linkerd/linkerd2-proxy#631)
* resolve: Add a Reset state (linkerd/linkerd2-proxy#633)
* resolve: Eagerly fail resolutions (linkerd/linkerd2-proxy#634)
* test: replace `net2` dependency with `socket2` (linkerd/linkerd2-proxy#635)
* dns: Run DNS resolutions on the main runtime (linkerd/linkerd2-proxy#637)
* Load balance requests to the control plane (linkerd/linkerd2-proxy#594)
* Unify control plane client construction (linkerd/linkerd2-proxy#638)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants