Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special case the error returned when we have a Raft leader but are not tracking it in the ServerLookup #9487

Merged
merged 1 commit into from Jan 4, 2021

Conversation

mkeeler
Copy link
Member

@mkeeler mkeeler commented Jan 4, 2021

This can happen when one other node in the cluster such as a client is unable to communicate with the leader server and sees it as failed. When that happens its failing status eventually gets propagated to the other servers in the cluster and eventually this can result in RPCs returning “No cluster leader” error.

That error is misleading and unhelpful for determing the root cause of the issue as its not raft stability but rather and client -> server networking issue. Therefore this commit will add a new error that will be returned in that case to differentiate between the two cases.

@mkeeler mkeeler added type/bug Feature does not function as expected backport/1.7 labels Jan 4, 2021
@mkeeler mkeeler requested a review from a team January 4, 2021 16:38
Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Fix makes sense to me

agent/consul/rpc_test.go Show resolved Hide resolved
agent/structs/errors.go Outdated Show resolved Hide resolved
…t tracking it in the ServerLookup

This can happen when one other node in the cluster such as a client is unable to communicate with the leader server and sees it as failed. When that happens its failing status eventually gets propagated to the other servers in the cluster and eventually this can result in RPCs returning “No cluster leader” error.

That error is misleading and unhelpful for determing the root cause of the issue as its not raft stability but rather and client -> server networking issue. Therefore this commit will add a new error that will be returned in that case to differentiate between the two cases.
Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mkeeler mkeeler merged commit 85e5da5 into master Jan 4, 2021
@mkeeler mkeeler deleted the bugfix/no-cluster-leader branch January 4, 2021 19:05
@hashicorp-ci
Copy link
Contributor

🍒 If backport labels were added before merging, cherry-picking will start automatically.

To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/303375.

@hashicorp-ci
Copy link
Contributor

🍒✅ Cherry pick of commit 85e5da5 onto release/1.9.x succeeded!

hashicorp-ci pushed a commit that referenced this pull request Jan 4, 2021
…t tracking it in the ServerLookup (#9487)

This can happen when one other node in the cluster such as a client is unable to communicate with the leader server and sees it as failed. When that happens its failing status eventually gets propagated to the other servers in the cluster and eventually this can result in RPCs returning “No cluster leader” error.

That error is misleading and unhelpful for determing the root cause of the issue as its not raft stability but rather and client -> server networking issue. Therefore this commit will add a new error that will be returned in that case to differentiate between the two cases.
@hashicorp-ci
Copy link
Contributor

🍒✅ Cherry pick of commit 85e5da5 onto release/1.8.x succeeded!

hashicorp-ci pushed a commit that referenced this pull request Jan 4, 2021
…t tracking it in the ServerLookup (#9487)

This can happen when one other node in the cluster such as a client is unable to communicate with the leader server and sees it as failed. When that happens its failing status eventually gets propagated to the other servers in the cluster and eventually this can result in RPCs returning “No cluster leader” error.

That error is misleading and unhelpful for determing the root cause of the issue as its not raft stability but rather and client -> server networking issue. Therefore this commit will add a new error that will be returned in that case to differentiate between the two cases.
@hashicorp-ci
Copy link
Contributor

🍒❌ Cherry pick of commit 85e5da5 onto release/1.7.x failed! Build Log

mkeeler added a commit that referenced this pull request Jan 4, 2021
mkeeler added a commit that referenced this pull request Jan 4, 2021
…t tracking it in the ServerLookup (#9487)

This can happen when one other node in the cluster such as a client is unable to communicate with the leader server and sees it as failed. When that happens its failing status eventually gets propagated to the other servers in the cluster and eventually this can result in RPCs returning “No cluster leader” error.

That error is misleading and unhelpful for determing the root cause of the issue as its not raft stability but rather and client -> server networking issue. Therefore this commit will add a new error that will be returned in that case to differentiate between the two cases.
mkeeler added a commit that referenced this pull request Jan 5, 2021
This can happen when one other node in the cluster such as a client is unable to communicate with the leader server and sees it as failed. When that happens its failing status eventually gets propagated to the other servers in the cluster and eventually this can result in RPCs returning “No cluster leader” error.

That error is misleading and unhelpful for determing the root cause of the issue as its not raft stability but rather and client -> server networking issue. Therefore this commit will add a new error that will be returned in that case to differentiate between the two cases.
mkeeler added a commit that referenced this pull request Jan 5, 2021
hashicorp-ci pushed a commit that referenced this pull request Jan 5, 2021
hashicorp-ci pushed a commit that referenced this pull request Jan 5, 2021
hashicorp-ci pushed a commit that referenced this pull request Jan 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Feature does not function as expected
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants