Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: serve member list API with linearizable guarantee #11639

Merged
merged 1 commit into from
Mar 27, 2020

Conversation

jingyih
Copy link
Contributor

@jingyih jingyih commented Feb 19, 2020

Fixes #11198

@@ -131,6 +134,15 @@ func (c *cluster) MemberList(ctx context.Context) (*MemberListResponse, error) {
return nil, toErr(ctx, err)
}

func (c *cluster) MemberListLinearizable(ctx context.Context) (*MemberListResponse, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: do we want to add new API MemberListLinearizable? Or do we want to change the function signature of the existing API MemberList by adding an additional input parameter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we giving choice to user between current stale information and linearzable guarantee? Per the issue, thought we will always go with linearzable guarantee. I believe when you said changing the current signature with an additional input - you mean a bool for linearzable?

Copy link
Contributor

@gyuho gyuho Feb 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for making it default (update the existing MemberList call)

/cc @xiang90 @jpbetz

Copy link
Contributor Author

@jingyih jingyih Feb 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of making linearizable the new default for MemberList call. Do we want to provide users the option (an input flag to MemberList) to get stale member list? (which is similar to serializable range request). Previously, users could get stale member list from a member that is isolated in the cluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe when you said changing the current signature with an additional input - you mean a bool for linearzable?

Yes.

@codecov-io
Copy link

codecov-io commented Feb 19, 2020

Codecov Report

Merging #11639 into master will decrease coverage by 0.63%.
The diff coverage is 71.42%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #11639      +/-   ##
==========================================
- Coverage   66.65%   66.01%   -0.64%     
==========================================
  Files         403      403              
  Lines       36744    36768      +24     
==========================================
- Hits        24493    24274     -219     
- Misses      10753    10986     +233     
- Partials     1498     1508      +10     
Impacted Files Coverage Δ
etcdserver/api/v3rpc/member.go 87.09% <50.00%> (-2.74%) ⬇️
clientv3/cluster.go 91.83% <100.00%> (ø)
etcdserver/v3_server.go 73.80% <100.00%> (-0.33%) ⬇️
clientv3/snapshot/util.go 0.00% <0.00%> (-20.00%) ⬇️
auth/store.go 52.05% <0.00%> (-19.42%) ⬇️
client/keys.go 73.86% <0.00%> (-17.59%) ⬇️
auth/range_perm_cache.go 51.42% <0.00%> (-17.15%) ⬇️
proxy/grpcproxy/register.go 72.50% <0.00%> (-10.00%) ⬇️
pkg/tlsutil/tlsutil.go 86.20% <0.00%> (-6.90%) ⬇️
raft/tracker/inflights.go 91.83% <0.00%> (-4.09%) ⬇️
... and 26 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6325754...0344b70. Read the comment docs.

@@ -34,6 +34,12 @@
"schema": {
"$ref": "#/definitions/etcdserverpbAuthenticateResponse"
}
},
"default": {
"description": "An unexpected error response",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, if you update commit for other changes, it would be good to keep the description consistent with rest of the file in this file and other swagger file by ending sentence with a period.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... this is actually auto generated by the genproto.sh script. Not sure why it is not consistent with the existing contents.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! Not a big thing to worry about.

@@ -131,6 +134,15 @@ func (c *cluster) MemberList(ctx context.Context) (*MemberListResponse, error) {
return nil, toErr(ctx, err)
}

func (c *cluster) MemberListLinearizable(ctx context.Context) (*MemberListResponse, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we giving choice to user between current stale information and linearzable guarantee? Per the issue, thought we will always go with linearzable guarantee. I believe when you said changing the current signature with an additional input - you mean a bool for linearzable?

@jingyih
Copy link
Contributor Author

jingyih commented Feb 21, 2020

PR updated. PTAL:)

@spzala
Copy link
Member

spzala commented Feb 21, 2020

@jingyih perfect :) LGTM

@gyuho
Copy link
Contributor

gyuho commented Mar 25, 2020

@jingyih Sorry for delay. Can we rebase

- Add linearizable field to etcdserverpb.MemberListRequest.
- Change behavior of clienv3 MemberList API. Now it is served with
linearizable guarantee.
@jingyih
Copy link
Contributor Author

jingyih commented Mar 26, 2020

@jingyih Sorry for delay. Can we rebase

Done. Thanks!

@chaochn47
Copy link
Member

@ahrtr do you think we can backport this PR to release-3.4 client/server? This helps for autoSync get member list with linearizable guarantee.

@ahrtr
Copy link
Member

ahrtr commented Aug 11, 2022

@ahrtr do you think we can backport this PR to release-3.4 client/server? This helps for autoSync get member list with linearizable guarantee.

I don't think we should backport this PR to 3.4, because it may break the patch upgrade, such as from 3.4.16 to 3.4.21 (assuming this PR to be backported to 3.4.21).

Actually it may break the upgrade from 3.4.x to 3.5.x as well. During rolling upgrade, some etcd member may still on the old version, but when the member with old version receives a request from client with the new version, it may panic because the pb.MemberListRequest on server side doesn't have the field Linearizable yet.

The other breaking case when a member with the new version receives a request from a client with old version.

When all etcd servers and the clients are on the new versions, then it's safe.

The point is that we should be very cautious when adding any new features/fields in the protocol buffer struct. Usually we need three versions (such as adding the field in 3.5, but actually it takes effect in 3.7) to safely rotate to a new version.

@ahrtr
Copy link
Member

ahrtr commented Aug 11, 2022

Note that we have similar issue on the lease list API, and it's in my to do list.

@chaochn47
Copy link
Member

Thanks @ahrtr for the insights!! I have some follow up questions if you don't mind answer.

Actually it may break the upgrade from 3.4.x to 3.5.x as well. During rolling upgrade, some etcd member may still on the old version, but when the member with old version receives a request from client with the new version, it may panic because the pb.MemberListRequest on server side doesn't have the field Linearizable yet.

I thought this field Linearizable will be ignored in 3.4 servers during upgrade when client is 3.5. Otherwise, how come our upgrade succeeds in e2e release test and in production environment?

When all etcd servers and the clients are on the new versions, then it's safe.

Sounds like version skew between client and server is not allowed, do we have a public documentation on this? For reference: https://kubernetes.io/releases/version-skew-policy/

I doubt running 3.5 servers while client is on 3.4 might be safe though.

@ahrtr
Copy link
Member

ahrtr commented Aug 11, 2022

I manually tried to run "etcdctl member list" in the following two cases, and confirmed both of them are working well,

  1. etcdserver 3.5.4 and etcdctl 3.4.20
  2. etcdserver 3.4.20 and etcdctl 3.5.4

The reason should be the new field Linearizable is just a golang build-in type bool instead of a pointer. It means it will never run into nil-pointer panic.

https://kubernetes.io/releases/version-skew-policy/ is good reference. But please note that we are planning to only support two minor versions. FYI. etcd-io/website#601

@chaochn47
Copy link
Member

chaochn47 commented Aug 12, 2022

Thanks for the context that etcd only maintains 2 minor versions for well said good reasons!

I am wondering the version-skew-policy between server and client versions, especially is it safe to run server 3.5 with client 3.4 or server 3.4 with client 3.5?

If that's the case, backporting this CR becomes possible.

/cc @jyotimahapatra

@ahrtr
Copy link
Member

ahrtr commented Aug 12, 2022

If that's the case, backporting this CR becomes possible.

The only concern for now it changes the default behavior, although might be wrong behavior. Please feel free to deliver a backporting PR to 3.4 for this, and we can have more discussion there, and get feeback from other maintainers as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Member list should be served with linearizable guarantee
6 participants