Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update etcd client to 3.3 for 1.13 #69322

Merged
merged 1 commit into from
Oct 11, 2018

Conversation

jpbetz
Copy link
Contributor

@jpbetz jpbetz commented Oct 2, 2018

Vendor in the etcd 3.3 client targeting kubernetes 1.13 release milestone. We had previously tested the etcd 3.3 client with kubernetes via #58551.

Most of the work here was due to the need to bump to gogo/protobuf v0.5 since the etcd client includes protobuf bindings generated for this version of gogo/protobuf and is not compatible with the v0.4-3-gc0656edd version previously in use by Kubernetes. Since we're at the very beginning of the k8s 1.13 release cycle, and we are already lagging behind on gogo/protobuf updates, and the v0.5 version includes noteworthy improvements, like the elimination of unsafe code in the generated bindings, this seems like a good time do the bump and regen.

Also, added ':(exclude)**.*.pb.go' to hack/verify-pkg-names.sh to resolve this lint error:

staging/src/k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1beta1/generated.pb.go:import encoding_binary "encoding/binary"

/area etcd
/sig api-machinery
/kind feature
/cc @gyuho @xiang90 @wenjiaswe @jingyih

Upgrade to etcd 3.3 client

@jpbetz jpbetz added area/etcd sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. kind/feature Categorizes issue or PR as related to a new feature. labels Oct 2, 2018
@jpbetz jpbetz added this to the v1.13 milestone Oct 2, 2018
@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Oct 2, 2018
@k8s-ci-robot
Copy link
Contributor

@jpbetz: GitHub didn't allow me to request PR reviews from the following users: wenjiaswe, jingyih, gyuho.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

Vendor in the etcd 3.3 client targeting kubernetes 1.13 release milestone. We had previously tested the etcd 3.3 client with kubernetes via #58551.

/area etcd
/sig api-machinery
/kind feature

Upgrade to etcd 3.3 client

/cc @gyuho @xiang90 @wenjiaswe @jingyih

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. area/apiserver sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Oct 2, 2018
@jpbetz jpbetz mentioned this pull request Oct 2, 2018
@jpbetz
Copy link
Contributor Author

jpbetz commented Oct 2, 2018

/retest

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. area/kubelet sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/node Categorizes an issue or PR as relevant to SIG Node. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 2, 2018
@jpbetz
Copy link
Contributor Author

jpbetz commented Oct 2, 2018

Sorting out some gogo/protobuf + generated code issues, hang tight.

@k8s-ci-robot k8s-ci-robot added area/code-generation sig/cli Categorizes an issue or PR as relevant to SIG CLI. labels Oct 2, 2018
@gyuho
Copy link
Member

gyuho commented Oct 2, 2018

etcd part LGTM.

Just for reference, this does not change clientv3 behavior since we backported health check feature to https://github.com/etcd-io/etcd/blob/master/CHANGELOG-3.2.md#v3210-2017-11-16 client. In addition, this replaces golang.org/x/net/context to context, which should not break anything.

@jpbetz jpbetz force-pushed the etcd-client-3.3.9 branch 4 times, most recently from 1436e18 to f9354f8 Compare October 2, 2018 22:42
@jpbetz
Copy link
Contributor Author

jpbetz commented Oct 3, 2018

This is ready for review.

@k8s-ci-robot k8s-ci-robot added the sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. label Oct 8, 2018
@jpbetz
Copy link
Contributor Author

jpbetz commented Oct 8, 2018

/test pull-kubernetes-e2e-kops-aws

@jpbetz
Copy link
Contributor Author

jpbetz commented Oct 8, 2018

cc @lavalamp @sttts Could I get a review? This is mostly dependency bumps with only minimal code change. But it touches a lot of files so if it sits for too long I end up having to do more tedious rebases...

@lavalamp
Copy link
Member

lavalamp commented Oct 8, 2018

I looked at this, that's an enormous vendoring change.

Is there any way to confine it to etcd only?

Changing every protobuf file in the repo is making me very nervous.

@jpbetz
Copy link
Contributor Author

jpbetz commented Oct 9, 2018

I looked at this, that's an enormous vendoring change.

Is there any way to confine it to etcd only?

I was careful to limit the transitive dependency bumps to ones that are actually required by the etcd client. Without the gogo/protobuf bump, we had a incompatibility in the etcd client:

vendor/github.com/coreos/etcd/client/keys.generated.go:70:6: r.WriteArrayStart undefined (type codec.encDriver has no field or method WriteArrayStart)

Changing every protobuf file in the repo is making me very nervous.

Yeah, I wasn't thrilled to find that we needed to regen our protobuf files. But we are lagging behind on gogo/protobuf and this bumps us up from v0.4-3-gc0656edd to v0.5, which is an officially released version, and eliminates unsafe code from the generated code, see https://github.com/gogo/protobuf/releases.

And we're doing this at the very beginning of the 1.13 cycle, which gives us ample time to soak these changes.

@jpbetz
Copy link
Contributor Author

jpbetz commented Oct 9, 2018

/test pull-kubernetes-local-e2e-containerized

@k8s-ci-robot
Copy link
Contributor

@jpbetz: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-local-e2e-containerized 4263c75 link /test pull-kubernetes-local-e2e-containerized

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Member

@timothysc timothysc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, but I would split out your v2 compat changes into it's own PR, unless this update somehow forces the issue.

@lavalamp - This is normal. Every time we levelset protobuf changes we need to regenerate * which cascades through machinery. This has happened a number of times.

/assign @liggitt @wojtek-t
/approve

@@ -314,12 +314,33 @@ func toTTLOptions(r *pb.Request) store.TTLOptionSet {
}

func applyRequest(r *pb.Request, applyV2 etcdserver.ApplierV2) {
// TODO: find a sane way to perform this cast or avoid it in the first place
reqV2 := &etcdserver.RequestV2{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems orthogonal to the main purpose of this PR which is level up the deps to 3.3

Copy link
Member

@liggitt liggitt Oct 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the etcd2-specific tools are being removed in #69577, fyi

edit: actually, I think we need to keep this through 1.13, and will drop it in 1.14

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, This change is required by the version bump, due to a change in the etcd codebase. Look forward to stripping out the etcd2 code in 1.14

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, but I would split out your v2 compat changes into it's own PR, unless this update somehow forces the issue.

^ seems to apply then.

@@ -820,24 +819,6 @@ func (m *DeviceSpec) MarshalTo(dAtA []byte) (int, error) {
return i, nil
}

func encodeFixed64Api(dAtA []byte, offset int, v uint64) int {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have any tests that ensure previously serialized data decodes correctly and changes like this stay wire-compatible?

Copy link
Contributor Author

@jpbetz jpbetz Oct 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have https://k8s-testgrid.appspot.com/sig-release-master-upgrade which I believe are all post-commit. We can watch that carefully after committing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's good in one direction, but not exhaustive for all fields (we have a few oddities in our proto generation around nullable type aliases), and doesn't test in the opposite direction (new serialized data is readable by old servers). I don't see anything concerning in this PR, but it's pretty impossible to eyeball. @smarterclayton, any specific areas we should look at carefully?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also downgrade tests (https://k8s-testgrid.appspot.com/sig-release-master-upgrade#gce-master-new-downgrade-cluster) that cover the other direction.

But it's certainly not exhaustive. I'm all in favor of capturing the test coverage gap we think we have here and contributing to get it fixed. Should we open a new issue for that? Should we block this PR?

^ @lavalamp

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that doesn't give me very much additional confidence. It wouldn't catch something that's used even moderately rarely.

@lavalamp
Copy link
Member

Since it's early in the cycle, and it's pretty difficult to get additional confidence, I'm inclined to take a risk and merge this. Be prepared to roll it back or fix quickly if we find problems :)

/approve

@lavalamp
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 10, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jpbetz, lavalamp, timothysc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 10, 2018
@lavalamp
Copy link
Member

BTW, what's the performance impact of not using the unsafe package?

@jpbetz
Copy link
Contributor Author

jpbetz commented Oct 10, 2018

BTW, what's the performance impact of not using the unsafe package?

@wojtek-t Is there a recommended way to flag this PR for performance testing? It's a strong candidate for having noticeable performance implications.

@k8s-ci-robot k8s-ci-robot merged commit a8c7a3f into kubernetes:master Oct 11, 2018
@wojtek-t
Copy link
Member

BTW, what's the performance impact of not using the unsafe package?

@wojtek-t Is there a recommended way to flag this PR for performance testing? It's a strong candidate for having noticeable performance implications.

@jpbetz @lavalamp
Jumping too late, as it's already merged.
Yesterday`s run of 5k node cluster tests is green, so we will see today the results.

BTW - I didn't look carefully into the whole PR, but where exactly we stopped using unsafe package?

@jpbetz
Copy link
Contributor Author

jpbetz commented Oct 11, 2018

BTW - I didn't look carefully into the whole PR, but where exactly we stopped using unsafe package?

gogo/protobuf: https://github.com/gogo/protobuf/releases/tag/v0.5 (gogo/protobuf#343 has benchmarks)

@wojtek-t
Copy link
Member

Thanks - that's useful to know.

FYI: the test from yesterday is green, so it looks promissing (I will try to look deeper into individual metrics early next week).

@jpbetz
Copy link
Contributor Author

jpbetz commented Oct 12, 2018

Thanks for checking @wojtek-t !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/code-generation area/etcd area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants