Upgrade to etcd v3 plan #22448

hongchaodeng · 2016-03-03T18:46:45Z

The etcd team has do a lot work to make etcd v3 works efficiently in large scale. Recently we have done a prototype that successfully ran k8s with etcd v3 :) Now I think it’s time for us to start initial discussion and work on the upgrade of etcd in Kubernetes.

What’s new in etcd v3

First, I want to point out the major benefits of etcd v3:

We would get rid of relist problem in most cases. etcd3 is able to keep millions of keys. So we can easily keep keys of last 1 hour. Now etcd3 can send progress notifications to watchers. If a client disconnects for an hour, we probably should let it die and doing a full recovery instead of a relist.
One lease per component. We would use only one lease client to keep alive all TTL keys. The lease has its own ID and is not tied to any client session so that we can migrate it easily. The cost of lease is also much lower than previous TTL keys.
Efficient watch: Use gRPC insead of HTTP + JSON. Multiple watchers per one TCP connection and very efficient encoding.
Mini-transaction (Txn): simplify multiple key creation logic.
Reliability: it has much lower memory footprint, nonetheless it's able to run smoothly with GBs of keys in mem.
Other improvements like consistent range (pagination) over large number of keys, being able to watch all deletions when delete a prefix, etc.

Attack Plan

@wojtek-t pointed out that “pkg/storage/etcd” package encapsulates all etcd related code. We appreciate the community work on this! I’m thinking about two ways to accomplish v3 upgrade:

The first way is that we can reuse the interface and rewrite the implementation in etcd3. This is the easy way. We can get the work done easily and quickly. The disadvantage is that the interface resembles that of etcd2. We might need to do some workaround for incompatible changes (ttl to lease). In the long term, we also need to add methods to take advantage of new features like Txn.

The other way is that we can design a new interface for etcd3. We can keep v3 API in mind and make it efficiently use etcd in first place. However, this requires more work and slows things down initially.

Either way we can have it done in parallel with existing storage layer and add a flag to switch to etcd3.

TBH, I’m not leaning to either way. I saw a lot of interest from the community and it’s important for us to coordinate and move as fast as we can. Any thoughts are welcome!

xiang90 · 2016-03-03T18:48:47Z

/cc

hongchaodeng · 2016-03-03T21:26:26Z

@wojtek-t @gmarek @timothysc @lavalamp @smarterclayton @xiang90 @HardySimpson

This is a fork from #20504. Besides introducing the benefits of etcd3, I want to discuss specifically what we are going to deal with storage.Interface and etcdHelper struct.

Please feel free to "at" more people if needed.

ncdc · 2016-03-03T21:35:39Z

cc @kubernetes/rh-cluster-infra

krousey · 2016-03-03T21:44:42Z

cc @kubernetes/sig-api-machinery

timothysc · 2016-03-03T21:53:45Z

@hongchaodeng You can write a totally parallel implementation behind the storage interface and leave the existing one as is. Startup could be conditionalized behind a factory. I think we'd have to migrate the capabilities through the storage interface eventually, but this ripples all through the registry.

Lastly, someone is going to need to update sky-dns, where last I looked, we were using a really old version.

re: gRPC, that's an orthogonal issue.

alfred-huangjian · 2016-03-04T01:27:17Z

cc @kubernetes/huawei

adohe-zz · 2016-03-04T02:55:53Z

/cc

xiang90 · 2016-03-04T03:00:05Z

@timothysc

re: gRPC, that's an orthogonal issue.

etcd uses gRPC for client/server communication. So in the context of etcd + k8s, this is relevant. This is not API sever + other components through gRPC,

wojtek-t · 2016-03-04T08:51:34Z

I agree with @timothysc - we should start with implementing v3 etcd implementation behind currently existing interface. Once we have this working, we can start extending/changing it to make benefits of etcd v3, but that's a second step in my opinion.

Regarding gRPC - I think there are few points here:

does etcd v3 support only gRPC or also JSON? I though both, am I right?
eventually, we definitely want to migrate to gRPC the communication etcd <-> apiserver; but I'm not sure if starting with gRPC will not cause more troubles in migration path; we need to be careful about it
how much additional work would be start with JSON implementation first, and then rewrite it to gRPC?

smarterclayton · 2016-03-04T14:29:43Z

We probably want gRPC early so that we can send binaries with protobuf,
rather than the escaped form in JSON. I'd rather move away from assuming
JSON now in the generic storage. Is the only concern about gRPC that it's
going to complicate our internal code path?

On Fri, Mar 4, 2016 at 2:51 AM, Wojciech Tyczynski <notifications@github.com

wrote:

I agree with @timothysc https://github.com/timothysc - we should start
with implementing v3 etcd implementation behind currently existing
interface. Once we have this working, we can start extending/changing it to
make benefits of etcd v3, but that's a second step in my opinion.

Regarding gRPC - I think there are few points here:

does etcd v3 support only gRPC or also JSON? I though both, am I
right?

eventually, we definitely want to migrate to gRPC the communication
etcd <-> apiserver; but I'm not sure if starting with gRPC will not cause
more troubles in migration path; we need to be careful about it

how much additional work would be start with JSON implementation
first, and then rewrite it to gRPC?

—
Reply to this email directly or view it on GitHub
#22448 (comment)
.

wojtek-t · 2016-03-04T14:37:34Z

I think that in the generic storage we don't have any assumption about JSON - it is only assumed in the etcd implementation of storage.Interface and only at the implementation level.

I'm definitely not concerned about code complication - I'm more worried if it won't complicate upgrade path (i.e. from 1.2 to 1.3 supporting etcd v3).
But I'm fine with starting with gRPC too if you think that won't be a problem.

xiang90 · 2016-03-04T15:30:58Z

@wojtek-t @smarterclayton

gRPC wont be a problem. etcd client does all the work. kubernetes storage layer should not even know about it unless you do not want to introduce any gRPC dependencies.

wojtek-t · 2016-03-04T15:33:25Z

I agree that storage interface shouldn't expose any gRPC-related things

lavalamp · 2016-03-04T22:53:37Z

You can write a totally parallel implementation behind the storage interface and leave the existing one as is. Startup could be conditionalized behind a factory.

Yes. It is an absolute requirement that both etcd 2 and etcd 3 work at the same time. We have etcd options plumbed through to the creation already.

gRPC is a total red herring here. Of course we should talk gRPC to etcd 3. That has zero effect on anything build on top of the storage interface.

davidopp · 2016-03-05T23:06:54Z

Mini-transaction (Txn): simplify multiple key creation logic.

Can you say more about the semantics of "mini-transaction"? This sounds like it might be different from what we were assuming etcd v3 would provide, but I'm not sure.

hongchaodeng · 2016-03-05T23:36:58Z

Can you say more about the semantics of "mini-transaction"?

Please see txn.go.

It's a single if-else layer and supports multiple get, put ops in a single operation.

What's the semantics you were expecting? Is there any concern on it?

davidopp · 2016-03-05T23:38:54Z

No, that looks good. Thanks!

hongchaodeng · 2016-03-06T01:41:50Z

we should start with implementing v3 etcd implementation behind currently existing interface.

I send a PR #22604 as a follow up on the discussion. The change is simple but I hope it will enable more folks to start writing the code.

I have a rough plan of what we are gonna to do next:

Implement all KV methods (Get, Set, GuaranteedUpdate, Create, Delete, etc.) without ttl.
ttl is quite tricky in v3. We might need to support per-key ttl for BC while v3 encourages per client ttl. Once (1) is done, we can implement per key lease management.
implement watch.

Once we have initial implementation, we can further discussion testing and improvement plans. What do you guys think?

adohe-zz · 2016-03-06T02:01:17Z

A really big TODO, I would help with KV methods.

wojtek-t · 2016-03-07T09:57:02Z

@hongchaodeng - yes that plan SGTM, as long as this is another implementation not replacing the currently existing one (actually this is a requirement anyway).

hongchaodeng · 2016-03-17T00:02:06Z

Folks.
I'm currently having trouble in #22604.

The test failure is reproducible by running "MetricsGrabber" and it seemed to grab etcd server metrics as well. From the log:

MetricsGrabber
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/metrics_grabber_test.go:158
  should grab all metrics from API server. [It]
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/metrics_grabber_test.go:100

  Expected
      <sets.String | len:39>: {
          "etcd_storage_events_total": {},
          "etcd_storage_db_compaction_pause_duration_milliseconds_sum": {},
          "etcd_storage_pending_events_total": {},
          "etcd_snapshot_save_marshalling_durations_seconds_sum": {},
          "etcd_storage_index_compaction_pause_duration_milliseconds_bucket": {},
          ....
      }
  to be empty

Anyone have idea in this or can help do anything to resolve it? This is currently blocking me from moving forward.

hongchaodeng · 2016-03-17T06:02:10Z

I have found the problem.

Digging out apiserver dependencies with the new change reveals that it imports the following:

github.com/coreos/etcd/auth
github.com/coreos/etcd/client
github.com/coreos/etcd/clientv3
github.com/coreos/etcd/compactor
github.com/coreos/etcd/discovery
github.com/coreos/etcd/error
github.com/coreos/etcd/etcdserver
github.com/coreos/etcd/etcdserver/api/v3rpc
github.com/coreos/etcd/etcdserver/etcdhttp/httptypes
github.com/coreos/etcd/etcdserver/etcdserverpb
github.com/coreos/etcd/etcdserver/stats
github.com/coreos/etcd/lease
github.com/coreos/etcd/lease/leasehttp
github.com/coreos/etcd/lease/leasepb
...
github.com/coreos/etcd/raft
github.com/coreos/etcd/raft/raftpb
github.com/coreos/etcd/rafthttp
github.com/coreos/etcd/snap
github.com/coreos/etcd/snap/snappb
github.com/coreos/etcd/storage
github.com/coreos/etcd/storage/backend
github.com/coreos/etcd/storage/storagepb
github.com/coreos/etcd/store
github.com/coreos/etcd/version
github.com/coreos/etcd/wal
github.com/coreos/etcd/wal/walpb

Metrics are registered in init() func in those packages.

/cc @xiang90

hongchaodeng · 2016-03-17T06:08:31Z

I will try to come up with a solution tomorrow. If anyone in other timezone would like to dig into it, that would be great :)

lavalamp · 2016-10-04T17:57:26Z

Justin, it's not desirable to do the dump/restore pattern for upgrades if
we can possibly avoid it, since it loses ResourceVersion metadata. Since
components depend on RV being a logical clock that only ever increments,
we'd have to restart everything after doing anything that causes them to go
backwards.

kubectl has an export command.

dump/restore doesn't make a whole lot of sense at the kubectl level, IMO.
There's too much cluster-specific stuff in objects to make things portable
between clusters that way, and it's basically never an operation you'd want
to perform on a single cluster.

Honestly dump/restore doesn't make much sense even at the db level, since
even minutes or seconds of cluster operation could make dropping back to an
earlier state disastrous, and the larger the cluster the more likely some
tenant is doing something not very idempotent.

On Tue, Oct 4, 2016 at 12:21 AM, Wojciech Tyczynski <
notifications@github.com> wrote:

I definitely agree that we need to document dump-restore procedure, but
this in my mind is a next step, and above I was just describing what to do
with the container to enable "rollforward" testing.

@justinsb https://github.com/justinsb Regarding online backups -
internally in GKE we are doing this (using standard etcd tools). Restore
procedure is a bit more complicated, but we will try to work on it and
create some script for open-source k8s.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#22448 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAnglsQCoSxyghiVtLwg4ZuV2xHwl1Scks5qwf59gaJpZM4HotlA
.

Automatic merge from submit-queue Support upgrade/downgrade in etcd image. Ref #22448 #20504

@lavalamp

Automatic merge from submit-queue Minor fixes to migrate-if-needed script Ref #22448 #20504 @lavalamp

Automatic merge from submit-queue Fix migration script to make it sh-friendly Ref #22448 There is no bash in etcd image, so the script needs to be "sh-friendly". Also, few fixes to the script.

timothysc · 2017-01-05T20:28:41Z

closing now enabled by default.

wojtek-t · 2017-01-16T09:23:36Z

Reopening, since we don't have any documentation how to do an upgrade in OSS.

timothysc · 2017-01-16T13:43:31Z

@wojtek-t there is a docs ticket here: kubernetes/website#2172

wojtek-t · 2017-01-16T14:58:34Z

OK - sorry. I must have missed it - too many emails when I was OOO.

teotia39 · 2017-04-21T06:38:19Z

For Kubernetes 1.6.1 etcd 3 , why we are not able to access the "/registry" key, How to access this key ?

As this key is not present on default 2379 port , curl to http://127.0.0.1:2379/v1/keys/registry is failing , which was working very well in alpha version. How to access those kubernetes objects now.

wojtek-t · 2017-04-21T06:44:55Z

@teotia39 - with etcd v3, data are not accessible via json endpoint. You need to use etcd tooling for it (ETCDCTL_API=3 etcdctl).

teotia39 · 2017-04-21T06:51:50Z

so that tool doesn't comes by default with kubeadm , and we have to explicitly install the etcdctl on top of kubeadm to access the endpoints ?

wojtek-t · 2017-04-21T07:02:42Z

this tool is built in the default etcd Docker image. You can just exec it from the container running etcd.

teotia39 · 2017-04-21T07:32:23Z

and even if we install etcd cluster separately , and point that to k8's , still we wont be able to access the registered kubernetes objects via rest calls , and we still have to use etcdctl command line to get those objects.

smarterclayton · 2017-04-21T12:52:23Z

Etcd3 has a grpc endpoint, so no, no rest calls.

@hongchaodeng

Automatic merge from submit-queue Update etcd 2.2 references to use 3.0.x This update an assortment of etcd 2.2.X references to 3.0.4 in the code base. /cc @hongchaodeng xref: kubernetes/kubernetes#22448  --- This change is [<img src="https://reviewable.kubernetes.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.kubernetes.io/reviews/kubernetes/kubernetes/29399)

…onfig UPSTREAM: <drop>: comment out componentconfig registration for kube controller manager Origin-commit: 5be504eb63b7fe9ee3b3ba88317544860ef2e95d

hongchaodeng changed the title ~~Upgrade to etcd v3 discussion~~ Upgrade to etcd v3 plan Mar 3, 2016

hongchaodeng mentioned this issue Mar 3, 2016

Begin supporting etcd3 #20504

Closed

wojtek-t added sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Mar 4, 2016

wojtek-t added this to the next-candidate milestone Mar 4, 2016

davidopp added the team/control-plane label Mar 5, 2016

hongchaodeng mentioned this issue Mar 6, 2016

etcd v3 initial change #22604

Closed

wojtek-t added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/backlog Higher priority than priority/awaiting-more-evidence. labels Oct 5, 2016

k8s-github-robot pushed a commit that referenced this issue Oct 10, 2016

Merge pull request #34009 from wojtek-t/upgrade_etcd_image

42392fe

Automatic merge from submit-queue Support upgrade/downgrade in etcd image. Ref #22448 #20504

wojtek-t mentioned this issue Oct 10, 2016

Minor fixes to migrate-if-needed script #34438

Merged

k8s-github-robot pushed a commit that referenced this issue Oct 12, 2016

Merge pull request #34438 from wojtek-t/small_fixed_for_migrations

89e4dd5

Automatic merge from submit-queue Minor fixes to migrate-if-needed script Ref #22448 #20504 @lavalamp

wojtek-t mentioned this issue Oct 25, 2016

Fix migration script to make it sh-friendly #35503

Merged

timothysc mentioned this issue Dec 2, 2016

Remove integration test dependencies on etcd client #37989

Merged

timothysc assigned timothysc and lavalamp and unassigned mml Jan 5, 2017

timothysc added this to the v1.6 milestone Jan 5, 2017

timothysc closed this as completed Jan 5, 2017

wojtek-t reopened this Jan 16, 2017

wojtek-t closed this as completed Jan 16, 2017

mumoshu mentioned this issue Mar 6, 2017

etcd v3 support kubernetes-retired/kube-aws#381

Closed

bgrant0607 mentioned this issue Jun 14, 2017

Add Consul support #31622

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade to etcd v3 plan #22448

Upgrade to etcd v3 plan #22448

hongchaodeng commented Mar 3, 2016

xiang90 commented Mar 3, 2016

hongchaodeng commented Mar 3, 2016

ncdc commented Mar 3, 2016

krousey commented Mar 3, 2016

timothysc commented Mar 3, 2016

alfred-huangjian commented Mar 4, 2016

adohe-zz commented Mar 4, 2016

xiang90 commented Mar 4, 2016

wojtek-t commented Mar 4, 2016

smarterclayton commented Mar 4, 2016

wojtek-t commented Mar 4, 2016

xiang90 commented Mar 4, 2016

wojtek-t commented Mar 4, 2016

lavalamp commented Mar 4, 2016

davidopp commented Mar 5, 2016

hongchaodeng commented Mar 5, 2016

davidopp commented Mar 5, 2016

hongchaodeng commented Mar 6, 2016

adohe-zz commented Mar 6, 2016

wojtek-t commented Mar 7, 2016

hongchaodeng commented Mar 17, 2016

hongchaodeng commented Mar 17, 2016

hongchaodeng commented Mar 17, 2016

lavalamp commented Oct 4, 2016

timothysc commented Jan 5, 2017

wojtek-t commented Jan 16, 2017

timothysc commented Jan 16, 2017

wojtek-t commented Jan 16, 2017

teotia39 commented Apr 21, 2017

wojtek-t commented Apr 21, 2017

teotia39 commented Apr 21, 2017

wojtek-t commented Apr 21, 2017

teotia39 commented Apr 21, 2017

smarterclayton commented Apr 21, 2017 via email

Upgrade to etcd v3 plan #22448

Upgrade to etcd v3 plan #22448

Comments

hongchaodeng commented Mar 3, 2016

What’s new in etcd v3

Attack Plan

xiang90 commented Mar 3, 2016

hongchaodeng commented Mar 3, 2016

ncdc commented Mar 3, 2016

krousey commented Mar 3, 2016

timothysc commented Mar 3, 2016

alfred-huangjian commented Mar 4, 2016

adohe-zz commented Mar 4, 2016

xiang90 commented Mar 4, 2016

wojtek-t commented Mar 4, 2016

smarterclayton commented Mar 4, 2016

wojtek-t commented Mar 4, 2016

xiang90 commented Mar 4, 2016

wojtek-t commented Mar 4, 2016

lavalamp commented Mar 4, 2016

davidopp commented Mar 5, 2016

hongchaodeng commented Mar 5, 2016

davidopp commented Mar 5, 2016

hongchaodeng commented Mar 6, 2016

adohe-zz commented Mar 6, 2016

wojtek-t commented Mar 7, 2016

hongchaodeng commented Mar 17, 2016

hongchaodeng commented Mar 17, 2016

hongchaodeng commented Mar 17, 2016

lavalamp commented Oct 4, 2016

timothysc commented Jan 5, 2017

wojtek-t commented Jan 16, 2017

timothysc commented Jan 16, 2017

wojtek-t commented Jan 16, 2017

teotia39 commented Apr 21, 2017

wojtek-t commented Apr 21, 2017

teotia39 commented Apr 21, 2017

wojtek-t commented Apr 21, 2017

teotia39 commented Apr 21, 2017

smarterclayton commented Apr 21, 2017 via email