Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(no)StoreV2 (Part 3): Applying consistency fix: ClusterVersionSet (and co) might get not applied on v2store #12854

Merged
merged 2 commits into from
Apr 21, 2021

Conversation

ptabor
Copy link
Contributor

@ptabor ptabor commented Apr 12, 2021

ClusterVersionSet, ClusterMemberAttrSet, DowngradeInfoSet functions are writing both to V2store and backend. Prior this CL there were in a branch not executed if shouldApplyV3 was false, e.g. during restore when Backend is up-to-date (has high
consistency-index) while v2store requires replay from WAL log.

The most serious consequence of this bug was that v2store after restore could have different index (revision) than the same exact store before restore, so potentially different content between replicas.

Also this change is supressing double-applying of Membership (ClusterConfig) changes on Backend (store v3) - that lackilly are not part of MVCC/KeyValue store, so they didn't caused Revisions to be bumped.

Inspired by @jingyih comment: #12820 (comment)


For the record, write that was not getting applied and v2store during snapshot restore:

go test ./integration --race -run TestForceNewCluster -v --count=1
```
    logger.go:130: 2021-04-10T10:36:15.771+0200 INFO    m2      !!! APPLYING    {"member": "m2", "e.index": 8, "raftReq": "header:<ID:3092698313349819908 > cluster_version_set:<ver:\"3.5.0\" > "}
    logger.go:130: 2021-04-10T10:36:15.788+0200 INFO    m2      set initial cluster version     {"member": "m2", "cluster-id": "6fb40b365bce42fa", "local-member-id": "868613072d7e2aeb", "cluster-version": "3.5"}
10:36:15.788528 event_history.go:48: adding event to history: {"action":"set","node":{"key":"/0/version","value":"3.5.0","modifiedIndex":7,"createdIndex":7}}
goroutine 386 [running]:
runtime/debug.Stack(0x12707ab, 0xc0000e2050, 0x2)
        /usr/lib/google-golang/src/runtime/debug/stack.go:24 +0xab
runtime/debug.PrintStack()
        /usr/lib/google-golang/src/runtime/debug/stack.go:16 +0x34
go.etcd.io/etcd/server/v3/etcdserver/api/v2store.(*EventHistory).addEvent(0xc0001061e0, 0xc000295980, 0x0)
        /home/ptab/corp/etcd/server/etcdserver/api/v2store/event_history.go:49 +0x10b
go.etcd.io/etcd/server/v3/etcdserver/api/v2store.(*watcherHub).notify(0xc000576020, 0xc000295980)
        /home/ptab/corp/etcd/server/etcdserver/api/v2store/watcher_hub.go:127 +0x7e
go.etcd.io/etcd/server/v3/etcdserver/api/v2store.(*store).Set(0xc000554070, 0xc002fb61c0, 0xa, 0x0, 0xc002fb61ba, 0x5, 0x0, 0x0, 0x0, 0x181f000, ...)
        /home/ptab/corp/etcd/server/etcdserver/api/v2store/store.go:239 +0x3ba
go.etcd.io/etcd/server/v3/etcdserver/api/membership.mustSaveClusterVersionToStore(0xc000484960, 0x1a0a618, 0xc000554070, 0xc002fa03c0)
        /home/ptab/corp/etcd/server/etcdserver/api/membership/store.go:154 +0x1fb
go.etcd.io/etcd/server/v3/etcdserver/api/membership.(*RaftCluster).SetVersion(0xc00048f2d0, 0xc002fa03c0, 0x1854410)
        /home/ptab/corp/etcd/server/etcdserver/api/membership/cluster.go:534 +0xcbd
go.etcd.io/etcd/server/v3/etcdserver.(*applierV3backend).ClusterVersionSet(0xc00000e0c0, 0xc002573200)
        /home/ptab/corp/etcd/server/etcdserver/apply.go:907 +0xd0
go.etcd.io/etcd/server/v3/etcdserver.(*applierV3backend).Apply(0xc00000e150, 0xc0004f1560, 0x0)
        /home/ptab/corp/etcd/server/etcdserver/apply.go:226 +0x2e9e
go.etcd.io/etcd/server/v3/etcdserver.(*authApplierV3).Apply(0xc0003e02d0, 0xc0004f1560, 0x0)
        /home/ptab/corp/etcd/server/etcdserver/apply_auth.go:60 +0x11d
go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntryNormal(0xc000154000, 0xc002545268)
        /home/ptab/corp/etcd/server/etcdserver/server.go:2140 +0xe02
go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).apply(0xc000154000, 0xc00022ea80, 0x1, 0x8, 0xc000574100, 0xc0025453a8, 0x5c917b, 0xc00041f7b0)
        /home/ptab/corp/etcd/server/etcdserver/server.go:2049 +0x725
go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntries(0xc000154000, 0xc000574100, 0xc000171970)
        /home/ptab/corp/etcd/server/etcdserver/server.go:1305 +0x185
go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyAll(0xc000154000, 0xc000574100, 0xc000171970)

        /home/ptab/corp/etcd/server/etcdserver/server.go:1305 +0x185
go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyAll(0xc000154000, 0xc000574100, 0xc000171970)
        /home/ptab/corp/etcd/server/etcdserver/server.go:1131 +0x9b
go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).run.func8(0x19f9830, 0xc00011dc80)
        /home/ptab/corp/etcd/server/etcdserver/server.go:1085 +0x54
go.etcd.io/etcd/pkg/v3/schedule.(*fifo).run(0xc00042c960)
        /home/ptab/corp/etcd/pkg/schedule/schedule.go:157 +0x12b
created by go.etcd.io/etcd/pkg/v3/schedule.NewFIFOScheduler
        /home/ptab/corp/etcd/pkg/schedule/schedule.go:70 +0x2d9
    logger.go:130: 2021-04-10T10:36:15.789+0200 INFO    m2      cluster version is updated      {"member": "m2", "cluster-version": "3.5"}

…lied on v2store

ClusterVersionSet, ClusterMemberAttrSet, DowngradeInfoSet functions are
writing both to V2store and backend. Prior this CL there were
in a branch not executed if shouldApplyV3 was false,
e.g. during restore when Backend is up-to-date (has high
consistency-index) while v2store requires replay from WAL log.

The most serious consequence of this bug was that v2store after restore
could have different index (revision) than the same exact store before restore,
so potentially different content between replicas.

Also this change is supressing double-applying of Membership
(ClusterConfig) changes on Backend (store v3) - that lackilly are not
part of MVCC/KeyValue store, so they didn't caused Revisions to be
bumped.

Inspired by jingyih@ comment:
etcd-io#12820 (comment)
@ptabor ptabor requested a review from jingyih April 12, 2021 08:12
@ptabor ptabor changed the title Applying consistency fix: ClusterVersionSet (and co) might get no applied on v2store Applying consistency fix: ClusterVersionSet (and co) might get not applied on v2store Apr 12, 2021
@ptabor ptabor changed the title Applying consistency fix: ClusterVersionSet (and co) might get not applied on v2store (no)StoreV2: Applying consistency fix: ClusterVersionSet (and co) might get not applied on v2store Apr 12, 2021
@ptabor ptabor changed the title (no)StoreV2: Applying consistency fix: ClusterVersionSet (and co) might get not applied on v2store (no)StoreV2 (Part 3): Applying consistency fix: ClusterVersionSet (and co) might get not applied on v2store Apr 12, 2021
Copy link
Contributor

@lilic lilic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a stab at the review, sorry if I am missing any context here! 🙂

Since we only have one call here that sets shouldApplyV3 to false I wonder if we could approach this differently, as right now we are leaking internal details to the caller.

server/etcdserver/apply.go Show resolved Hide resolved
server/etcdserver/apply_v2.go Show resolved Hide resolved
server/etcdserver/server.go Outdated Show resolved Hide resolved
@ptabor
Copy link
Contributor Author

ptabor commented Apr 13, 2021

Since we only have one call here that sets shouldApplyV3 to false I wonder if we could approach this differently, as right now we are leaking internal details to the caller.

Thank you for review. In general I agree, although I doubt its worth taking a risk of bigger refactoring at this point.
In 3.6/3.7 there is a hope to stop writing v2store at all.

@ptabor ptabor added this to the etcd-v3.5 milestone Apr 17, 2021
Copy link
Contributor

@jingyih jingyih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sorry to see that our logic is getting more complicated due to the different nature of v2 and v3 backend. Once v2 store is removed, hopefully we can simplify the logic?

@ptabor
Copy link
Contributor Author

ptabor commented Apr 21, 2021

LGTM

Sorry to see that our logic is getting more complicated due to the different nature of v2 and v3 backend. Once v2 store is removed, hopefully we can simplify the logic?

Definitely. Thank you for review.

@ptabor ptabor merged commit ea287dd into etcd-io:master Apr 21, 2021
@ptabor ptabor deleted the 20210410-shouldApplyV3 branch April 21, 2021 07:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants