New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/storage/etcd3: correctly validate resourceVersions #108938
pkg/storage/etcd3: correctly validate resourceVersions #108938
Conversation
I'm unsure about this change ... it seems like something in the etcd3 store unit tests should be checking that the resourceVersion is stable in the absence of other changes. Isn't the unit test starting its own etcd instance? I'm not sure a shared etcd is a risk here |
@liggitt it certainly depends on what the point of the test is. If the test only passes on specific etcd fixtures, it's not really testing that the underling /retest |
I don't think anything in k8s requires that this be the case. It's not an invariant that influences any interaction k8s has with the underlying store to my knowledge. If I'm mistaken here, I'm more than happy to write exhaustive tests to enforce this behavior from etcd. Aside: if a client is sending linearized/quorum reads, those end up in the Raft log and bump the logical clock, so an etcd cluster which sees nothing but quorum reads will nevertheless have a higher revision even if "nothing" happened to mutate the kv store. A k8s API server will expose this higher revision in e.g. the resourceVersion on List calls. |
18b906a
to
45bf83b
Compare
/retest |
/triage accepted |
1 similar comment
/triage accepted |
/triage accepted |
@liggitt @wojtek-t @smarterclayton, I would love to circle back to this PR. The long-and-short of it is that
By changing the tests to make the check that I've proposed, we gain readability, document the requirements on |
Would |
this is in my backlog but I won't be able to dig into it until next week at the earliest; consumed by 1.24 stuff atm |
@smarterclayton good questions!
Yes, but nothing here is specific to
I would not necessarily frame this as a weakening of any assumption. This test fails today with any |
@liggitt thanks for the confirmation :) |
@liggitt added a focused test for the linearized read invariant we spoke about. |
staging/src/k8s.io/apiserver/pkg/storage/etcd3/linearized_read_test.go
Outdated
Show resolved
Hide resolved
staging/src/k8s.io/apiserver/pkg/storage/etcd3/linearized_read_test.go
Outdated
Show resolved
Hide resolved
b1cbd35
to
638e3a2
Compare
638e3a2
to
dcf8e63
Compare
dcf8e63
to
88a796b
Compare
In a number of tests, the underlying storage backend interaction will return the revision (logical clock underpinning the MVCC implementation) at the call-time of the RPC. Previously, the tests validated that this returned revision was exactly equal to some previously seen revision. This assertion is only true in systems where no other events are advancing the logical clock. For instance, when using a single etcd cluster as a shared fixture for these tests, the assertion is not valid any longer. By checking that the returned revision is no older than the previously seen revision, the validation logic is correct in all cases. Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
We must ensure that we notice if the etcd behavior on linearized reads changes. Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
88a796b
to
ed5fd90
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: liggitt, stevekuznetsov The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
pkg/storage/etcd3: correctly validate resourceVersions
In a number of tests, the underlying storage backend interaction will
return the revision (logical clock underpinning the MVCC implementation)
at the call-time of the RPC. Previously, the tests validated that this
returned revision was exactly equal to some previously seen revision.
This assertion is only true in systems where no other events are
advancing the logical clock. For instance, when using a single etcd
cluster as a shared fixture for these tests, the assertion is not valid
any longer. By checking that the returned revision is no older than the
previously seen revision, the validation logic is correct in all cases.
Signed-off-by: Steve Kuznetsov skuznets@redhat.com
Depends on #108936
/kind cleanup
/sig api-machinery
/assign @liggitt @smarterclayton @sttts @deads2k