Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/storage/etcd3: correctly validate resourceVersions #108938

Merged

Conversation

stevekuznetsov
Copy link
Contributor

pkg/storage/etcd3: correctly validate resourceVersions

In a number of tests, the underlying storage backend interaction will
return the revision (logical clock underpinning the MVCC implementation)
at the call-time of the RPC. Previously, the tests validated that this
returned revision was exactly equal to some previously seen revision.
This assertion is only true in systems where no other events are
advancing the logical clock. For instance, when using a single etcd
cluster as a shared fixture for these tests, the assertion is not valid
any longer. By checking that the returned revision is no older than the
previously seen revision, the validation logic is correct in all cases.

Signed-off-by: Steve Kuznetsov skuznets@redhat.com


Depends on #108936

/kind cleanup

NONE

/sig api-machinery
/assign @liggitt @smarterclayton @sttts @deads2k

@k8s-ci-robot k8s-ci-robot added the release-note-none Denotes a PR that doesn't merit a release note. label Mar 23, 2022
@k8s-ci-robot k8s-ci-robot added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Mar 23, 2022
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Mar 23, 2022
@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/apiserver labels Mar 23, 2022
@liggitt
Copy link
Member

liggitt commented Mar 23, 2022

I'm unsure about this change ... it seems like something in the etcd3 store unit tests should be checking that the resourceVersion is stable in the absence of other changes. Isn't the unit test starting its own etcd instance? I'm not sure a shared etcd is a risk here

@stevekuznetsov
Copy link
Contributor Author

@liggitt it certainly depends on what the point of the test is. If the test only passes on specific etcd fixtures, it's not really testing that the underling storage.Interface is providing the correct mechanisms as necessary for k8s to function. Given that, the test is less precise than it could be and does a bad job of documenting what, specifically, k8s expects from the underlying storage layer.

/retest

@stevekuznetsov
Copy link
Contributor Author

stevekuznetsov commented Mar 23, 2022

it seems like something in the etcd3 store unit tests should be checking that the resourceVersion is stable in the absence of other changes

I don't think anything in k8s requires that this be the case. It's not an invariant that influences any interaction k8s has with the underlying store to my knowledge. If I'm mistaken here, I'm more than happy to write exhaustive tests to enforce this behavior from etcd.

Aside: if a client is sending linearized/quorum reads, those end up in the Raft log and bump the logical clock, so an etcd cluster which sees nothing but quorum reads will nevertheless have a higher revision even if "nothing" happened to mutate the kv store. A k8s API server will expose this higher revision in e.g. the resourceVersion on List calls.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 24, 2022
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 24, 2022
@stevekuznetsov
Copy link
Contributor Author

/retest

@fedebongio
Copy link
Contributor

/triage accepted

1 similar comment
@cici37
Copy link
Contributor

cici37 commented Mar 29, 2022

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 29, 2022
@cici37
Copy link
Contributor

cici37 commented Mar 29, 2022

/triage accepted

@stevekuznetsov
Copy link
Contributor Author

stevekuznetsov commented Mar 31, 2022

@liggitt @wojtek-t @smarterclayton, I would love to circle back to this PR. The long-and-short of it is that etcd and etcd itself decides when the internal revision has changed, and why. Among the actions that get committed to the Raft log and bump the revision are quorum reads, so even in cases where no state has been mutated, the revision may and will increase. Kubernetes exposes the raw current revision as the resourceVersion for List() calls. The tests today expect the resourceVersion on List() calls to exactly match a previous version. This is true if and only if the etcd cluster in question has no activities going on (or only non-quorum reads). This is, in my opinion:

  • hyper-specific to the point of being misleading about what, specifically, Kubernetes expects from a conformant etcd cluster
  • fragile in that it fails on just about every etcd fixture except for the particular ones used in these tests

By changing the tests to make the check that I've proposed, we gain readability, document the requirements on etcd better and make the tests less fragile.

@smarterclayton
Copy link
Contributor

Would kine be better testable if this was fixed? Is there any real disadvantage to weakening this assumption (such as compaction / future etcd changes that lead to potentially background changes)?

@liggitt
Copy link
Member

liggitt commented Mar 31, 2022

this is in my backlog but I won't be able to dig into it until next week at the earliest; consumed by 1.24 stuff atm

@stevekuznetsov
Copy link
Contributor Author

@smarterclayton good questions!

Would kine be better testable if this was fixed?

Yes, but nothing here is specific to kine. This change makes it easier to test with a shared fixture in general, with no specific benefits to any individual RPC implementation.

Is there any real disadvantage to weakening this assumption ...

I would not necessarily frame this as a weakening of any assumption. This test fails today with any etcd fixture other than the one that is used in the unit tests. This test assumption does not apply to any non-toy etcd. This is why I labelled it as a misleading thing.

@stevekuznetsov
Copy link
Contributor Author

@liggitt thanks for the confirmation :)

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 13, 2022
@stevekuznetsov
Copy link
Contributor Author

@liggitt added a focused test for the linearized read invariant we spoke about.

In a number of tests, the underlying storage backend interaction will
return the revision (logical clock underpinning the MVCC implementation)
at the call-time of the RPC. Previously, the tests validated that this
returned revision was exactly equal to some previously seen revision.
This assertion is only true in systems where no other events are
advancing the logical clock. For instance, when using a single etcd
cluster as a shared fixture for these tests, the assertion is not valid
any longer. By checking that the returned revision is no older than the
previously seen revision, the validation logic is correct in all cases.

Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
We must ensure that we notice if the etcd behavior on linearized reads
changes.

Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
@liggitt
Copy link
Member

liggitt commented May 3, 2022

/lgtm
/approve
/milestone v1.25

@k8s-ci-robot k8s-ci-robot added this to the v1.25 milestone May 3, 2022
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 3, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, stevekuznetsov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 3, 2022
@k8s-ci-robot k8s-ci-robot merged commit 2845122 into kubernetes:master May 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants