apiserver/storage: test caching and indexing layers at the unit level #109831

stevekuznetsov · 2022-05-05T15:31:54Z

What would you like to be added?

Today, unit-level tests only exist for *etcd3.store{}, and cannot be run against any other implementation of storage.Interface. In general, these tests define what it means to be a correct implementation of storage.Interface, and only use exported functionality. We aim to refactor the tests to accept any storage.Interface. This allows for testing the entire exported function space for an etcd storage layer with whichever caching and indexing layers on top.

The following tests can be ported as-is, since they use only storage.Interface methods:

In #109833:

Other:

TestProgressNotify (requires that the underlying etcd is configured to send bookmarks)

The following tests currently require the etcd client, and need to be re-thought or adapted:

The following tests currently touch internals of *etcd3.store{}:

Next steps:

migrate all of the tests that can be used as-is into *_tests.go files and change their signature to accept a storage.Interface, import these and use them to test caching and indexing layers (storage/etcd3: factor tests to accept storage.Interface #109833)
make a decision for all of the tests that touch the etcd client whether they could instead use the storage.Interface itself, etc
similarly figure out if the tests that touch internals of the *etcd3.store{} are useful to test other storage.Interface implementations, and, if so, how they can be adapted

/sig api-machinery
/assign @stevekuznetsov @wojtek-t
/cc @liggitt @lavalamp

Why is this needed?

To improve our ability to test the caching and indexing layers that sit on top of the etcd storage implementation.

The text was updated successfully, but these errors were encountered:

leilajal · 2022-05-05T16:35:49Z

/triage accepted

wojtek-t · 2022-05-06T11:53:38Z

Couple thoughts from me:

TestCreate uses the etcd client to:

Indeed I would switch to storage.Interface.Get instead.
Checking invariants about data, I would make a dedicated etcd3-specific test for it (that just does: Create+checkinvartians).

TestGuaranteedUpdate uses the etcd client to:

Making a generic test from that seems trivial, but I'm yet sure how to make a simple etcd-specific additional test...

TestList uses the etcd client to:

Can we just make the test to take two implementations of storage.Interface as an input? And then it trivially generalizes, no?

TestListContinuation uses the etcd client to:
TestListPaginationRareObject uses the etcd client to:
TestListContinuationWithFilter uses the etcd client to:

Hmm - it's similar to TestGuaranteedUpdate - it seems trivial to generalize (by removing those calls), but I'm not sure how to test that functionality without duplicating the same test more-or-less...

TestListInconsistentContinuation uses the etcd client to:
TestWatchFromZero uses the etcd client to:

I'm wondering if we would be to somehow pass a compaction callback or sth like that to the test.

TestWatchError uses the etcd client to:

Can we pass two stores to the test?

TestWatchDeleteEventObjectHaveLatestRV uses the etcd client to:

Roughly what you suggested...

wojtek-t · 2022-05-06T11:55:12Z

Thinking more about TestGuaranteedObject (and similar cases where we just check simple invariants).

Can we pass an additional callback to the test function that will be called (in etcd test we can just pass checkStorageInvariants, in watchcache ones, we can basically just pass a no-op function)...

liggitt · 2022-05-06T15:22:41Z

TestWatchDeleteEventObjectHaveLatestRV uses the etcd client to:

create a raw watch, we check that the mod revision at which the raw watch shows us the delete is the same as the revision the storage.Interface uses to report the deletion ... but if we know nothing else is touching our storage, why not just use storage.Interface.Get() (or the returned value from creating) to check that the last RV we saw is what it's deleted at?

maybe I'm misunderstanding, but the rv returned from a Get or from the Create is explicitly not what we want to be the RV in a delete watch event or delete response. The RV on a delete response or watch event must be distinct from and later than the RV on the last write to the object.

instrumenting the current test:

diff --git a/staging/src/k8s.io/apiserver/pkg/storage/etcd3/watcher_test.go b/staging/src/k8s.io/apiserver/pkg/storage/etcd3/watcher_test.go
index 273bb1c4389..21f42ec1f06 100644
--- a/staging/src/k8s.io/apiserver/pkg/storage/etcd3/watcher_test.go
+++ b/staging/src/k8s.io/apiserver/pkg/storage/etcd3/watcher_test.go
@@ -293,6 +293,7 @@ func TestWatchDeleteEventObjectHaveLatestRV(t *testing.T) {
 		t.Fatalf("Watch failed: %v", err)
 	}
 	rv, err := APIObjectVersioner{}.ObjectResourceVersion(storedObj)
+	t.Log("created",rv)
 	if err != nil {
 		t.Fatalf("failed to parse resourceVersion on stored object: %v", err)
 	}
@@ -310,12 +311,14 @@ func TestWatchDeleteEventObjectHaveLatestRV(t *testing.T) {
 		t.Fatalf("timed out waiting for watch event")
 	}
 	deletedRV, err := deletedRevision(watchCtx, etcdW)
+	t.Log("deleted",deletedRV)
 	if err != nil {
 		t.Fatalf("did not see delete event in raw watch: %v", err)
 	}
 	watchedDeleteObj := e.Object.(*example.Pod)
 
 	watchedDeleteRev, err := store.versioner.ParseResourceVersion(watchedDeleteObj.ResourceVersion)
+	t.Log("watched",watchedDeleteRev)
 	if err != nil {
 		t.Fatalf("ParseWatchResourceVersion failed: %v", err)
 	}

you get

go test k8s.io/apiserver/pkg/storage/etcd3 -run TestWatchDeleteEventObjectHaveLatestRV -v
=== RUN   TestWatchDeleteEventObjectHaveLatestRV
    watcher_test.go:296: created 2
    watcher_test.go:314: deleted 3
    watcher_test.go:321: watched 3

stevekuznetsov · 2022-05-06T15:33:40Z

@liggitt thanks - seems like I had a mistaken assumption that etcdWatch.Events[0].Kv.ModRevision would be the revision of the last modification, but it seems like etcd sets that to the logical time at which the kv was deleted. This test then can only exist at the etcd unit level unless we make it less strict and allow any number > the previous revision.

stevekuznetsov · 2022-05-09T14:41:42Z

@liggitt mentioned this test plan in a Slack thread, wanted to add it here for posterity:

create an object, record RV (createdRV)
start a watch on the object
delete the object, record RV (deletedRV)
assert deletedRV != createdRV
wait for watch to observe delete (tolerating any intermediate updates from other actors, I guess)
assert deleted watch event is received
assert deleted watch event RV == deletedRV
assert deleted watch event RV is different from any intermediate update watch event RVs

Unfortunately, we can't do step 3. Clients definitely do not get anything back from a DELETE call, and it looks like the current implementation for etcd puts the resourceVersion of the value used for preconditions into the out object passed in. @logicalhan mentioned that it would be useful and more complete to let clients know the resourceVersion at which their DELETE was committed, but this seems like an orthogonal bit of work and the test should, I guess, remain etcd-specific until that point?

logicalhan · 2022-05-09T15:07:43Z

@liggitt mentioned this test plan in a Slack thread, wanted to add it here for posterity:

create an object, record RV (createdRV)

start a watch on the object

delete the object, record RV (deletedRV)

assert deletedRV != createdRV

wait for watch to observe delete (tolerating any intermediate updates from other actors, I guess)

assert deleted watch event is received

assert deleted watch event RV == deletedRV

assert deleted watch event RV is different from any intermediate update watch event RVs

Unfortunately, we can't do step 3. Clients definitely do not get anything back from a DELETE call, and it looks like the current implementation for etcd puts the resourceVersion of the value used for preconditions into the out object passed in. @logicalhan mentioned that it would be useful and more complete to let clients know the resourceVersion at which their DELETE was committed, but this seems like an orthogonal bit of work and the test should, I guess, remain etcd-specific until that point?

This is the type of test which should become possible once we introduce the interface layer right above etcdClient, so I think it's okay to omit this for now on the presumption that this will be tested in the interface we're proposing.

stevekuznetsov · 2022-05-09T15:08:42Z

@logicalhan would it be sufficient to test that the storage provides this semantic to the storage.Interface implementation? As this is end-client-visible I would have thought that we want to make sure that we continue to forward this property up the stack

logicalhan · 2022-05-09T15:12:25Z

As this is end-client-visible I would have thought that we want to make sure that we continue to forward this property up the stack

I would be okay with just testing this on the inner layer, it should transitively retain the properties on the outer layer. But that's just me. Porting this logic to the outer layer involves patching the storage.Interface and the implementation which is considerably involved.

k8s-triage-robot · 2022-08-23T15:32:44Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-09-22T15:44:02Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

wojtek-t · 2022-09-23T07:29:35Z

/remove-lifecycle rotten

MadhavJivrajani · 2022-11-07T10:50:10Z

reusing those tests for watchcache

I can take a stab at this if anyone isn't already working on it!

wojtek-t · 2022-11-07T14:00:13Z

I can take a stab at this if anyone isn't already working on it!

A bit of that already hapenned in: #113427 and #113588

I have a bit more started in my local branch (#113666 is a preparation for making that possible), but my work (for now) is limited to lists.

If you want to help with making that for other tests (in particular Create, GuaranteedUpdate and Delete tests), that would definitely be useful.

wojtek-t · 2022-11-10T15:12:31Z

I have a bit more started in my local branch (#113666 is a preparation for making that possible), but my work (for now) is limited to lists.

and watches

k8s-triage-robot · 2023-02-08T15:34:36Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

wojtek-t · 2023-02-08T20:08:48Z

/remove-lifecycle stale

MadhavJivrajani · 2023-02-09T04:25:37Z

@wojtek-t @stevekuznetsov i see from the issue description that those tests are taken care of, is there anything else left that I can help with here?

wojtek-t · 2023-02-09T07:39:06Z

@MadhavJivrajani - the next step I would like to take here (which is a bit started, but not that much happened here) is to ensure that all tests that we generalized will actually also be run against watchcache:
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/storage/tests/cacher_test.go

There are three of them already there (and one more on its way in #114656 ), but we need to enable all tests.

If you have capacity to help with it, it would be great.

The next bit (or ideally something that can be taken simultanously, we need to audit tests that exist in that file and either:

confirm those cases are already tested by those generic tests already
extend the generic tests if they test cacher tests do test more case
and get rid of the cacher-specific tests from that file. Everything there should be generic.

k8s-triage-robot · 2024-02-09T08:27:43Z

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

wojtek-t · 2024-02-09T08:42:17Z

/triage accepted

stevekuznetsov added the kind/feature Categorizes issue or PR as related to a new feature. label May 5, 2022

k8s-ci-robot assigned stevekuznetsov and wojtek-t May 5, 2022

k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 5, 2022

stevekuznetsov mentioned this issue May 5, 2022

storage/etcd3: factor tests to accept storage.Interface #109833

Merged

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 5, 2022

stevekuznetsov mentioned this issue May 9, 2022

storage/testing: move creation test to generic package #109909

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 23, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 22, 2022

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Sep 23, 2022

This was referenced Nov 5, 2022

Standardize naming in generic storage tests #113645

Closed

Standardize naming in generic storage tests #113666

Merged

This was referenced Nov 8, 2022

Standardize naming in generic storage watcher tests #113721

Merged

Reuse generic GetList test for watchcache and fix inconsistency issues for both etcd3 and watchcache #113730

Merged

wojtek-t mentioned this issue Dec 22, 2022

Reuse generic Watch test for watchcache #114656

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 8, 2023

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 8, 2023

MadhavJivrajani mentioned this issue Feb 15, 2023

storage: Get rid of TestListDeprecated #115794

Merged

MadhavJivrajani mentioned this issue Mar 1, 2023

storage: Remove TestWatchDeprecated #116178

Closed

This was referenced Apr 17, 2023

Refactor watch bookmark tests to allow sharing between etcd3 and watchcache #117413

Merged

Generalize few watch tests from cacher #117415

Merged

This was referenced Apr 25, 2023

Enable a bunch of generic storage tests for watchcache. #117582

Merged

Refactor cacher.ListerWatcher code structure #117584

Merged

Refactor some watchcache tests #117641

Merged

This was referenced May 30, 2023

storage: Implement TestWatchFromZero for cacher #118330

Merged

storage: Cleanup cacher testing #118495

Merged

wojtek-t mentioned this issue Jun 28, 2023

Generalize watch storage tests #118931

Merged

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Feb 9, 2024

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apiserver/storage: test caching and indexing layers at the unit level #109831

apiserver/storage: test caching and indexing layers at the unit level #109831

stevekuznetsov commented May 5, 2022 •

edited by wojtek-t

leilajal commented May 5, 2022

wojtek-t commented May 6, 2022

wojtek-t commented May 6, 2022

liggitt commented May 6, 2022

stevekuznetsov commented May 6, 2022

stevekuznetsov commented May 9, 2022

logicalhan commented May 9, 2022

stevekuznetsov commented May 9, 2022

logicalhan commented May 9, 2022

k8s-triage-robot commented Aug 23, 2022

k8s-triage-robot commented Sep 22, 2022

wojtek-t commented Sep 23, 2022

MadhavJivrajani commented Nov 7, 2022

wojtek-t commented Nov 7, 2022

wojtek-t commented Nov 10, 2022

k8s-triage-robot commented Feb 8, 2023

wojtek-t commented Feb 8, 2023

MadhavJivrajani commented Feb 9, 2023 •

edited

wojtek-t commented Feb 9, 2023

k8s-triage-robot commented Feb 9, 2024

wojtek-t commented Feb 9, 2024

apiserver/storage: test caching and indexing layers at the unit level #109831

apiserver/storage: test caching and indexing layers at the unit level #109831

Comments

stevekuznetsov commented May 5, 2022 • edited by wojtek-t

What would you like to be added?

Why is this needed?

leilajal commented May 5, 2022

wojtek-t commented May 6, 2022

wojtek-t commented May 6, 2022

liggitt commented May 6, 2022

stevekuznetsov commented May 6, 2022

stevekuznetsov commented May 9, 2022

logicalhan commented May 9, 2022

stevekuznetsov commented May 9, 2022

logicalhan commented May 9, 2022

k8s-triage-robot commented Aug 23, 2022

k8s-triage-robot commented Sep 22, 2022

wojtek-t commented Sep 23, 2022

MadhavJivrajani commented Nov 7, 2022

wojtek-t commented Nov 7, 2022

wojtek-t commented Nov 10, 2022

k8s-triage-robot commented Feb 8, 2023

wojtek-t commented Feb 8, 2023

MadhavJivrajani commented Feb 9, 2023 • edited

wojtek-t commented Feb 9, 2023

k8s-triage-robot commented Feb 9, 2024

wojtek-t commented Feb 9, 2024

stevekuznetsov commented May 5, 2022 •

edited by wojtek-t

MadhavJivrajani commented Feb 9, 2023 •

edited