Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 make: add non-kind shared and sharded e2e #2265

Merged

Conversation

stevekuznetsov
Copy link
Contributor

@stevekuznetsov stevekuznetsov commented Oct 26, 2022

test/e2e: make sure tests actually run in parallel

Signed-off-by: Steve Kuznetsov skuznets@redhat.com


test/e2e: be consistent about timeouts and poll intervals

Signed-off-by: Steve Kuznetsov skuznets@redhat.com


test/e2e/workspacetype: poll, don't blindly wait for 5s

Signed-off-by: Steve Kuznetsov skuznets@redhat.com


test/e2e/syncer: send one request, not many

You can't prove a negative, and eventually consistent systems don't like
to conform to your idea of "long enough" before synchronizing. We can't
assume that "nothing will happen" within 5s here. If we wanted to make
sure that the controller had seen & acknowledged the object, we should
have it post some observed generation.

Since these polls were not effective at best, and flaky at worst, we can
save 10s by just making one call.

Signed-off-by: Steve Kuznetsov skuznets@redhat.com


make: add non-kind variants of shared & sharded e2e

Signed-off-by: Steve Kuznetsov skuznets@redhat.com


make: be more intelligent with parallelism, count

First, we don't want to pass -count 1 to go test unless someone
explicitly opts into this behavior, as this explicitly invalidates the
test cache. When re-running tests locally, being able to run the full
suite every time and use the cache is incredibly useful.

Second, we should only bother with limiting parallelism when we're
starting up a full kcp and etcd server per test case.

Signed-off-by: Steve Kuznetsov skuznets@redhat.com


/cc @jmprusi @davidfestal
/assign @ncdc @sttts

Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
You can't prove a negative, and eventually consistent systems don't like
to conform to your idea of "long enough" before synchronizing. We can't
assume that "nothing will happen" within 5s here. If we wanted to make
sure that the controller had seen & acknowledged the object, we should
have it post some observed generation.

Since these polls were not effective at best, and flaky at worst, we can
save 10s by just making one call.

Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
@nrb
Copy link
Contributor

nrb commented Oct 26, 2022

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 26, 2022
@nrb
Copy link
Contributor

nrb commented Oct 26, 2022

Running make test-e2e-shared-minimal a second time on my machine (macOS 16.1 ARM) results in this error; deleting the whole .kcp directory fixes it.

KCP Error: cannot create the 'admin.kubeconfig` file with an empty token for the shard-admin user
error: kcp shard kcp terminated with exit code 1

@stevekuznetsov
Copy link
Contributor Author

@nrb that's a pre-existing boo-boo for these targets

First, we don't want to pass `-count 1` to `go test` unless someone
explicitly opts into this behavior, as this explicitly invalidates the
test cache. When re-running tests locally, being able to run the full
suite every time and use the cache is incredibly useful.

Second, we should only bother with limiting parallelism when we're
starting up a full `kcp` and `etcd` server per test case.

Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 26, 2022
}
require.NoError(t, err)
return false
}, 5*time.Second, time.Second, "upstream Deployment %s/%s got deleted or there was an error", upstreamNamespace.Name, upstreamDeployment.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the intent here was to wait a ~short amount of time to make sure there weren't any unintentional/unexpected deletions that trickled in after a period of time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote in the commit why that is not a valid way to test this behavior.

@@ -287,10 +288,16 @@ func TestClusterWorkspaceTypes(t *testing.T) {
})

t.Logf("Expect workspace to be stuck in initializing phase")
time.Sleep(5 * time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another case where there's a desire to wait a ~short amount of time and make sure an event does not happen. Not sure how we want to approach those.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not - we are waiting to see it be in initializing, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment about proving negatives and assuming time-scales for eventually consistent systems applies to this case as well.

@ncdc
Copy link
Member

ncdc commented Oct 26, 2022

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 26, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 26, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ncdc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 26, 2022
@openshift-merge-robot openshift-merge-robot merged commit 264a6fe into kcp-dev:main Oct 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants