✨ cache: add a synthetic delay to the cache server #2742

stevekuznetsov · 2023-02-02T18:42:32Z

Signed-off-by: Steve Kuznetsov skuznets@redhat.com

stevekuznetsov · 2023-02-02T19:05:47Z

In the sharded env, this seems to cause the VWs to never come up. They fail to get data from a live read ... I need to improve the logging situation here, this is hard to parse.

pkg/cache/server/handler.go

p0lyn0mial · 2023-02-03T08:19:07Z

lgtm but it looks like you will be changing this PR, i will have another look when it is ready.

p0lyn0mial · 2023-02-13T12:36:10Z

e2e-multiple-runs failed on TestReplicateShard.

Normally this test creates a workspace, adds a new shard resource and verifies if it was replicated.

In this run the test failed because a workspace wasn't scheduled. The scheduling controller was trying to put the workspace on a fake shard!

In general TestReplicationDisruptive runs each scenario in a separate private server. Each scenario gets its own set of directories. However in the faulty run I found a few requests issued by UserAgent=TestReplicateShardNegative which might indicate that the private instance was shared among the tests. Which is very suspicious because ports to private servers appears to be assigned randomly.

I0202 18:55:46.498784   55516 resource_controller.go:216] "queueing resource" reconciler="kcp-workload-resource-scheduler" key="workspaces.v1alpha1.tenancy.kcp.io::root|e2e-workspace-ntgqm"

E0202 18:55:46.557341   55516 workspace_controller.go:237] "kcp-workspace" controller failed to sync "root|e2e-workspace-ntgqm", err: Post "https://base.kcp.test.dev/clusters/1yo677m4ish61htq/apis/core.kcp.io/v1alpha1/logicalclusters": dial tcp: lookup base.kcp.test.dev on 172.30.0.10:53: no such host

And it looks like the fake shard was added by a different test (TestReplicateShardNegative ?!):

I0202 18:55:46.402018   55516 httplog.go:131] "HTTP" verb="LIST" URI="/clusters/root/apis/core.kcp.io/v1alpha1/shards" latency="1.545268ms" userAgent="cache.test/v0.0.0 (linux/amd64) kubernetes/$Format/TestReplicationDisruptive/TestReplicateShardNegative" audit-ID="9cdc7e0b-5f4e-438c-bb49-bc9c421d274d" srcIP="10.130.28.235:44072" resp=200
I0202 18:55:46.405126   55516 httplog.go:131] "HTTP" verb="POST" URI="/clusters/root/apis/core.kcp.io/v1alpha1/shards" latency="1.62927ms" userAgent="cache.test/v0.0.0 (linux/amd64) kubernetes/$Format/TestReplicationDisruptive" audit-ID="2780431d-ecc0-4ae7-a256-02d76efb0ddc" srcIP="10.130.28.235:44072" resp=201
I0202 18:55:46.405148   55516 shard_controller.go:90] "queueing Shard" reconciler="kcp-shard" key="root|test-shard-7397489778125985083"

Also, it looks like the CI is missing some logs. Since TestReplicationDisruptive runs two scenarios, each with its own server I would expect to find two separate directories/log that would correspond to subtests but I found only one.

ncdc · 2023-02-13T13:36:35Z

@p0lyn0mial I only recently fixed it so each disruptive test runs with its own private server. The test failure might have been from before that fix.

p0lyn0mial · 2023-02-13T13:52:42Z

@p0lyn0mial I only recently fixed it so each disruptive test runs with its own private server. The test failure might have been from before that fix.

thanks for the info, let me re-run the test then.

/test e2e-multiple-runs

p0lyn0mial · 2023-02-13T14:26:34Z

e2e-multiple-runs is green! it looks like now we are storing artefacts of a private server in a separate dir 👍 - https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/kcp-dev_kcp/27[…]tiple-runs/artifacts/TestReplicationDisruptive/

stevekuznetsov · 2023-02-20T15:45:50Z

/test all

stevekuznetsov · 2023-02-20T15:46:10Z

Huh, something else must have landed in the interim, when I left this was 100% broken :)

Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>

openshift-ci · 2023-02-20T16:49:11Z

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci bot assigned p0lyn0mial Feb 2, 2023

openshift-ci bot requested a review from ncdc February 2, 2023 18:42

p0lyn0mial reviewed Feb 3, 2023

View reviewed changes

pkg/cache/server/handler.go Show resolved Hide resolved

stevekuznetsov mentioned this pull request Feb 3, 2023

feature: Enable random workspace scheduling and work through e2e (without tmc) #2612

Closed

cache: add a synthetic delay to the cache server

0d6a2ae

Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>

stevekuznetsov force-pushed the skuznets/add-sythethic-delay branch from 8ce17df to 0d6a2ae Compare February 20, 2023 16:12

stevekuznetsov added approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. labels Feb 20, 2023

openshift-merge-robot merged commit 4f71d22 into kcp-dev:main Feb 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ cache: add a synthetic delay to the cache server #2742

✨ cache: add a synthetic delay to the cache server #2742

stevekuznetsov commented Feb 2, 2023

stevekuznetsov commented Feb 2, 2023

p0lyn0mial commented Feb 3, 2023

p0lyn0mial commented Feb 13, 2023

ncdc commented Feb 13, 2023

p0lyn0mial commented Feb 13, 2023

p0lyn0mial commented Feb 13, 2023

stevekuznetsov commented Feb 20, 2023

stevekuznetsov commented Feb 20, 2023

openshift-ci bot commented Feb 20, 2023

✨ cache: add a synthetic delay to the cache server #2742

✨ cache: add a synthetic delay to the cache server #2742

Conversation

stevekuznetsov commented Feb 2, 2023

stevekuznetsov commented Feb 2, 2023

p0lyn0mial commented Feb 3, 2023

p0lyn0mial commented Feb 13, 2023

ncdc commented Feb 13, 2023

p0lyn0mial commented Feb 13, 2023

p0lyn0mial commented Feb 13, 2023

stevekuznetsov commented Feb 20, 2023

stevekuznetsov commented Feb 20, 2023

openshift-ci bot commented Feb 20, 2023