Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 add support for standalone virtualworkspace server #2407

Merged
merged 7 commits into from
Jan 19, 2023

Conversation

hardys
Copy link

@hardys hardys commented Nov 24, 2022

Summary

This is another change decoupled from the shard-standalone-vw branch from @ncdc

Related issue(s)

Follow up to ##2297 and #2303 which also merged changes from the WIP branch

Fixes #2254
Fixes #2056

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 24, 2022
@hardys hardys changed the title [WIP] 🐛 sharded-test-server: add support for standalone virtualworkspace server [WIP] 🐛 add support for standalone virtualworkspace server Nov 24, 2022
@hardys
Copy link
Author

hardys commented Nov 24, 2022

Some of the e2e tests are failing hence marking this as WIP until resolved, any help with that or initial review feedback is welcome :)

pkg/server/controllers.go Outdated Show resolved Hide resolved
pkg/server/handler.go Outdated Show resolved Hide resolved
pkg/server/virtual.go Outdated Show resolved Hide resolved
test/e2e/virtual/workspaces/virtual_workspace_test.go Outdated Show resolved Hide resolved
@hardys hardys force-pushed the shard-standalone-vw2 branch 2 times, most recently from 2f0a93a to c60b0e6 Compare November 30, 2022 15:42
@hardys
Copy link
Author

hardys commented Nov 30, 2022

Thanks for the review feedback @ncdc - I pushed an update which fixes most (but not yet all) of the e2e failures, will leave this as WIP until all tests are green and the review feedback is addressed 👍

pkg/server/controllers.go Outdated Show resolved Hide resolved
@hardys hardys force-pushed the shard-standalone-vw2 branch 2 times, most recently from 90e3c6b to 13fd4fd Compare December 5, 2022 18:10
Copy link
Contributor

@stevekuznetsov stevekuznetsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems mostly reasonable, but boy is that an enormous test setup

test/e2e/virtual/workspaces/virtual_workspace_test.go Outdated Show resolved Hide resolved
test/e2e/virtual/workspaces/virtual_workspace_test.go Outdated Show resolved Hide resolved
test/e2e/virtual/workspaces/virtual_workspace_test.go Outdated Show resolved Hide resolved
@hardys hardys force-pushed the shard-standalone-vw2 branch 2 times, most recently from e2c1812 to c4fa992 Compare December 7, 2022 09:24
@hardys
Copy link
Author

hardys commented Dec 13, 2022

@ncdc quick update on this as I'm still struggling somewhat to get the e2e tests all working in standalone-vw mode

AFAICS some tests (for example TestRootWorkspaces/a_user_sees_his_own_workspaces_at_the_root,_but_no_other_workspaces are failing because they hit the ForeverTestTimeout of 30s.

Locally testing (with sharded-test-server configured with --shard-run-virtual-workspaces=false) e.g

GOOS=linux GOARCH=amd64 go test -v -failfast -parallel=1 -count=1 ./test/e2e/virtual/workspaces/... -run TestRootWorkspaces/a_user_sees_his_own_workspaces_at_the_root,_but_no_other_workspaces -args --use-default-kcp-server --root-shard-kubeconfig=$PWD/.kcp-0/admin.kubeconfig

We see that the test sometimes passes, and if I increase the test timeout it can pass reliably - in some test runs locally I'm seeing a delay of around 50 seconds waiting for the user1-workspace to appear (this is on baremetal so not resource constrained, although I do see high CPU usage).

In the shard logs we can see the proxied requests, and there are some bootstrapping authz errors which I'm thinking may be related?

I1213 17:56:41.503077 2707520 handler.go:138] "proxying virtual workspace request" inURL="/services/workspaces/root/apis/tenancy.kcp.dev/v1beta1/workspaces" outURL="https://10.19.133.66:7444/services/workspaces/root/apis/tenancy.kcp.dev/v1beta1/workspaces"
I1213 17:56:41.506245 2707520 httplog.go:131] "HTTP" verb="POST" URI="/clusters/root/apis/authorization.k8s.io/v1/subjectaccessreviews" latency="1.454168ms" userAgent="virtual-workspaces/v0.0.0 (linux/amd64) kubernetes/$Format/workspaces-virtual-workspace" audit-ID="5cd9b73b-6526-4798-ae35-3d85c4231534" srcIP="[::1]:35636" resp=201
I1213 17:56:41.507868 2707520 decorator.go:75] "bootstrap policy reason: "
I1213 17:56:41.507930 2707520 decorator.go:75] "local policy for cluster \"root\" reason: "
I1213 17:56:41.507940 2707520 decorator.go:75] "delegated bootstrap.authorization.kcp.dev authorizer reason: access denied\ndelegated local.authorization.kcp.dev authorizer reason: access denied"
I1213 17:56:41.507947 2707520 decorator.go:75] "delegated maxpermissionpolicy.authorization.kcp.dev authorizer reason: access denied"
I1213 17:56:41.507952 2707520 decorator.go:75] "delegated systemcrd.authorization.kcp.dev authorizer reason: access denied"
I1213 17:56:41.507958 2707520 decorator.go:75] "delegated content.authorization.kcp.dev authorizer reason: access denied"

If you have any suggestions where to look next that would be welcome, currently I'm not very sure why there is such a long delay in the workspace creation.

@ncdc
Copy link
Member

ncdc commented Dec 13, 2022

@hardys this particular test will go away as part of the workspace refactoring. Do you have a list of all the tests you're having problems with, and we can see if any of them still need addressing?

@hardys
Copy link
Author

hardys commented Dec 13, 2022

@hardys this particular test will go away as part of the workspace refactoring. Do you have a list of all the tests you're having problems with, and we can see if any of them still need addressing?

@ncdc I've mostly been testing e2e-sharded-minimal locally so I don't yet have an exhaustive list, but I'm seeing TestInProcessWorkspacesVirtualWorkspaces/In-process_virtual_workspace_apiserver/create_a_sub-workspace_and_verify_that_only_users_with_right_permissions_can_access_workspaces_inside_it also fail (similar to above, sometimes it passes)

I'm interested to understand why the long delay though, with the in-process VW the workspace list succeeds pretty much immediately, any idea why is it taking nearly a minute in standalone VW mode?

@ncdc
Copy link
Member

ncdc commented Dec 13, 2022

Not offhand. Maybe something to do with the workspace authz cache (which will be going away as part of the workspace refactor)

@hardys
Copy link
Author

hardys commented Dec 13, 2022

Not offhand. Maybe something to do with the workspace authz cache (which will be going away as part of the workspace refactor)

Ack thanks, sounds like I should probably pause work on this and rebase after the workspace refactor lands :)

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 23, 2022
@s-urbaniak
Copy link
Contributor

s-urbaniak commented Jan 2, 2023

@hardys let me know if you need help after the rebase, from the logs it looks like the requesting user for the SAR passes all authorizers but the local and bootstrap policy one for the root workspace. Unfortunately kubernetes RBAC does not give any reason in case of denials, so the local and bootstrap policy authorizer return empty string.

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 5, 2023
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 16, 2023
@hardys
Copy link
Author

hardys commented Jan 16, 2023

/test e2e-shared

Looks like #1878

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 18, 2023
Steven Hardy and others added 7 commits January 19, 2023 12:07
Co-authored-by: Andy Goldstein <andy.goldstein@redhat.com>
Co-authored-by: Andy Goldstein <andy.goldstein@redhat.com>
Co-authored-by: Andy Goldstein <andy.goldstein@redhat.com>
Co-authored-by: Andy Goldstein <andy.goldstein@redhat.com>
Align with the shard WaitForReady pattern ref kcp-dev#2303 as a step
towards enabling multiple VW servers when there are multiple shards
Avoid calculating the port in multiple places, and also use
JoinHostByPort instead of Sprintf
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 19, 2023
@hardys
Copy link
Author

hardys commented Jan 19, 2023

/hold cancel

@ncdc @stevekuznetsov this is ready for another review when you get some time, thanks!

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 19, 2023
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 19, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 19, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ncdc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 43b6fc2 into kcp-dev:main Jan 19, 2023
hardys pushed a commit to hardys/kcp that referenced this pull request Jan 20, 2023
In kcp-dev#2407 we introduced this validation, but it doesn't handle the
case where the VW server is not standalone, but you still want to
specify --shard-virtual-workspace-url

This scenario is likely when deploying a sharded environment, since
you'll want the VW traffic to go via the front-proxy regardless
of whether the VW server is standalone or not.
hardys pushed a commit to hardys/kcp that referenced this pull request Jan 23, 2023
Partially reverts logic introduced in kcp-dev#2407 so we don't unconditionally
use the external URL - when non-standalone VW mode is used we still
need the old behavior, or the loopback client can't access the VW
server.

This breaks shard bootstrapping (evidently not in our e2e scenarios,
but in a real deployment where probes mean the proxy isn't ready until
the shard bootstrap is complete)
hardys pushed a commit to hardys/kcp that referenced this pull request Jan 24, 2023
Partially reverts logic introduced in kcp-dev#2407 so we don't unconditionally
use the external URL - when non-standalone VW mode is used we still
need the old behavior, or the loopback client can't access the VW
server.

This breaks shard bootstrapping (evidently not in our e2e scenarios,
but in a real deployment where probes mean the proxy isn't ready until
the shard bootstrap is complete)
hardys pushed a commit to hardys/kcp that referenced this pull request Jan 24, 2023
Partially reverts logic introduced in kcp-dev#2407 so we don't unconditionally
use the external URL - when non-standalone VW mode is used we still
need the old behavior, or the loopback client can't access the VW
server.

This breaks shard bootstrapping (evidently not in our e2e scenarios,
but in a real deployment where probes mean the proxy isn't ready until
the shard bootstrap is complete)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
5 participants