Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HOSTEDCP-1429: hypershift-operator: add a controller for t-shirt sizing #3686

Merged

Conversation

stevekuznetsov
Copy link
Contributor

/assign @csrwng

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 5, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 5, 2024

@stevekuznetsov: This pull request references HOSTEDCP-1429 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

/assign @csrwng

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

netlify bot commented Mar 5, 2024

Deploy Preview for hypershift-docs ready!

Name Link
🔨 Latest commit 4ff8a15
🔍 Latest deploy log https://app.netlify.com/sites/hypershift-docs/deploys/65e8ce700742430008f3d2ac
😎 Deploy Preview https://deploy-preview-3686--hypershift-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@openshift-ci openshift-ci bot requested review from csrwng and sjenning March 5, 2024 15:17
@openshift-ci openshift-ci bot added area/ci-tooling Indicates the PR includes changes for CI or tooling area/cli Indicates the PR includes changes for CLI area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release and removed do-not-merge/needs-area labels Mar 5, 2024
@stevekuznetsov
Copy link
Contributor Author

Looks like I need to write that integration test :D

@openshift-ci openshift-ci bot added the area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release label Mar 5, 2024
@stevekuznetsov
Copy link
Contributor Author

// TODO: pause controller when reconciliation is paused

@stevekuznetsov stevekuznetsov force-pushed the skuznets/t-shirt-sizing branch 4 times, most recently from 5c4377f to 4ff8a15 Compare March 6, 2024 20:13
@stevekuznetsov
Copy link
Contributor Author

@csrwng integration test:

$ GO_TEST_FLAGS='-run TestHostedSizingController' PATH="${PATH}:$(realpath ./bin)" WORK_DIR=/tmp/integration PULL_SECRET=~/secrets/ocm-pull-secret ./test/integration/run.sh test
Running test...
2024-03-06T13:13:07-07:00	INFO	running tests
=== RUN   TestHostedSizingController
    hosted_cluster_sizing_test.go:33: setting the cluster sizing configuration
    hosted_cluster_sizing_test.go:78: adding a small node count (1) to the hosted control plane hosted-clusters-2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8
    hosted_cluster_sizing_test.go:184: expecting hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 to transition to small size
    hosted_cluster_sizing_test.go:210: waiting for hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 to have status condition ClusterSizeComputed=True: small and match validation func
    hosted_cluster_sizing_test.go:224: hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 observed at RV 2147 after 9.721094ms
    hosted_cluster_sizing_test.go:236: hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 status: ClusterSizeTransitionPending=True: TransitionDelayNotElapsed (HostedClusters must wait at least 1h0m0s to increase in size after the last transition.)
    hosted_cluster_sizing_test.go:236: hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 status: ClusterSizeComputed=True: small (The HostedCluster has transitioned to a new t-shirt size.)
    hosted_cluster_sizing_test.go:254: hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 validation: hostedCluster.metadata.labels[io.openshift.hypershift/hosted-cluster-size]=small
    hosted_cluster_sizing_test.go:91: adding a large node count (50) to the hosted control plane hosted-clusters-2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8
    hosted_cluster_sizing_test.go:101: expecting hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 to delay transition
    hosted_cluster_sizing_test.go:210: waiting for hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 to have status condition ClusterSizeTransitionPending=True: TransitionDelayNotElapsed
    hosted_cluster_sizing_test.go:224: hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 observed at RV 2149 after 9.176652ms
    hosted_cluster_sizing_test.go:236: hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 status: ClusterSizeTransitionPending=True: TransitionDelayNotElapsed (HostedClusters must wait at least 1h0m0s to increase in size after the last transition.)
    hosted_cluster_sizing_test.go:236: hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 status: ClusterSizeComputed=True: small (The HostedCluster has transitioned to a new t-shirt size.)
    hosted_cluster_sizing_test.go:107: configure the cluster sizing configuration to have miniscule delays
    hosted_cluster_sizing_test.go:184: expecting hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 to transition to medium size
    hosted_cluster_sizing_test.go:210: waiting for hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 to have status condition ClusterSizeComputed=True: medium and match validation func
    hosted_cluster_sizing_test.go:224: hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 observed at RV 2149 after 2.248304ms
    hosted_cluster_sizing_test.go:236: hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 status: ClusterSizeTransitionPending=True: TransitionDelayNotElapsed (HostedClusters must wait at least 1h0m0s to increase in size after the last transition.)
    hosted_cluster_sizing_test.go:236: hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 status: ClusterSizeComputed=True: small (The HostedCluster has transitioned to a new t-shirt size.)
    hosted_cluster_sizing_test.go:224: hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 observed at RV 2153 after 116.688634ms
    hosted_cluster_sizing_test.go:236: hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 status: ClusterSizeComputed=True: medium (The HostedCluster has transitioned to a new t-shirt size.)
    hosted_cluster_sizing_test.go:236: hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 status: ClusterSizeTransitionPending=False: ClusterSizeTransitioned (The HostedCluster has transitioned to a new t-shirt size.)
    hosted_cluster_sizing_test.go:254: hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 validation: hostedCluster.metadata.labels[io.openshift.hypershift/hosted-cluster-size]=medium
    hosted_cluster_sizing_test.go:128: configure the cluster sizing configuration to have no concurrency
    hosted_cluster_sizing_test.go:148: adding a huge node count (500) to the hosted control plane hosted-clusters-2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8
    hosted_cluster_sizing_test.go:158: expecting hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 to delay transition for concurrency
    hosted_cluster_sizing_test.go:210: waiting for hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 to have status condition ClusterSizeTransitionPending=True: ConcurrencyLimitReached
    hosted_cluster_sizing_test.go:224: hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 observed at RV 2153 after 2.245286ms
    hosted_cluster_sizing_test.go:236: hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 status: ClusterSizeComputed=True: medium (The HostedCluster has transitioned to a new t-shirt size.)
    hosted_cluster_sizing_test.go:236: hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 status: ClusterSizeTransitionPending=False: ClusterSizeTransitioned (The HostedCluster has transitioned to a new t-shirt size.)
    hosted_cluster_sizing_test.go:224: hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 observed at RV 2157 after 100.333364ms
    hosted_cluster_sizing_test.go:236: hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 status: ClusterSizeTransitionPending=True: TransitionDelayNotElapsed (HostedClusters must wait at least 1s to increase in size after the last transition.)
    hosted_cluster_sizing_test.go:236: hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 status: ClusterSizeComputed=True: medium (The HostedCluster has transitioned to a new t-shirt size.)
    hosted_cluster_sizing_test.go:224: hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 observed at RV 2159 after 513.554479ms
    hosted_cluster_sizing_test.go:236: hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 status: ClusterSizeTransitionPending=True: ConcurrencyLimitReached (1 HostedClusters have already transitioned sizes in the last 1h0m0s, more time must elapse before the next transition.)
    hosted_cluster_sizing_test.go:236: hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 status: ClusterSizeComputed=True: medium (The HostedCluster has transitioned to a new t-shirt size.)
    hosted_cluster_sizing_test.go:164: configure the cluster sizing configuration to have allow lots of concurrency
    hosted_cluster_sizing_test.go:184: expecting hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 to transition to large size
    hosted_cluster_sizing_test.go:210: waiting for hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 to have status condition ClusterSizeComputed=True: large and match validation func
    hosted_cluster_sizing_test.go:224: hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 observed at RV 2159 after 2.33001ms
    hosted_cluster_sizing_test.go:236: hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 status: ClusterSizeTransitionPending=True: ConcurrencyLimitReached (1 HostedClusters have already transitioned sizes in the last 1h0m0s, more time must elapse before the next transition.)
    hosted_cluster_sizing_test.go:236: hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 status: ClusterSizeComputed=True: medium (The HostedCluster has transitioned to a new t-shirt size.)
    hosted_cluster_sizing_test.go:224: hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 observed at RV 2163 after 104.618861ms
    hosted_cluster_sizing_test.go:236: hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 status: ClusterSizeComputed=True: large (The HostedCluster has transitioned to a new t-shirt size.)
    hosted_cluster_sizing_test.go:236: hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 status: ClusterSizeTransitionPending=False: ClusterSizeTransitioned (The HostedCluster has transitioned to a new t-shirt size.)
    hosted_cluster_sizing_test.go:254: hosted cluster hosted-clusters/2sbjc6akxi6ochgq0u1y0u6gah0ica9ph1s668m92fw8 validation: hostedCluster.metadata.labels[io.openshift.hypershift/hosted-cluster-size]=large
    run.go:79: cleaning up...
--- PASS: TestHostedSizingController (0.98s)
PASS
ok  	github.com/openshift/hypershift/test/integration	0.998s

@openshift-ci openshift-ci bot added the area/testing Indicates the PR includes changes for e2e testing label Mar 6, 2024
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 6, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 6, 2024
cmd/install/install.go Outdated Show resolved Hide resolved
There are only two places in k8s where ordering matters for what you're
applying to the server: namespaces and CRDs. When we declare that we
need CRDs, we also need to know that they are established before they
can be used. New options to the install command allow us to wait on this
explicitly, instead of implcitly, and output separate files during
render to allow third parties to do so as well.

Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
Copy link
Contributor

@csrwng csrwng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits

Dockerfile Show resolved Hide resolved
@csrwng
Copy link
Contributor

csrwng commented Mar 11, 2024

/approve

Copy link
Contributor

openshift-ci bot commented Mar 11, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: csrwng, stevekuznetsov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 11, 2024
@csrwng
Copy link
Contributor

csrwng commented Mar 12, 2024

/hold
@stevekuznetsov in manually testing this, I don't see the delay taking effect on scale up/down. I changed the config after the hypershift operator was already running, so not sure if the config is stale or it's something else.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 12, 2024
@stevekuznetsov
Copy link
Contributor Author

 Error: failed to create infra: cannot tag NAT gateway EIP: InvalidElasticIpID.NotFound: The elasticIp ID 'eipalloc-02b301c4c6f314eec' does not exist
	status code: 400, request id: c6d39334-9eac-4b65-a468-1971e77a3cfc
failed to create infra: cannot tag NAT gateway EIP: InvalidElasticIpID.NotFound: The elasticIp ID 'eipalloc-02b301c4c6f314eec' does not exist
	status code: 400, request id: c6d39334-9eac-4b65-a468-1971e77a3cfc 

/retest

@csrwng
Copy link
Contributor

csrwng commented Mar 13, 2024

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 13, 2024
@csrwng
Copy link
Contributor

csrwng commented Mar 13, 2024

@stevekuznetsov one thing that occurred to me is that we should skip reconcile of a hostedcluster that either has a deletionTimestamp or is paused. Other than that, lgtm

@csrwng
Copy link
Contributor

csrwng commented Mar 13, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 13, 2024
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 8fc4d27 and 2 for PR HEAD d6f9feb in total

Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Mar 13, 2024
@csrwng
Copy link
Contributor

csrwng commented Mar 13, 2024

/lgtm
/retest-required

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 13, 2024
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 374e6c0 and 2 for PR HEAD c17b642 in total

Copy link
Contributor

openshift-ci bot commented Mar 13, 2024

@stevekuznetsov: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-ibmcloud-iks c17b642 link false /test e2e-ibmcloud-iks
ci/prow/e2e-ibmcloud-roks c17b642 link false /test e2e-ibmcloud-roks

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit be38396 into openshift:main Mar 14, 2024
12 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/ci-tooling Indicates the PR includes changes for CI or tooling area/cli Indicates the PR includes changes for CLI area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/testing Indicates the PR includes changes for e2e testing jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants