Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-5744: dockercfg: use tokenrequest instead of SA secrets #223

Closed
wants to merge 4 commits into from

Conversation

stlaz
Copy link
Member

@stlaz stlaz commented May 6, 2022

this PR makes use of the TokenRequest API to get a token for the dockercfg SA secret rather than creating a secret and polluting etcd for that very same purpose,

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 6, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 6, 2022

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: stlaz
To complete the pull request process, please assign mfojtik after the PR has been reviewed.
You can assign the PR to them by writing /assign @mfojtik in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@stlaz
Copy link
Member Author

stlaz commented May 7, 2022

This also needs to implement token refresh.

Possible issue: we need to rotate the token inside the secret, is the change in the secret propagated in the pod?

@s-urbaniak
Copy link

Possible issue: we need to rotate the token inside the secret, is the change in the secret propagated in the pod?

generally, yes: https://kubernetes.io/docs/concepts/configuration/secret/#mounted-secrets-are-updated-automatically. But the pod is responsible for re-reading the value, either using a time based approach or by fsnotify.

@stlaz stlaz force-pushed the token_request_dockercfg branch 3 times, most recently from 4889e36 to f0c1de4 Compare May 24, 2022 14:05
@stlaz
Copy link
Member Author

stlaz commented Jun 1, 2022

/retest
should be showing better test pass rate as creating tokens should now (master is kube 1.24) be allowed

@stlaz
Copy link
Member Author

stlaz commented Jun 3, 2022

/retest

@stlaz
Copy link
Member Author

stlaz commented Jun 16, 2022

/retest

@stlaz stlaz force-pushed the token_request_dockercfg branch 2 times, most recently from 61bf205 to 4309267 Compare June 20, 2022 11:45
@stlaz stlaz changed the title wip: dockercfg: use tokenrequest instead of SA secrets dockercfg: use tokenrequest instead of SA secrets Jun 20, 2022
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 20, 2022
}

// NewDockercfgTokenDeletedController returns a new *DockercfgTokenDeletedController.
func NewDockercfgTokenDeletedController(secrets informers.SecretInformer, cl kclientset.Interface, options DockercfgTokenDeletedControllerOptions) *DockercfgTokenDeletedController {
Copy link
Contributor

@deads2k deads2k Jul 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this controller, what deletes the token secrets when the dockercfg secret is deleted later on?

Is there an ownerref these days perhaps?

@@ -138,7 +138,7 @@ type DockerRegistryServiceController struct {
}

// Runs controller loops and returns immediately
func (e *DockerRegistryServiceController) Run(workers int, stopCh <-chan struct{}) {
func (e *DockerRegistryServiceController) Run(ctx context.Context, workers int) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pull stuff like this which are maintenance refactors without intent changes into a separate PR for easier review and merge

ServiceAccountTokenSecretNameKey = "openshift.io/token-secret.name"
MaxRetriesBeforeResync = 5
MaxRetriesBeforeResync = 5
ExpirationCheckPeriod = 10 * time.Minute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you make me look at a constant, I'll ask for godoc ;)

// token data population
PendingTokenAnnotation = "openshift.io/create-dockercfg-secrets.pending-token"
PendingTokenAnnotation = "openshift.io/create-dockercfg-secrets.pending-secret"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a change here looks suspicious. I would expect a new constant instead.

if !exists {
continue
serviceAccount.Annotations[PendingTokenAnnotation] = pendingTokenName
updatedServiceAccount, err := e.client.CoreV1().ServiceAccounts(serviceAccount.Namespace).Update(ctx, serviceAccount, metav1.UpdateOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend using the apply client for this. You can do it as a separate step, but it will reduce conflicts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was considering patch when looking through the code, this helped with with other controllers in the past.

klog.V(4).Infof("Creating dockercfg secret %q for service account %s/%s", dockercfgSecret.Name, serviceAccount.Namespace, serviceAccount.Name)

// Save the secret
_, err = e.client.CoreV1().Secrets(serviceAccount.Namespace).Create(ctx, dockercfgSecret, metav1.CreateOptions{})
Copy link
Contributor

@deads2k deads2k Jul 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why create before getting the content? With the tokenrequest API, the call to get the content is synchronous and it would simplify some of the waiting logic, right?

EDIT: OH! it's because you need the secret for the bound token to bind to, isn't it? Still that's fast, (api calls, not controller response time), so you could do a secret create, a tokenrequest, a secret apply (avoid the conflict), and an SA update.

}

func (c *DockercfgController) waitForDockerURLs(ready chan<- struct{}, stopCh <-chan struct{}) {
func (c *DockercfgController) waitForDockerURLs(ctx context.Context, ready chan<- struct{}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separate from this PR (before or after), I wonder if these are still logically configurable. They used to be cyclical with the rest of the controller-manager so I built this logic, but I wonder if on openshift these are now practically fixed in a way that the names can be predicted during installation and reacted to if they ever change to avoid the interlock and self-discovery logic.

Just imagining future you trying to maintain this area of code.

pendingTokenName := serviceAccount.Annotations[PendingTokenAnnotation]

// If this service account has no record of a pending token name, record one
if len(pendingTokenName) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can build a valid dockerconfig secret synchronously, you'd be able to first build the secret and second directly update the serviceaccount with the pull secret and never need the intermediate annotation, right?

@deads2k
Copy link
Contributor

deads2k commented Jul 21, 2022

Just so I'm clear, this is about how we handle new dockercfg secrets, but how do we migrate existing ones and how do existing ones delete the old token secret and use a new tokenrequest based token instead?

default:
utilruntime.HandleError(fmt.Errorf("object passed to %T that is not expected: %T", e, obj))
return false
}
},
Handler: cache.ResourceEventHandlerFuncs{
// We don't need to react to secret deletes, the deleted_dockercfg_secrets controller does that
// It also updates the SA so we will eventually get back to creating a new secret
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you already use FilteringResoruceEventHandler it's sufficient for you to always e.enqueuSecret(secret)

}

func (e *DockercfgController) enqueueServiceAccountForToken(tokenSecret *v1.Secret) {
func (e *DockercfgController) enqueueServiceAccountForToken(dockerCfgSecret *v1.Secret) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it's not being used anywhere.

dockercfg[dockerURL] = credentialprovider.DockerConfigEntry{
Username: "serviceaccount",
Password: string(saToken),
Email: "serviceaccount@example.org",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The email shouldn't be needed, looking at my own .docker/config.json so I'd suggest dropping it.

if !exists {
continue
serviceAccount.Annotations[PendingTokenAnnotation] = pendingTokenName
updatedServiceAccount, err := e.client.CoreV1().ServiceAccounts(serviceAccount.Namespace).Update(ctx, serviceAccount, metav1.UpdateOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was considering patch when looking through the code, this helped with with other controllers in the past.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 20, 2022
@openshift-merge-robot
Copy link
Contributor

@stlaz: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 20, 2022

@stlaz: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/openshift-e2e-aws-builds-techpreview bcb0404 link false /test openshift-e2e-aws-builds-techpreview
ci/prow/e2e-gcp bcb0404 link true /test e2e-gcp
ci/prow/e2e-aws-proxy bcb0404 link false /test e2e-aws-proxy
ci/prow/e2e-aws bcb0404 link true /test e2e-aws
ci/prow/e2e-gcp-builds bcb0404 link true /test e2e-gcp-builds
ci/prow/e2e-gcp-ovn-builds bcb0404 link true /test e2e-gcp-ovn-builds
ci/prow/e2e-aws-ovn-upgrade bcb0404 link true /test e2e-aws-ovn-upgrade
ci/prow/e2e-gcp-ovn bcb0404 link true /test e2e-gcp-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@mfojtik mfojtik changed the title dockercfg: use tokenrequest instead of SA secrets Bug OCPBUGS-5744: dockercfg: use tokenrequest instead of SA secrets Jan 11, 2023
@stlaz stlaz changed the title Bug OCPBUGS-5744: dockercfg: use tokenrequest instead of SA secrets OCPBUGS-5744: dockercfg: use tokenrequest instead of SA secrets Jan 18, 2023
@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 19, 2023
@stlaz
Copy link
Member Author

stlaz commented Apr 20, 2023

/remove-lifecycle stale
I might be able to get to this soon-ish

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 20, 2023
@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 20, 2023
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 19, 2023
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Sep 19, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 19, 2023

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

@stlaz: An error was encountered updating to the NEW state for bug OCPBUGS-5744 on the Jira server at https://issues.redhat.com/. No known errors were detected, please see the full error message for details.

Full error message. No transition status with name `NEW` could be found. Please select from the following list: [Replan]

Please contact an administrator to resolve this issue, then request a bug refresh with /jira refresh.

In response to this:

this PR makes use of the TokenRequest API to get a token for the dockercfg SA secret rather than creating a secret and polluting etcd for that very same purpose,

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants