New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2.6 rancher updates #40243
2.6 rancher updates #40243
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c3ecd54
to
4f68c0f
Compare
This reverts commit 6c477a5.
The "cattle" service account on a downstream cluster is the account that Rancher uses to connect as an admin to the downstream cluster. Without this change, if the cattle service account's token is deleted, the cluster agent will regenerate it identically. This is a problem because it makes rotation of the token nontrivial. We can't craft the JWT ourselves or influence what claims are included in it, that is done within kubernetes. The only way to change the resulting JWT is to change the values kubernetes uses for claims. The only option is to make the secret name unique[1]. All other claims come from the service account, which we do not want to have to change in order to rotate the token. This change addresses the problem by using GenerateName when creating the secret so that it will be unique every time. However, since the name is no longer predictable, this causes problems when Rancher tries to look up the token. We now need to look up the name of the secret from the service account object. A further complication is that for kubernetes 1.24, the secret name is no longer stored on the service account, so now we set it explicitly. An extra benefit of this approach is that we no longer create multiple tokens for service accounts on k8s <1.24, since creating the token is skipped if it is found on the service account. This change refactors any code that was creating a service account token to use the serviceaccounttoken.EnsureSecretForServiceAccount function in order to be consistent everywhere. The function is updated to use a backoff routine instead of an infinite loop to check the state of the secret. It is flexible enough to use controller caches for callers with access to that, and falls back to regular clients for remote callers such as the agent. See also the change that introduced this functionality in Rancher[2]. [1] https://github.com/kubernetes/kubernetes/blob/v1.25.2/pkg/serviceaccount/legacy.go#L39 [2] rancher#38113 (cherry picked from commit 37c7ee1)
(cherry picked from commit d4d9741)
Make the setting of conditions consistent. Conditions do not need to be set explicitly when DoUntilTrue is used, but the object returned from DoUntilTrue does need to be explicitly updated. Use copies of the object until the object is ready to be updated. (cherry picked from commit dd1f35b2dcde242280680b99c162be0a2489549f)
(cherry picked from commit ce445013102a708e7a0dc9589162f2d20ae934ac)
The function comments on the secretmigrator assembler functions indicated they would never change the original object, but this was not true. This change ensures the objects are deepcopied within the function so make it consistent with their documentation. It also removes now-unnecessary deepcopies from the calling functions. Also corrects a badly copy-pasted comment on one assembler function. (cherry picked from commit 3f93f24567320acbcbe91737fc422d3a09cda0a2)
Without this change, if a user is created, is added to a project or cluster, and then makes a request in rapid succession (as in a CI case), it is highly likely that the PRTB or CRTB controller will not have finished adding the user's rolebindings to the project or cluster in between steps 2 and 3. The cause is the refactoring of the service account token generator in 37c7ee1 which added a wait to the service account creator to ensure the token was populated before returning. The wait loop would check the current secret object, then refresh the secret, then wait 2 seconds before checking that secret again. The original secret object was certainly not populated, and the first refresh was only a few nanoseconds afterward and so the first-time refreshed secret was also almost certainly not populated, so it was not until the second refresh that the secret was populated and ready. With the loop timing set to 2 seconds, this meant the wait took a full 4 seconds. This change reorders the wait loop to refresh the secret first thing, to avoid an extra loop, and reduces the step period to 2 milliseconds. This is enough time for the token to populate on the 2nd or 3rd retry and makes it much more likely the controller can finish setting up rolebindings before the user needs them. (cherry picked from commit 700b99c)
Previously, if the cattle token was rotated on an RKE2 downstream, the token would not propagate to the local cluster. This was not an issue for RKE clusters because the service account token is updated through the clusterdeploy controller and kontainer-engine driver. RKE2 clusters only ever had the token set in the tunnel authorizer, and only upon cluster creation. This change updates the tunnel authorizer to compare and update the token if necessary. The local secret is fully rotated rather than updated in-place in order to ensure the user context controller is triggered so that user controllers are automatically refreshed.
Use the correct condition and Do* function for the new RKE field migration. We don't need to explicitly set the condition to True when DoUntilTrue is used. Fix the unit tests to check that the object doesn't change when the migration is recalled, to show that the conditions aren't changed every controller sync.
* Bump bci-base to 15.4 - 15.3 is reaching EOL * Update Dockerfile.agent
* Move trigger refresh logic to token handler Before, a userattribute handler would check if the ExtraByProvider field was null and trigger a refresh. ExtraByProvider contains data that is available only on tokens. When there was no token associated with the user corresponding to the userattribute, the handler would fire endlessly. This happened because the field continued to be null but the refresh process would update the LastRefreshed field which would cause the userattribute handler to execute again. Now, the ExtraByProvider check is done on a token handler. The token handler is aware that the tokens existence may make populating the field possible and the refresh process, makes no changes to the token object. This avoids nonstop firing of handlers. * Add tests * Fix handler returns * Fix cherry-pick test
(cherry picked from commit 9a016a8)
Signed-off-by: Guilherme Macedo <guilherme.macedo@suse.com>
7c0beab
to
66de2da
Compare
Co-authored-by: Guilherme Macedo <guilherme@gmacedo.com>
Signed-off-by: Guilherme Macedo <guilherme.macedo@suse.com>
20cc4d2
to
b69dd48
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
package/Dockerfile
Outdated
@@ -16,7 +16,7 @@ WORKDIR /var/lib/rancher | |||
ARG ARCH=amd64 | |||
ARG IMAGE_REPO=rancher | |||
ARG SYSTEM_CHART_DEFAULT_BRANCH=release-v2.6 | |||
ARG CHART_DEFAULT_BRANCH=release-v2.6 | |||
ARG CHART_DEFAULT_BRANCH=dev-v2.6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert
ENV CATTLE_FLEET_MIN_VERSION=100.1.0+up0.4.0 | ||
ENV CATTLE_RANCHER_WEBHOOK_MIN_VERSION=1.0.6+up0.2.7 | ||
ENV CATTLE_FLEET_MIN_VERSION=100.2.0+up0.5.1 | ||
ENV CATTLE_RANCHER_WEBHOOK_MIN_VERSION=1.0.7+up0.2.8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confirmed both of these
github.com/rancher/gke-operator v1.1.4 | ||
github.com/rancher/norman v0.0.0-20220627222520-b74009fac3ff | ||
github.com/rancher/rke v1.3.15 | ||
github.com/rancher/wrangler v1.0.1-0.20220520195731-8eeded9bae2a | ||
github.com/rancher/wrangler v1.0.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confirmed
Mentioned UI changes that don't have a corresponding release in ember and dashboard. Going to move forward for now. |
No description provided.