Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.6 rancher updates #40243

Merged
merged 31 commits into from Jan 24, 2023
Merged

Conversation

KevinJoiner
Copy link
Contributor

No description provided.

Copy link
Contributor

@rmweir rmweir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove 0402678 5a56a94 and their corresponding commits in the other two PRs? We do not want these changes for final release.

MbolotSuse and others added 26 commits January 23, 2023 16:21
The "cattle" service account on a downstream cluster is the account that
Rancher uses to connect as an admin to the downstream cluster. Without
this change, if the cattle service account's token is deleted, the
cluster agent will regenerate it identically. This is a problem because
it makes rotation of the token nontrivial.

We can't craft the JWT ourselves or influence what claims are included
in it, that is done within kubernetes. The only way to change the
resulting JWT is to change the values kubernetes uses for claims. The
only option is to make the secret name unique[1]. All other claims come
from the service account, which we do not want to have to change in
order to rotate the token.

This change addresses the problem by using GenerateName when creating
the secret so that it will be unique every time. However, since the name
is no longer predictable, this causes problems when Rancher tries to
look up the token. We now need to look up the name of the secret from
the service account object. A further complication is that for
kubernetes 1.24, the secret name is no longer stored on the service
account, so now we set it explicitly. An extra benefit of this approach
is that we no longer create multiple tokens for service accounts on k8s
<1.24, since creating the token is skipped if it is found on the service
account.

This change refactors any code that was creating a service account token
to use the serviceaccounttoken.EnsureSecretForServiceAccount function in
order to be consistent everywhere. The function is updated to use a
backoff routine instead of an infinite loop to check the state of the
secret. It is flexible enough to use controller caches for callers with
access to that, and falls back to regular clients for remote callers
such as the agent.

See also the change that introduced this functionality in Rancher[2].

[1] https://github.com/kubernetes/kubernetes/blob/v1.25.2/pkg/serviceaccount/legacy.go#L39
[2] rancher#38113

(cherry picked from commit 37c7ee1)
Make the setting of conditions consistent. Conditions do not need to be
set explicitly when DoUntilTrue is used, but the object returned from
DoUntilTrue does need to be explicitly updated. Use copies of the object
until the object is ready to be updated.

(cherry picked from commit dd1f35b2dcde242280680b99c162be0a2489549f)
(cherry picked from commit ce445013102a708e7a0dc9589162f2d20ae934ac)
The function comments on the secretmigrator assembler functions
indicated they would never change the original object, but this was not
true. This change ensures the objects are deepcopied within the function
so make it consistent with their documentation. It also removes
now-unnecessary deepcopies from the calling functions. Also corrects a
badly copy-pasted comment on one assembler function.

(cherry picked from commit 3f93f24567320acbcbe91737fc422d3a09cda0a2)
Without this change, if a user is created, is added to a project or
cluster, and then makes a request in rapid succession (as in a CI case),
it is highly likely that the PRTB or CRTB controller will not have
finished adding the user's rolebindings to the project or cluster in
between steps 2 and 3. The cause is the refactoring of the service
account token generator in 37c7ee1 which added a wait to the service
account creator to ensure the token was populated before returning. The
wait loop would check the current secret object, then refresh the
secret, then wait 2 seconds before checking that secret again. The
original secret object was certainly not populated, and the first
refresh was only a few nanoseconds afterward and so the first-time
refreshed secret was also almost certainly not populated, so it was not
until the second refresh that the secret was populated and ready. With
the loop timing set to 2 seconds, this meant the wait took a full 4
seconds.

This change reorders the wait loop to refresh the secret first thing, to
avoid an extra loop, and reduces the step period to 2 milliseconds. This
is enough time for the token to populate on the 2nd or 3rd retry and
makes it much more likely the controller can finish setting up
rolebindings before the user needs them.

(cherry picked from commit 700b99c)
Previously, if the cattle token was rotated on an RKE2 downstream, the
token would not propagate to the local cluster. This was not an issue
for RKE clusters because the service account token is updated through
the clusterdeploy controller and kontainer-engine driver. RKE2 clusters
only ever had the token set in the tunnel authorizer, and only upon
cluster creation. This change updates the tunnel authorizer to compare
and update the token if necessary. The local secret is fully rotated
rather than updated in-place in order to ensure the user context
controller is triggered so that user controllers are automatically
refreshed.
Use the correct condition and Do* function for the new RKE field
migration. We don't need to explicitly set the condition to True when
DoUntilTrue is used.

Fix the unit tests to check that the object doesn't change when the
migration is recalled, to show that the conditions aren't changed every
controller sync.
* Bump bci-base to 15.4 - 15.3 is reaching EOL

* Update Dockerfile.agent
* Move trigger refresh logic to token handler

Before, a userattribute handler would check if the ExtraByProvider
field was null and trigger a refresh. ExtraByProvider contains
data that is available only on tokens. When there was no token
associated with the user corresponding to the userattribute, the
handler would fire endlessly. This happened because the field
continued to be null but the refresh process would update the
LastRefreshed field which would cause the userattribute handler
to execute again. Now, the ExtraByProvider check is done on a
token handler. The token handler is aware that the tokens
existence may make populating the field possible and the refresh
process, makes no changes to the token object. This avoids nonstop
firing of handlers.

* Add tests

* Fix handler returns

* Fix cherry-pick test
snasovich and others added 2 commits January 23, 2023 16:22
Signed-off-by: Guilherme Macedo <guilherme.macedo@suse.com>
package/Dockerfile Outdated Show resolved Hide resolved
KevinJoiner and others added 2 commits January 23, 2023 20:14
Co-authored-by: Guilherme Macedo <guilherme@gmacedo.com>
Signed-off-by: Guilherme Macedo <guilherme.macedo@suse.com>
Copy link
Contributor

@cbron cbron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -16,7 +16,7 @@ WORKDIR /var/lib/rancher
ARG ARCH=amd64
ARG IMAGE_REPO=rancher
ARG SYSTEM_CHART_DEFAULT_BRANCH=release-v2.6
ARG CHART_DEFAULT_BRANCH=release-v2.6
ARG CHART_DEFAULT_BRANCH=dev-v2.6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert

ENV CATTLE_FLEET_MIN_VERSION=100.1.0+up0.4.0
ENV CATTLE_RANCHER_WEBHOOK_MIN_VERSION=1.0.6+up0.2.7
ENV CATTLE_FLEET_MIN_VERSION=100.2.0+up0.5.1
ENV CATTLE_RANCHER_WEBHOOK_MIN_VERSION=1.0.7+up0.2.8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confirmed both of these

github.com/rancher/gke-operator v1.1.4
github.com/rancher/norman v0.0.0-20220627222520-b74009fac3ff
github.com/rancher/rke v1.3.15
github.com/rancher/wrangler v1.0.1-0.20220520195731-8eeded9bae2a
github.com/rancher/wrangler v1.0.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confirmed

@rmweir
Copy link
Contributor

rmweir commented Jan 24, 2023

Mentioned UI changes that don't have a corresponding release in ember and dashboard. Going to move forward for now.

@KevinJoiner KevinJoiner merged commit c1972e2 into rancher:release/v2.6.10 Jan 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet