Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

azure: don't use managed identity on ARO #4843

Merged

Conversation

m1kola
Copy link
Member

@m1kola m1kola commented Apr 12, 2021

  • Adds a new aro build tag which is going to be used by ARO when vendoring the installer
  • Disables managed identity for ARO

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 12, 2021
@openshift-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign wking after the PR has been reviewed.
You can assign the PR to them by writing /assign @wking in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@staebler
Copy link
Contributor

This comes in timely for us, @m1kola. We need this same logic for Azure Stack, which does not have support for user-provided managed identities.

@m1kola
Copy link
Member Author

m1kola commented Apr 12, 2021

@staebler Is there anything I should take into account while working on this PR? Maybe you want me to change something in it (like naming)?

Or should I go ahead and do what we need for ARO and later you refactor the codebase to accommodate Azure Stack?

@staebler
Copy link
Contributor

@staebler Is there anything I should take into account while working on this PR? Maybe you want me to change something in it (like naming)?

Or should I go ahead and do what we need for ARO and later you refactor the codebase to accommodate Azure Stack?

I haven't dug into the PR yet to offer any opinion on naming and such. My inclination is to have you do what you need to ARO and we based the Azure Stack work around that, making any tweaks that we need.

@patrickdillon
Copy link
Contributor

/assign

@m1kola
Copy link
Member Author

m1kola commented Apr 12, 2021

My inclination is to have you do what you need to ARO and we based the Azure Stack work around that, making any tweaks that we need.

Sounds good to me.


Will look into CI failures next (tomorrow hopefully). Doesn't look like they are related to PR.

/retest

@patrickdillon
Copy link
Contributor

/retest
/test e2e-azure
/test e2e-azure-upi
/test e2e-azure-shared-vpc

Copy link
Contributor

@patrickdillon patrickdillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took an initial look and left some general comments.

As @staebler already mentioned, we need the logic for providing the clientID & secret in the cloud provider config for Azure Stack Hub support, so this PR is very helpful.

As my comment says, we already have logic for creating the secret and secret reader, so we should try to move the logic in this PR out of the cloud provider.

Have you considered any of the changes that would be required from the Terraform code, for example where we create user-assigned roles: https://github.com/openshift/installer/blob/master/data/data/azure/main.tf#L151-L170?

@@ -0,0 +1,7 @@
// +build aro
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: aro.go seems a better filename than platform_aro.go IMHO

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would break these changes in pkg/types into a separate commit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: aro.go seems a better filename than platform_aro.go IMHO

platform_aro.go is going to be built only with build tag aro and it updates the value of the variable defined in platform.go. I think platform_aro.go makes the link less cryptic.

Also it mimics the way OKD is planning to gate their code in this PR (see pkg/types/installconfig_okd.go).

I would break these changes in pkg/types into a separate commit.

+1. Will do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to split the build tag bit into a separate PR: #4874

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarification: I'm happy for the build tag to be merged as part of this PR. But if there is still feedback related to managed identity - I suggest that we merge the build tag as part of #4874 as I have few more things that depend on it.

pkg/asset/manifests/cloudproviderconfig.go Outdated Show resolved Hide resolved
Copy link
Member Author

@m1kola m1kola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickdillon thanks for the review. I'm looking into updating the PR. I left some responses to your comments below.

Have you considered any of the changes that would be required from the Terraform code, for example where we create user-assigned roles: https://github.com/openshift/installer/blob/master/data/data/azure/main.tf#L151-L170?

On ARO we do not use terraform code, so my plan is to introduce the changes we need for ARO and try not to break rest of the installer in the process :)

@@ -0,0 +1,7 @@
// +build aro
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: aro.go seems a better filename than platform_aro.go IMHO

platform_aro.go is going to be built only with build tag aro and it updates the value of the variable defined in platform.go. I think platform_aro.go makes the link less cryptic.

Also it mimics the way OKD is planning to gate their code in this PR (see pkg/types/installconfig_okd.go).

I would break these changes in pkg/types into a separate commit.

+1. Will do that.

pkg/asset/manifests/cloudproviderconfig.go Outdated Show resolved Hide resolved
@m1kola m1kola force-pushed the aro_disable_managed_identity branch 3 times, most recently from 32aedef to d76ce45 Compare April 19, 2021 08:45
@m1kola m1kola marked this pull request as ready for review April 19, 2021 08:46
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 19, 2021
Copy link
Contributor

@patrickdillon patrickdillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am on PTO today so just took a quick look. I can try to help with this more tomrrow or later this week.

pkg/asset/manifests/openshift.go Outdated Show resolved Hide resolved
pkg/asset/manifests/openshift.go Outdated Show resolved Hide resolved
@m1kola m1kola force-pushed the aro_disable_managed_identity branch 2 times, most recently from 2f474cb to 90b778a Compare April 20, 2021 14:02
@m1kola
Copy link
Member Author

m1kola commented Apr 20, 2021

@patrickdillon @staebler please take another look.

I updated naming and replaced yaml mrashaling with templates as requested. I left generation in pkg/asset/manifests/openshift.go for now, but I'm happy to move it back to pkg/asset/manifests/cloudproviderconfig.go as in the original version of this PR, if you think it is more appropriate place for it.

@@ -69,6 +69,9 @@ func (o *Openshift) Dependencies() []asset.Asset {
&openshift.PrivateClusterOutbound{},
&openshift.BaremetalConfig{},
new(rhcos.Image),
&openshift.AzureCloudProviderSecret{},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickdillon an issue with the appraoch of using templates is this: we have to declare templates as dependencies and store will call Generate() on them even if we don't need them (e.g. non-ARO builds).

What do you think? Is it something we should worry about?

We can probably use build tags and create two extra files in this package:

  1. One with // +build aro where we include them as dependencies
  2. One with // +build !aro where we do not include them

But I don't want to rely on build tags too much in this case:

  1. It will make the code harder to follow
  2. It will probably be unnecessary once you implement Azure Stack Hub support

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going with the template approach, I was thinking to just add them to the existing templates. Here's a simplified example: master...patrickdillon:azure-merge-cloud-config We would need to add a boolean field to AzureCredsSecretData to switch this behavior on or off depending on ASH/ARO.

Also, not sure if it is valid to have the values of the cloud provider config secret be base64 encoded. I assume for the existing code, the cloud credential operator knows to base64 decode the values, but I assume that is up to the client. I don't see anything in the azure docs about this. If we don't know, we can test and see.

I would like to get @staebler's feedback about whether this is the right direction with templates. To summarize the context: In the case of ARO & ASH we need to create a secret (with the creds) , role, and role binding. I see a few options:

  1. Use the template approach, like in my example above
  2. Generate all 3 resources in the cloud provider config, which was the original state of this PR
  3. Some combination of the above
  4. Generate the resource manifests somewhere else entirely

I'm pretty neutral on these options with a slight leaning toward option 1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickdillon thanks for the feedback. Replied to some of the comments.

If we're going with the template approach, I was thinking to just add them to the existing templates. Here's a simplified example: master...patrickdillon:azure-merge-cloud-config We would need to add a boolean field to AzureCredsSecretData to switch this behavior on or off depending on ASH/ARO.

I was thinking about reusing data/data/manifests/openshift/cloud-creds-secret.yaml.template, but it is already overloaded with logic and I didn't want to make it worse. Happy to do this if you think it is a better appraoch.

Also, not sure if it is valid to have the values of the cloud provider config secret be base64 encoded. I assume for the existing code, the cloud credential operator knows to base64 decode the values, but I assume that is up to the client. I don't see anything in the azure docs about this. If we don't know, we can test and see.

We must encode values into base64 manually here because:

  • We add them into Secret's data:

    The values for all keys in the data field have to be base64-encoded strings.

  • We use templates. Normaly when marshaling into json/yaml from a go struct it will be done implicitly due to the fact that data is defined as follows:

    Data map[string][]byte `json:"data,omitempty" protobuf:"bytes,2,rep,name=data"`

    And []byte in go marshaled as base64-encoded string.

    Array and slice values encode as JSON arrays, except that []byte encodes as a base64-encoded string, and a nil slice encodes as the null JSON value.

On the receiving end, the cloud provider doesn't have to decode the value explicilty here: go will do it behind the scenes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about reusing data/data/manifests/openshift/cloud-creds-secret.yaml.template, but it is already overloaded with logic and I didn't want to make it worse. Happy to do this if you think it is a better appraoch.

As I look at this more closely I do think this current approach is on the right track. I think the approach should very closely follow the pattern here:
https://github.com/openshift/installer/blob/master/pkg/asset/manifests/openshift.go#L253-L259

we have to declare templates as dependencies and store will call Generate() on them even if we don't need them (e.g. non-ARO builds).

So yes this is fine to have a dependency generated even when it is not used. The above example is only for private clusters, but is generated for all clusters. But you should consolidate all of these Azure Cloud Provider dependencies into a single asset, which in the above example would be equivalent to private-cluster-outbound-service.go. Note that the one asset could have many templates; so you could either have a separate template for each resource or combine the three resources in a single template.

If this is unclear or your have questions let me know. I think this is the right drection!

Also, thanks for the detailed, awesome explanation of why secret values are b64 encoded. I didn't know that!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickdillon I left 3 template files, but consolidated them into one assset. Now the naming is little bit tricky: I called it AzureCloudProviderSecret, but it contains the role, the role binding as well as the secret which might be a little bit confusing. Suggestions are welcome :)

I'm also happy to put all yamls into one file, if you like. Having in separate templates/manifests is a tiny bit better for visibility, IMO. But I have no strong prefrences.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is looking good. Have you tested this? I hope to test it soon with Azure Stack.

I called it AzureCloudProviderSecret... Suggestions are welcome :)

Maybe AzureMergedCloudConfig?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickdillon

Have you tested this?

I did openshift-install create manifests with aro build tag and manifests look good. I alos did it with a binary compiled without aro build tag and relevant manifests were not present in the output directory, as expected.

And I'm pretty confident that it will work with ARO because I tested it with older version of this PR (we did not change outputs significantly in recent changes). I'll ARO integration tomorrow, but it should not block this PR.

Maybe AzureMergedCloudConfig?

I was wonder if we can better reflect the fact that it is not only secret, but role and role binding as well.
Let's probably keep it as AzureCloudProviderSecret as it consists of resource name azure-cloud-provider and resource type secret. Resource name is forced on us by cloud provider.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also happy to put all yamls into one file, if you like. Having in separate templates/manifests is a tiny bit better for visibility, IMO. But I have no strong prefrences.

Leave it as one file for each resource. The CVO has (or had) issues with multiple resources in a single manifest when there are problems applying some of the resources.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must encode values into base64 manually here because:

We add them into Secret's data:

Use stringData instead of data. Then you don't need to do the base-64 encoding. The API server will do the base-64 encoding for you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know that stringData merges into data.

@staebler I updated the PR to use stringData. Note that everything else in pkg/asset/manifests/openshift.go explicitly encodes into base64. I'm happy to revert back to explicit encoding for consistency, if you wish.

* Adds a new aro build tag which is going to be used by ARO when vendoring the installer
* Adds IsARO() method to gate ARO-only modifications
@m1kola m1kola force-pushed the aro_disable_managed_identity branch from 90b778a to 888a21a Compare April 22, 2021 14:06
@m1kola m1kola force-pushed the aro_disable_managed_identity branch from 888a21a to 8b2f210 Compare April 22, 2021 15:06
m1kola pushed a commit to jewzaam/installer-aro that referenced this pull request May 17, 2021
At the moment OCP on Azure uses MSI for kubelets and controllers and one or
more service principals for operators.  For now on ARO, simplify to all
components using the user-provided SP.  Later, we'll reinstate a separate
managed identity at least for worker kubelets.

PR: openshift#4843
m1kola pushed a commit to jewzaam/installer-aro that referenced this pull request May 17, 2021
At the moment OCP on Azure uses MSI for kubelets and controllers and one or
more service principals for operators.  For now on ARO, simplify to all
components using the user-provided SP.  Later, we'll reinstate a separate
managed identity at least for worker kubelets.

PR: openshift#4843
m1kola pushed a commit to jewzaam/installer-aro that referenced this pull request May 17, 2021
At the moment OCP on Azure uses MSI for kubelets and controllers and one or
more service principals for operators.  For now on ARO, simplify to all
components using the user-provided SP.  Later, we'll reinstate a separate
managed identity at least for worker kubelets.

PR: github.com/openshift/pull/4843
m1kola pushed a commit to jewzaam/installer-aro that referenced this pull request May 17, 2021
At the moment OCP on Azure uses MSI for kubelets and controllers and one or
more service principals for operators.  For now on ARO, simplify to all
components using the user-provided SP.  Later, we'll reinstate a separate
managed identity at least for worker kubelets.

PR: openshift#4843
m1kola pushed a commit to jewzaam/installer-aro that referenced this pull request May 21, 2021
At the moment OCP on Azure uses MSI for kubelets and controllers and one or
more service principals for operators.  For now on ARO, simplify to all
components using the user-provided SP.  Later, we'll reinstate a separate
managed identity at least for worker kubelets.

PR: openshift#4843
@patrickdillon
Copy link
Contributor

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 17, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 17, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: patrickdillon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 17, 2021
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

10 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 20, 2021

@m1kola: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-azure-upi 1402758 link /test e2e-azure-upi

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 63ac266 into openshift:master Jun 20, 2021
@m1kola m1kola deleted the aro_disable_managed_identity branch June 21, 2021 09:29
m1kola pushed a commit to m1kola/installer that referenced this pull request Aug 3, 2021
At the moment OCP on Azure uses MSI for kubelets and controllers and one or
more service principals for operators.  For now on ARO, simplify to all
components using the user-provided SP.  Later, we'll reinstate a separate
managed identity at least for worker kubelets.

PR: openshift#4843
m1kola pushed a commit to jewzaam/installer-aro that referenced this pull request Aug 25, 2021
At the moment OCP on Azure uses MSI for kubelets and controllers and one or
more service principals for operators.  For now on ARO, simplify to all
components using the user-provided SP.  Later, we'll reinstate a separate
managed identity at least for worker kubelets.

PR: openshift#4843
m1kola pushed a commit to jewzaam/installer-aro that referenced this pull request Sep 6, 2021
At the moment OCP on Azure uses MSI for kubelets and controllers and one or
more service principals for operators.  For now on ARO, simplify to all
components using the user-provided SP.  Later, we'll reinstate a separate
managed identity at least for worker kubelets.

PR: openshift#4843
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants