Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 2012173: Azure Stack: Add UPI Instructions for internal CA #5573

Merged

Conversation

patrickdillon
Copy link
Contributor

Many Azure Stack environments use internal CAs. In these cases
special steps are needed for a UPI install.

Many Azure Stack environments use internal CAs. In these cases
special steps are needed for a UPI install.
```sh
export BOOTSTRAP_URL=$(az storage blob url --account-name "${INFRA_ID}sa" --account-key "$ACCOUNT_KEY" -c "files" -n "bootstrap.ign" -o tsv)
export BOOTSTRAP_IGNITION=$(jq -rcnM --arg v "3.2.0" --arg url "$BOOTSTRAP_URL" '{ignition:{version:$v,config:{replace:{source:$url}}}}' | base64 | tr -d '\n')
```

### Create the Bootstrap Ignition Shim with an Internal Certificate Authority (Optional)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickdillon Is this step specific to a UPI install? Given that users will not have to create ignition config files for the cluster, I wanted to verify creating the bootstrap ignition shim is not required for IPI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed specific for UPI. For IPI, we do this in the installer's code.

Copy link

@nastacio nastacio Jan 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickdillon The ignition spec says the tls.certificateAuthorities[].source element is a URL. I trust that the example below works, but wondering if it there is a chance this is not supported.

source (string): the URL of the contents to append. Supported schemes are http, https, tftp, s3, gs, and data. When using http, it is advisable to use the verification option to ensure the contents haven’t been modified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is valid and supported. The example uses a data url: https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URIs

@patrickdillon
Copy link
Contributor Author

/close

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 24, 2022

@patrickdillon: Closed this PR.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot closed this Jan 24, 2022
@patrickdillon
Copy link
Contributor Author

Woops. Closed the wrong pr

@patrickdillon patrickdillon reopened this Jan 25, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 26, 2022

@patrickdillon: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-workers-rhel8 199bfbc link false /test e2e-aws-workers-rhel8
ci/prow/e2e-aws-single-node 199bfbc link false /test e2e-aws-single-node
ci/prow/e2e-alibaba 199bfbc link true /test e2e-alibaba

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@nastacio
Copy link

nastacio commented Jan 26, 2022

@patrickdillon, I am following these instructions (thanks, by the way, they got me further than the product guide) , using a private CA and installing OCP 4.9.15.

I set the name: user-ca-bundle field in "${INSTALL_DIR}/manifests/cluster-proxy-01-config.yaml, as instructed, but I am seeing one of those X509 errors in container csi-driver of pod azure-disk-csi-driver-node-xxxxx, which prevents the storage operator from completely starting.

(currently double-checking all certs, but thinking if I had them wrong I would not even get past the stage of creating master/worker nodes)

Update on Jan/27: According to this technote, the cluster-wide proxy settings are not applied to user created application pods, and I guess the Azure CSI driver pods count as "user created". Unfortunately, the technote only talks about setting the URL of the proxy, but does not say anything about setting the user CA truststore for the proxy (if it is at all mapped to an environment variable.)

 oc get Proxy cluster -o yaml
apiVersion: config.openshift.io/v1
kind: Proxy
metadata:
  creationTimestamp: "2022-01-26T03:10:15Z"
  generation: 1
  name: cluster
  resourceVersion: "613"
  uid: e94f5c62-62f0-48af-bb73-e31fcf651a2b
spec:
  trustedCA:
    name: user-ca-bundle
status: {}
F0126 17:58:32.887515       1 azuredisk.go:192] failed to get Azure Cloud Provider, error: Get "https://management.mtcazs.wwtatc.com/metadata/endpoints?api-version=1.0": x509: certificate signed by unknown authority

Full stretch of logs from that container:

I0126 17:58:32.844757       1 main.go:101] set up prometheus server on [::]:29604
I0126 17:58:32.845141       1 azuredisk.go:189]
DRIVER INFORMATION:
-------------------
Build Date: "2021-12-15T01:32:34Z"
Compiler: gc
Driver Name: disk.csi.azure.com
Driver Version: v1.5.0
Git Commit: 6a6d0e33d844794cffbaf0874ca8c17928775673
Go Version: go1.16.6
Platform: linux/amd64
Topology Key: topology.disk.csi.azure.com/zone

Streaming logs below:
I0126 17:58:32.847726       1 azure.go:62] reading cloud config from secret
E0126 17:58:32.856348       1 azure_config.go:45] Failed to get cloud-config from secret: failed to get secret kube-system/azure-cloud-provider: secrets "azure-cloud-provider" is forbidden: User "system:serviceaccount:openshift-cluster-csi-drivers:azure-disk-csi-driver-node-sa" cannot get resource "secrets" in API group "" in the namespace "kube-system"
I0126 17:58:32.856369       1 azure.go:65] InitializeCloudFromSecret failed with error: InitializeCloudFromSecret: failed to get cloud config from secret kube-system/azure-cloud-provider: failed to get secret kube-system/azure-cloud-provider: secrets "azure-cloud-provider" is forbidden: User "system:serviceaccount:openshift-cluster-csi-drivers:azure-disk-csi-driver-node-sa" cannot get resource "secrets" in API group "" in the namespace "kube-system"
I0126 17:58:32.856374       1 azure.go:70] could not read cloud config from secret
I0126 17:58:32.856380       1 azure.go:73] AZURE_CREDENTIAL_FILE env var set as /etc/kubernetes/cloud.conf
I0126 17:58:32.856398       1 azure.go:92] read cloud config from file: /etc/kubernetes/cloud.conf successfully
F0126 17:58:32.887515       1 azuredisk.go:192] failed to get Azure Cloud Provider, error: Get "https://management.mtcazs.wwtatc.com/metadata/endpoints?api-version=1.0": x509: certificate signed by unknown authority
goroutine 1 [running]:
k8s.io/klog/v2.stacks(0xc000010001, 0xc000166200, 0xd8, 0x1f5)
	/go/src/github.com/openshift/azure-disk-csi-driver/vendor/k8s.io/klog/v2/klog.go:1021 +0xb9
k8s.io/klog/v2.(*loggingT).output(0x2bca440, 0xc000000003, 0x0, 0x0, 0xc00049e230, 0x23ac9ef, 0xc, 0xc0, 0x0)
	/go/src/github.com/openshift/azure-disk-csi-driver/vendor/k8s.io/klog/v2/klog.go:970 +0x191
k8s.io/klog/v2.(*loggingT).printf(0x2bca440, 0xc000000003, 0x0, 0x0, 0x0, 0x0, 0x1d619c5, 0x2d, 0xc0006c8450, 0x1, ...)
	/go/src/github.com/openshift/azure-disk-csi-driver/vendor/k8s.io/klog/v2/klog.go:751 +0x191
k8s.io/klog/v2.Fatalf(...)
	/go/src/github.com/openshift/azure-disk-csi-driver/vendor/k8s.io/klog/v2/klog.go:1509
sigs.k8s.io/azuredisk-csi-driver/pkg/azuredisk.(*Driver).Run(0xc0002c6700, 0x7ffdb0dc038b, 0x14, 0x0, 0x0, 0x1f80001)
	/go/src/github.com/openshift/azure-disk-csi-driver/pkg/azuredisk/azuredisk.go:192 +0x366
main.handle()
	/go/src/github.com/openshift/azure-disk-csi-driver/pkg/azurediskplugin/main.go:87 +0x130
main.main()
	/go/src/github.com/openshift/azure-disk-csi-driver/pkg/azurediskplugin/main.go:69 +0xae

goroutine 6 [chan receive]:
k8s.io/klog/v2.(*loggingT).flushDaemon(0x2bca440)
	/go/src/github.com/openshift/azure-disk-csi-driver/vendor/k8s.io/klog/v2/klog.go:1164 +0x8b
created by k8s.io/klog/v2.init.0
	/go/src/github.com/openshift/azure-disk-csi-driver/vendor/k8s.io/klog/v2/klog.go:418 +0xdf
	...

@nastacio
Copy link

nastacio commented Jan 26, 2022

Something else I noticed is that, of the 5 secrets manually created and added to the "openshift/manifests" directory before invoking openshift-install create ignition-configs --dir "${INSTALL_DIR}", only 2 of them seem to have been applied to the cluster. I had to apply them manually after the master/worker nodes came up. See output below as evidence that some of them were not applied.

Note that I read through https://docs.openshift.com/container-platform/4.9/installing/installing_azure/manually-creating-iam-azure.html and did not have credentialsMode: Manual in install-config.yaml.

 apply -f manifests/credentials-secret.yaml 
Warning: resource secrets/azure-cloud-credentials is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by oc apply. oc apply should only be used on resources created declaratively by either oc create --save-config or oc apply. The missing annotation will be patched automatically.
secret/azure-cloud-credentials configured
secret/installer-cloud-credentials configured
secret/cloud-credentials created
secret/azure-cloud-credentials created
secret/azure-disk-credentials created

@patrickdillon
Copy link
Contributor Author

@patrickdillon, I am following these instructions (thanks, by the way, they got me further than the product guide) , using a private CA and installing OCP 4.9.15.

I set the name: user-ca-bundle field in "${INSTALL_DIR}/manifests/cluster-proxy-01-config.yaml, as instructed, but I am seeing one of those X509 errors in container csi-driver of pod azure-disk-csi-driver-node-xxxxx, which prevents the storage operator from completely starting.

(currently double-checking all certs, but thinking if I had them wrong I would not even get past the stage of creating master/worker nodes)

Update on Jan/27: According to this technote, the cluster-wide proxy settings are not applied to user created application pods, and I guess the Azure CSI driver pods count as "user created". Unfortunately, the technote only talks about setting the URL of the proxy, but does not say anything about setting the user CA truststore for the proxy (if it is at all mapped to an environment variable.)

 oc get Proxy cluster -o yaml
apiVersion: config.openshift.io/v1
kind: Proxy
metadata:
  creationTimestamp: "2022-01-26T03:10:15Z"
  generation: 1
  name: cluster
  resourceVersion: "613"
  uid: e94f5c62-62f0-48af-bb73-e31fcf651a2b
spec:
  trustedCA:
    name: user-ca-bundle
status: {}
F0126 17:58:32.887515       1 azuredisk.go:192] failed to get Azure Cloud Provider, error: Get "https://management.mtcazs.wwtatc.com/metadata/endpoints?api-version=1.0": x509: certificate signed by unknown authority

Full stretch of logs from that container:

I0126 17:58:32.844757       1 main.go:101] set up prometheus server on [::]:29604
I0126 17:58:32.845141       1 azuredisk.go:189]
DRIVER INFORMATION:
-------------------
Build Date: "2021-12-15T01:32:34Z"
Compiler: gc
Driver Name: disk.csi.azure.com
Driver Version: v1.5.0
Git Commit: 6a6d0e33d844794cffbaf0874ca8c17928775673
Go Version: go1.16.6
Platform: linux/amd64
Topology Key: topology.disk.csi.azure.com/zone

Streaming logs below:
I0126 17:58:32.847726       1 azure.go:62] reading cloud config from secret
E0126 17:58:32.856348       1 azure_config.go:45] Failed to get cloud-config from secret: failed to get secret kube-system/azure-cloud-provider: secrets "azure-cloud-provider" is forbidden: User "system:serviceaccount:openshift-cluster-csi-drivers:azure-disk-csi-driver-node-sa" cannot get resource "secrets" in API group "" in the namespace "kube-system"
I0126 17:58:32.856369       1 azure.go:65] InitializeCloudFromSecret failed with error: InitializeCloudFromSecret: failed to get cloud config from secret kube-system/azure-cloud-provider: failed to get secret kube-system/azure-cloud-provider: secrets "azure-cloud-provider" is forbidden: User "system:serviceaccount:openshift-cluster-csi-drivers:azure-disk-csi-driver-node-sa" cannot get resource "secrets" in API group "" in the namespace "kube-system"
I0126 17:58:32.856374       1 azure.go:70] could not read cloud config from secret
I0126 17:58:32.856380       1 azure.go:73] AZURE_CREDENTIAL_FILE env var set as /etc/kubernetes/cloud.conf
I0126 17:58:32.856398       1 azure.go:92] read cloud config from file: /etc/kubernetes/cloud.conf successfully
F0126 17:58:32.887515       1 azuredisk.go:192] failed to get Azure Cloud Provider, error: Get "https://management.mtcazs.wwtatc.com/metadata/endpoints?api-version=1.0": x509: certificate signed by unknown authority
goroutine 1 [running]:
k8s.io/klog/v2.stacks(0xc000010001, 0xc000166200, 0xd8, 0x1f5)
	/go/src/github.com/openshift/azure-disk-csi-driver/vendor/k8s.io/klog/v2/klog.go:1021 +0xb9
k8s.io/klog/v2.(*loggingT).output(0x2bca440, 0xc000000003, 0x0, 0x0, 0xc00049e230, 0x23ac9ef, 0xc, 0xc0, 0x0)
	/go/src/github.com/openshift/azure-disk-csi-driver/vendor/k8s.io/klog/v2/klog.go:970 +0x191
k8s.io/klog/v2.(*loggingT).printf(0x2bca440, 0xc000000003, 0x0, 0x0, 0x0, 0x0, 0x1d619c5, 0x2d, 0xc0006c8450, 0x1, ...)
	/go/src/github.com/openshift/azure-disk-csi-driver/vendor/k8s.io/klog/v2/klog.go:751 +0x191
k8s.io/klog/v2.Fatalf(...)
	/go/src/github.com/openshift/azure-disk-csi-driver/vendor/k8s.io/klog/v2/klog.go:1509
sigs.k8s.io/azuredisk-csi-driver/pkg/azuredisk.(*Driver).Run(0xc0002c6700, 0x7ffdb0dc038b, 0x14, 0x0, 0x0, 0x1f80001)
	/go/src/github.com/openshift/azure-disk-csi-driver/pkg/azuredisk/azuredisk.go:192 +0x366
main.handle()
	/go/src/github.com/openshift/azure-disk-csi-driver/pkg/azurediskplugin/main.go:87 +0x130
main.main()
	/go/src/github.com/openshift/azure-disk-csi-driver/pkg/azurediskplugin/main.go:69 +0xae

goroutine 6 [chan receive]:
k8s.io/klog/v2.(*loggingT).flushDaemon(0x2bca440)
	/go/src/github.com/openshift/azure-disk-csi-driver/vendor/k8s.io/klog/v2/klog.go:1164 +0x8b
created by k8s.io/klog/v2.init.0
	/go/src/github.com/openshift/azure-disk-csi-driver/vendor/k8s.io/klog/v2/klog.go:418 +0xdf
	...

@nastacio thank you for your thoughtful comments. The azure-csi-disk-driver issue was fixed in 4.9.17. If you switch to the newest release image and then do the same install as before, you should not hit this issue.

@patrickdillon
Copy link
Contributor Author

Something else I noticed is that, of the 5 secrets manually created and added to the "openshift/manifests" directory before invoking openshift-install create ignition-configs --dir "${INSTALL_DIR}", only 2 of them seem to have been applied to the cluster. I had to apply them manually after the master/worker nodes came up. See output below as evidence that some of them were not applied.

Any manifests that are in the dir should be applied "dumbly" on the bootstrap node. You would need to check the logs there to see if they were actually applied. The first bit of troubleshooting I would do is to actually make sure the secrets you are creating are unique (for example you might be creating multiple secrets with the same name, ask me how I know). Are you scripting this step? Hopefully we will get Azure support for the ccotool to make this step easier for users.

Note that I read through https://docs.openshift.com/container-platform/4.9/installing/installing_azure/manually-creating-iam-azure.html and did not have credentialsMode: Manual in install-config.yaml.

This should not make a difference/cause any problems. I believe we are choosing a sane default when Azure Stack is specified. I will admit we are working to improve the credentials usability.

@nastacio
Copy link

nastacio commented Jan 27, 2022

The first bit of troubleshooting I would do is to actually make sure the secrets you are creating are unique (for example you might be creating multiple secrets with the same name, ask me how I know). Are you scripting this step?

Yes:

function create_secrets() {
    cat << EOF > "${INSTALL_DIR}/manifests/credentials-secret.yaml"
---
apiVersion: v1
kind: Secret
metadata:
    name: azure-cloud-credentials
    namespace: openshift-cloud-controller-manager
stringData:
  azure_subscription_id: ${ASH_SUBSCRIPTION_ID}
  azure_client_id: ${ASH_USER}
  azure_client_secret: ${ASH_PASSWORD}
  azure_tenant_id: ${ASH_TENANT}
  azure_resource_prefix: ${INFRA_ID}
  azure_resourcegroup: ${RESOURCE_GROUP}
  azure_region: ${AZURE_REGION}
---
apiVersion: v1
kind: Secret
metadata:
    name: installer-cloud-credentials
    namespace: openshift-image-registry
stringData:
  azure_subscription_id: ${ASH_SUBSCRIPTION_ID}
  azure_client_id: ${ASH_USER}
  azure_client_secret: ${ASH_PASSWORD}
  azure_tenant_id: ${ASH_TENANT}
  azure_resource_prefix: ${INFRA_ID}
  azure_resourcegroup: ${RESOURCE_GROUP}
  azure_region: ${AZURE_REGION}
---
apiVersion: v1
kind: Secret
metadata:
    name: cloud-credentials
    namespace: openshift-ingress-operator
stringData:
  azure_subscription_id: ${ASH_SUBSCRIPTION_ID}
  azure_client_id: ${ASH_USER}
  azure_client_secret: ${ASH_PASSWORD}
  azure_tenant_id: ${ASH_TENANT}
  azure_resource_prefix: ${INFRA_ID}
  azure_resourcegroup: ${RESOURCE_GROUP}
  azure_region: ${AZURE_REGION}
---
apiVersion: v1
kind: Secret
metadata:
  name: azure-cloud-credentials
  namespace: openshift-machine-api
stringData:
  azure_subscription_id: ${ASH_SUBSCRIPTION_ID}
  azure_client_id: ${ASH_USER}
  azure_client_secret: ${ASH_PASSWORD}
  azure_tenant_id: ${ASH_TENANT}
  azure_resource_prefix: ${INFRA_ID}
  azure_resourcegroup: ${RESOURCE_GROUP}
  azure_region: ${AZURE_REGION}
---
apiVersion: v1
kind: Secret
metadata:
  name: azure-disk-credentials
  namespace: openshift-cluster-csi-drivers
stringData:
  azure_subscription_id: ${ASH_SUBSCRIPTION_ID}
  azure_client_id: ${ASH_USER}
  azure_client_secret: ${ASH_PASSWORD}
  azure_tenant_id: ${ASH_TENANT}
  azure_resource_prefix: ${INFRA_ID}
  azure_resourcegroup: ${RESOURCE_GROUP}
  azure_region: ${AZURE_REGION}
EOF
}

In fact, the fix for the problem was to run oc apply ... of that same credentials-secret.yaml output.

I double-checked the CredentialsRequest responses coming out of oc adm release extract "${release_image}" --credentials-requests --cloud=azure and these two have the same credential name, albeit in different namespaces:

  name: azure-cloud-credentials
  namespace: openshift-cloud-credential-operator
  name: azure-cloud-credentials
  namespace: openshift-machine-api

@nastacio
Copy link

The first bit of troubleshooting I would do is to actually make sure the secrets you are creating are unique (for example you might be creating multiple secrets with the same name, ask me how I know). Are you scripting this step?

I came back to this. It turns out I had left incorrect copies of the secret files in the folder, along with the correct ones.

Copy link
Contributor

@staebler staebler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 2, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 2, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: staebler

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 2, 2022
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit febfc9d into openshift:master Feb 2, 2022
@patrickdillon
Copy link
Contributor Author

/retitle Bug 2012173: Azure Stack: Add UPI Instructions for internal CA

@openshift-ci openshift-ci bot changed the title Azure Stack: Add UPI Instructions for internal CA Bug 2012173: Azure Stack: Add UPI Instructions for internal CA Feb 3, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 3, 2022

@patrickdillon: All pull requests linked via external trackers have merged:

Bugzilla bug 2012173 has been moved to the MODIFIED state.

In response to this:

Bug 2012173: Azure Stack: Add UPI Instructions for internal CA

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants