Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maistra 2.2 istio-csr support #890

Conversation

SpectralHiss
Copy link

@SpectralHiss SpectralHiss commented Mar 21, 2022

This PR is still work in progress! it aims to provide the cert-manager certificate authority method to Maistra through a high level convenient API.

It is essentially simplifying the mesh configuration with istio-csr through exposing few simple parameters like ca address and pilot secret.
The pilot secret mounting for the serving cert has presented many difficulties and was the initial reason this was not possible to do natively in Maistra. Maistra is an operator for the helm chart and doesn't provide the same overlays method that IstioOperator API provides, so we had to supply these parameters ourselves. original issue: https://issues.redhat.com/browse/MAISTRA-2592
While we wait for more native support for the helm chart in istio/istio#37264 , we missed the likely istio version target for this 2.2 release and instead expose the exact same parameters through a sed patch of the helm chart.

Todo:

  • test rebase on 2.2 works as did 2.1 with the basic parameters.
  • add CA configmap (trustbundle) to possible configs to allow more flexibility and avoid Trust On First Use which currently leverages cert-manager secret ca.crt key and is less secure. this should default to the istio-csr default configmap istio-ca-root-configmap
  • add unit tests to CRD to helm conversion package
  • Update this description with final design doc and motivation and website documentation.

@maistra-bot
Copy link
Contributor

Hi @SpectralHiss. Thanks for your PR.

I'm waiting for a maistra member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@SpectralHiss
Copy link
Author

Q for @luksa, i saw that you just recently updated the scaffolding for the 2.2 release, I have rebased on top of that and the makefile build works 🥳
but when deploying the operator and adding a servicemeshcontrol plane the 2.2.0 declared (olm) images are not in the redhat registry yet causing ErrPullImg errors, when I tried to use the 2.1.1 image instead I think I got some crashes which is probably due to the incompatibility of 2.2 chart.

If these tags are cut later in the release process, is there another 2.2 branch "build" version so I can progress with testing the operator through patching the olm related manifests?

Thanks in advance 😄

@SpectralHiss
Copy link
Author

SpectralHiss commented Apr 4, 2022

Never mind i think i can just build my own SNAPSHOT builds and patch until we get to the actual release cutting.. I think I cannot run build the rhel image from the opensource maistra/istio repo, hope the centos 8 image will do 🤔

@SpectralHiss
Copy link
Author

Managed to test ok on 2.2 however there were some hickups with some empty envs for mutatingwebhookconfigname and validatingwebhookconfigname as well as some rbac, I will be checking that closely on integration time

@SpectralHiss SpectralHiss force-pushed the hef/maistra-2.2-cert-manager-istio-csr-support branch from 8f82caa to dae00ba Compare April 5, 2022 13:45
@SpectralHiss
Copy link
Author

So i've added all the required unit tests for conversion from securityconfig -> helm and the reverse path, the particular unit test is passing:

ok      github.com/maistra/istio-operator/pkg/apis/external     (cached)
ok      github.com/maistra/istio-operator/pkg/apis/maistra/conversion   1.252s
ok      github.com/maistra/istio-operator/pkg/apis/maistra/v1   (cached)

I am getting some other go code warnings in zz_generate deepcopy stuff.. i am not sure how relevant those are, it builds end to end on the cluster except for some webhookconfig env vars that are set to nil (which I assume is a WIP on maistra-2.2).
the env vars in question are VALIDATION_WEEBHOOK_CONFIG_NAME and INJECTION_WEBHOOK_CONFIG_NAME.
It looks like there is some maistra specific webhook patches being implemented somewhere, this would also explain why pilot was missing the RBAC to be able to patch webhooks (causing CA unknown authority) when the RBAC is missing.

So assuming these are all upstream wip, I guess this is ready to start having a first past?
Once all points are cleared up there is a few things (like olm related image patches and deploy-prototype folder) I used in development I will need to clean up.. I am keeping them in interim in case i need to investigate something while addressing any other comments.

I will be doing a website PR explaining the new field later, I am off rest of this week and next, but please let me know of any release related tasks I need to do regardless!.

@SpectralHiss SpectralHiss changed the title WIP: maistra 2.2 istio-csr support Maistra 2.2 istio-csr support Apr 6, 2022
@dgn
Copy link
Contributor

dgn commented Apr 8, 2022

/ok-to-test

@@ -0,0 +1,2040 @@
//go:build !ignore_autogenerated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this file about?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry i didn't realise i checked these in

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strange, my CI failure is saying:
"ERROR: Some files need to be updated, please run 'make gen' and include any changed files in your PR"
And make gen creates those files, what's going on here 😕

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cleared what seems like rubish from a stray go build run.. but it's still erroring on deep copy methods derived from new struct fields .. i am missing something about the workflow here I believe.

@dgn
Copy link
Contributor

dgn commented Apr 8, 2022

Only seeing this now. Are you planning on reopening istio/istio#37264?

@SpectralHiss
Copy link
Author

Only seeing this now. Are you planning on reopening istio/istio#37264?

Yeah good point, the PR fell inactive but in principle, all PR comments by the core team were addressed, it's good for us to get it in later k8s so we don't have to maintain the sed patch!
I will poke reviewers on that PR and mention this issue.

@SpectralHiss SpectralHiss force-pushed the hef/maistra-2.2-cert-manager-istio-csr-support branch from 1a53609 to e90e5a2 Compare April 20, 2022 09:42
@maistra-bot
Copy link
Contributor

maistra-bot commented Apr 21, 2022

@SpectralHiss: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
istio-operator_gen-check-2.2 1834fce link true /test istio-operator_gen-check-2.2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@SpectralHiss
Copy link
Author

SpectralHiss commented Apr 21, 2022

@dgn I don't see how the gen_check step can pass since it's checking out maistra-2.2 and doing a diff check after gen which is not going to be the same as I introduces new security certificateauthority type.. is there a step I missed for this build?
You will see on the actual diff on the gen_check that the step failure doesn't make a lot of sense.

Another question I have is about support for 2.1 would I need to backport these changes on 2.1 branch as well? i had a conversation with Jamie Longmuir (paraphrasing slightly)
"I am CCing our Maistra eng lead Rob who mentioned your PR today and can provide more updates on that.

We will be into our QE cycle before the 2.2 release. If this can't make that release, we will likely need to do a 2.2.1 and 2.1.3 minor release not too long afterward."

mentioning @rcernich

@@ -10698,7 +10716,8 @@ spec:
serviceAccountName: istio-operator
containers:
- name: istio-operator
image: quay.io/maistra/istio-ubi8-operator:2.2.0
#image: quay.io/maistra/istio-ubi8-operator:2.2.0
image: maistradev.azurecr.io/istio-operator:latest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be removed before it can be merged.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

extraArgs[2] = "--caCertFile=/etc/cert-manager/ca/root-cert.pem"
} else {
fmt.Printf(`Warning! using istiod-tls ca.crt key as root of trust for mesh,\
this will likely work but is not a recomended Trust On First Use pattern. Set rootCAConfigMapName to 'istio-ca-root-cert' instead`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something that we'd want to validate, i.e. prevent the user from creating an invalid SMCP?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It technically isn't incorrect but it's a less secure configuration, I am okay to enforce this as a hard gate if you'd like

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries then.

// .Values.global.caAddress
Address string `json:"address,omitempty"`
PilotCertSecretName string `json:"pilotSecretName,omitempty"`
RootCAConfigMapName string `json:"rootCAConfigMapName,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be easier to use the default cacerts? Is the issue that the root CA gets configured into a ConfigMap resource instead of a Secret? Seems like if the Certificate were configured to use cacerts as the secret, everything would work without the extra volume mounts.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trouble is that the cacerts configmap is designed to load an external plugin CA root (private key and cert) which is then used by pilot to sign all future csr including it's own serving cert., this is a less secure version of what istio-csr does. I believe It's not possible to have istio-csr work using that parameter

The parameters above are needed by istio-csr, it assumes an existing external CA issuer (handled by cert-manager), but still needs to manually inject the pilot server serving cert. which has already been signed by the external CA (istio-csr handles this automatically on chart install). This is currently the only way to install istio-csr in the upstream istio too.

We do not use any injected CA to sign any new secrets but just as a trust bundle anchor, every new cert when signed goes directly to our istio-csr server for signing where it's paired with a cert-manager issuer.

here's a process diagram of a CSR during execution time:

renewal-istio-csr-certificate (1)

During configuration what happens when you set cacerts is that pilot starts to generate their own signing cert, assuming basically that you're still using citadel for signing.
I've spent a decent amount of time trying to hack around the cacerts configmap along with the existing "custom" method to fit our use case to no avail.
I am pretty confident this is the only way but happy to be wrong on this!

If you still think this is wrong could you care to explain to me at a high level how each component will be configured? ie. what fields will be set in cacerts and roughly how the istio-csr config would work?

For more details on overall design options we explored: https://docs.google.com/document/d/1CwACez6pFPi_z3CFY1bB4HwIbwT3VQOz_NDUn3cQqv0/edit#

@rcernich
Copy link
Contributor

rcernich commented May 2, 2022

Regarding version support, the operator is designed to support multiple install versions, so having this in 2.2 would still support users running 2.1 control planes. That said, the charts are version specific, so there may be some work to do there. I added a comment about the need to have volume mounts for the certs/keys and was wondering if using the default cacerts named secret might make things simpler, at least initially, as it wouldn't require any chart changes.

@SpectralHiss
Copy link
Author

SpectralHiss commented May 3, 2022

Regarding version support, the operator is designed to support multiple install versions, so having this in 2.2 would still support users running 2.1 control planes. That said, the charts are version specific, so there may be some work to do there. I added a comment about the need to have volume mounts for the certs/keys and was wondering if using the default cacerts named secret might make things simpler, at least initially, as it wouldn't require any chart changes.

I had initially also implemented this for 2.1, it should be possible to retrofit the sed hacks to older charts with minimal effort but does that break any release assumptions?

As for the code, I am confident that we can implement pretty much exactly the same code and add this as part of a new 2.1 patch release. I am happy to start porting, although I have been slightly guessing at the release process and would like a bit more guidance if at all possible.
Thanks for the initial review!

@rcernich
Copy link
Contributor

Hey @SpectralHiss, thanks for working on this and for being patient with us. This looks pretty good to me. I only have two (hopefully small) requests, and one question:

  1. Could you please retarget this for maistra-2.3?
  2. Could you please revert the image names for the operator (I assume you modified these for local testing)?
  3. Once Add new well known pilot dns TLS cert loading - making it zero config for istio-csr installation istio/istio#39375 is merged, would the chart changes here become moot? What maintenance would be required after that is merged?

Hopefully, we can get this merged in the next week or so.

Thanks again!

@SpectralHiss
Copy link
Author

Hey @SpectralHiss, thanks for working on this and for being patient with us. This looks pretty good to me. I only have two (hopefully small) requests, and one question:

  1. Could you please retarget this for maistra-2.3?
  2. Could you please revert the image names for the operator (I assume you modified these for local testing)?
  3. Once Add new well known pilot dns TLS cert loading - making it zero config for istio-csr installation istio/istio#39375 is merged, would the chart changes here become moot? What maintenance would be required after that is merged?

Hopefully, we can get this merged in the next week or so.

Thanks again!

Hi ,

great to see you're still keeping a close eye on this, I will look actively at porting the changes to 2-3 next week, when is the code-freeze date for that and which upstream version is it targeting?
I think the code in Istio/istio will land in Istio 1.16 or 1.17 at the most, so the helm changes will only be relevant until that is released, once it is released then the operator code we added will just provide a user-friendly interface to set a couple of env. variables/configurations.

@rcernich
Copy link
Contributor

Hey @SpectralHiss, I've rebased this against 2.3 and have made the changes above. I just need to get some eyes on it and hopefully we can get it merged next week (or sooner). Thanks again for the contribution, and sorry it's taken so long for us to get back to you.

Here's the other PR. We can go with that if you want, or you can update this one. Your call.

@openshift-ci
Copy link

openshift-ci bot commented Sep 27, 2022

@SpectralHiss: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/istio-operator-unit-2-2 1834fce link true /test istio-operator-unit-2-2
ci/prow/istio-operator-gencheck-2-2 1834fce link true /test istio-operator-gencheck-2-2

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@SpectralHiss
Copy link
Author

Hey @SpectralHiss, I've rebased this against 2.3 and have made the changes above. I just need to get some eyes on it and hopefully we can get it merged next week (or sooner). Thanks again for the contribution, and sorry it's taken so long for us to get back to you.

Here's the other PR. We can go with that if you want, or you can update this one. Your call.

Hey again, be advised that the upstream istio pr has been finally merged 🥳 istio/istio#39375 , I will check which version this will be cut in but i think it's 1.17 (maybe 1.16) , once you'll be targeting that version there will be some simplifications to make to the chart overlay code, so we need to monitor closely for that PR!

@SpectralHiss
Copy link
Author

SpectralHiss commented Sep 30, 2022

Happy to close this PR, as the work was ported and merged here #1005,
i think we should have a discussion on how to best coordinate the future releases dependencies as it will get a little bit confusing :)

@rcernich
Copy link
Contributor

Sounds good. We'll make a note to make the necessary changes when we update onto 1.16/17.

Thanks @SpectralHiss for the contribution, and I'm sorry that it took us so long to get it merged. Thanks for sticking with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants