Feature/support helm in release manager #916

clemensutschig · 2022-07-01T13:49:29Z

first cut of HELM support, which also once and for all eliviates the DeploymentConfig limitations

This is still dirty! - helm variables from component pipeline are NOT in the RM

michaelsauter · 2022-07-19T13:42:22Z

Alright ... I have silently watched this for a while ... first of, kudos to the nice work!

I thought about the Helm integration on a more general note, and I'm going to throw a bit of a curve ball to you ;)

My basic question is: do you even want to make the Helm integration work like the Tailor integration?

The Tailor integration is very "Deployment(Config) centric", because:

that's what the first version of the shared lib did
operators were not a thing
it allowed easy implementation of some sanity checks and extraction of pod data to render the GxP docs

However, I think long-term you want to be in a place where you can basically deploy any Helm chart. This means that you do not know which resources are your deployment units (think CRDs from operators, like, for example, a "Cluster" resource for a Postgres cluster). So I am wondering if it wouldn't be nice to have the Helm "branch" somewhat separated from the Tailor part. In the Helm "branch", I would question which checks are really needed (e.g. the checks that Deployments are controlled by the release manager are probably not needed at all because the Helm chart cannot be exported from the DEV namespace, so by definition all things are version controlled). Or, is it really needed to collect pod data like IP addresses to put them into the GxP templates when IP addresses may change anyway? Maybe we can use other evidence instead which is better suited and easier to get?

So - just some food for thought. I am aware this would probably blow up the scope of this PR - but it may also make a few things easier that were difficult because of the current constraints?

serverhorror · 2022-07-19T17:32:15Z

I'll add another convenience method to be able to import external images.

Since we still require images to be owned by the OpenShift registry that should be quite helpful anytime people use a chart from the interwebs.

serverhorror · 2022-07-19T17:32:44Z

And, finally!, the tests pass again. Yay, I guess!

clemensutschig

@serverhorror - see the comments

src/org/ods/component/RolloutOpenShiftDeploymentStage.groovy

src/org/ods/component/CopyImageOptions.groovy

src/org/ods/component/CopyImageStage.groovy

docs/modules/jenkins-shared-library/partials/odsComponentStageCopyImage.adoc

clemensutschig

Martin - one big thing is the tests... we should write a bunch of tests all around this?

michaelsauter · 2022-07-22T06:07:07Z

From our conversation yesterday @clemensutschig:

I really really like that this solution doesn't use the latest tag any longer, but instead the "short Git Sha tag". Awesome! Do you still want to maintain a latest tag at all? BTW - this will mean the spec is updated, which automatically triggers a rollout for deployments. There is some code that does a manual "rollout" if not automatically triggered if I am not mistaken - I think this is not necessary for the Deployment path ... you may want to check that.
I really really like that it is not pausing / unpausing any longer, but I think that collides with Add recommended labels to pods #686. I do not really know the consequences of this. May need to document the differences in behaviour?
From the current implementation, I do not see how one could supply different values files for different environments. So for example, one might maintain something like a values.dev.yaml, values.test.yaml and values.prod.yaml to configure the application depending on the environment. I'm linking https://github.com/opendevstack/ods-pipeline/blob/master/docs/tasks/ods-deploy-helm.adoc, which describes how this works for ODS Pipeline. The approach we took is quite similar to how Tailor can be configured differently based on environments / namespaces. Keep in mind that you also need the same logic for "secrets files" (see next bullet point), which are just encrypted values files.
It looks like you still need to handle Helm "secrets files" in the "RM flow". In the component pipeline, the Helm upgrade is surrounded by code that exposes a secret key: https://github.com/opendevstack/ods-jenkins-shared-library/blob/master/src/org/ods/component/RolloutOpenShiftDeploymentStage.groovy#L192. As a side note, we switched from GPG keys to AGE keys in ODS Pipeline because the handling / CLI tool is so much easier. But that's probably a topic for another day.
I observed that the version / appVersion in the Helm chart is not connected to the "version" set in the release manager (typically from Jira). This may need documentation to avoid confusion? In ODS Pipeline we are overwriting the version of the Helm package with the one defined by the user in ods.yaml, but tbh I am not sure if that is worth it.
You should add --three-way-merge to the diff invocation (see https://github.com/opendevstack/ods-pipeline/blob/master/deploy/ods-pipeline/charts/tasks/templates/task-ods-deploy-helm.yaml#L78-L81). While Helm 3 uses a 3-way merge during upgrade, helm-diff only compares previous/new by default. Adding the flag teaches helm-diff to use a 3-way merge as well, which is a good thing for consistency. Otherwise the diff may exit with 0, while the upgrade would have made a change.
You also may want to set HELM_DIFF_IGNORE_UNKNOWN_FLAGS=true for the diff command (https://github.com/opendevstack/ods-pipeline/blob/master/cmd/deploy-with-helm/helm.go#L160) as that ensures the diff command also works when someone adds more flags to helm upgrade that helm-diff does not understand. If you add this env var, I believe you do not need https://github.com/opendevstack/ods-jenkins-shared-library/blob/master/src/org/ods/services/OpenShiftService.groovy#L140 anymore.

I hope this is helpful, let me know if there are other questions that I may help with!

serverhorror · 2022-07-22T07:11:33Z

@clemensutschig

Martin - one big thing is the tests... we should write a bunch of tests all around this?

There is (well: already was) a test to deploy helm via the componet pipeline.

You're right making a few more tests is probably useful

@michaelsauter
Excellent feedback! Thank you, I'll try and work thru all these things piece by piece

clemensutschig · 2022-07-22T14:49:27Z

@michaelsauter - thanks for for the hard challenge . this helps a lot ..

From the current implementation, I do not see how one could supply different values files for different environments. So for example, one might maintain something like a values.dev.yaml, values.test.yaml and values.prod.yaml to configure the application depending on the environment. I'm linking https://github.com/opendevstack/ods-pipeline/blob/master/docs/tasks/ods-deploy-helm.adoc, which describes how this works for ODS Pipeline. The approach we took is quite similar to how Tailor can be configured differently based on environments / namespaces - is fixed now - thru a new param, which allows you to pass those env dependent files :)

michaelsauter · 2022-07-25T13:08:55Z

src/org/ods/services/OpenShiftService.groovy

@@ -137,8 +137,11 @@ class OpenShiftService {
        valuesFiles.collect { upgradeFlags << "-f ${it}".toString() }
        values.collect { k, v -> upgradeFlags << "--set ${k}=${v}".toString() }
        if (withDiff) {
-            def diffFlags = upgradeFlags.findAll { it != '--atomic' }
+            def diffFlags = upgradeFlags.findAll { it != '--atomic' || it != '--three-way-merge'}


I do not understand this line. You wouldn't need to check since you handle the case that --three-way-merge is already given down below, no? Apart from that, --three-way-merge isn't a known flag for helm upgrade.

--atomic is filtered out because it is a known flag for helm upgrade, but it is not handled by helm diff. But as I said I think it's better to set the env var to ignore unknown flags in diff, otherwise you prevent most upgrade flags to be used as it would break helm diffing.

serverhorror · 2022-07-25T15:24:11Z

I observed that the version / appVersion in the Helm chart is not connected to the "version" set in the release manager (typically from Jira). This may need documentation to avoid confusion? In ODS Pipeline we are overwriting the version of the Helm package with the one defined by the user in ods.yaml, but tbh I am not sure if that is worth it.

@michaelsauter I'm not sure that this is something the library should do -- I will mention it in the docs. I think that makes sense.

The case I know about is:

you get chart from a vendor and want to deploy what's in there
the version (the Chart version) is handled by the vendor
the appVersion (the version of the application) is handled by the vendor
We handle config values, does it make sense to change either version/appVersion in that case?
In my head it doesn't make too much sense, I might be wrong.

OPEN

I really really like that this solution doesn't use the latest tag any longer, but instead the "short Git Sha tag". Awesome! Do you still want to maintain a latest tag at all? BTW - this will mean the spec is updated, which automatically triggers a rollout for deployments. There is some code that does a manual "rollout" if not automatically triggered if I am not mistaken - I think this is not necessary for the Deployment path ... you may want to check that.

DONE

I really really like that it is not pausing / unpausing any longer, but I think that collides with Add recommended labels to pods #686. I do not really know the consequences of this. May need to document the differences in behaviour?

Added documentation in 88702bc

From the current implementation, I do not see how one could supply different values files for different environments. So for example, one might maintain something like a values.dev.yaml,
values.test.yaml and values.prod.yaml to configure the application depending on the environment. I'm linking https://github.com/opendevstack/ods-pipeline/blob/master/docs/tasks/ods-deploy-helm.adoc, which describes how this works for ODS Pipeline. The approach we took is quite similar to how Tailor can be configured differently based on environments / namespaces. Keep in mind that you also need the same logic for "secrets files" (see next bullet point), which are just encrypted values files.
You should add --three-way-merge to the diff invocation (see https://github.com/opendevstack/ods-pipeline/blob/master/deploy/ods-pipeline/charts/tasks/templates/task-ods-deploy-helm.yaml#L78-L81). While Helm 3 uses a 3-way merge during upgrade, helm-diff only compares previous/new by default. Adding the flag teaches helm-diff to use a 3-way merge as well, which is a good thing for consistency. Otherwise the diff may exit with 0, while the upgrade would have made a change.
You also may want to set HELM_DIFF_IGNORE_UNKNOWN_FLAGS=true for the diff command (https://github.com/opendevstack/ods-pipeline/blob/master/cmd/deploy-with-helm/helm.go#L160) as that ensures the diff command also works when someone adds more flags to helm upgrade that helm-diff does not understand. If you add this env var, I believe you do not need https://github.com/opendevstack/ods-jenkins-shared-library/blob/master/src/org/ods/services/OpenShiftService.groovy#L140 anymore.

It looks like you still need to handle Helm "secrets files" in the "RM flow". In the component pipeline, the Helm upgrade is surrounded by code that exposes a secret key: https://github.com/opendevstack/ods-jenkins-shared-library/blob/master/src/org/ods/component/RolloutOpenShiftDeploymentStage.groovy#L192. As a side note, we switched from GPG keys to AGE keys in ODS Pipeline because the handling / CLI tool is so much easier. But that's probably a topic for another day.

we picked the same model that was implemented for tailor :)

test/groovy/vars/OdsComponentStageRolloutOpenShiftDeploymentSpec.groovy

vars/odsComponentStageCopyImage.groovy

CHANGELOG.md

src/org/ods/component/CopyImageStage.groovy

src/org/ods/component/RolloutOpenShiftDeploymentOptions.groovy

clemensutschig · 2022-07-28T12:48:28Z

I think this iready to be merged .. @metmajer @michaelsauter - for a very big first cut - it's feature wise at what tailor does / did

serverhorror · 2022-07-28T13:03:42Z

I think this iready to be merged .. @metmajer @michaelsauter - for a very big first cut - it's feature wise at what tailor does / did

Thanks for all the help I received here. Should I open a new issue to craft a quickstarter that uses helm so people see how it would work?

clemensutschig · 2022-07-28T14:05:23Z

I think this iready to be merged .. @metmajer @michaelsauter - for a very big first cut - it's feature wise at what tailor does / did

Thanks for all the help I received here. Should I open a new issue to craft a quickstarter that uses helm so people see how it would work?

This warrants a bigger discussion- how we want to handle this ... :/ (e.g. do it for all quickstarters in one short, .. or a flag, .. or)

This fixes calls such as: ``` + tailor --verbose --non-interactive \ -n play-test apply --selector app=example-component-name \ --exclude bc,is \ --param ODS_OPENSHIFT_APP_DOMAIN=apps.example.invalid \ --param ODS_OPENSHIFT_APP_DOMAIN=apps.example.invalid \ --verify --ignore-unknown-parameters ```

…elm-in-release-manager

serverhorror · 2022-12-01T15:58:06Z

Some "proof" that this does work:

No clue why the infrastructure broke other builds, I just hope that this is resolved now and we can accept this :)

serverhorror · 2022-12-01T16:24:17Z

There we go:

see: * opendevstack#916 (comment)

Use same paths as ScanWithSonarStage see: * opendevstack#916 (comment)

jafarre-bi · 2022-12-14T09:07:48Z

IMO this is ready to merge. I see there have been build errors that should be solved first.
I give my approval now, but please, do not merge it before the tests that are under development are complete and pass.
Thank you and congratulations, @serverhorror, for the terrific work!

serverhorror · 2022-12-14T11:37:01Z

IMO this is ready to merge. I see there have been build errors that should be solved first. I give my approval now, but please, do not merge it before the tests that are under development are complete and pass. Thank you and congratulations, @serverhorror, for the terrific work!

I have fixed this with your requests for the SonarQube changes. All checks are passing again, see:

https://github.com/serverhorror/ods-jenkins-shared-library/compare/86191beb6fb36d47e6b60ff3c91f860cedde6e3e..0b157706cd2e2f3d13cd4ac28b7079e70400937e

metmajer

Well done, @serverhorror, kudos for the hard work and contribution :) Please wait with the merge until our test cases indicate success. We are working on it and will let you know soon. Then, let's please ensure we squash-merge this thing :)

serverhorror · 2022-12-16T14:17:01Z

@metmajer

Please wait with the merge until our test cases indicate success

I can't even merge, I don't have the technical permissions :)

serverhorror · 2022-12-21T19:24:44Z

@metmajer -- I went thru the cases with your tester today. I think it looks good. Can we, please, merge this before the year ends.

I would really, really appreciate it if the open and merge would happen in the same year and I pinky swear not to open another PR this close to the years end :)

metmajer · 2022-12-22T12:34:14Z

@serverhorror appreciate it or not, but this contribution is critical enough to deserve a validation through a test. The test case might be partly defined already, but has not been executed yet. Once we have a green light, your contribution is ready to be merged. Otherwise, we taint master with a broken ability to deploy components. And we don't want that.

metmajer · 2022-12-22T16:58:46Z

To add more context to this conversation and hopefully clarify: the PR by @serverhorror is most welcome and will be released early next year. Therefore, we are not under pressure of merging this code into master. We see benefits in making contributions, including our own ones, Done before having them merged, which enables us to forget about them for now and expect successful test executions during our formal release process vs. being baffled and potentially blocked later on when the deep knowledge has already vanished. Quality is of great importance when building a Platform, especially when changing something at its core as is the case here. Overall, this has made us go from > 100 down to 25 and down to 9 bugs over the last 3 releases and thus helps us make releases and maintenance manageable. If not on this thread, the need to have a working test case was discussed internally. Thanks for your understanding.

metmajer · 2023-01-27T10:07:02Z

Over the last weeks, @serverhorror @jafarre-bi @albertpuente and others worked on triaging random errors during pipeline executions using Helm. We didn't see any clear evidence that this was related to Helm directly but needed a better understanding of the overall stability to decide on the next steps. When the team was confident about the final result, we decided to consider the implementation Done and to give the merge a go. Thanks for everyone's patience. Some contributions need a close look to ensure the quality of the resulting feature. This was certainly the case here. Well done, everyone!

clemensutschig requested review from michaelsauter, metmajer and braisvq1996 July 1, 2022 13:49

clemensutschig self-assigned this Jul 1, 2022

serverhorror mentioned this pull request Jul 1, 2022

Add Helm Support to Release Manager #866

Closed

clemensutschig marked this pull request as ready for review July 20, 2022 13:16

michaelsauter mentioned this pull request Jul 20, 2022

Helm default flag --atomic leads to error when pulling latest #922

Closed

clemensutschig commented Jul 20, 2022

View reviewed changes

src/org/ods/component/RolloutOpenShiftDeploymentStage.groovy Show resolved Hide resolved

src/org/ods/component/CopyImageOptions.groovy Outdated Show resolved Hide resolved

src/org/ods/component/CopyImageStage.groovy Outdated Show resolved Hide resolved

clemensutschig commented Jul 21, 2022

View reviewed changes

docs/modules/jenkins-shared-library/partials/odsComponentStageCopyImage.adoc Show resolved Hide resolved

clemensutschig commented Jul 21, 2022

View reviewed changes

michaelsauter reviewed Jul 25, 2022

View reviewed changes

clemensutschig requested a review from michaelsauter July 26, 2022 07:16