Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/support helm in release manager #916

Conversation

clemensutschig
Copy link
Member

@clemensutschig clemensutschig commented Jul 1, 2022

first cut of HELM support, which also once and for all eliviates the DeploymentConfig limitations

This is still dirty! - helm variables from component pipeline are NOT in the RM

@michaelsauter
Copy link
Member

Alright ... I have silently watched this for a while ... first of, kudos to the nice work!

I thought about the Helm integration on a more general note, and I'm going to throw a bit of a curve ball to you ;)

My basic question is: do you even want to make the Helm integration work like the Tailor integration?

The Tailor integration is very "Deployment(Config) centric", because:

  • that's what the first version of the shared lib did
  • operators were not a thing
  • it allowed easy implementation of some sanity checks and extraction of pod data to render the GxP docs

However, I think long-term you want to be in a place where you can basically deploy any Helm chart. This means that you do not know which resources are your deployment units (think CRDs from operators, like, for example, a "Cluster" resource for a Postgres cluster). So I am wondering if it wouldn't be nice to have the Helm "branch" somewhat separated from the Tailor part. In the Helm "branch", I would question which checks are really needed (e.g. the checks that Deployments are controlled by the release manager are probably not needed at all because the Helm chart cannot be exported from the DEV namespace, so by definition all things are version controlled). Or, is it really needed to collect pod data like IP addresses to put them into the GxP templates when IP addresses may change anyway? Maybe we can use other evidence instead which is better suited and easier to get?

So - just some food for thought. I am aware this would probably blow up the scope of this PR - but it may also make a few things easier that were difficult because of the current constraints?

@serverhorror
Copy link
Contributor

I'll add another convenience method to be able to import external images.

Since we still require images to be owned by the OpenShift registry that should be quite helpful anytime people use a chart from the interwebs.

@serverhorror
Copy link
Contributor

And, finally!, the tests pass again. Yay, I guess!

Copy link
Member Author

@clemensutschig clemensutschig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@serverhorror - see the comments

src/org/ods/component/CopyImageOptions.groovy Outdated Show resolved Hide resolved
src/org/ods/component/CopyImageStage.groovy Outdated Show resolved Hide resolved
Copy link
Member Author

@clemensutschig clemensutschig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Martin - one big thing is the tests... we should write a bunch of tests all around this?

@michaelsauter
Copy link
Member

From our conversation yesterday @clemensutschig:

  • I really really like that this solution doesn't use the latest tag any longer, but instead the "short Git Sha tag". Awesome! Do you still want to maintain a latest tag at all? BTW - this will mean the spec is updated, which automatically triggers a rollout for deployments. There is some code that does a manual "rollout" if not automatically triggered if I am not mistaken - I think this is not necessary for the Deployment path ... you may want to check that.
  • I really really like that it is not pausing / unpausing any longer, but I think that collides with Add recommended labels to pods #686. I do not really know the consequences of this. May need to document the differences in behaviour?
  • From the current implementation, I do not see how one could supply different values files for different environments. So for example, one might maintain something like a values.dev.yaml, values.test.yaml and values.prod.yaml to configure the application depending on the environment. I'm linking https://github.com/opendevstack/ods-pipeline/blob/master/docs/tasks/ods-deploy-helm.adoc, which describes how this works for ODS Pipeline. The approach we took is quite similar to how Tailor can be configured differently based on environments / namespaces. Keep in mind that you also need the same logic for "secrets files" (see next bullet point), which are just encrypted values files.
  • It looks like you still need to handle Helm "secrets files" in the "RM flow". In the component pipeline, the Helm upgrade is surrounded by code that exposes a secret key: https://github.com/opendevstack/ods-jenkins-shared-library/blob/master/src/org/ods/component/RolloutOpenShiftDeploymentStage.groovy#L192. As a side note, we switched from GPG keys to AGE keys in ODS Pipeline because the handling / CLI tool is so much easier. But that's probably a topic for another day.
  • I observed that the version / appVersion in the Helm chart is not connected to the "version" set in the release manager (typically from Jira). This may need documentation to avoid confusion? In ODS Pipeline we are overwriting the version of the Helm package with the one defined by the user in ods.yaml, but tbh I am not sure if that is worth it.
  • You should add --three-way-merge to the diff invocation (see https://github.com/opendevstack/ods-pipeline/blob/master/deploy/ods-pipeline/charts/tasks/templates/task-ods-deploy-helm.yaml#L78-L81). While Helm 3 uses a 3-way merge during upgrade, helm-diff only compares previous/new by default. Adding the flag teaches helm-diff to use a 3-way merge as well, which is a good thing for consistency. Otherwise the diff may exit with 0, while the upgrade would have made a change.
  • You also may want to set HELM_DIFF_IGNORE_UNKNOWN_FLAGS=true for the diff command (https://github.com/opendevstack/ods-pipeline/blob/master/cmd/deploy-with-helm/helm.go#L160) as that ensures the diff command also works when someone adds more flags to helm upgrade that helm-diff does not understand. If you add this env var, I believe you do not need https://github.com/opendevstack/ods-jenkins-shared-library/blob/master/src/org/ods/services/OpenShiftService.groovy#L140 anymore.

I hope this is helpful, let me know if there are other questions that I may help with!

@serverhorror
Copy link
Contributor

@clemensutschig

Martin - one big thing is the tests... we should write a bunch of tests all around this?

There is (well: already was) a test to deploy helm via the componet pipeline.

You're right making a few more tests is probably useful

@michaelsauter
Excellent feedback! Thank you, I'll try and work thru all these things piece by piece

@clemensutschig
Copy link
Member Author

@michaelsauter - thanks for for the hard challenge . this helps a lot ..

From the current implementation, I do not see how one could supply different values files for different environments. So for example, one might maintain something like a values.dev.yaml, values.test.yaml and values.prod.yaml to configure the application depending on the environment. I'm linking https://github.com/opendevstack/ods-pipeline/blob/master/docs/tasks/ods-deploy-helm.adoc, which describes how this works for ODS Pipeline. The approach we took is quite similar to how Tailor can be configured differently based on environments / namespaces - is fixed now - thru a new param, which allows you to pass those env dependent files :)

@@ -137,8 +137,11 @@ class OpenShiftService {
valuesFiles.collect { upgradeFlags << "-f ${it}".toString() }
values.collect { k, v -> upgradeFlags << "--set ${k}=${v}".toString() }
if (withDiff) {
def diffFlags = upgradeFlags.findAll { it != '--atomic' }
def diffFlags = upgradeFlags.findAll { it != '--atomic' || it != '--three-way-merge'}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand this line. You wouldn't need to check since you handle the case that --three-way-merge is already given down below, no? Apart from that, --three-way-merge isn't a known flag for helm upgrade.

--atomic is filtered out because it is a known flag for helm upgrade, but it is not handled by helm diff. But as I said I think it's better to set the env var to ignore unknown flags in diff, otherwise you prevent most upgrade flags to be used as it would break helm diffing.

@serverhorror
Copy link
Contributor

serverhorror commented Jul 25, 2022

  • I observed that the version / appVersion in the Helm chart is not connected to the "version" set in the release manager (typically from Jira). This may need documentation to avoid confusion? In ODS Pipeline we are overwriting the version of the Helm package with the one defined by the user in ods.yaml, but tbh I am not sure if that is worth it.

@michaelsauter I'm not sure that this is something the library should do -- I will mention it in the docs. I think that makes sense.

The case I know about is:

  1. you get chart from a vendor and want to deploy what's in there
  2. the version (the Chart version) is handled by the vendor
  3. the appVersion (the version of the application) is handled by the vendor
  4. We handle config values, does it make sense to change either version/appVersion in that case?
    In my head it doesn't make too much sense, I might be wrong.

OPEN

  • I really really like that this solution doesn't use the latest tag any longer, but instead the "short Git Sha tag". Awesome! Do you still want to maintain a latest tag at all? BTW - this will mean the spec is updated, which automatically triggers a rollout for deployments. There is some code that does a manual "rollout" if not automatically triggered if I am not mistaken - I think this is not necessary for the Deployment path ... you may want to check that.

DONE

  • I really really like that it is not pausing / unpausing any longer, but I think that collides with Add recommended labels to pods #686. I do not really know the consequences of this. May need to document the differences in behaviour?

Added documentation in 88702bc

we picked the same model that was implemented for tailor :)

CHANGELOG.md Outdated Show resolved Hide resolved
@clemensutschig
Copy link
Member Author

I think this iready to be merged .. @metmajer @michaelsauter - for a very big first cut - it's feature wise at what tailor does / did

@serverhorror
Copy link
Contributor

I think this iready to be merged .. @metmajer @michaelsauter - for a very big first cut - it's feature wise at what tailor does / did

Thanks for all the help I received here. Should I open a new issue to craft a quickstarter that uses helm so people see how it would work?

@clemensutschig
Copy link
Member Author

I think this iready to be merged .. @metmajer @michaelsauter - for a very big first cut - it's feature wise at what tailor does / did

Thanks for all the help I received here. Should I open a new issue to craft a quickstarter that uses helm so people see how it would work?

This warrants a bigger discussion- how we want to handle this ... :/ (e.g. do it for all quickstarters in one short, .. or a flag, .. or)

@clemensutschig clemensutschig added the enhancement New feature or request label Jul 28, 2022
serverhorror and others added 5 commits November 23, 2022 11:17
This fixes calls such as:

```
+ tailor --verbose --non-interactive \
  -n play-test apply --selector app=example-component-name \
  --exclude bc,is \
  --param ODS_OPENSHIFT_APP_DOMAIN=apps.example.invalid \
  --param ODS_OPENSHIFT_APP_DOMAIN=apps.example.invalid \
  --verify --ignore-unknown-parameters
```
@serverhorror
Copy link
Contributor

Some "proof" that this does work:

image

No clue why the infrastructure broke other builds, I just hope that this is resolved now and we can accept this :)

@serverhorror
Copy link
Contributor

There we go:

image

@jafarre-bi
Copy link
Contributor

IMO this is ready to merge. I see there have been build errors that should be solved first.
I give my approval now, but please, do not merge it before the tests that are under development are complete and pass.
Thank you and congratulations, @serverhorror, for the terrific work!

@jafarre-bi jafarre-bi requested review from jafarre-bi and removed request for serverhorror December 14, 2022 09:08
jafarre-bi
jafarre-bi previously approved these changes Dec 14, 2022
@serverhorror
Copy link
Contributor

IMO this is ready to merge. I see there have been build errors that should be solved first. I give my approval now, but please, do not merge it before the tests that are under development are complete and pass. Thank you and congratulations, @serverhorror, for the terrific work!

I have fixed this with your requests for the SonarQube changes. All checks are passing again, see:

Copy link
Member

@metmajer metmajer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done, @serverhorror, kudos for the hard work and contribution :) Please wait with the merge until our test cases indicate success. We are working on it and will let you know soon. Then, let's please ensure we squash-merge this thing :)

@serverhorror
Copy link
Contributor

@metmajer

Please wait with the merge until our test cases indicate success

I can't even merge, I don't have the technical permissions :)

@serverhorror
Copy link
Contributor

serverhorror commented Dec 21, 2022

@metmajer -- I went thru the cases with your tester today. I think it looks good. Can we, please, merge this before the year ends.

I would really, really appreciate it if the open and merge would happen in the same year and I pinky swear not to open another PR this close to the years end :)

@metmajer
Copy link
Member

@serverhorror appreciate it or not, but this contribution is critical enough to deserve a validation through a test. The test case might be partly defined already, but has not been executed yet. Once we have a green light, your contribution is ready to be merged. Otherwise, we taint master with a broken ability to deploy components. And we don't want that.

@metmajer
Copy link
Member

metmajer commented Dec 22, 2022

To add more context to this conversation and hopefully clarify: the PR by @serverhorror is most welcome and will be released early next year. Therefore, we are not under pressure of merging this code into master. We see benefits in making contributions, including our own ones, Done before having them merged, which enables us to forget about them for now and expect successful test executions during our formal release process vs. being baffled and potentially blocked later on when the deep knowledge has already vanished. Quality is of great importance when building a Platform, especially when changing something at its core as is the case here. Overall, this has made us go from > 100 down to 25 and down to 9 bugs over the last 3 releases and thus helps us make releases and maintenance manageable. If not on this thread, the need to have a working test case was discussed internally. Thanks for your understanding.

@jafarre-bi jafarre-bi merged commit 63d3fa0 into opendevstack:master Jan 27, 2023
@metmajer
Copy link
Member

metmajer commented Jan 27, 2023

Over the last weeks, @serverhorror @jafarre-bi @albertpuente and others worked on triaging random errors during pipeline executions using Helm. We didn't see any clear evidence that this was related to Helm directly but needed a better understanding of the overall stability to decide on the next steps. When the team was confident about the final result, we decided to consider the implementation Done and to give the merge a go. Thanks for everyone's patience. Some contributions need a close look to ensure the quality of the resulting feature. This was certainly the case here. Well done, everyone!

@serverhorror serverhorror deleted the feature/support-helm-in-release-manager branch January 8, 2024 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants