Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Strimzi Helm Chart #565

Merged
merged 1 commit into from Jul 30, 2018

Conversation

Projects
None yet
4 participants
@seglo
Copy link
Contributor

commented Jul 10, 2018

Type of change

Enhancement / new feature

Description

Support Helm Charts as a means to install Strimzi into a Kubernetes cluster. #539

Checklist

Please go through this checklist and make sure all applicable tasks have been done

  • Write tests
  • Make sure all tests pass
  • Update documentation
  • Check RBAC rights for Kubernetes / OpenShift roles
  • Try your changes from Pod inside your Kubernetes and OpenShift cluster, not just locally
  • Reference relevant issue(s) and close them after merging
@seglo

This comment has been minimized.

Copy link
Contributor Author

commented Jul 10, 2018

Hello Strimzi committers. I've been working on adding this feature, but I have a number of outstanding questions that should be addressed before I proceed. I've summarized these questions and other comments below.


Outstanding questions:

  1. Should we make this the primary way to install the operator?

By making this the primary installation means we could remove duplicate install examples from /examples and /book/book/examples. If we don't want to make this the default then another option would be to use helm template to generate the rendered templates and copy them to these directories. This has the benefit of providing a consistent way to template installation files and replaces the need substitute values with shell commands during release (the release_version make target).

  1. Is the topic operator ever installed standalone? If so then we should create a chart for it too.

  2. Adding to central Kubeapps chart repository?

The current implementation simply bundles the chart into the release process by calling helm package to create a tarball that's included in the ./charts/ dir of the release. Eventually we could publish this to the central kubeapps chart repository. I've started implementing the requirements listed in the Review Guidelines and Contribution Guidelines in preparation.

TODO for adding to Kubeapps

  • Implement names and labels to Review Guidelines spec for each template
  • Add README.md according to Review Guidelines
  • Support application upgrades according to Contribution Guidelines. I assume this means the operator itself must support upgrades, but obviously it also raises the issue of performing a rolling upgrade of brokers themselves. I don't know if Strimzi currently supports this.

Other TODO

  • Test with Travis. This is only necessary if we choose to make helm the default deployment mechanism or use helm in some other capacity during the build.

@seglo seglo force-pushed the seglo:helm-chart branch 2 times, most recently from b7abeee to 578486f Jul 10, 2018

.travis.yml Outdated
- wget -q https://storage.googleapis.com/kubernetes-helm/${HELM_TGZ}
- tar xzfv ${HELM_TGZ}
- PATH=`pwd`/linux-amd64/:$PATH
- helm init --client-only

This comment has been minimized.

Copy link
@ppatierno

ppatierno Jul 11, 2018

Member

not an Helm expert but doesn't it install just the client but not the Tiller server? Should we test in some way that the chart works so the server is needed?

This comment has been minimized.

Copy link
@seglo

seglo Jul 11, 2018

Author Contributor

@ppatierno Yes, the helm init --client-only was to prepare the CLI chart release management activities like those described in the Makefile. We can add a system test which tests that the generated chart works (which would require us to initialize Tiller on the test cluster).

I haven't looked at the Strimzi systemtests project in detail, but we could also think about making helm the default deployment mechanism (see question 1 in my last comment). I'll get the systemtests running locally so I can test the feasibility of this. Do you think that is worth exploring?

This comment has been minimized.

Copy link
@ppatierno

ppatierno Jul 11, 2018

Member

I'm not quite sure that we want Helm Charts as default deployment mechanism but for sure one of the available mechanisms (so without removing what we have in place today). Regarding the systemtests I really think that it's worth exploring having them for helm deployment. @scholzj @tombentley wdyt ?

This comment has been minimized.

Copy link
@tombentley

tombentley Jul 11, 2018

Member

I agree about having an automated test that deploying through helm results in a functioning Strimzi deployment.

This comment has been minimized.

Copy link
@seglo

seglo Jul 11, 2018

Author Contributor

Ok, I'll add a system test.

appVersion: "0.1.0"
description: "Strimzi: Kafka as a Service"
name: strimzi-kafka-operator
version: 0.1.0

This comment has been minimized.

Copy link
@ppatierno

ppatierno Jul 11, 2018

Member

Is this the chart version or should be the "software" version (in this case Strimzi) ? Should they be aligned ? Just asking ...

This comment has been minimized.

Copy link
@seglo

seglo Jul 11, 2018

Author Contributor

appVersion is the "software" version. version is the version of the actual chart.

Often the charts in the kubeapps repo are managed by different people from the upstream project. This is because the upstream project committers may not be concerned with how their software is deployed so somebody volunteers to do this on their behalf. This could lead to an inefficient release process whenever you want to publish a new version of the chart, because the chart on kubeapps may significantly lag the latest release of the software. IMO since Strimzi is a K8s specific project it makes sense to integrate the chart release into the overall release process. A chart artifact could even be hosted by you immediately on a github pages site (a strimzi chart repo, basically). Once the chart gets approved by kubeapps they have processes that let chart authors fast track new versions of the chart based on the github user details in the OWNERS file. This essentially allows chart authors to create a PR that gets automatically merged once an owner approves it. See this documentation on Owning and Maintaining a Chart for more details. I think these steps could be added to your release process too after initial approval of the chart.

This comment has been minimized.

Copy link
@seglo

seglo Jul 11, 2018

Author Contributor

Also, appVersion and version are hardcoded in the Chart.yaml, but they're substituted with the actual RELEASE_VERSION when the chart is packaged during release (see the release_helm_pkg Make target). Because version must comply with semantic versioning I used sed to extract the version component from RELEASE_VERSION and use that. Example) if RELEASE_VERSION is 0.5.0-SNAPSHOT we use 0.5.0 for the semantic version of the chart.

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: Role
metadata:
name: strimzi-role

This comment has been minimized.

Copy link
@ppatierno

ppatierno Jul 11, 2018

Member

why did you change the metadata.name ? I see that you have done something similar for all the others templates, is there any specific reason ?

This comment has been minimized.

Copy link
@seglo

seglo Jul 11, 2018

Author Contributor

That's a good question. In the Review Guidelines they state that charts should have the following metadata for every resource.

Resources and labels should follow some conventions. The standard resource metadata should be this:

name: {{ template "myapp.fullname" . }}
labels:
 app: {{ template "myapp.name" . }}
 chart: {{ template "myapp.chart" . }}
 release: {{ .Release.Name }}
 heritage: {{ .Release.Service }}

In this case myapp.fullname (strimzi.fullname in this PR) refers to a templating function in _helpers.tpl which concatenates the Release.name and Chart.name. I think the reasoning for this is so that you can install multiple instances of the same chart in the same namespace. I initially applied this convention to all metadata.name's, but then I discovered that the operator depends on certain names to be constant, such as the service account created for the Kafka broker pods. In the this PR I only use the templated name for the deployment resource.

I took a look at other charts in kubeapps and noticed a mix of charts enforcing and not enforcing this requirement. I haven't received an answer yet to my questions about when it's appropriate (i.e. what type of resource in what context) to use the templated name, but I'll follow up by asking on the helm mailing list and get back to you.

.travis.yml Outdated
- wget -q https://storage.googleapis.com/kubernetes-helm/${HELM_TGZ}
- tar xzfv ${HELM_TGZ}
- PATH=`pwd`/linux-amd64/:$PATH
- helm init --client-only

This comment has been minimized.

Copy link
@tombentley

tombentley Jul 11, 2018

Member

I agree about having an automated test that deploying through helm results in a functioning Strimzi deployment.

- stream
- event
- messaging
- datastore

This comment has been minimized.

Copy link
@tombentley

tombentley Jul 11, 2018

Member

topic


To create a Kafka cluster refer to the following documentation.

http://strimzi.io/docs/master/#kafka_broker

This comment has been minimized.

Copy link
@tombentley

tombentley Jul 11, 2018

Member

Can we parameterize this so it points people to the right version of the docs?

kind: CustomResourceDefinition
metadata:
name: kafkaconnects2is.kafka.strimzi.io
labels:

This comment has been minimized.

Copy link
@tombentley

tombentley Jul 11, 2018

Member

This is tricky from the build perspective because the CRD resources are themselves generated from the Java model in api. We have to avoid having screeds of YAML which needs to be hand-maintained. We could perhaps generate the examples resources from the helm charts, but to do that for the CRD resources we'd need to be generating the resources with these extra labels. Which is doable. I assume apart from these Helm-specific labels the rest of the CRD templates are just a verbatim copy of what's currently in examples?

This comment has been minimized.

Copy link
@seglo

seglo Jul 11, 2018

Author Contributor

@tombentley Yes I just copied the latest from ./examples/install/cluster-operator/. The main differences are the additions to metadata and Deployment in 08-deployment.yaml.

Can you point me to the process in the build that generates the CRD? I can take a crack at making these steps of the build process work together.

@tombentley

This comment has been minimized.

Copy link
Member

commented Jul 11, 2018

  1. Is the topic operator ever installed standalone? If so then we should create a chart for it too.

We're aware of users who do this, but I wouldn't consider it necessary to have a chart for the topic operator in this PR.

@seglo

This comment has been minimized.

Copy link
Contributor Author

commented Jul 11, 2018

We're aware of users who do this, but I wouldn't consider it necessary to have a chart for the topic operator in this PR.

@tombentley The reason I suggested creating the chart for the topic-operator as well was to remain consistent with how the resources are templated during the build process under ./examples/install. If we start using helm to manage templating for the cluster-operator we could do the same with the topic-operator, and then not have to manage either in ./examples/ any more. Both charts could be used to generate the contents of ./examples/install/ during the release process.

@seglo seglo force-pushed the seglo:helm-chart branch 3 times, most recently from e3650ef to c06e5e4 Jul 13, 2018

@seglo

This comment has been minimized.

Copy link
Contributor Author

commented Jul 16, 2018

Hi @ppatierno @tombentley. I added a system test, but I have some new questions. Another review would be good too.

  1. I updated the ClusterOperator annotation used in systemtest to accept a parameter to use the Helm Chart for install. This allows you to use it instead of the resources found in the ./examples/install/cluster-operator/ directory. I was hoping to add the testDeployKafkaClusterViaHelmChart test to KafkaClusterIT, but ATM you cannot override the ClusterOperator attribute at the method level if it's already defined at the test class level. Well, you can, but it looks like it will deploy the cluster using the test class config first and then again when it evals the method attribute. If you like I could try re-factoring it so the method attribute would always take precedent.

  2. @ppatierno asked why it was necessary to overload metadata.name in all the resources. See this earlier comment for my original motivation. I didn't get any response from the helm mailing list and the only person who responded was on the #charts channel on the Kubernetes channel, who basically said what I had assumed: To support multiple installs of the same chart. However, since I don't have a definitive answer I've rolled back this change. All metadata.name's are now what they were originally.

  3. @tombentley @ppatierno I took a look at the CrdGenerator. I would like to propose modifying the release build based on what @tombentley suggested in an earlier comment. The steps would be:

    1. Generate CRD's to ./helm-charts/spark-kafka-operator/templates/
    2. Generate and output resource templates to ./examples/install/cluster-operator/
    3. Bundle release

It looks like some tests in systemtest failed during the travis build. I'm checking to see if they're a result of my changes..

api/pom.xml Outdated
@@ -106,6 +106,25 @@
</arguments>
</configuration>
</execution>
<execution>

This comment has been minimized.

Copy link
@seglo

seglo Jul 16, 2018

Author Contributor

We can consolidate this later. I just added the 2nd execution to get fresh CRD's in the helm chart.

@seglo seglo force-pushed the seglo:helm-chart branch 2 times, most recently from b3c57c3 to db71fcc Jul 16, 2018

@tombentley

This comment has been minimized.

Copy link
Member

commented Jul 16, 2018

Hi @seglo,

I'll do a proper review tomorrow, but I'll answer this one now:

I updated the ClusterOperator annotation used in systemtest to accept a parameter to use the Helm Chart for install. This allows you to use it instead of the resources found in the ./examples/install/cluster-operator/ directory. I was hoping to add the testDeployKafkaClusterViaHelmChart test to KafkaClusterIT, but ATM you cannot override the ClusterOperator attribute at the method level if it's already defined at the test class level. Well, you can, but it looks like it will deploy the cluster using the test class config first and then again when it evals the method attribute. If you like I could try re-factoring it so the method attribute would always take precedent.

Setting up a CO instance takes quite a while, so the idea with putting the annotation at the class level is that all the test methods in the class share a CO instance, thus sharing the cost of setting it up. This is important as we've found over time that we can easily hit the 50 minute limit on Travis. If you want to set up the CO in a different way then maybe you can achieve the same thing by subclassing the existing test, but changing the class annotation in the subclass. It's not ideal, but it might be enough for the time being.

@seglo

This comment has been minimized.

Copy link
Contributor Author

commented Jul 16, 2018

@tombentley Got it. That makes sense.

@tombentley

This comment has been minimized.

Copy link
Member

commented Jul 18, 2018

We're aware of users who do this, but I wouldn't consider it necessary to have a chart for the topic operator in this PR.

@tombentley The reason I suggested creating the chart for the topic-operator as well was to remain consistent with how the resources are templated during the build process under ./examples/install. If we start using helm to manage templating for the cluster-operator we could do the same with the topic-operator, and then not have to manage either in ./examples/ any more. Both charts could be used to generate the contents of ./examples/install/ during the release process.

Yes, that would be a good place to end up, but I think it will be simpler to get there in steps.

@tombentley
Copy link
Member

left a comment

One thing we need to avoid is duplicating the installation resources (currently in examples and helm-charts/strimzi-kafka-operator/templates). If you tweak the CrdGenerator to be able to to able to supply arbitrary metadata.labels to be included in the generated output you could then generate the template CRDs. Then we could see whether we can use helm template to generate the examples. Wdyt?

@@ -112,6 +113,14 @@ you can push the images to OpenShift's Docker repo like this:

oc create -f examples/configmaps/cluster-operator/kafka-ephemeral.yaml

## Helm Chart

This comment has been minimized.

Copy link
@tombentley

tombentley Jul 18, 2018

Member

Since helm is now used by the build process that's something we should document. I know we've not documented the existing build requirements here until now, but could to add a section for that to this HACKING.md: make, mvn, helm, asciidoctor for docs I suppose.

This comment has been minimized.

Copy link
@seglo

seglo Jul 18, 2018

Author Contributor

I'll add a section for pre-req's.

entry("imageRepositoryOverride", dockerOrg),
entry("imageTagOverride", dockerTag),
entry("clusterOperator.image.pullPolicy", "Always"),
entry("clusterOperator.resources.requests.memory", "512Mi"),

This comment has been minimized.

Copy link
@tombentley

tombentley Jul 18, 2018

Member

"512Mi" etc should probably be constants, so they get modified in tandem between the helm and examples installations

This comment has been minimized.

Copy link
@seglo

seglo Jul 18, 2018

Author Contributor

I'll make them constants.

@seglo seglo force-pushed the seglo:helm-chart branch 2 times, most recently from d35a395 to 1ca710c Jul 18, 2018

@seglo

This comment has been minimized.

Copy link
Contributor Author

commented Jul 23, 2018

@tombentley @ppatierno I think I have all the pieces in place now and ready for a full review. Some notes.

  • Some acceptance tests in KafkaClusterIT are failing I think due to the addition of the TLS sidecar images to the ZK, Kafka, and Topic Operator pods:
...
        Suppressed: io.strimzi.test.k8s.KubeClusterException: `kubectl --namespace kafka-cluster-test logs my-cluster-kafka-0` got status code 1 and stderr:
------
Error from server (BadRequest): a container name must be specified for pod my-cluster-kafka-0, choose one of: [kafka tls-sidecar]
  • I've integrated the helm packaging into the release as discussed with @tombentley (updated CRD generator, generate examples install resources, etc.)
  • I subclass KafkaClusterIT with a new systemtest called HelmChartClusterIT, which runs all the parents tests, but using the chart as a deployment mechanism.
  • I'm going to do some smoke testing in with GKE, but I don't have an OpenShift cluster available.

@seglo seglo force-pushed the seglo:helm-chart branch from 1ca710c to ec640d6 Jul 23, 2018

@seglo seglo changed the title Add a Strimzi Helm Chart -WIP Add a Strimzi Helm Chart Jul 23, 2018

@seglo seglo force-pushed the seglo:helm-chart branch from ec640d6 to b6cd09b Jul 23, 2018

@ppatierno

This comment has been minimized.

Copy link
Member

commented Jul 24, 2018

@seglo thanks! Regarding the message you got for failing tests. That line is related to the logger method which is not able to print the log when an error occurs due to the addition of one more container (as you said) in a pod; btw it's not the cause of the test failure.
You got this message when a test is already failed and the logger tries to print pod log. Of course, we have to fix that but in any case you should check the real reason why the test is failed (so the logger tries to print pod log ... failing as well :-)).

@ppatierno

This comment has been minimized.

Copy link
Member

commented Jul 24, 2018

@seglo actually the problem about the log was already fixed by @tombentley few days ago, maybe you should rebase against latest master.

@seglo seglo force-pushed the seglo:helm-chart branch 2 times, most recently from 6402000 to 893d781 Jul 24, 2018

@seglo seglo force-pushed the seglo:helm-chart branch 2 times, most recently from 7ca79b5 to c0f92df Jul 24, 2018

@seglo

This comment has been minimized.

Copy link
Contributor Author

commented Jul 25, 2018

@ppatierno The systemtest build is passing again. It's ready for a review.

@seglo seglo force-pushed the seglo:helm-chart branch from c0f92df to 265dd39 Jul 25, 2018

@ppatierno

This comment has been minimized.

Copy link
Member

commented Jul 27, 2018

@seglo I'm sorry but due to new changes in the master there are new conflicts :(
Can you fix them? Anyway I'm going to review it.

@seglo seglo force-pushed the seglo:helm-chart branch 2 times, most recently from 438e60e to 97157bc Jul 27, 2018

@seglo

This comment has been minimized.

Copy link
Contributor Author

commented Jul 28, 2018

@ppatierno I rebased and resolved conflicts, but I'll need to do more troubleshooting to figure out the new TopicOperatorIT systemtest failure. Will look into it Monday morning.

@scholzj

This comment has been minimized.

Copy link
Member

commented Jul 28, 2018

@seglo The TopicOperatorIT is IMHO a flaky test. I restarted your PR build and it failed on the System tests.

I would like to merge this quickly to avoid another conflicts and rebasing :-). So I had a look at this and noticed two things:

  • The HelmChartClusterIT class should probably inherit only from AbstractClusterIT
  • When I tried to run the HelmChartCLusterIT test it failed with some RBAC errors. I think the problem is that it Tiller needs service account and cluster role binding to deploy the Strimzi RBAC resources.

I think I fixed both of these in my branch. Maybe you can have a look at it and use it for your PR.

@seglo seglo force-pushed the seglo:helm-chart branch from 97157bc to dcb649f Jul 29, 2018

@seglo

This comment has been minimized.

Copy link
Contributor Author

commented Jul 29, 2018

@scholzj Thanks a lot for taking the time to get the tests passing. It helped me fix the underlying issue I had. I would also like to get this merged in ASAP because it touches on a big chunk of your release process and management of the cluster operator resource files that see a lot of changes.

Regarding the two things you noticed:

The HelmChartClusterIT class should probably inherit only from AbstractClusterIT

I had originally subclassed AbstractClusterIT, but switched to KafkaClusterIT based on a suggestion from @tombentley . By subclassing KafkaClusterIT we can run all the same tests using the Helm Chart as the deployment mechanism, which was useful to validate the chart. This does come with the consequence of increasing the time of the build.

When I tried to run the HelmChartCLusterIT test it failed with some RBAC errors. I think the problem is that it Tiller needs service account and cluster role binding to deploy the Strimzi RBAC resources.

It looks like the underlying issue was related to not deploying the new KafkaTopic CRD with the Helm Chart. In the api project's CRD generation config I didn't include the new argument to create and when the topic operator was deployed it was unable to find the correct CRD. After seeing your branch included this argument, I added it as well and tests worked again.

The Tiller service account isn't necessary to run the tests on Minikube, but I think it would be useful to include (though won't be required when we Helm 3 is released since Tiller will no longer exist). The reason I didn't keep that wasn't merged in was due to an issue with loading the resource file properly (helm-service-account.yaml). It worked when I ran a test in IntelliJ, but would crash when I ran a test using maven w/ the system test helper script. I'm not sure why.

@scholzj

This comment has been minimized.

Copy link
Member

commented Jul 29, 2018

Thanks for the explanations. We will have a look at it right in the morning to get it merged without any further rebasing.

However what makes me curious is why the service account should not be needed. Without that it was using just the default account which should IMHO have no rights by default. However you are right that RBAC the error I saw could have been caused also by the missing CRD. As I noticed that only after adding the service account. So as long as the tests pass it should be fine.

Thanks for the PR.

@ppatierno
Copy link
Member

left a comment

LGTM. @seglo thanks for this PR!

@tombentley
Copy link
Member

left a comment

LGTM. Let's not delay mering this any longer. @seglo if you could open a PR to fix the typo and possibly move the KafkaUser to the 'normal' CrdGenerator execution that would be great.

<argument>io.strimzi.api.kafka.model.KafkaConnectAssembly=${pom.basedir}${file.separator}..${file.separator}helm-charts${file.separator}strimzi-kafka-operator${file.separator}templates${file.separator}04-Crd-kafkaconnect.yaml</argument>
<argument>io.strimzi.api.kafka.model.KafkaConnectS2IAssembly=${pom.basedir}${file.separator}..${file.separator}helm-charts${file.separator}strimzi-kafka-operator${file.separator}templates${file.separator}04-Crd-kafkaconnects2i.yaml</argument>
<argument>io.strimzi.api.kafka.model.KafkaTopic=${pom.basedir}${file.separator}..${file.separator}helm-charts${file.separator}strimzi-kafka-operator${file.separator}templates${file.separator}04-Crd-kafkatopic.yaml</argument>
<argument>io.strimzi.api.kafka.model.KafkaUser=${pom.basedir}${file.separator}..${file.separator}helm-charts${file.separator}strimzi-kafka-operator${file.separator}templates${file.separator}04-Crd-kafkauser.yaml</argument>

This comment has been minimized.

Copy link
@tombentley

tombentley Jul 30, 2018

Member

I don't think KafkaUser needs to be here (@scholzj?)

| `image.repository` | Cluster Operator image repository | `strimzi` |
| `image.name` | Cluster Operator image name | `cluster-operator` |
| `image.tag` | Cluster Operator image tag | `latest` |
| `image.imagePullPolicy` | Cluster Operator image pull policy | `IfNotPrsent` |

This comment has been minimized.

Copy link
@tombentley

tombentley Jul 30, 2018

Member

IfNotPresent

@tombentley tombentley merged commit 49e7bd3 into strimzi:master Jul 30, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@seglo

This comment has been minimized.

Copy link
Contributor Author

commented Jul 30, 2018

Great. Thanks!

@tombentley I'll create a PR to fix that typo.

There are some follow up tasks I would like to suggest.

  • Create a helm repo on the strimzi website. You can generate a Helm Repository with the helm CLI tool and host it using github pages. This will let you release the chart in a more formal way. The way the chart is bundled now, users will have to download your release artifact to install it, or install it as a standalone artifact via a URL.

    helm install https://github.com/strimzi/strimzi-kafka-operator/releases/download/0.6.0/strimzi-kafka-operator-0.6.0.tgz
    

    If we create the repo then the user could add it and install the latest version via repo/package coordinates.

    helm repo add strimzi http://strimzi.io/helm/
    helm install strimzi/strimzi-kafka-operator
    
  • Submit the project to the central kubeapps incubator repository. I think most of the requirements have been satisfied already. After the next release I suggest submitting it.

  • Create a standalone strimzi-topic-operator chart. This allows users to install the topic operator standalone without having to use the examples resource files directly. It will also bring consistency with how the resource files are templated and managed as part of the release workflow.

WDYT? @ppatierno @scholzj @tombentley

@seglo seglo deleted the seglo:helm-chart branch Jul 30, 2018

@tombentley

This comment has been minimized.

Copy link
Member

commented Jul 30, 2018

+1 on hosting a helm repo on the strimzi website.
+1 on having a topic operator chart too.

I'm just reading about kubeapps contribution guidelines to understand that better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.