Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate Helm operator into SDK #776

Merged
merged 14 commits into from
Dec 3, 2018

Conversation

joelanford
Copy link
Member

Description of the change:

Adding support to the SDK for creating and running a Helm-based operator: This PR adds:

  • pkg/helm, which contains Go code for running a Helm operator
    • NOTE: The first commit in the PR is a direct copy from github.com/operator-framework/helm-app-operator-kit/helm-app-operator/pkg/helm.
  • CLI support for creating a Helm operator (operator-sdk new) and running it locally (operator-sdk up local).
  • An end-to-end test that builds a helm-operator base image, scaffolds a new helm-based memcached operator, deploys it to a cluster, and verifies it correctly installs and uninstalls an example memcached instance.

Motivation for the change:
See the Helm operator proposal

@openshift-ci-robot openshift-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Nov 26, 2018
Copy link
Member

@estroz estroz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, is there a reason why we're copying code verbatim from helm-app-operator-kit? Are we getting rid of that repo?

#
# See: http://stackoverflow.com/questions/3338030/multiple-bash-traps-for-the-same-signal
#===================================================================
trap_add() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should throw this into hack/lib/test_lib.sh since hack/ci/e2e-ansible.sh uses it as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. There's duplication that could be reduced in other areas as well:

  • pkg/helm/internal/util/diff.go and pkg/scaffold/internal/testutil/test_util.go have identical Diff functions
  • pkg/scaffold/ansible and pkg/scaffold/helm are very similar, especially operator.go and watches.go.

I'm fine with taking a stab at reducing that in this PR, but maybe it makes more sense to take care of those things in a follow-up PR since this is already huge as it is?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to make those consolidations in a follow on, I think it would be easier for reviews and we can then take the time to potentially design a nice API interface to the watches file that we can share. I think there is more to this then just sharing a template IMO. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@joelanford
Copy link
Member Author

Out of curiosity, is there a reason why we're copying code verbatim from helm-app-operator-kit? Are we getting rid of that repo?

That hasn't been explicitly decided, but my assumption is that at the very least we'll follow up in that repo by removing pkg/helm and updating imports to use github.com/operator-framework/operator-sdk/pkg/helm, which is similar to the path the ansible operator integration took, if I'm not mistaken.

Copy link
Member

@lilic lilic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall 🎉 :) Some nits, suggestions and questions.

commands/operator-sdk/cmd/new.go Outdated Show resolved Hide resolved
commands/operator-sdk/cmd/new.go Outdated Show resolved Hide resolved
pkg/helm/internal/util/doc.go Show resolved Hide resolved
pkg/helm/release/doc.go Show resolved Hide resolved
@@ -171,6 +181,55 @@ func upLocalAnsible() {
log.Info("Exiting.")
}

func upLocalHelm() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion, but maybe it would make sense to split these functions into separate files. For example, it's all already part of the up pkg, so we can just place this in the helm.go file. Same would be nice to do with upLocalAnsible, to just leave logic in this file that is shared amongst the different types. More of a nit really. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file contains what are effectively the main functions for the ansible and helm operators that we would use to build the binaries for the ansible and helm base images.

Maybe we should extract these into a common package that can be used by the ansible and helm main packages and by operator-sdk up local. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that makes sense, but doing that in a follow up PR is fine as well. 👍

hack/tests/e2e-helm.sh Outdated Show resolved Hide resolved
hack/tests/e2e-helm.sh Outdated Show resolved Hide resolved

# create CR
kubectl create -f deploy/crds/helm_v1alpha1_memcached_cr.yaml
trap_add 'kubectl delete --ignore-not-found -f ${DIR2}/deploy/crds/helm_v1alpha1_memcached_cr.yaml' EXIT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL: --ignore-not-found Thanks! :)

pkg/scaffold/helm/chart.go Show resolved Hide resolved
@lilic
Copy link
Member

lilic commented Nov 27, 2018

I do have one question: I tried to test this locally by running operator-sdk up local and it gives me this error:

➜  helmy git:(master) operator-sdk up local
INFO[0000] Running the operator locally.
INFO[0000] Go Version: go1.11.2
INFO[0000] Go OS/Arch: darwin/amd64
INFO[0000] operator-sdk Version: v0.1.1+git
FATA[0000] invalid chart directory /opt/helm/helm-charts/memcached: stat /opt/helm/helm-charts/memcached: no such file or directory

Is there something I was missing while creating the project?

@joelanford
Copy link
Member Author

Is there something I was missing while creating the project?

@lilic Yes, this was something @shawn-hurley @hasbro17 and I discussed. The problem is that the watches.yaml file can't easily serve double duty for both operator-sdk up local and operator-sdk build.

The decision was that we'd make it work for operator-sdk build but that users would need to make manual changes on their filesystem to make up local work (a common change would be to create a symlink at /opt/helm/helm-charts/<kind> pointing to $PWD/helm-charts/<kind>). I'll make sure this is covered and highlighted in the docs.

We tossed around some ideas about how we might be able to make this work "magically":

  • Put relative paths in watches.yaml and rely on setting the current working directory correctly in the built docker image
  • Parse and regenerate a temporary watches.yaml for up local, substituting in the local helm-charts path.

In the end, we decided against any of them for the sake of explicitness and simplicity, at least initially.

@openshift-bot
Copy link

@joelanford: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@lilic
Copy link
Member

lilic commented Nov 28, 2018

The decision was that we'd make it work for operator-sdk build but that users would need to make manual changes on their filesystem to make up local work (a common change would be to create a symlink at /opt/helm/helm-charts/ pointing to $PWD/helm-charts/). I'll make sure this is covered and highlighted in the docs.

Makes sense, is it possible we might want to follow up on this to add some logs or warnings to be printed out when users run up local and we know it's helm operator type. To convey to the user this is still experimental and they need to perform additional steps.

@joelanford
Copy link
Member Author

is it possible we might want to follow up on this to add some logs or warnings to be printed out when users run up local and we know it's helm operator type.

I think so, and this also applies to the ansible operator, so whatever we do needs to apply to both. Though I think we probably only want to do that when an error occurs loading the role/playbook/chart.

Copy link
Member

@lilic lilic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question, otherwise 👍

version = "2.11.0"

[[override]]
name = "github.com/russross/blackfriday"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to override this? Just curious about the reason behind this. Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

github.com/russross/blackfriday is depended on by k8s.io/kubernetes, which doesn't have dep files. I'm not totally familiar with how dep chooses dependencies, but in this case it chooses the latest tag (v2.0.1) which is incompatible with k8s.io/kubernetes usage of it. I'll update this to specify the latest v1 tag (1.5.2), which does work.

github.com/docker/distribution is also depended on by k8s.io/kubernetes. The latest stable tag in that repo is 2.6.2, which is what dep chooses without an override. That version is not compatible. There is a 2.7.0 pre-release tag that would work. I'm not sure if 2.7.0-rc.0 or master would be better.

github.com/docker/docker is also depended on by k8s.io/kubernetes. Without the override, dep chooses 1.6.0-rc5, which is incompatible with k8s.io/kubernetes usage of it. The main problem with more recent tagged versions is that they use github.com/Sirupsen/logrus and we use github.com/sirupsen/logrus, so dep refuses to pull a more recent version. The latest master switches to github.com/sirupsen/logrus, but for some reason, docker devs don't seem to be tagging that repo anymore.

I also noticed that I need to use "=1.11.2" instead of "1.11.2" for the kubernetes override to force it to pick 1.11.2 exactly, so I'll make that change as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some docker GitHub issues which describe the problems with the docker deps:

Perhaps we should take this approach? moby/moby#33989 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a shame we need to import k8s.io/kubernetes in the first place just to satisfy tillers need for the different clients it uses.

But that made me look at https://github.com/helm/helm/blob/2e55dbe1fdb5fdb96b75ff144a339489417b146b/glide.lock in case you find some inspiration there.

All makes sense, thanks for the explanation! Maybe we could add that to the commit msg or a short comment in the Gopkg.toml? WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I like the approach of taking helm's locked dependencies. I'll update Gopkg.toml to use the revision hashes directly and add a comment in there about why were doing it that way.

@@ -171,6 +181,55 @@ func upLocalAnsible() {
log.Info("Exiting.")
}

func upLocalHelm() {
// Set the kubeconfig that the manager will be able to grab
os.Setenv(k8sutil.KubeConfigEnvVar, kubeConfig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check return error?

}

func main() {
flag.Parse()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any defined flags that we're parsing?
The ansible-operator uses this for setting the reconcile/resync period.
https://github.com/water-hole/ansible-operator/blob/master/cmd/manager/main.go#L21

For helm we have that set to 5 seconds below.
Slight tangent but I forget if we could configure that per CR.
https://github.com/water-hole/ansible-operator/blob/master/vendor/github.com/operator-framework/operator-sdk/pkg/ansible/controller/reconcile.go#L44-L47

Doesn't seem like we do that right now for helm, but I forgot if we're planning to do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are not flags defined by the helm operator itself, but there are flags defined in github.com/golang/glog and sigs.k8s.io/controller-ruintime

We don't have a command-line configurable or watches.yaml configurable reconcile periods, and I don't think we've talked about it for the Helm operator. We could plan to add that in a follow-on PR.

@shawn-hurley also mentioned dynamically adding watches for created objects. Not sure if we're thinking that will replace or augment the periodic reconciliation though.

Copy link
Contributor

@hasbro17 hasbro17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after the error check nit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants