-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Istio to our nightly tests #240
Comments
Thanks for opening the issue, 100% agree, we should fix this stuff. Istio has been on our radar for awhile anyway, and (1) and (3) are problems we've encountered from another big, important, complex chart as well ( (2) is one I've not heard before; if Helm does it we should probably support it too, even though I think this is not a great decision individually. :) Even though this issue is really a meta-issue, I'm going to keep it open for now, because I suspect that once we get through these issues, we'll find even more. My experience is that large Helm charts usually have subtle and unnoticed bugs that need to be fixed, and our pattern has been to contribute fixes upstream, and I expect we'll likely do that here, too. |
I'm moving the umbrella issue to M19. We'll try to get a quick fix in for most Charts in the next day or two (I'm guessing we will do infinite parallelism), and then follow up with a series of fixes that solve the problems of the Istio Chart generally. I also suspect we will discover more issues than those listed here -- so we'll want to budget time for that. |
I have updated the title. We should not only fix the underlying set of issues, but also aim to close out M19 with Istio actually harnessed and running in our nightly tests, so that we lock in the goodness. |
This commit will introduce a simple retry mechanism that makes the Kubernetes provider more tolerant to misordered resource creation, as is particularly common in Helm and Kubernetes YAML. For example, if Kubernetes YAML specifies to create a `Pod` before the `Namespace` it resides in, it is likely this will cause `pulumi up` to fail. Another example is creating `CustomResoruces` before an operator or controller has the opportunity to register the relevant `CustomResourceDefinition`s. We ameliorate this bug by retrying creation REST calls that fail, with some backoff, and some maximum retry threshold. The effects of this are: * Makes the Kubernetes provider somewhat tolerant to undeclared dependencies between resources. * Makes the Kubernetes provider somewhat tolerant to under-specified CRDs, custom controllers, and custom operators, since the retry logic can stand in for all but the most complex of these (which would still require await logic). This commmit represents major progress towards #240, and should (we believe) fix #239.
This commit will introduce a simple retry mechanism that makes the Kubernetes provider more tolerant to misordered resource creation, as is particularly common in Helm and Kubernetes YAML. For example, if Kubernetes YAML specifies to create a `Pod` before the `Namespace` it resides in, it is likely this will cause `pulumi up` to fail. Another example is creating `CustomResoruces` before an operator or controller has the opportunity to register the relevant `CustomResourceDefinition`s. We ameliorate this bug by retrying creation REST calls that fail, with some backoff, and some maximum retry threshold. The effects of this are: * Makes the Kubernetes provider somewhat tolerant to undeclared dependencies between resources. * Makes the Kubernetes provider somewhat tolerant to under-specified CRDs, custom controllers, and custom operators, since the retry logic can stand in for all but the most complex of these (which would still require await logic). This commmit represents major progress towards #240, and should (we believe) fix #239.
This commit will introduce a simple retry mechanism that makes the Kubernetes provider more tolerant to misordered resource creation, as is particularly common in Helm and Kubernetes YAML. For example, if Kubernetes YAML specifies to create a `Pod` before the `Namespace` it resides in, it is likely this will cause `pulumi up` to fail. Another example is creating `CustomResoruces` before an operator or controller has the opportunity to register the relevant `CustomResourceDefinition`s. We ameliorate this bug by retrying creation REST calls that fail, with some backoff, and some maximum retry threshold. The effects of this are: * Makes the Kubernetes provider somewhat tolerant to undeclared dependencies between resources. * Makes the Kubernetes provider somewhat tolerant to under-specified CRDs, custom controllers, and custom operators, since the retry logic can stand in for all but the most complex of these (which would still require await logic). This commmit represents major progress towards #240, and should (we believe) fix #239.
This commit will introduce a simple retry mechanism that makes the Kubernetes provider more tolerant to misordered resource creation, as is particularly common in Helm and Kubernetes YAML. For example, if Kubernetes YAML specifies to create a `Pod` before the `Namespace` it resides in, it is likely this will cause `pulumi up` to fail. Another example is creating `CustomResoruces` before an operator or controller has the opportunity to register the relevant `CustomResourceDefinition`s. We ameliorate this bug by retrying creation REST calls that fail, with some backoff, and some maximum retry threshold. The effects of this are: * Makes the Kubernetes provider somewhat tolerant to undeclared dependencies between resources. * Makes the Kubernetes provider somewhat tolerant to under-specified CRDs, custom controllers, and custom operators, since the retry logic can stand in for all but the most complex of these (which would still require await logic). This commmit represents major progress towards #240, and should (we believe) fix #239.
This commit will introduce a simple retry mechanism that makes the Kubernetes provider more tolerant to misordered resource creation, as is particularly common in Helm and Kubernetes YAML. For example, if Kubernetes YAML specifies to create a `Pod` before the `Namespace` it resides in, it is likely this will cause `pulumi up` to fail. Another example is creating `CustomResoruces` before an operator or controller has the opportunity to register the relevant `CustomResourceDefinition`s. We ameliorate this bug by retrying creation REST calls that fail, with some backoff, and some maximum retry threshold. The effects of this are: * Makes the Kubernetes provider somewhat tolerant to undeclared dependencies between resources. * Makes the Kubernetes provider somewhat tolerant to under-specified CRDs, custom controllers, and custom operators, since the retry logic can stand in for all but the most complex of these (which would still require await logic). This commmit represents major progress towards #240, and should (we believe) fix #239.
This commit will introduce a simple retry mechanism that makes the Kubernetes provider more tolerant to misordered resource creation, as is particularly common in Helm and Kubernetes YAML. For example, if Kubernetes YAML specifies to create a `Pod` before the `Namespace` it resides in, it is likely this will cause `pulumi up` to fail. Another example is creating `CustomResoruces` before an operator or controller has the opportunity to register the relevant `CustomResourceDefinition`s. We ameliorate this bug by retrying creation REST calls that fail, with some backoff, and some maximum retry threshold. The effects of this are: * Makes the Kubernetes provider somewhat tolerant to undeclared dependencies between resources. * Makes the Kubernetes provider somewhat tolerant to under-specified CRDs, custom controllers, and custom operators, since the retry logic can stand in for all but the most complex of these (which would still require await logic). This commmit represents major progress towards #240, and should (we believe) fix #239.
Ok, we're getting close, I think. Here are the outstanding issues that we're in the process of resolving:
|
Ok, I'm marking the two items from my last comment off -- with #271, Istio does not (on my machine) error out during install or uninstall. I am guessing we'll be running into more bugs as I have time to explore the entire parameter space of the Istio helm chart, but for now I think we've made significant progress on @geekflyer's point (3) For (2) I haven't decided if we should allow duplicate keys for Helm 2 -- I'm thinking yes, because Helm itself supports this, and I'm a big believer in duplicating bugs when you're mirroring an API. Would appreciate thoughts on this. For (1), we will finish this work before the sprint ends. Over the next few days, I'm going to be:
|
Ok, we're now making significant progress. For @geekflyer's (2), we've decided to go ahead and conform to Helm's YAML parsing semantics. This is the final provider-level blocker, so in tandem I've put a WIP example for Istio here. We've tested this with a variety of settings so we have some confidence that this will work for normal users. |
I've worked with several users (including @geekflyer) to make sure this works for them, so we have some confidence we're getting to the right place. Let's moving out the "write custom CRDs" bit to a new issue -- this is really outside the testing umbrella. For the last remaining issue of writing tests:
|
Alright, we have a good start on the tests, but I realized yesterday that, because we can't have multiple instances of Istio running on a cluster, we will actually have to boot an entire GKE cluster every time we deploy it. This shouldn't be hard but so far it's pretty fiddly and I don't know what I don't know, so I am not sure how to make an accurate time estimate. |
As a first step, could we just run this test in a nightly run? That should be easy to set up, avoid the parallelism issue, and still get us some coverage in the near term until we can do the work to create an entire cluster as part of each test. |
Success. We're building a GKE cluster and deploying Istio to it. The test conditions need to be fixed, but the hard part seems to be over: https://travis-ci.com/pulumi/pulumi-kubernetes/builds/93263424 The PR will be up shortly. |
Closed with #102. |
This is mostly related to #239 but I wanted to create a separate issue for istio in general due to the severity of problems I ran into and discuss whether there are other issues to create out of this.
In general I think istio is an excellent example of a non-trivial k8s application that pulumi should adopt as part of their test suite. Why? Istio uses helm, istio deploys more than 100 resources, it uses a lot of advanced resource types, like CRDs etc.
Currently the experience when attempting to deploying istio via pulumi is very bad.
This is the code that can be used for testing:
helm repo add istio https://istio.io/charts
This is the list of issues I ran into - in order:
1. getting the chart involves manual steps
I found no way to install a chart from local. Then I found istio actually has a not documented custom helm repo endpoint but using this in pulumi requires a manual
helm repo add
step. Those 2 issues are already addressed in #229 and #238.2. Pulumi crashes because of duplicate yaml key in kiali definitions with:
This is because kiali has duplicated the
name
key in its service definition here https://github.com/istio/istio/blob/369bf50f45d9e2748b59726a466b840312633f2b/install/kubernetes/helm/istio/charts/kiali/templates/service.yaml#L10-L13 . This mistake was already fixed onmaster
of istio, but the latest official chart still contains this duplicate key. Now the thing is why I'm mentioning is that helm itself can deal with those duplicate keys and simply takes the value from the last key (similar to JSON). A short google for the documentation of thejs-yaml
library that is used by pulumi reveals that this behaviour can be enabled via thejson
param on thesafeLoad
function: https://github.com/nodeca/js-yaml#safeload-string---options- . It is certainly debatable whether pulumi should allow duplicate keys in yamls, but since helm itself allows it I wanted to at least bring it up for discussion. There seem to be also a bunch of other projects which intentionally allow duplicate keys in yaml for compatibility reasons with earlier yaml specs etc.3. Resources are not deployed in the right order and pulumi fails early when one resource failed to deploy
This is basically #239 , however there are a few additional adverse behaviors that surfaced in attempting to workaround this. In general for the istio deployment to succeed the CRDs and the Deployments have to be deployed first. When running
pulumi up
pulumi currently by default starts 10 concurrent resource deployments and once one of them fails it stops spawning further resource deployments. For istio that means it takes a lot ofpulumi up
"attempts" until all of the CRDs and Deployments randomly end up in the resource deployment queue. A workaround which actually helped to reduce the number of necessarypulumi up
attempts was to increase the concurrency by usingpulumi up -p 50
which results in pulumi starting 50 concurrent deployments right from the start and which increases the likelihood for the CRDs and Deployment resources to get picked up. Based on this workaround I was wondering if pulumi could change or introduce a param which controls the behaviour when pulumi is going to stop an update and considers it as failure. In case of istio I would like pulumi to continue deploying resources even if the ones that have been picked up first have failed. I'm not 100% sure if it's clear enough of what I'm suggesting from the description above, so please let me know if this needs further clarification.The text was updated successfully, but these errors were encountered: