New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testing Improvement/Feature Request: Testing during install to allow fail faster #8861
Comments
@huang-jy Helm has a process for proposal called Helm Improvement Proposal. Do you mind creating a HIP for this proposal? |
I will take a look at the proposal doc, thanks @hickeyma |
@hickeyma should I go through the cncf mailing list first ("Start with an idea" section), or do I go straight to the fork-and-PR step ("Submitting an HIP" section)? |
It’s worth starting a conversation in helm-users (Slack) or in the mailing list before submitting a proposal. If you’re working on something that would affect a large number of the helm user base, gaining some form of consensus and working with others on a proposal is beneficial to both yourself and other third parties that may be interested in the idea. |
Thanks, I'll post into that channel 👍 |
Do I need to tag anything on the message in slack, or just post it a couple of times over a few days for visibility? |
To quote a paragraph from HIP 1:
In other words, this process is optional, and there is no hard rule about having to post X number of times a day until you gain enough consensus. When we're looking at proposals, we're evaluating which ones will make an impact on users, and focus on the ones that most users have a vetted interest in. Proposals that have been vetted by multiple parties and have been refined through collaborative discussion usually results in a better proposal, and one that will solve a problem for a larger subset of the community. If nobody is responding to your proposal, it could be a sign that there might not be enough interest for that proposal. In which case, the proposal may not be helpful to the broader community, which makes it harder to justify putting in the time to implement work that only one community member would benefit from. It also makes it less likely that we will schedule a time to review the proposal if we have to choose between a proposal that has a large subset of the community's interest, or one that has a single contributor's interest. Hope that helps clarify our stance. |
I understand, but I'm not sure whether its because the message runs off the top of the messages or if it is because of lack of interest. I'll post a few more times over the coming week. If not enough consensus is reached then, we'll call it a rejected idea. |
You can also try posting in the helm-users mailing list. Slack is not great for gathering mass consensus as there are many users running in different timezones. I was just making a suggestion to try and get the ball rolling, since a large number of the Helm community is logged into Slack (16,700 members in that channel as of this morning). But it isn't the greatest tool to keep a long-running discussion going. Email is more well-suited for those type of conversations. |
Great idea, thank you, I will look at emailing into the mailing list next. |
This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs. |
Output of
helm version
:Helm2:
Client: &version.Version{SemVer:"v2.14.3", GitCommit:"0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.14.1", GitCommit:"5270352a09c7e8b6e8c9593002a73535276507c0", GitTreeState:"clean"}
Helm3:
version.BuildInfo{Version:"v3.3.4", GitCommit:"a61ce5633af99708171414353ed49547cf05013d", GitTreeState:"clean", GoVersion:"go1.14.9"}
Output of
kubectl version
:Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.12", GitCommit:"e2a822d9f3c2fdb5c9bfbe64313cf9f657f0a725", GitTreeState:"clean", BuildDate:"2020-05-06T05:17:59Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.12-gke.20", GitCommit:"0ac5f81eecab42bff5ef74f18b99d8896ba7b89b", GitTreeState:"clean", BuildDate:"2020-09-09T00:48:20Z", GoVersion:"go1.12.17b4", Compiler:"gc", Platform:"linux/amd64"}
Cloud Provider/Platform (AKS, GKE, Minikube etc.):
GKE
This feature request is to allow testing DURING an install. To explain why, let me provide some backing information.
When installing a chart using Helm, you can run
helm test
on the chart afterwards to run the tests specified within the chart. However, you would need to ensure the deployment had finished its rollout by usingkubectl rollout
first. And herein lies the problem.If your rollout takes a long time to apply -- for example, if you have a large number of replicas, a low surge value, running as a StatefulSet, or the pods take time to initialise (e.g. it has to sync with other replicas in a cluster), the rollout could take from a few minutes to a few hours, during which time you can't run
helm test
because you may hit a split between the old and new versions of your deployment.And, when you do run the
helm test
, you may find a problem and then have to roll back and then do all that deployment again.This proposed change is to allow, Helm to, during the deployment update, and for each new pod that comes up and shows ready, to generate a "testing pod" that will run the test on the newly generated pod. Helm would continue generating the new pods from the deployment, but would also generate new "testing pods" as these new pods come up and report ready.
Using the "--parallel" switch for the test, would allow Helm to generate multiple "testing pods" so all the pods are immediately tested upon their creation.
The benefit of this proposal is that if one (or more) of the new pods fail the test, Helm can cancel and rollback, even if it has not finished rolling out the pods. This could save a lot of time, and prevent broken infrastructure.
Here's an example
Let's assume we have a deployment of four replicas, represented by "O"s
OOOO
We do a deployment update with Helm. Helm generates two new pods due to surge, these are represented by "o"s
OOOOoo
These pods come ready. Two of the old pods terminate, represented by "X"s
OOXXoo
Helm starts the test process and generates a test pod and ties it to the first new pod -- represented by "T". We didn't specify
--parallel
here, so it won't generate a second test pod until the first test exitsThe test passes on the first new pod -- represented by "t". Helm generates a new test pod and ties it to the second new pod. Meanwhile, it generates two new pods for the deployment. The old pods that were terminating disappear as they have finished terminating.
Now, let's assume the test on the second pod fails, represented by the "-"
Helm would, at this point, cancel the rollout, and revert back. Since we still have two of the old pods, it would just delete the new pods, and regenerate two of the old pods.
The current test method would require all the pods to be reverted. Which, if the new version was totally broken, would mean downtime in between the broken update taking, and the revert completing. Using the proposed method would allow us to find breakages before the whole deployment is broken by it.
The text was updated successfully, but these errors were encountered: