Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing Improvement/Feature Request: Testing during install to allow fail faster #8861

Closed
huang-jy opened this issue Oct 8, 2020 · 11 comments

Comments

@huang-jy
Copy link

huang-jy commented Oct 8, 2020

Output of helm version:
Helm2:
Client: &version.Version{SemVer:"v2.14.3", GitCommit:"0e7f3b6637f7af8fcfddb3d2941fcc7cbebb0085", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.14.1", GitCommit:"5270352a09c7e8b6e8c9593002a73535276507c0", GitTreeState:"clean"}

Helm3:
version.BuildInfo{Version:"v3.3.4", GitCommit:"a61ce5633af99708171414353ed49547cf05013d", GitTreeState:"clean", GoVersion:"go1.14.9"}

Output of kubectl version:
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.12", GitCommit:"e2a822d9f3c2fdb5c9bfbe64313cf9f657f0a725", GitTreeState:"clean", BuildDate:"2020-05-06T05:17:59Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.12-gke.20", GitCommit:"0ac5f81eecab42bff5ef74f18b99d8896ba7b89b", GitTreeState:"clean", BuildDate:"2020-09-09T00:48:20Z", GoVersion:"go1.12.17b4", Compiler:"gc", Platform:"linux/amd64"}

Cloud Provider/Platform (AKS, GKE, Minikube etc.):

GKE

This feature request is to allow testing DURING an install. To explain why, let me provide some backing information.

When installing a chart using Helm, you can run helm test on the chart afterwards to run the tests specified within the chart. However, you would need to ensure the deployment had finished its rollout by using kubectl rollout first. And herein lies the problem.

If your rollout takes a long time to apply -- for example, if you have a large number of replicas, a low surge value, running as a StatefulSet, or the pods take time to initialise (e.g. it has to sync with other replicas in a cluster), the rollout could take from a few minutes to a few hours, during which time you can't run helm test because you may hit a split between the old and new versions of your deployment.

And, when you do run the helm test, you may find a problem and then have to roll back and then do all that deployment again.

This proposed change is to allow, Helm to, during the deployment update, and for each new pod that comes up and shows ready, to generate a "testing pod" that will run the test on the newly generated pod. Helm would continue generating the new pods from the deployment, but would also generate new "testing pods" as these new pods come up and report ready.

Using the "--parallel" switch for the test, would allow Helm to generate multiple "testing pods" so all the pods are immediately tested upon their creation.

The benefit of this proposal is that if one (or more) of the new pods fail the test, Helm can cancel and rollback, even if it has not finished rolling out the pods. This could save a lot of time, and prevent broken infrastructure.

Here's an example

Let's assume we have a deployment of four replicas, represented by "O"s

OOOO

We do a deployment update with Helm. Helm generates two new pods due to surge, these are represented by "o"s

OOOOoo

These pods come ready. Two of the old pods terminate, represented by "X"s

OOXXoo

Helm starts the test process and generates a test pod and ties it to the first new pod -- represented by "T". We didn't specify --parallel here, so it won't generate a second test pod until the first test exits

OOXXoo
    T

The test passes on the first new pod -- represented by "t". Helm generates a new test pod and ties it to the second new pod. Meanwhile, it generates two new pods for the deployment. The old pods that were terminating disappear as they have finished terminating.

OOoooo
  tT

Now, let's assume the test on the second pod fails, represented by the "-"

OOoooo
  t-

Helm would, at this point, cancel the rollout, and revert back. Since we still have two of the old pods, it would just delete the new pods, and regenerate two of the old pods.

The current test method would require all the pods to be reverted. Which, if the new version was totally broken, would mean downtime in between the broken update taking, and the revert completing. Using the proposed method would allow us to find breakages before the whole deployment is broken by it.

@hickeyma
Copy link
Contributor

hickeyma commented Oct 8, 2020

@huang-jy Helm has a process for proposal called Helm Improvement Proposal. Do you mind creating a HIP for this proposal?

@huang-jy
Copy link
Author

huang-jy commented Oct 8, 2020

I will take a look at the proposal doc, thanks @hickeyma

@huang-jy
Copy link
Author

@hickeyma should I go through the cncf mailing list first ("Start with an idea" section), or do I go straight to the fork-and-PR step ("Submitting an HIP" section)?

@bacongobbler
Copy link
Member

bacongobbler commented Oct 14, 2020

It’s worth starting a conversation in helm-users (Slack) or in the mailing list before submitting a proposal. If you’re working on something that would affect a large number of the helm user base, gaining some form of consensus and working with others on a proposal is beneficial to both yourself and other third parties that may be interested in the idea.

@huang-jy
Copy link
Author

Thanks, I'll post into that channel 👍

@huang-jy
Copy link
Author

Do I need to tag anything on the message in slack, or just post it a couple of times over a few days for visibility?

@bacongobbler
Copy link
Member

bacongobbler commented Oct 16, 2020

To quote a paragraph from HIP 1:

Vetting an idea publicly before going as far as writing a proposal is meant to save the potential author time. Many ideas have been brought forward for changing Helm that have been rejected for various reasons. Asking the Helm community first if an idea is original helps prevent wasting time on something that is guaranteed to be rejected based on prior discussions (searching the internet does not always do the trick). It also helps to make sure the idea is applicable to the entire community and not just the author. Just because an idea sounds good to the author does not mean it will work for most Helm users in most areas where Helm is used.

In other words, this process is optional, and there is no hard rule about having to post X number of times a day until you gain enough consensus.

When we're looking at proposals, we're evaluating which ones will make an impact on users, and focus on the ones that most users have a vetted interest in. Proposals that have been vetted by multiple parties and have been refined through collaborative discussion usually results in a better proposal, and one that will solve a problem for a larger subset of the community.

If nobody is responding to your proposal, it could be a sign that there might not be enough interest for that proposal. In which case, the proposal may not be helpful to the broader community, which makes it harder to justify putting in the time to implement work that only one community member would benefit from. It also makes it less likely that we will schedule a time to review the proposal if we have to choose between a proposal that has a large subset of the community's interest, or one that has a single contributor's interest.

Hope that helps clarify our stance.

@huang-jy
Copy link
Author

huang-jy commented Oct 16, 2020

I understand, but I'm not sure whether its because the message runs off the top of the messages or if it is because of lack of interest. I'll post a few more times over the coming week. If not enough consensus is reached then, we'll call it a rejected idea.

@bacongobbler
Copy link
Member

bacongobbler commented Oct 16, 2020

You can also try posting in the helm-users mailing list. Slack is not great for gathering mass consensus as there are many users running in different timezones. I was just making a suggestion to try and get the ball rolling, since a large number of the Helm community is logged into Slack (16,700 members in that channel as of this morning). But it isn't the greatest tool to keep a long-running discussion going. Email is more well-suited for those type of conversations.

@huang-jy
Copy link
Author

Great idea, thank you, I will look at emailing into the mailing list next.

@github-actions
Copy link

This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants