New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Watch hooks logs and log release-checking status in helm install/upgrade #3481

Open
distorhead opened this Issue Feb 9, 2018 · 10 comments

Comments

Projects
None yet
5 participants
@distorhead
Copy link
Contributor

distorhead commented Feb 9, 2018

There are 2 issues related to logs streaming, that we are experiencing using helm.

Watch hooks logs

Generally we have a project, that uses helm for deploy (https://github.com/flant/dapp). And in our scheme deploy run as follows:

  • run helm install/upgrade in main thread (blocking);
  • parse hooks-jobs templates and show logs for jobs with annotation dapp/watch-logs = true in separate thread.

It works ok, but we have a strong belief, that it is a job of the helm to show logs.
We have done some work on that in ruby (https://github.com/flant/dapp/blob/master/lib/dapp/kube/dapp/command/deploy.rb#L119) and now we can use that experience to improve helm.

So, there is an idea to use hook-job annotation like watch-logs = true|false.
Say, by default none of the hooks is logged, so it is supposed that watch-logs = false for all hooks. In that case tiller will stream only logs related to the hooks where watch-logs = true.
And as the next step add option to helm client & tiller to log all hooks by default. With that option tiller will suppose watch-logs = true for all hooks by default.

And that logs streaming could occur directly in install / upgrade release grpc-action using server-side streaming.

Check release state and show status during checking

There is related to the logging issue about checking release state (--wait in helm).

Release checking may also log some status info about "what is being checked" and "what we are waiting for". It is not necessary pod's logs, it could be container status, information about replicas count, that needed to be reached and current replicas count for deployment.

Again, in our project we have done some work on Deployment status watching (https://github.com/flant/dapp/blob/master/lib/dapp/kube/kubernetes/manager/deployment.rb#L33). After helm install / upgrade is done the thread that watches hooks logs is also done, we run a deployments-watching procedure.

It is similar to what --wait is doing, but it gives interactive info about what is being checked and shows, among other things, pod's logs. That logging is not "ideal", but there is an idea, that release readiness check could work that way: making release checks polling and at the same time give some info to the user.

The main purpose of that logging is to show enough info for the end user during install / upgrade operation, so that user can diagnose the problem and make changes to templates without running kubectl to dig what is wrong.

And again we have a strong belief, that helm in --wait mode could do something like that to become a trully end-user kubernetes deployment tool.


Also there is a related issue #2298, but separate helm command to show logs will not make helm more user-friendly deployment tool, that works out-of-the box.

@distorhead distorhead changed the title Watch hooks logs and print release checking status Watch hooks logs and log release checking status Feb 9, 2018

@distorhead distorhead changed the title Watch hooks logs and log release checking status Watch hooks logs and log release-checking status in helm install/upgrade Feb 9, 2018

@distorhead

This comment has been minimized.

Copy link
Contributor

distorhead commented Feb 16, 2018

Here is update to related PR list about logging:

Here is experiments from my team with first implementation: https://github.com/kubernetes/helm/pull/3479/files.

It contains all needed layers to pass information through the chain: kube-client - KubeClient interface - tiller ReleaseServer - helm-client. Implementation of job-watching in kube-client is the most complex part and and contains experimental code for now.

This feature aimed to bring following possibilities for helm users:

  • Give user enough info for debugging templates without additional kubectl invocation
  • User can enable stream of information about hooks with new annotation helm/watch-logs. Tiller will stream events and logs only for annotated hooks during helm install/helm upgrade. No unnecessary data will be shown and also will not be sent over network.

This feature is useful for CI/CD where helm used as end-user deployment tool. Streaming all sets logs and kubernetes system-events could also be useful for some purposes, but that scope of release monitoring is not addressed in this PR.

Implementation details:

  • Helm-client will receive some events like LogChunk or userspace-error and will make logs from this events.
  • The stream is oriented not to give only logs or some low-level server info, but to give userspace-level events about resources being deployed, including log-chunks for containers. And give these events not a randomly, but in an logically ordered manner.

The way it works is the same as in #3263 -- streaming response to install/upgrade request. The main problem with this way of streaming is backward compatibility between different versions of helm-client and tiller (https://github.com/kubernetes/helm/pull/3263/files#diff-e337e5a949100848dad15963f6fc8b02R64). Actually I began experiments to get info from tiller in the same way and then I received a link to that issue from the people who already worked on that problem.

Here is my team proposal to that problem:

  • Firstly make api version negotiation between helm-client and tiller in backward-compatible manner. Somehow.
  • Then add new extended InstallReleaseV2 and UpgradeReleaseV2 grpc methods with response streaming.
  • Make new helm-client use InstallReleaseV2/UpgradeReleaseV2 instead of InstallRelease/UpgradeRelease when available.
@distorhead

This comment has been minimized.

Copy link
Contributor

distorhead commented Feb 16, 2018

Another alternative could be some api-version parameter for helm-cli to enable api with some feature-set. In that case: helm/watch annotation will simply be ignored till helm install/upgrade called with --api-version SomeVersionId.

@distorhead

This comment has been minimized.

Copy link
Contributor

distorhead commented Feb 21, 2018

Implemented complex part related to job/pod/container watching. The result is a JobWatchMonitor, PodWatchMonitor primitives.

On the client side helm prints logs and errors the same way as tail -f do on multiple files. If hook-job spawns multiple pods or multiple containers within pods, helm client will toggle log-header and print information chunk related to (job, pod, container) in realtime. It is acceptable logging behavior for hook-job, because typically hook-job doing something sequential. But if hook-job is doing something in parallel, that case is supported too, it will give an output as-is.

Overall job-watch procedure is separated from WatchUntilReady into WatchJobsUntilReady (https://github.com/kubernetes/helm/pull/3479/files#diff-73ee28a4d39ab9ab84ba5a6f9dee3867R365), because JobWatchMonitor primitive makes code more specific and focused on the end-goal of sequential watching of hook-jobs. And this primitive should watch job-related events by itself.

Passing logs and events from kube-client to release-server is implemented through WatchFeed interface. Actually it is a simple set of callbacks, because there is really no need to create golang-channels at this level.

For now this feature is works only for install request. "Copy-paste" only needed to add streaming to upgrade and rollback, but it is not the main problem. Annotation helm/watch=true is also not implemented yet, but it is not a problem too. Also there is a need to do more debug on cleanup for all spawned monitor-primitive goroutines (WatchContainerLogs could hang now when container remains in Waiting state).

The main problem for now is api-compatibility and I think I need some feedback to make changes to integrate this work properly. What do you think about this feature generally speaking? It is about 80% done and needs some adjustions, maybe something is done wrong.

What about extending install/upgrade/rollback grpc procedures to the server-side streamed versions and give these procedures different names. Helm client could use extended version only if they are available on tiller? What is wrong with that idea?

Also I think some parts of logs streaming can be reused for this PR #2342 too.

@distorhead

This comment has been minimized.

Copy link
Contributor

distorhead commented Mar 8, 2018

@bacongobbler Hey! You set a proposal tag on this, could you answer me? Half a month passed and no reaction.

If this work goes totally against your plans could you say that directly? And whether there are any plans?

@lawrencejones

This comment has been minimized.

Copy link

lawrencejones commented Mar 22, 2018

Really keen on seeing this, it's pretty essential for us to have this debug output.

@unicell

This comment has been minimized.

Copy link

unicell commented Apr 27, 2018

@distorhead I just cherrypicked your patches to latest 2.9 release, and added pre/post upgrade hook support[1]. It works like a charm! Really looking forward seeing the feature incorporated into master.

[1] https://github.com/unicell/helm/tree/upgrade-hook-log-streaming

@distorhead

This comment has been minimized.

Copy link
Contributor

distorhead commented Apr 28, 2018

Cool, thanks for trying out!

Actually jobs watch is only the first step. The next steps could be support for watching for all other resource besides Job. And this watch code is also should be written as to be used not only to Watch hooks logs, but also for Check release state and show status during checking.

But for the first step current implementation already can be useful for users to watch jobs-hooks-only. Main problems to solve for this step:

  • more debug on cleanup for all spawned monitor-primitive goroutines (WatchContainerLogs could hang now when container remains in Waiting state);
  • watch only when annotation helm/watch=true is set, this experimental code should be optional and non-default;
  • make this change api-compatible, maybe use extended install/upgrade/delete grpc requests with server-side streaming (?).

Actually my team is here and we are ready to solve all that, for now any works on this were just frozen before any feedback is received, because someone responsible from helm maintainers should agree or disagree or make corrections for this direction of works.

@billylindeman

This comment has been minimized.

Copy link

billylindeman commented Aug 3, 2018

Any progress on this? I'd love to see it make it to public release.

@bacongobbler

This comment has been minimized.

Copy link
Member

bacongobbler commented Aug 14, 2018

Apologies for the delay on a response. The core team (including myself) has been focusing solely on getting an alpha release of Helm 3 out the door, so any development on significant contributions to Helm 2 have been much harder to dedicate resources for peer review and design work at this time. We highly suggest bringing proposals/discussions to the weekly public dev calls if you need feedback: https://github.com/helm/community/blob/master/communication.md

Firstly make api version negotiation between helm-client and tiller in backward-compatible manner. Somehow.

If this endpoint would break existing grpc endpoints, what about holding off until we have an alpha release of Helm 3 available and rebase this work against Helm 3? That way we won't have to write an API negotiation layer between helm and tiller... especially since tiller's being removed.

@distorhead

This comment has been minimized.

Copy link
Contributor

distorhead commented Nov 28, 2018

Hi, People!

The code from the PR #3479 related to this issue has been ported to separate project https://github.com/flant/kubedog.

We need this functionality, so we had to create a separate project and start using it. This code has been debugged more, tested and improved alot (for example now it uses kubernetes informers, which is a reliable primitive from kubernetes library, instead of raw watch kubernetes api).

This kubedog project is mainly the library that provides API to enable resources tracking in some application. There is also cli interface to this tracking functions. Library can track Jobs, Deployments, StatefulSets, DaemonSets. We have plans to support DaemonSets, StatefulSets and other resources tracking.

So if someone needs these functions right now -- try it out, and give some feedback ;)

As for the future of this issue: when Helm 3 will be released we can try to reintegrate these functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment