Skip to content
This repository has been archived by the owner on Dec 1, 2018. It is now read-only.

Mark Heapster as Deprecated #2022

Merged

Conversation

DirectXMan12
Copy link
Contributor

This marks Heapster as deprecated, adding a deprecation timeline that
culminates in the retirement of the Heapster project for the Kubernetes
1.13 release.

As per the discussion in SIG Instrumentation.

This marks Heapster as deprecated, adding a deprecation timeline that
culminates in the retirement of the Heapster project for the Kubernetes
1.13 release.
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 24, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: DirectXMan12

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Apr 24, 2018
@DirectXMan12
Copy link
Contributor Author

cc @kubernetes/heapster-maintainers @kubernetes/sig-instrumentation-misc @brancz @piosz

@k8s-ci-robot k8s-ci-robot added the sig/instrumentation Categorizes an issue or PR as relevant to sig-multicluster. label Apr 24, 2018
@DirectXMan12
Copy link
Contributor Author

DirectXMan12 commented Apr 24, 2018

/hold

to make sure we don't accidentally merge this before people have a chance to voice concerns.

@DirectXMan12
Copy link
Contributor Author

P.S. I've written up the deprecation timeline as a draft. I'd like it to be similarly aggressive to what's listed, if possible, but I'm open to suggestions.

@DirectXMan12
Copy link
Contributor Author

@DirectXMan12 DirectXMan12 added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 24, 2018
@jsoriano
Copy link
Contributor

Is metrics-server going to be promoted from incubator?

@kawych
Copy link
Contributor

kawych commented Apr 24, 2018

/cc @piosz
I don't think we should fully deprecate Heapster, at least not at this point. For instance, Heapster is currently a go-to monitoring agent for Stackdriver.

I suggest to deprecate only Heapster API and work on replacing it with metrics API served by Metrics Server. My understanding of current monitoring architecture for k8s: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/instrumentation/monitoring_architecture.md is that Heapster is going to become just another third-party monitoring agent.

@k8s-ci-robot k8s-ci-robot requested a review from piosz April 24, 2018 19:39
@brancz
Copy link

brancz commented Apr 24, 2018

@kawych can we find a way to have heapster transition from this org to a stackdriver organization? From the Kubernetes (sig-instrumentation) project point of view we agreed that Heapster is to be depcrecated. That doesn't mean stackdriver can't continue to support it, but as the Kubernetes project as a whole it's not a project we want to continue supporting. Can we agree on deprecating it here and having a stackdriver fork and support as necessary for your organization?

@DirectXMan12
Copy link
Contributor Author

For instance, Heapster is currently a go-to monitoring agent for Stackdriver.

@brancz beat me to the punch here, but I'll follow up with extended thoughts.

Stackdriver using Heapster is all well and good, but, IMO, that's not a good reason to keep Heapster around as a maintained Kubernetes project -- it's a good reason for GCE to keep around a Heapster fork until they can migrate to a different monitoring solution (or really as long as they want).

One of the key observations of the new monitoring plan was that the Heapster model of development (vendor dump) is unsustainable. Even apart from that, the Heapster codebase needs significant love. It has serious design issues (ask me about these later, if you're interested), years accumulated cruft with no real policy around what we can remove, and serious usability issues (I'd be happy to expand on these later as well).

Furthermore, non-deprecated Heapster poses problems from an organization perspective. People continue to request that we add new sinks, and with the current policies, we don't have a good reason to say no. It's unclear if this is because they believe that Heapster is still the recommended solution for monitoring Kubernetes (it's not), because they want their monitoring solution to reach a larger audience (which is, in-and-of-itself, also a problem -- see below), or because they don't want to maintain collection code.

On that subject, from a Kubernetes perspective, we want people to invest in a healthy monitoring ecosystem with solutions designed to work with their tools from the beginning, as opposed to shoehorning the tools into Heapster's architecture -- we have a number of PRs like "this specific sink has issues with this way the metrics are structured in Heapster, so we need to work around it". I'm not try to cast a bad light on our sink maintainers -- they do a wonderful job with the resources they have, but it's tough trying to keep things working, between the core Heapster code and the sink-specific hacks.

As an added benefit, this'll help ensure that, as we move forward with features that require use of the new monitoring architecture, we can be more certain that distributions have either switched over to using it, or made a conscious decision to maintain their own copy of Heapster.

To summarize: one of the main reasons for the new monitoring architecture was to remove Heapster as a bottleneck to development, and to ensure more maintainable code. Keeping Heapster around as a maintained Kubernetes project hinders that effort.

@brancz
Copy link

brancz commented Apr 24, 2018

Thanks for taking the time to express yourself @DirectXMan12! You have clearly laid out, what my thoughts are exactly (but expressed them poorly myself).

@andyxning
Copy link
Contributor

andyxning commented Apr 25, 2018

Is metrics-server going to be promoted from incubator?

Agreed that we need also promote metrics-server to kubernetes organization alongside the deprecation of Heapster if we do need to deprecate Heapster.

I suggest to deprecate only Heapster API and work on replacing it with metrics API served by Metrics Server. My understanding of current monitoring architecture for k8s: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/instrumentation/monitoring_architecture.md is that Heapster is going to become just another third-party monitoring agent.

It also seems to me that keep the scrape and sink ability for Heapster should be ok. IMHO, moving to metrics api, metrics server is good for HPA to better support custom metrics which is the currently the big pain point for instrumentation. So, i think we should move the api functionality from Heapster to metrics api.

IMHO, Heapster deprecation is too sudden. There are no simple and easy setup guides for metrics api, metrics server and third party tsdb servers pipeline available to end users to learn and try for now which is a big gap between Heapster. Actually i, even as a developer, has no clear thinking about how to set up a monitor system with metrics api, metrics server and third party tsdb to replace the existing one backed by Heapster with no big changes. If we want end users move to the new monitor pipeline, we should add some docs or guides to help them to migrate from Heapster backed monitoring system to the new one.

Also there are some existing add-ons or functionalities depends on the data from Heapster. What is the procedure to help those components migrating to the new pipeline smoothly.

BTW, what is the replacement for eventer. People may still be using it.

one of the main reasons for the new monitoring architecture was to remove Heapster as a bottleneck to development, and to ensure more maintainable code. Keeping Heapster around as a maintained Kubernetes project hinders that effort.

Agreed that maintaining Heapster with multi sinks is painful. But maybe we should think a way for Heapster to support a plugin mechanism just like out-of-tree cloud provider support or different device support with device plugin instead of kicking it off and setup a new one. :)

@m1093782566
Copy link

m1093782566 commented Apr 25, 2018

It's really a very sudden news to me! I was thinking contributing a new sink(poseidon, which is leaded by sig-scheduling) to heapster.

@deepak-vij
Copy link

deepak-vij commented Apr 25, 2018

Yeah, our current ongoing Poseidon/Firmament scheduler effort leverages Heapster real-time metrics for workload scheduling. We created a new sink for pushing metrics information out of heapster to Firmament scheduler.

As Dujun mentioned, we were about to initiate new Poseidon sink upstreaming process in the next couple of days.

@DirectXMan12
Copy link
Contributor Author

Let me try and address a few of the above comments, mainly about the "suddenness" of this move:

Agreed that we need also promote metrics-server to kubernetes organization alongside the deprecation of Heapster if we do need to deprecate Heapster.

It's reasonable to want to promote metrics-server from kubernetes-incubator to somewhere (most likely the kubernetes organization, but we'll have to figure out what the current policy is on that, and whether the kubernetes organization is the right place).

It also seems to me that keep the scrape and sink ability for Heapster should be ok.

One of the key reasons that we want to deprecate Heapster is precisely that the sink functionality is unmaintainable in it's current form. Keeping the scrape and sink functionality would defeat on of the major purposes of deprecating Heapster, as it would continue to support that unmaintainable code, and continue to perpetuate the idea the adding new sinks to Heapster is the recommended way of getting metrics to you sink.

IMHO, Heapster deprecation is too sudden

Please let me know how this could have been communicated better.

We've be discussing the possibility of deprecating Heapster basically since we merged the new monitoring vision (a few releases ago), we've been encouraging people to move to metrics-server with steadily increasing intensity over the past few releases, and this monitoring timeline delays the actual removal . Even conservatively saying that nobody realized that metrics-server was the future until we made use of metrics-server default in Kubernetes for the HPA (in 1.9), that still gives 1.9, 1.10, 1.11, and 1.12, plus the development cycle for 1.13 to figure out what each cluster/distribution/admin wants to do instead.

There are no simple and easy setup guides for metrics api, metrics server and third party tsdb servers pipeline available to end users to learn and try for now which is a big gap between Heapster.

BTW, what is the replacement for eventer. People may still be using it.

I don't recall the name, at the moment, but last time I checked, there was a recommended event exporter that was more stable and robust than eventer. I'll try to find it later. We've been unofficially recommending against using eventer for a while now.

But maybe we should think a way for Heapster to support a plugin mechanism just like out-of-tree cloud provider support or different device support with device plugin instead of kicking it off and setup a new one.

But, that doesn't really buy us much. If, for instance, we agree that either way we should deprecate serving APIs, we get:

  • the basic discovery of nodes and node IPs. This logic isn't actually super-complicated to start out with, but could be pulled out into a library.

  • some scrape logic with translates the relatively straightforward structure of the summary API to a format which vaguely makes sense internally to Heapster, but is fairly complicated.

  • a bunch of processors that transform data. Most of this transformation is either a) no longer needed (it augments with additional data in ways that aren't necessary any more, or aren't recommended) or b) performs ahead-of-time aggregation, which goes against best practices for many modern monitoring pipelines (you can always aggregate later, but you can't "unaggregate" preprocessed data)

  • some rather complicated interval logic that doesn't quite do what it says on the tin, and has been the source of some subtle bugs

  • some boilerplate setup code which is way more complicated than it needs to be

Of those bullets, I think the only one that's actually worth keeping around is the first bullet point, and that could just be abstracted out into a small one-file library. Ergo, I don't think that just trying to move all sinks out of tree would be a useful exercise.

There are no simple and easy setup guides for metrics api, metrics server and third party tsdb servers pipeline available to end users to learn and try for now which is a big gap between Heapster.

Putting custom-metrics-based-autoscaling aside (since Heapster doesn't do that either), I'd dispute this assertion. For metrics-server, there's the in-repo deployment manifests, linked from the official documentation (https://kubernetes.io/docs/tasks/debug-application-cluster/core-metrics-pipeline/), plus the manifests in the cluster addons directory, both of which are fairly straightforward to use (kubectl create -f /path/to/manifests).

For third-party monitoring solutions,there are extensive guides on how to set up Prometheus on Kubernetes, from a myriad of sources, with a myriad of perspectives. There's even the Prometheus Operator, to make it even simpler. Prometheus aside, there are other open-source and close-source solutions for cluster monitoring.

Also there are some existing add-ons or functionalities depends on the data from Heapster. What is the procedure to help those components migrating to the new pipeline smoothly.

Please be more specific. kubectl top and HPA no longer need to use Heapster. Last I heard, the dashboard team was at least aware that Heapster was not a long term solution, and was thinking about alternatives. I don't know of any official kubernetes-organization projects that are using Heapster. Other projects that use Heapster will have to consider if the existing metrics APIs are sufficient, or if they need to talk directly to a monitoring pipeline, or propose a new metrics API, or an extension to an existing one. I can't make suggestions on those, though, without a concrete example.

Yeah, our current ongoing Poseidon/Firmament scheduler effort leverages Heapster real-time metrics for workload scheduling. We created a new sink for pushing metrics information out of heapster to Firmament scheduler.

As Dujun mentioned, we were about to initiate new Poseidon sink upstreaming process in the next couple of days.

That's unfortunate, but we've been mentioning that Heapster was not a reasonable path forward for a while now, and this very incident underscores the need to official deprecate Heapster to communicate that it should not be counted on as a path forward.

@andyxning
Copy link
Contributor

@DirectXMan12 Thanks for the detailed and excellent explanation.

Sorry for not catching up with the latest usage about HPA/kubectl top and Heapster. :)

Maybe we could do this better in the following aspects:

  • Add the deprecation summary for Heapster also to the Kubernetes release v1.11 release note and update the deprecation info for Kubernetes release v1.12 and v1.13. Most people will try to read the summary release note for Kubernetes to learn the trends for Kubernetes. So, i think it is also a good place to make the heapster deprecation widely known.
  • Add a detailed setup docs for metrics api, metrics server and third party monitoring system to learn.
  • Add the replacement binary for eventer in case some people still using it.

@kawych
Copy link
Contributor

kawych commented Apr 29, 2018

@DirectXMan12 @brancz
Thank you for detailed explanations. I've also discussed with @piosz the Stackdriver case and once he approves, I have no objections. Removing myself from reviewers.

@kawych kawych removed their request for review April 29, 2018 09:49
@acobaugh
Copy link
Contributor

acobaugh commented May 1, 2018

As the very short-term co-maintainer of the influxdb sink, I fully support this effort. For the influxdb case, I believe telegraf with its prometheus input plugin ought to be able to replace the functionality of heapster by pulling metrics straight from metrics-server. I haven't dug into this yet, but I wouldn't be surprised if someone in the TICK community is already doing just that, and we just need to document those methods and configurations within the kubernetes project as an example to get metrics into influxdb, and as a migration path for the heapster+influxdb users out there.

@DirectXMan12
Copy link
Contributor Author

Add the deprecation summary for Heapster also to the Kubernetes release v1.11 release note and update the deprecation info for Kubernetes release v1.12 and v1.13. Most people will try to read the summary release note for Kubernetes to learn the trends for Kubernetes. So, i think it is also a good place to make the heapster deprecation widely known.

Yes, definitely!

Add a detailed setup docs for metrics api, metrics server and third party monitoring system to learn.

I'd love to improve the docs. Can you put together a specific list of complaints you have with the official setup docs in the Kubernetes documentation?

Add the replacement binary for eventer in case some people still using it.

We should definitely link to that somewhere. Does anybody remember exactly what it was? I can't recall. I'll need to track it down otherwise.

For the influxdb case, I believe telegraf with its prometheus input plugin ought to be able to replace the functionality of heapster by pulling metrics straight from metrics-server.

It shouldn't be pulling from metrics-server. It should query the nodes instead. At any rate, we should have follow-up PRs to build up a list of alternatives for the third-party monitoring solutions.

@piosz any objections to this timeline. Otherwise, I'll plan on merging by the end of the week.

@DirectXMan12
Copy link
Contributor Author

Since there are no further maintainer objections lodged, I'm going to merge this.

@DirectXMan12 DirectXMan12 merged commit d1f30e8 into kubernetes-retired:master May 15, 2018
@DirectXMan12 DirectXMan12 deleted the docs/deprecate-heapster branch May 15, 2018 17:52
@piosz
Copy link
Contributor

piosz commented May 22, 2018

/lgtm
(I haven't clicked send before)

@eherot
Copy link

eherot commented Aug 14, 2018

Hey guys, were any decisions ever made W/R/T the replacement for eventer? I see a bunch of discussion of linking to a replacement but with no follow-up...

@DirectXMan12
Copy link
Contributor Author

It differs based on sink. I believe there was a fluentd event exporter at one point, and there's now a stackdriver event exporter, plus what appears to be a more generic one at https://github.com/alauda/event-exporter (not endorsing that one, but it does appear to be a potential solution).

@DirectXMan12
Copy link
Contributor Author

There's also Heptio's eventrouter, it appears: https://github.com/heptiolabs/eventrouter

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. sig/instrumentation Categorizes an issue or PR as relevant to sig-multicluster. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.