Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set DefaultHeapsterPort to a more sensible default #58289

Closed
wants to merge 2 commits into from
Closed

Set DefaultHeapsterPort to a more sensible default #58289

wants to merge 2 commits into from

Conversation

itskingori
Copy link
Member

@itskingori itskingori commented Jan 15, 2018

What this PR does / why we need it:

Because it seems kubectl top is broken. The code comments (removed by PR) indicate that this should "use the first exposed port on the service" but this does not seem to be the case.

Below is what I get when trying to use it on a node:

$ kubectl top node ip-xx-xx-xxx-xxx.ec2.internal
Error from server (InternalError): an error on the server ("unknown") has prevented the request from succeeding (get services http:heapster:)

$ kubectl top node ip-xx-xx-xxx-xxx.ec2.internal --heapster-port=80
NAME                            CPU(cores)   CPU%      MEMORY(bytes)   MEMORY%
ip-xx-xx-xxx-xxx.ec2.internal   3105m        77%       21849Mi         71%

Below is what I get when trying to use it on a pod:

$ kubectl top pod kubernetes-dashboard-344955092-cd9ln -n kube-system
W0115 15:03:09.478178   58206 top_pod.go:192] Metrics not available for pod kube-system/kubernetes-dashboard-344955092-cd9ln, age: 3h5m27.47814s
error: Metrics not available for pod kube-system/kubernetes-dashboard-344955092-cd9ln, age: 3h5m27.47814s

$ kubectl top pod kubernetes-dashboard-344955092-cd9ln -n kube-system --heapster-port=80
NAME                                   CPU(cores)   MEMORY(bytes)
kubernetes-dashboard-344955092-cd9ln   0m           24Mi

Poking around the API server logs i.e. every time the error happens ... I find a trace that looks like this.

I0115 11:12:55.338202       7 wrap.go:42] GET /api/v1/namespaces/kube-system/services/http:heapster:/proxy/apis/metrics/v1alpha1/nodes?labelSelector=: (1.624838ms) 503
goroutine 85287885 [running]:
[... TRACE HERE ...]
logging error output: "{\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"no endpoints available for service \\\"http:heapster:\\\"\",\"reason\":\"ServiceUnavailable\",\"code\":503}\n"
 [[kubectl/v1.9.1 (darwin/amd64) kubernetes/3a1c944] 10.83.6.153:28306]

Which led me to believe that the port was missing, notice the http:heapster:, and it does seem like the default is to use DefaultHeapsterPort which is set to an empty string. As you've seen, adding the port, fixes the issue.

Which issue(s) this PR fixes

This improves defaults so that kubectl top works out of the box. No more need to set the --heapster-port flag whenever you're using it (assuming one has Heapster running on port 80).

Might be related to these issues:

Special notes for your reviewer:

Official heapster documentation/examples all use port 80 for heapster, so it seems to me to be a good default. See:

Even other non-heapsters-official (but still kubernetes-org) examples set it up on port 80:

I also noticed that the legacy metrics client uses picks up this value.

Release note:

The `DefaultHeapsterPort` on the metrics client has been explicitly set to 80 which means that your heapster service should expose your heapster deployment on port 80.

@k8s-ci-robot k8s-ci-robot added release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 15, 2018
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. labels Jan 15, 2018
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jan 15, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: itskingori
We suggest the following additional approver: smarterclayton

Assign the PR to them by writing /assign @smarterclayton in a comment when ready.

No associated issue. Update pull-request body to add a reference to an issue, or get approval with /approve no-issue

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@smarterclayton
Copy link
Contributor

Before we change this, we need to know whether this used to work (empty port defaulted to first port) in which case this was a regression of the service proxy and we need to either fix that or document that was deliberately changed.

@itskingori
Copy link
Member Author

itskingori commented Jan 16, 2018

Before we change this, we need to know whether this used to work (empty port defaulted to first port) ...

@smarterclayton Sure. Based on my examples in the description it wasn't/isn't working for me. I tried to look into the commit/PR (5df9fe6/#28844) that introduced that code for more context on the "use the first exposed port on the service" comment but couldn't find anything useful. Doesn't seem like it's been changed or reviewed since.

Admittedly, I'm also not sure hard-coding the port is the right solution. Just figured a PR with an idea is better than filing an issue. I might be hiding the symptoms of a larger issue.


@DirectXMan12 I saw you in some heapster/kubectl-top issues ... and you're in podautoscaling OWNERS. I hope you don't mind weighing in on this. And maybe on any possible effects on this line in cmd/kube-controller-manager/app/autoscaling.go.

@itskingori
Copy link
Member Author

/assign @smarterclayton

/cc @DirectXMan12

@DirectXMan12
Copy link
Contributor

DirectXMan12 commented Jan 16, 2018

indicate that this should "use the first exposed port on the service"

ok, so. This is a common misconception compounded by the fact that the default behavior only make sense if you tilt your head ever-so-slightly to the left.

What the service proxy actually does is that it uses the first port with no name on the service, and has done that as long as I can remember (I've hit this before when attempting to actually give my Heapster ports names). Now you might think to yourself:

self, how does that make any sense?

To which one of your other selves might reply:

it's a corner case of the name checking, to wit, in the absence of a port number, the code assumes that you're using a named port, and matches as such (https://github.com/kubernetes/kubernetes/blob/master/pkg/registry/core/service/rest.go#L434)

Upon viewing that, your first and second selves are satisfied, but a third self questions:

self, how did this misconception occur in the first place?

Alas, nobody quite knows that answer, but a fourth self conjectures:

Perhaps it was because the default Heapster manifest has no port names (https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/cluster-monitoring/standalone/heapster-service.yaml#L12) and someone left a comment as such in the original source of most of the kubectl top code, the HPA metrics client (https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/podautoscaler/metrics/legacy_metrics_client.go#L42).

Now, having enlightened several of your selves, you may lament aloud in frustration

there must be a better solution to the port problem than hoping that there's an unnamed port or making a good educated guess about a port number!

At which point your eyes shall fall upon PR #56206, and behold API discovery mechanisms, and see that they solve many such problems.

@DirectXMan12
Copy link
Contributor

DirectXMan12 commented Jan 16, 2018

So, if I had to hazard a guess, you've set a port name on your Heapster service, and that's what's causing the issue. I'm also guessing that your HPA doesn't work. If either of those aren't true, then we have an actual bug. Otherwise, this is more-or-less known behavior (not amazing behavior, but somewhat known none-the-less).

We can't really change the default behavior, because we can't assume that people weren't relying on using an unnamed port with a different port number, and if we did change it, we'd need to change the legacy HPA client at the same time.

(EDIT P.S. the above comment is not intended to be snarky, I'm just in a strange mood)

@itskingori
Copy link
Member Author

@DirectXMan12 amazing explanation in #58289 (comment)! Thanks a lot!

So, if I had to hazard a guess, you've set a port name on your Heapster service, and that's what's causing the issue.

Yes! And I couldn't figure out why 😓 ... setting the flag did make it work so I poked around the code to understand why. The comment in the code doesn't explain the full story. It would have if it were "use the first exposed port on the service that's not named" 😅

I'm also guessing that your HPA doesn't work.

Yes!! And I don't quite know why (dashboard works though). I suspected that it might be because of this i.e. DefaultHeapsterPort is used in autoscaling code. However, this theory didn't quite make sense because I'm not using the legacy metrics client (it picks it up here) ... and I couldn't quite find anywhere the new metrics client does.

I'll remove the named ports on heapster and see if my problems go away.

We can't really change the default behavior, because we can't assume that people weren't relying on using an unnamed port with a different port number ...

I 💯% agree.

... and if we did change it, we'd need to change the legacy HPA client at the same time.

Actually, I wasn't quite sold on this solution because it hard-codes a port, which forces everyone to put heapster on port 80. My goal was to explain the issue better and spur a discussion (PR is better than filing an issue) ... to which I'd say this was a success.

(EDIT P.S. the above comment is not intended to be snarky, I'm just in a strange mood)

It didn't come off snarky. Makes sense. I ... thought my PR title was snarky 🙈 ... implicitly calling the code-author not sensible-ish, so I understand. 😅

@itskingori
Copy link
Member Author

I'll remove the named ports on heapster and see if my problems go away.

@DirectXMan12 so, glad to report that kubectl top now works out of the box 🎉 ... which is progress! HPA still doesn't work 😓 ... so my theory that they were related is probably wrong but I don't want to conflate issues.

$ kubectl autoscale deployment ingress-nginx-controller --cpu-percent=50 --min=3 --max=12 -n kube-system
$ kubectl describe hpa/ingress-nginx-controller -n kube-system
Name:                                                  ingress-nginx-controller
Namespace:                                             kube-system
Labels:                                                <none>
Annotations:                                           <none>
CreationTimestamp:                                     Wed, 17 Jan 2018 08:59:50 +0300
Reference:                                             Deployment/ingress-nginx-controller
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  <unknown> / 50%
Min replicas:                                          3
Max replicas:                                          12
Conditions:
  Type           Status  Reason                   Message
  ----           ------  ------                   -------
  AbleToScale    True    SucceededGetScale        the HPA controller was able to get the target's current scale
  ScalingActive  False   FailedGetResourceMetric  the HPA was unable to compute the replica count: unable to get metrics for resource cpu: unable to fetch metrics from API: the server could not find the requested resource (get pods.metrics)
Events:
  Type     Reason                        Age               From                       Message
  ----     ------                        ----              ----                       -------
  Warning  FailedGetResourceMetric       1s (x2 over 31s)  horizontal-pod-autoscaler  unable to get metrics for resource cpu: unable to fetch metrics from API: the server could not find the requested resource (get pods.metrics)
  Warning  FailedComputeMetricsReplicas  1s (x2 over 31s)  horizontal-pod-autoscaler  failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from API: the server could not find the requested resource (get pods.metrics)

@itskingori
Copy link
Member Author

itskingori commented Jan 17, 2018

@DirectXMan12 HPA issue is unrelated and resolved. PR submitted in kubernetes-sigs/metrics-server#32. Thanks for your guidance, you set me on the right path.

@itskingori itskingori closed this Jan 17, 2018
@itskingori itskingori deleted the fix_default_heapster_port branch January 17, 2018 08:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants