Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI command to fetch control plane metrics #3579

Open
wants to merge 3 commits into
base: master
from

Conversation

@srv-twry
Copy link
Contributor

srv-twry commented Oct 15, 2019

The CLI command diagnostics will fetch the control plane metrics directly from containers.

Sample output: https://gist.github.com/srv-twry/8e5f649edfa8bd34dcee286cbf913d5d

Fixes #3116

Signed-off-by: Saurav Tiwary srv.twry@gmail.com

cli/cmd/diagnostics.go Outdated Show resolved Hide resolved
cli/cmd/diagnostics.go Outdated Show resolved Hide resolved
@srv-twry

This comment has been minimized.

Copy link
Contributor Author

srv-twry commented Oct 15, 2019

@grampelberg @adleong
I have done some work on this and wanted to know if the PR is heading in the right direction or I am totally in the wrong direction.

Also, I have added some inline questions. Hoping to get some suggestions for them.

@adleong

This comment has been minimized.

Copy link
Member

adleong commented Oct 15, 2019

Thanks @srv-twry! The overall approach here looks good! I was thinking that this could be part of the linkerd metrics command rather than a brand new command. For example, linkerd -n linkerd metrics deploy/linkerd-controller gets the proxy metrics for the linkerd controller. What do you think about a --controller flag which changes this to get the controller metrics instead? E.g. linkerd -n linkerd metrics deploy/linkerd-controller --controller

@grampelberg wdyt?

By making this part of the metrics command, you can share some code with the existing metrics code and avoid duplication, especially around things like timeouts.

I like your design of iterating through the list of containers and port looking for ports named admin-http.

@srv-twry

This comment has been minimized.

Copy link
Contributor Author

srv-twry commented Oct 16, 2019

@adleong Thanks for the suggestions.

I was thinking that this could be part of the linkerd metrics command rather than a brand new command.

My initial thought was to do the same but @grampelberg suggested to make it a separate command(check his proposal on the linked issue).

@grampelberg

This comment has been minimized.

Copy link
Member

grampelberg commented Oct 16, 2019

@adleong every time I played sticking it with linkerd metrics it got funky as that's a more general command. The edge cases of something like linkerd metrics -n emojivoto deploy --controller or linkerd metrics -A -l app=foo --controller didn't feel quite right as I was going through it.

@adleong

This comment has been minimized.

Copy link
Member

adleong commented Oct 16, 2019

Ahh, my apologies for joining the discussion now rather than during the initial design on the issue. My expectation was that the --controller flag would look for all control plane containers within the specified resource. That means that linkerd metrics -n emojivoto deploy --controller would return zero results and linkerd metrics -A -l app=foo --controller would also return zero results.

My motivations for the suggestion are:

  • Simplify the implementation by sharing common code with the existing linkerd metrics command
  • Allow easier selection of specific control plane components or containers without needing to post process the metrics dump. e.g. linkerd metrics -n linkerd deploy/linkerd-destination --controller
@grampelberg

This comment has been minimized.

Copy link
Member

grampelberg commented Oct 17, 2019

Simplify the implementation by sharing common code with the existing linkerd metrics command

Common code should be shared no matter whether it is a separate command or not.

without needing to post process the metrics dump

I don't understand this one. Why don't we do some brainstorming today, I'm still not a fan but it sounds like I'm missing some things.

@adleong

This comment has been minimized.

Copy link
Member

adleong commented Oct 17, 2019

@grampelberg that's a good point. I was optimizing for the case when you know exactly which component you want to query, but you're right that it's more common to want to dump everything from the control plane.

@srv-twry sorry for all the back and forth! I'm on board with the diagnostics command as you have it here. However, I think that metrics.go and diagnostics.go can be refactored to share a lot of common code.

@srv-twry

This comment has been minimized.

Copy link
Contributor Author

srv-twry commented Oct 17, 2019

@adleong No problem!

However, I think that metrics.go and diagnostics.go can be refactored to share a lot of common code.

Yeah sure. In fact, I was planning to do it after receiving confirmation that I was on the right track :)

@grampelberg

This comment has been minimized.

Copy link
Member

grampelberg commented Oct 17, 2019

Thanks @srv-twry !

@srv-twry srv-twry force-pushed the srv-twry:control-plane-diagnostics branch 2 times, most recently from 973d93b to 4d0bb4d Oct 29, 2019
srv-twry added 3 commits Oct 20, 2019
Fixes #3116

Signed-off-by: Saurav Tiwary <srv.twry@gmail.com>
Signed-off-by: Saurav Tiwary <srv.twry@gmail.com>
Signed-off-by: Saurav Tiwary <srv.twry@gmail.com>
@srv-twry srv-twry force-pushed the srv-twry:control-plane-diagnostics branch from 4d0bb4d to b5c228c Nov 5, 2019
@srv-twry srv-twry marked this pull request as ready for review Nov 5, 2019
@srv-twry

This comment has been minimized.

Copy link
Contributor Author

srv-twry commented Nov 5, 2019

@adleong Thanks for being patient with this. I was caught up in other stuff and hence wasn't able to devote much time on this. I am back on this now.

The command works now. I have also incorporated your suggestion to use wait groups in order to implement timeouts. Please make sure to check out the linked Github gist for a sample output.

However, I need some pointers regarding code duplication between the metrics and the diagnostics command. I am relatively new to Go and hence am facing some difficulties. I am not able to figure out where and how I could extract code out from both the commands to a separate file in order to remove code duplication. For eg., the handling of timeouts is done in different ways(using waitgroups in diagnostics and length of expected slice in metrics) in the commands and hence no scope for extraction of common code unless you want me to change the way it is being done in the metrics command as well.

@adleong

This comment has been minimized.

Copy link
Member

adleong commented Nov 7, 2019

No problem, @srv-twry, I'm happy to give you a few pointers.

A first step might be to update the NewProxyMetricsForward function in pkg/k8s/portforward.go to accept the port name as an argument rather than hardcoding it to ProxyAdminPortName.

You could then update the getMetrics function in metrics.go to also accept a port name.

Finally, you could create a function which, given a list of pods and a port name, returns all the metrics for those pods. All the waitgroup logic would be encapsulated here.

There's a bit more to figure out regarding containers: you'll actually either need to pass around a list of containers rather than pods, or you can make the assumption that port names between proxy and non-proxy containers will never conflict. I'll leave it to you to figure out what makes sense there.

@adleong

This comment has been minimized.

Copy link
Member

adleong commented Dec 10, 2019

@srv-twry are you still interested in working on this?

@srv-twry

This comment has been minimized.

Copy link
Contributor Author

srv-twry commented Dec 10, 2019

@adleong I am but unfortunately, I am very busy for the next month or so. If it's urgent, then it can be taken up by anyone else.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.