New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support 3rd party device monitoring plugins #606

Open
dashpole opened this Issue Jul 31, 2018 · 32 comments

Comments

Projects
None yet
@dashpole
Copy link
Contributor

dashpole commented Jul 31, 2018

Feature Description

  1. As a cluster admin, I want container-level metrics for devices provided by device plugins.
  2. As a device vendor, I want to be able to provide device-specific metrics without contributing to core kubernetes.
  • One-line feature description (can be used as a release note): Support 3rd party device monitoring plugins
  • Primary contact (assignee): @dashpole
  • Responsible SIGs: sig-node
  • Design proposal link (community repo): kubernetes/community#2454
  • Link to e2e and/or unit tests: Coming Soon
  • Reviewer(s) - (for LGTM) recommend having 2+ reviewers (at least one from code-area OWNERS file) agreed to review. Reviewers from multiple companies preferred: @vikaschoudhary16 @jiayingz
  • Approver (likely from SIG/area to which feature belongs): @derekwaynecarr
  • Feature target (which target equals to which milestone):
    • Alpha release target (1.13)
    • Beta release target (1.14)
    • Stable release target (TBD)

/kind feature
/sig node
/stage alpha

@kacole2

This comment has been minimized.

Copy link
Contributor

kacole2 commented Jul 31, 2018

@dashpole this has been added to the 1.12 tracking sheet. Thank you

@justaugustus please add relevant tags to this issue

@justaugustus justaugustus added this to the v1.12 milestone Jul 31, 2018

@justaugustus

This comment has been minimized.

Copy link
Member

justaugustus commented Jul 31, 2018

@kacole2 all set!

@vikaschoudhary16

This comment has been minimized.

Copy link
Member

vikaschoudhary16 commented Aug 1, 2018

@dashpole

This comment has been minimized.

Copy link
Contributor

dashpole commented Aug 1, 2018

Agreed, that makes more sense

@justaugustus

This comment has been minimized.

Copy link
Member

justaugustus commented Aug 15, 2018

/assign @dashpole

@zparnold

This comment has been minimized.

Copy link
Member

zparnold commented Aug 20, 2018

Hey there! @dashpole I'm the wrangler for the Docs this release. Is there any chance I could have you open up a docs PR against the release-1.12 branch as a placeholder? That gives us more confidence in the feature shipping in this release and gives me something to work with when we start doing reviews/edits. Thanks! If this feature does not require docs, could you please update the features tracking spreadsheet to reflect it?

@dashpole

This comment has been minimized.

Copy link
Contributor

dashpole commented Aug 20, 2018

@zparnold here is the docs pr placeholder: kubernetes/website#9945

@zparnold

This comment has been minimized.

Copy link
Member

zparnold commented Aug 25, 2018

Thank you @dashpole!

@justaugustus

This comment has been minimized.

Copy link
Member

justaugustus commented Sep 5, 2018

@dashpole --
Any update on docs status for this feature? Are we still planning to land it for 1.12?
At this point, code freeze is upon us, and docs are due on 9/7 (2 days).
If we don't here anything back regarding this feature ASAP, we'll need to remove it from the milestone.

cc: @zparnold @jimangel @tfogo

@dashpole

This comment has been minimized.

Copy link
Contributor

dashpole commented Sep 5, 2018

It can be removed from the milestone

@justaugustus

This comment has been minimized.

Copy link
Member

justaugustus commented Sep 5, 2018

Got it. Thanks for the update!

@justaugustus justaugustus added tracked/no and removed tracked/yes labels Sep 5, 2018

@justaugustus justaugustus removed this from the v1.12 milestone Sep 5, 2018

@kacole2

This comment has been minimized.

Copy link
Contributor

kacole2 commented Oct 8, 2018

Hi @dashpole
This enhancement has been tracked before, so we'd like to check in and see if there are any plans for this to graduate stages in Kubernetes 1.13. I can see that your original post says Alpha for 1.13. This release is targeted to be more ‘stable’ and will have an aggressive timeline. Please only include this enhancement if there is a high level of confidence it will meet the following deadlines:
Docs (open placeholder PRs): 11/8
Code Slush: 11/9
Code Freeze Begins: 11/15
Docs Complete and Reviewed: 11/27

Please take a moment to ping @kacole2 so it can be included in the 1.13 Enhancements Tracking Sheet if it's going to make it.

Thanks!

@kacole2

This comment has been minimized.

Copy link
Contributor

kacole2 commented Oct 8, 2018

/milestone v1.13
/tracked yes

@k8s-ci-robot k8s-ci-robot added this to the v1.13 milestone Oct 8, 2018

@kacole2 kacole2 added tracked/yes and removed tracked/no labels Oct 8, 2018

@AishSundar

This comment has been minimized.

Copy link

AishSundar commented Oct 17, 2018

@dashpole could you please let us know whats pending for this feature to go to Alpha in 1.13. Do you have a list of pending PRs or issues you can point us to. Thanks

@claurence

This comment has been minimized.

Copy link

claurence commented Oct 22, 2018

@dashpole there has been no communication on the status. Are we confident this is going to make the v1.13 milestone? Enhancement freeze is tomorrow COB. If there is no communication or update on the PR, this is going to be pulled from the milestone as it doesn't fit with our "stability" theme. If there is no communication after COB tomorrow, an exception will be required to add it back to the milestone. Please let me know where we stand. Thanks!

@dashpole

This comment has been minimized.

Copy link
Contributor

dashpole commented Oct 22, 2018

KEP: kubernetes/community#2454 should be merged soon, as it is approved by the required people. Once that happens, Ill open the implementation PR, and get working on docs. I still expect this to land in 1.13.

@tfogo

This comment has been minimized.

Copy link
Member

tfogo commented Nov 1, 2018

Hi @dashpole , I'm the docs wrangler for the 1.13 release. Could you please open a placeholder PR for the docs for this enhancement against the dev-1.13 branch of k/website and send me a link? I see you're waiting for a PR to be merged, but a placeholder is all we need at this point.

The deadline for placeholder PRs for the 1.13 release is November 8. So it's important to make a docs PR as soon as possible.

If you have any questions about any of this, I'm happy to help. You can also message me on slack (I'm tfogo there too). 😀

Thanks!

@dashpole

This comment has been minimized.

Copy link
Contributor

dashpole commented Nov 1, 2018

@tfogo, does this one still work? kubernetes/website#9945

@tfogo

This comment has been minimized.

Copy link
Member

tfogo commented Nov 1, 2018

@dashpole Oh that's perfect, thanks. I missed that, sorry.

@kacole2

This comment has been minimized.

Copy link
Contributor

kacole2 commented Nov 8, 2018

@dashpole I see the KEP is still open and there are no code PR currently in flight. Code Slush begins tomorrow. Are we sure we don't want to punt this to 1.14?

@AishSundar

This comment has been minimized.

Copy link

AishSundar commented Nov 9, 2018

@dashpole we see the KEP kubernetes/community#2454 is still open for this enhancement and we dont see an open implementation PR. With Code slush coming in today and Code freeze next week, I feel its too late to wrangle all this in and stabilize in time for Code freeze in 1.13 timeframe. At this point Release team is strongly leaning towards moving this to 1.14 and give more time to stabilize.

Let us know what you think. If we dont see an update, by default this will be untracked for 1.13 starting Monday 11/12.

@kacole2

@dashpole

This comment has been minimized.

Copy link
Contributor

dashpole commented Nov 9, 2018

@AishSundar ack. Ill see if we can resolve the last outstanding issue today. The implementation PR is actually out: kubernetes/kubernetes#70508, but there is still one point of discussion to be worked out before that can go in. If it isn't resolved by eod Monday, we can move this out of the milestone.

@AishSundar

This comment has been minimized.

Copy link

AishSundar commented Nov 9, 2018

Thanks so much @dashpole for helping us timebox this. If the KEP is resolved by Monday, do you think you can merge the PR by Wednesday, giving us a couple of days to watch CI?

@RenaudWasTaken

This comment has been minimized.

Copy link
Member

RenaudWasTaken commented Nov 10, 2018

If this can help build confidence on this feature in 1.13, overall for GPUs (since this is the main accelerator that will be enabled by this) there are no major issues with the KEP and we support the availability of this feature in 1.13.

POCs have been built and shows that this KEP enables monitoring agents to expose Pod level metrics to the end users (e.g: using the NVIDIA monitoring tools adapted for this KEP we expose GPU consumption per pod).

Finally, reviews on the code have started, building gRPC services is something that has been done multiple times (device plugin, plugin watcher) and the current implementation builds on the lessons learned from these experience.

@AishSundar

This comment has been minimized.

Copy link

AishSundar commented Nov 11, 2018

@RenaudWasTaken thanks for clarification, I see the KEP is approved now (pending merge though). As mentioned earlier as long as the open PR kubernetes/kubernetes#70508 is merged early this week to help us watch CI stability for a few runs we should be good for 1.13.

@claurence

This comment has been minimized.

Copy link

claurence commented Jan 14, 2019

@dashpole Hello - I’m the enhancement’s lead for 1.14 and I’m checking in on this issue to see what work (if any) is being planned for the 1.14 release. Enhancements freeze is Jan 29th and I want to remind that all enhancements must have a KEP - I can't find a KEP for this issue can you please link one if it exists? Thanks.

@dashpole

This comment has been minimized.

Copy link
Contributor

dashpole commented Jan 14, 2019

@claurence I don't have any work planned on this for the 1.14 release. The next steps for this feature are to validate the design by using it out-of-tree.

@claurence

This comment has been minimized.

Copy link

claurence commented Jan 14, 2019

Thanks @dashpole so is it then not targeting beta for 1.14? Or is the work for beta already completed?

@dashpole

This comment has been minimized.

Copy link
Contributor

dashpole commented Jan 14, 2019

@claurence it is not targeting beta for 1.14

@dchen1107 dchen1107 modified the milestones: v1.13, v1.14 Jan 15, 2019

@dchen1107

This comment has been minimized.

Copy link
Member

dchen1107 commented Jan 15, 2019

Re-target this enhancement alpha to v1.14. We need some user inputs before promoting it to beta.

@claurence

This comment has been minimized.

Copy link

claurence commented Jan 15, 2019

@dchen1107 thanks! To clarify does that mean this did not ship as alpha in 1.13 or does that mean no status change and we don't need to track anything for it in 1.14?

@dashpole

This comment has been minimized.

Copy link
Contributor

dashpole commented Jan 15, 2019

@claurence this shipped as alpha in 1.13. No status change or tracking required for 1.14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment