New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic Maximum volume count #554

Open
gnufied opened this Issue Mar 29, 2018 · 28 comments

Comments

Projects
None yet
@gnufied
Member

gnufied commented Mar 29, 2018

Feature Description

  • Add support for dynamic and generic mechanism of maximum volume per node.
  • Primary contact (assignee): gnufied
  • Responsible SIGs: sig-storage
  • Design proposal link (community repo): kubernetes/community#2051
  • Link to e2e and/or unit tests:
  • Reviewer(s) - (for LGTM) @saad-ali @jsafrane
  • Approver (likely from SIG/area to which feature belongs): @saad-ali @childsb
  • Feature target (which target equals to which milestone):
    • Alpha release target (x.y) 1.11
    • Beta release target (x.y) 1.12
    • Stable release target (x.y)

@gnufied gnufied changed the title from Dynamic Max. volume count to Dynamic Maximum volume count Mar 29, 2018

@gnufied

This comment has been minimized.

Member

gnufied commented Mar 29, 2018

/assign

@klausenbusk

This comment has been minimized.

klausenbusk commented Apr 6, 2018

Is it intended for this to work with flexvolume plugins? We still lack a solution for that use-case.

@gnufied

This comment has been minimized.

Member

gnufied commented Apr 6, 2018

It is intended to work with all volume types. Including Flexvolume and CSI.

@idvoretskyi

This comment has been minimized.

Member

idvoretskyi commented Apr 12, 2018

@idvoretskyi

This comment has been minimized.

Member

idvoretskyi commented Apr 12, 2018

/assign @gnufied

@mistyhacks

This comment has been minimized.

mistyhacks commented May 24, 2018

@gnufied please fill out the appropriate line item of the
1.11 feature tracking spreadsheet
and open a placeholder docs PR against the
release-1.11 branch
by 5/25/2018 (tomorrow as I write this) if new docs or docs changes are
needed and a relevant PR has not yet been opened.

@justaugustus

This comment has been minimized.

Member

justaugustus commented Jun 1, 2018

@gnufied -- What's the current status of this feature?
As we haven't heard from you with regards to some items, this feature has been moved to the Milestone risks sheet within the 1.11 Features tracking spreadsheet.

Please update the line item for this feature on the Milestone risks sheet ASAP AND ping myself and @idvoretskyi, so we can assess the feature status or we will need to officially remove it from the milestone.

@gnufied

This comment has been minimized.

Member

gnufied commented Jun 1, 2018

The PR is still on track for 1.11. The implementation PR is here - kubernetes/kubernetes#64154

We have approval from @saad-ali . We are waiting on approvals from @liggitt and @bsalamat . I just had to rebase the PR because the upstream changed and jordan requested some naming changes.

@justaugustus

This comment has been minimized.

Member

justaugustus commented Jun 1, 2018

@gnufied -- there needs to be a Docs PR issued as well, as Misty mentioned above.
Please update the Features tracking sheet with that information, so that we can remove this feature from the Milestone risks tab.

@gnufied

This comment has been minimized.

Member

gnufied commented Jun 1, 2018

Added docs PR - kubernetes/website#8871

@justaugustus

This comment has been minimized.

Member

justaugustus commented Jun 2, 2018

@gnufied thanks for the update! I've moved this feature back into the main sheet.

k8s-merge-robot added a commit to kubernetes/kubernetes that referenced this issue Jul 21, 2018

Merge pull request #66397 from gnufied/fix-default-max-volume-ebs
Automatic merge from submit-queue (batch tested with PRs 66410, 66398, 66061, 66397, 65558). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix volume limit for EBS on m5 and c5 instances

This is a fix for lower volume limits on m5 and c5 instance types while we wait for kubernetes/enhancements#554 to land GA.

This problem became urgent because many of our users are trying to migrate to those instance types in light of spectre/meltdown vulnerability but  lower volume limit on those instance types often causes cluster instability. Yes they can workaround by configuring the scheduler with lower limit but often this becomes somewhat difficult to do when cluster is mixed. 

The newer default limits were picked from https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/volume_limits.html

Text about spectre/meltdown is available on - https://community.bitnami.com/t/spectre-variant-2/54961/5

/sig storage
/sig scheduling

```release-note
Fix volume limit for EBS on m5 and c5 instance types
```
@kacole2

This comment has been minimized.

Contributor

kacole2 commented Jul 23, 2018

@gnufied This feature was worked on in the previous milestone, so we'd like to check in and see if there are any plans for this to graduate stages in Kubernetes 1.12 as mentioned in your original post. This still has the alpha tag as well so we need to update it accordingly.

If there are any updates, please explicitly ping @justaugustus, @kacole2, @robertsandoval, @rajendar38 to note that it is ready to be included in the Features Tracking Spreadsheet for Kubernetes 1.12.


Please note that the Features Freeze is July 31st, after which any incomplete Feature issues will require an Exception request to be accepted into the milestone.

In addition, please be aware of the following relevant deadlines:

  • Docs deadline (open placeholder PRs): 8/21
  • Test case freeze: 8/28

Please make sure all PRs for features have relevant release notes included as well.

Happy shipping!

@gnufied

This comment has been minimized.

Member

gnufied commented Jul 23, 2018

@kacole2 For 1.12 the plan is to further expand this feature to cover more volume types. Add CSI support. The decision to whether to move to beta in 1.12 or not will be taken in a day or two.

@justaugustus

This comment has been minimized.

Member

justaugustus commented Jul 28, 2018

@gnufied -- just following up... are we graduating this one to Beta in 1.12?

@justaugustus

This comment has been minimized.

Member

justaugustus commented Jul 31, 2018

@gnufied @saad-ali --
Feature Freeze is today. Are we planning on graduating this to Beta in Kubernetes 1.12?
If so, can you make sure everything is up-to-date, so I can include it on the 1.12 Feature tracking spreadsheet?

@gnufied

This comment has been minimized.

Member

gnufied commented Jul 31, 2018

Yeah we are targetting this feature to move to beta for 1.12

@justaugustus

This comment has been minimized.

Member

justaugustus commented Aug 4, 2018

Thanks for the update. I've added this to the 1.12 tracking sheet.

cc: @kacole2 @wadadli @robertsandoval @rajendar38

@zparnold

This comment has been minimized.

Member

zparnold commented Aug 20, 2018

Hey there! @gnufied I'm the wrangler for the Docs this release. Is there any chance I could have you open up a docs PR against the release-1.12 branch as a placeholder? That gives us more confidence in the feature shipping in this release and gives me something to work with when we start doing reviews/edits. Thanks! If this feature does not require docs, could you please update the features tracking spreadsheet to reflect it?

hh pushed a commit to ii/kubernetes that referenced this issue Aug 27, 2018

Merge pull request kubernetes#67772 from andyzhangx/azuredisk-volumel…
…imits2

Automatic merge from submit-queue (batch tested with PRs 67766, 67642, 67772). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Enable dynamic azure disk volume limits

**What this PR does / why we need it**:
Enable dynamic azure disk volume limits,
This is an azure cloud provider implementation related to feature: [Dynamic Maximum volume count](kubernetes/enhancements#554)

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes kubernetes#66269

**Special notes for your reviewer**:
This PR use `az.VirtualMachineSizesClient.List` to list all vm sizes under region, match vm size with current node size, and then got `MaxDataDiskCount`, the `GetVolumeLimits` happens in kubelet and will return `attachable-volumes-azure-disk` in node status as following example:
```
agentpool-22082114-0
    ...
    allocatable:
      attachable-volumes-azure-disk: "8"
      cpu: "2"
      ephemeral-storage: "28043041951"
      hugepages-1Gi: "0"
      hugepages-2Mi: "0"
      memory: 7034772Ki
      pods: "30"
```

**Release note**:

```
Enable dynamic azure disk volume limits
```

/sig azure
/kind feature
@justaugustus

This comment has been minimized.

Member

justaugustus commented Sep 5, 2018

@gnufied --
Any update on docs status for this feature? Are we still planning to land it for 1.12?
At this point, code freeze is upon us, and docs are due on 9/7 (2 days).
If we don't here anything back regarding this feature ASAP, we'll need to remove it from the milestone.

cc: @zparnold @jimangel @tfogo

@gnufied

This comment has been minimized.

Member

gnufied commented Sep 6, 2018

We are moving this feature to beta. Adding support for CSI and Azure. Will open docs PR soonish.

@andyzhangx I may need your help in documenting stuff from Azure side of things.

@andyzhangx

This comment has been minimized.

Member

andyzhangx commented Sep 6, 2018

@gnufied no problem, just assign the task to me.

@justaugustus

This comment has been minimized.

Member

justaugustus commented Sep 6, 2018

@gnufied @andyzhangx -- please keep us posted. Docs PR needs to be opened ASAP. It's overdue at this point.

@tfogo

This comment has been minimized.

Member

tfogo commented Sep 6, 2018

Hi @gnufied and @andyzhangx, do you have an update on the Docs PR? Please let us know as soon as you have a PR open.

@gnufied

This comment has been minimized.

Member

gnufied commented Sep 6, 2018

Placeholder docs PR - kubernetes/website#10211

@ameukam

This comment has been minimized.

Contributor

ameukam commented Oct 5, 2018

Hi folks,
Kubernetes 1.13 is going to be a 'stable' release since the cycle is only 10 weeks. We encourage no big alpha features and only consider adding this feature if you have a high level of confidence it will make code slush by 11/09. Are there plans for this enhancement to graduate to beta/stable within the 1.13 release cycle? If not, can you please remove it from the 1.12 milestone or add it to 1.13?

We are also now encouraging that every new enhancement aligns with a KEP. If a KEP has been created, please link to it in the original post. Please take the opportunity to develop a KEP.

@kacole2

This comment has been minimized.

Contributor

kacole2 commented Oct 8, 2018

hi @gnufied. I'm following up on @ameukam's comment to see if there are any plans for this to graduate stages for 1.13.

This release is targeted to be more ‘stable’ and will have an aggressive timeline. Please only include this enhancement if there is a high level of confidence it will meet the following deadlines:
Docs (open placeholder PRs): 11/8
Code Slush: 11/9
Code Freeze Begins: 11/15
Docs Complete and Reviewed: 11/27

@kacole2 kacole2 added tracked/no and removed tracked/yes labels Oct 8, 2018

@kacole2

This comment has been minimized.

Contributor

kacole2 commented Oct 15, 2018

/milestone clear

@k8s-ci-robot k8s-ci-robot removed this from the v1.12 milestone Oct 15, 2018

@gnufied

This comment has been minimized.

Member

gnufied commented Oct 15, 2018

err - I deleted my earlier comment. For 1.13 - we are going to keep this feature in beta and apart from bug fixes, there won't be any change in feature itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment