Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ceph: support topologySpreadConstraints #4387

Open
satoru-takeuchi opened this issue Nov 28, 2019 · 7 comments
Open

Ceph: support topologySpreadConstraints #4387

satoru-takeuchi opened this issue Nov 28, 2019 · 7 comments
Labels
Projects
Milestone

Comments

@satoru-takeuchi
Copy link
Contributor

@satoru-takeuchi satoru-takeuchi commented Nov 28, 2019

Is this a bug report or feature request?

  • Feature Request

What should the feature do:

It's necessary to support topologySpreadConstraints to spread OSDs among failure domains as much as possible. The detail is written in the following document.

# choose to schedule many of them on the same node. What we need is the Pod Topology

In addition, the meaning of topologySpreadConstraints is written in the following document.
https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/

What is use case behind this feature:

Spread OSDs among failure domains and permit multiple OSDs per one failure domain.

@satoru-takeuchi

This comment has been minimized.

Copy link
Contributor Author

@satoru-takeuchi satoru-takeuchi commented Dec 2, 2019

I have a question about how to implement topologySpreadConstraints support.

I found that Rook and operator-kit depend on kubernetes 1.14. However, topologySpreadConstraints is defined in kubernetes 1.16 or later. So, could you give me advice about what is the preferred way to support this feature?

a. Update dependency of the kubernetes version to 1.16 and implement this feature's support.
b. Write a document of a workaround for now, and support this feature when Rook starts to use kubernetes 1.16 . I'm now preparing a mutating admission webhook that injects TopologySpreadConstraints resource to OSD pods. I can share how to implement it in Rook's document.
c. Any other ways.

@travisn

This comment has been minimized.

Copy link
Member

@travisn travisn commented Dec 2, 2019

@satoru-takeuchi We can update the client to 1.16. I'll look at this update in the next day or two so we can have topology constraints in 1.2. thanks!

@travisn

This comment has been minimized.

Copy link
Member

@travisn travisn commented Dec 3, 2019

@satoru-takeuchi Updating to the 1.16 client is going to take some time to work through the dependencies as mentioned in #4408. If you want to continue looking at this feature, it would be great to get some early testing even before it's available through rook directly.

@satoru-takeuchi

This comment has been minimized.

Copy link
Contributor Author

@satoru-takeuchi satoru-takeuchi commented Dec 4, 2019

@travisn Yes, of course. I'll test PR#4408 and make my PR on top of your PR soon.

@travisn travisn removed this from To do in v1.2 Dec 5, 2019
@travisn travisn modified the milestones: 1.2, 1.3 Dec 5, 2019
@BlaineEXE

This comment has been minimized.

Copy link
Member

@BlaineEXE BlaineEXE commented Dec 5, 2019

I think we should consider waiting to implement this until the feature goes into beta in k8s so we don't have to deal with any instabilities between alpha and beta.

@satoru-takeuchi

This comment has been minimized.

Copy link
Contributor Author

@satoru-takeuchi satoru-takeuchi commented Dec 5, 2019

I'll implement this feature soon in any case. It's because we're constructing a Ceph cluster and this feature is mandatory here. So sharing a draft PR here is nice, I think.

I expect the following scenario.

  1. Make a PR on top of Travis's PR and use it for our cluster as a temporary solution.
  2. After this feature goes beta, change my PR to conform to beta.
  3. Try to merge the new version to upstream Rook.
@travisn travisn added this to To do in v1.3 Dec 17, 2019
@satoru-takeuchi

This comment has been minimized.

Copy link
Contributor Author

@satoru-takeuchi satoru-takeuchi commented Dec 26, 2019

One of my co-workers @tapih made a branch that enables topologySpreadConstraints.

cybozu-go@b4bd2ee

This branch is on top of the following branch.

#4408

When topologySpreadConstraints goes into beta, I'll make a PR based on this branch.
Any comments are welcome.

NOTE:
Unfortunately, this feature and all other features in StorageClassDeviceSet.Placement doesn't work correctly without fixing the following issue.

#4582

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
v1.3
  
To do
3 participants
You can’t perform that action at this time.