New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Independent persistent storage for replicated pods #4211

Open
rhcarvalho opened this Issue Aug 17, 2015 · 44 comments

Comments

Projects
None yet
@rhcarvalho
Contributor

rhcarvalho commented Aug 17, 2015

Trying to add persistent storage to our MongoDB replication example template, we hit into a show stopper: if a DeploymentConfig's pod template has a PersistentVolumeClaim, the same claim is reused for every replica deployed.

We need a way to provide persistent storage to a pod that scales with oc scale, so that new replicas get new PVCs and claim different PVs.

The workaround for now is to manually define:

  • N pods or N replication controllers with 1 replica (cannot have #replicas > 1)
  • N PVCs

... and forget about oc scale.

/cc @bparees

@bparees

This comment has been minimized.

Show comment
Hide comment
@bparees

bparees Aug 17, 2015

Contributor

@pmorie @markturansky I think you've already been involved in this discussion, can you tell us the current plans around this?

Contributor

bparees commented Aug 17, 2015

@pmorie @markturansky I think you've already been involved in this discussion, can you tell us the current plans around this?

@markturansky

This comment has been minimized.

Show comment
Hide comment
@markturansky

markturansky Aug 17, 2015

Member

kubernetes/kubernetes#260

Scaling storage is on the radar and the above issue talks through many of the problems and difficulties. This feature is growing in importance and we're starting to dissect the requirements on the linked issue.

No official design yet or implementation plan, but it is in the works.

Member

markturansky commented Aug 17, 2015

kubernetes/kubernetes#260

Scaling storage is on the radar and the above issue talks through many of the problems and difficulties. This feature is growing in importance and we're starting to dissect the requirements on the linked issue.

No official design yet or implementation plan, but it is in the works.

@pmorie

This comment has been minimized.

Show comment
Hide comment
@pmorie

pmorie Aug 17, 2015

Member

@rhcarvalho Can you say a little more about your use-case so that we're all on the same page? It's not clear to me currently whether you want:

  1. Each replica to use the same storage
  2. Each replica to make a new claim
Member

pmorie commented Aug 17, 2015

@rhcarvalho Can you say a little more about your use-case so that we're all on the same page? It's not clear to me currently whether you want:

  1. Each replica to use the same storage
  2. Each replica to make a new claim
@rhcarvalho

This comment has been minimized.

Show comment
Hide comment
@rhcarvalho

rhcarvalho Aug 17, 2015

Contributor

@pmorie in the case of a MongoDB replica set, each member (a container in a pod) should have it's own independent storage.

Contributor

rhcarvalho commented Aug 17, 2015

@pmorie in the case of a MongoDB replica set, each member (a container in a pod) should have it's own independent storage.

@deanpeterson

This comment has been minimized.

Show comment
Hide comment
@deanpeterson

deanpeterson commented Oct 13, 2015

+1

1 similar comment
@ntquyen

This comment has been minimized.

Show comment
Hide comment
@ntquyen

ntquyen commented Oct 22, 2015

+1

@wattsteve

This comment has been minimized.

Show comment
Hide comment
@wattsteve

wattsteve Nov 23, 2015

Yes, but its independent LOCAL storage. Not Network.

To the best of my knowledge, in a non-containerized (traditional) MongoDB environment, you don't use network storage for individual MongoDB instances. Why are we trying to do so just because Mongo is running in containers? Shouldn't you just be specifying a HostPath (Host/Local Storage) Volume Plugin in your MongoDB RC and problem solved?

Kubernetes/OpenShift supports exposes 2 types of methods to access persistence for containers:

  • Volume Plugins (Direct)
  • Persistent Volumes (Abstracted)

Generally, the usage of Persistent Volumes means you don't care where the pod/container/app runs but you want to be able to re-connect it to the same network storage device regardless of which host it gets moved to. Most scale out persistence platforms (GlusterFS, HDFS, Mongo, Cassandra) are designed to use local direct attached storage for performance reasons, and so I think for these types of platforms you want to always be using HostPath (or some future incarnation of the same feature) rather than Persistent Volumes.

wattsteve commented Nov 23, 2015

Yes, but its independent LOCAL storage. Not Network.

To the best of my knowledge, in a non-containerized (traditional) MongoDB environment, you don't use network storage for individual MongoDB instances. Why are we trying to do so just because Mongo is running in containers? Shouldn't you just be specifying a HostPath (Host/Local Storage) Volume Plugin in your MongoDB RC and problem solved?

Kubernetes/OpenShift supports exposes 2 types of methods to access persistence for containers:

  • Volume Plugins (Direct)
  • Persistent Volumes (Abstracted)

Generally, the usage of Persistent Volumes means you don't care where the pod/container/app runs but you want to be able to re-connect it to the same network storage device regardless of which host it gets moved to. Most scale out persistence platforms (GlusterFS, HDFS, Mongo, Cassandra) are designed to use local direct attached storage for performance reasons, and so I think for these types of platforms you want to always be using HostPath (or some future incarnation of the same feature) rather than Persistent Volumes.

@rhcarvalho

This comment has been minimized.

Show comment
Hide comment
@rhcarvalho

rhcarvalho Nov 23, 2015

Contributor

@wattsteve thanks for weighting in.

Why are we trying to do so just because Mongo is running in containers? Shouldn't you just be specifying a HostPath (Host/Local Storage) Volume Plugin in your MongoDB RC and problem solved?

Relying on HostPath has several limitations as well. The obvious one: if your pod gets scheduled to a different node than the one it resided before, you lose access to existing data. HostPath is not a production-oriented feature, as warned in the documentation of OpenShift and Kubernetes, and therefore does not solve the problem.

When running a MongoDB cluster in a distributed containers fashion, the containers are ephemeral and treated as a pet, as opposed to a traditional setup in which you use the local disks (best performance), and make sure to run data backups and keep those specific disks healthy.

Perhaps we might find a solution somewhere between the extremes. What can be done today would be to run primarily on ephemeral and faster storage, with an automatic live backup to persistent storage.

That could work for a MongoDB setup, but doesn't invalidate the general need for being able to request independent PVs stated in the issue.

Contributor

rhcarvalho commented Nov 23, 2015

@wattsteve thanks for weighting in.

Why are we trying to do so just because Mongo is running in containers? Shouldn't you just be specifying a HostPath (Host/Local Storage) Volume Plugin in your MongoDB RC and problem solved?

Relying on HostPath has several limitations as well. The obvious one: if your pod gets scheduled to a different node than the one it resided before, you lose access to existing data. HostPath is not a production-oriented feature, as warned in the documentation of OpenShift and Kubernetes, and therefore does not solve the problem.

When running a MongoDB cluster in a distributed containers fashion, the containers are ephemeral and treated as a pet, as opposed to a traditional setup in which you use the local disks (best performance), and make sure to run data backups and keep those specific disks healthy.

Perhaps we might find a solution somewhere between the extremes. What can be done today would be to run primarily on ephemeral and faster storage, with an automatic live backup to persistent storage.

That could work for a MongoDB setup, but doesn't invalidate the general need for being able to request independent PVs stated in the issue.

@wattsteve

This comment has been minimized.

Show comment
Hide comment
@wattsteve

wattsteve Nov 23, 2015

"if your pod gets scheduled to a different node than the one it resided before, you lose access to existing data. HostPath is not a production-oriented feature, as warned in the documentation of OpenShift and Kubernetes, and therefore does not solve the problem."

I contend this is actually not an issue as Scale Out Software Defined Storage Platforms are designed from the ground up to expect this failure domain which is why they offer data replication policies. A Pod being moved from one server to another is the same scenario as losing a server in a non-containerized solution. When the original pod goes offline, the storage platform (mongo) identifies that the amount of Mongo replicas are affected. When the new pod is started up on a different to replace the pod that went down, it is perceived as a new addition to the Mongo cluster and the data replicas are rebalanced and the new replicas are stored on the new Host Systems local storage. This is exactly how people run HDFS in EC2 with ephemeral local disks, although that is more like using EmptyDir than HostPath.

Another point I want to make is that Storage/Persistence Platforms are Pets, not Cattle. Once you stick some data in something you care about it and generally want to manage it carefully. To this point, I'd contend that not using Replication Controllers for deploying MongoDB and instead using individual Mongo Pods with NodeSelectors and HostPath (or EmptyDir) volumes is a reasonable approach - @deanpeterson is this something you've explored?

wattsteve commented Nov 23, 2015

"if your pod gets scheduled to a different node than the one it resided before, you lose access to existing data. HostPath is not a production-oriented feature, as warned in the documentation of OpenShift and Kubernetes, and therefore does not solve the problem."

I contend this is actually not an issue as Scale Out Software Defined Storage Platforms are designed from the ground up to expect this failure domain which is why they offer data replication policies. A Pod being moved from one server to another is the same scenario as losing a server in a non-containerized solution. When the original pod goes offline, the storage platform (mongo) identifies that the amount of Mongo replicas are affected. When the new pod is started up on a different to replace the pod that went down, it is perceived as a new addition to the Mongo cluster and the data replicas are rebalanced and the new replicas are stored on the new Host Systems local storage. This is exactly how people run HDFS in EC2 with ephemeral local disks, although that is more like using EmptyDir than HostPath.

Another point I want to make is that Storage/Persistence Platforms are Pets, not Cattle. Once you stick some data in something you care about it and generally want to manage it carefully. To this point, I'd contend that not using Replication Controllers for deploying MongoDB and instead using individual Mongo Pods with NodeSelectors and HostPath (or EmptyDir) volumes is a reasonable approach - @deanpeterson is this something you've explored?

@bparees

This comment has been minimized.

Show comment
Hide comment
@bparees

bparees Nov 23, 2015

Contributor

Shouldn't you just be specifying a HostPath (Host/Local Storage) Volume Plugin in your MongoDB RC and problem solved?

@wattsteve Aside from the valid issues @rhcarvalho listed, how does that work if two pods from the same RC get scheduled on the same node? they're going to use the same hostpath and clobber each other. So i need to manually define a unique hostpath for each pod i create? That seems very error prone.

Aside from this specific use case, it's the responsibility of the PaaS to provide storage to pods. Using EmptyDir or HostPath handcuffs the admin in terms of where that storage can come from. Even if i don't want the data to persist or follow my pods, i should be able to have a pod dynamically pick up network storage from a pool, and each pod in a RC should be able to get its own storage.

. To this point, I'd contend that not using Replication Controllers for deploying MongoDB and instead using individual Mongo Pods with NodeSelectors and HostPath (or EmptyDir) volumes is a reasonable approach

That removes the entire value of having an RC which makes it easy for me to scale up/down my mongo cluster..now i have to manually create/destroy pods to scale my cluster?

Contributor

bparees commented Nov 23, 2015

Shouldn't you just be specifying a HostPath (Host/Local Storage) Volume Plugin in your MongoDB RC and problem solved?

@wattsteve Aside from the valid issues @rhcarvalho listed, how does that work if two pods from the same RC get scheduled on the same node? they're going to use the same hostpath and clobber each other. So i need to manually define a unique hostpath for each pod i create? That seems very error prone.

Aside from this specific use case, it's the responsibility of the PaaS to provide storage to pods. Using EmptyDir or HostPath handcuffs the admin in terms of where that storage can come from. Even if i don't want the data to persist or follow my pods, i should be able to have a pod dynamically pick up network storage from a pool, and each pod in a RC should be able to get its own storage.

. To this point, I'd contend that not using Replication Controllers for deploying MongoDB and instead using individual Mongo Pods with NodeSelectors and HostPath (or EmptyDir) volumes is a reasonable approach

That removes the entire value of having an RC which makes it easy for me to scale up/down my mongo cluster..now i have to manually create/destroy pods to scale my cluster?

@markturansky

This comment has been minimized.

Show comment
Hide comment
@markturansky

markturansky Nov 23, 2015

Member

The upstream suggestion for this is to include a PersistentVolumeClaimTemplate on the RC side-by-side with the PodTemplate. This way, each pod replica gets its own volume.

The details are not more fully formed than that. I don't know, for example, what happens if a pod goes away (e.g, replica count decreased). Is the claim cascade deleted? If it sticks around, does it get re-used when the RC replica count increases again? These might just be policy issues with a toggle on the RC/Claim WRT behavior.

It seems relatively easy to give volumes in the same cardinal order to pods created in that order. i.e, an RC has created 7 pods, then decreases replicas to 3 (assume no PVC deletes). As the replica count returns to 7, each index (5, 6, and 7) could use the same claim it used previously.

Member

markturansky commented Nov 23, 2015

The upstream suggestion for this is to include a PersistentVolumeClaimTemplate on the RC side-by-side with the PodTemplate. This way, each pod replica gets its own volume.

The details are not more fully formed than that. I don't know, for example, what happens if a pod goes away (e.g, replica count decreased). Is the claim cascade deleted? If it sticks around, does it get re-used when the RC replica count increases again? These might just be policy issues with a toggle on the RC/Claim WRT behavior.

It seems relatively easy to give volumes in the same cardinal order to pods created in that order. i.e, an RC has created 7 pods, then decreases replicas to 3 (assume no PVC deletes). As the replica count returns to 7, each index (5, 6, and 7) could use the same claim it used previously.

@wattsteve

This comment has been minimized.

Show comment
Hide comment
@wattsteve

wattsteve Nov 23, 2015

@bparees I think I addressed @rhcarvalho's concerns in my response to his comments.

WRT the additional questions you brought up. Just so you know where I am coming from, first, I acknowledge that this is an issue if you want to use RCs and have each new pod that gets spin up get assigned a new NETWORK block device. What I'm disagreeing with is that people that want to run Mongo & Cassandra should be using Network Block Devices for their node level persistence. They normally use local disks when not using Containers for performance reasons so now that we are using containers, I'm trying to avoid making every single I/O go over the network instead of local disk for those same performance reasons.

I think what we really need to do is to augment this feature request with an RC that can schedule Pods on Kube/Origin Nodes that have available Local Storage on them. We will also need an RC that can stamp out new pods that each are attached to a new block device for the scenarios where there is no local storage available and so they are forced to use network storage (and suffer the performance consequences).

To address the issue about having 2 MongoDB pods on the same host colliding on the HostPath path. I agree. Thats a real issue when using RCs with the existing scheduling abilities. As such, its looking like the best way to do this for now, is to not use RCs, and have curated pods that use NodeSelectors, so they are scheduled on hosts that have the right storage available. This would avoid the 2 Pod on one Node scenario, which is also bad because of shard replication failure domains. I also contend that this approach is a reasonable workaround until we have this PR resolved, as I suspect that the majority of our community have relatively small MongoDB clusters. RCs don't add a whole lot of value when you're only scaling up incrementally (and not ever scaling down) by adding one pod per host, every 3 months or so. This is an example of an implementation of what I am proposing for GlusterFS which suffers from the same issues with RCs that MongoDB does - https://github.com/wattsteve/glusterfs-kubernetes

wattsteve commented Nov 23, 2015

@bparees I think I addressed @rhcarvalho's concerns in my response to his comments.

WRT the additional questions you brought up. Just so you know where I am coming from, first, I acknowledge that this is an issue if you want to use RCs and have each new pod that gets spin up get assigned a new NETWORK block device. What I'm disagreeing with is that people that want to run Mongo & Cassandra should be using Network Block Devices for their node level persistence. They normally use local disks when not using Containers for performance reasons so now that we are using containers, I'm trying to avoid making every single I/O go over the network instead of local disk for those same performance reasons.

I think what we really need to do is to augment this feature request with an RC that can schedule Pods on Kube/Origin Nodes that have available Local Storage on them. We will also need an RC that can stamp out new pods that each are attached to a new block device for the scenarios where there is no local storage available and so they are forced to use network storage (and suffer the performance consequences).

To address the issue about having 2 MongoDB pods on the same host colliding on the HostPath path. I agree. Thats a real issue when using RCs with the existing scheduling abilities. As such, its looking like the best way to do this for now, is to not use RCs, and have curated pods that use NodeSelectors, so they are scheduled on hosts that have the right storage available. This would avoid the 2 Pod on one Node scenario, which is also bad because of shard replication failure domains. I also contend that this approach is a reasonable workaround until we have this PR resolved, as I suspect that the majority of our community have relatively small MongoDB clusters. RCs don't add a whole lot of value when you're only scaling up incrementally (and not ever scaling down) by adding one pod per host, every 3 months or so. This is an example of an implementation of what I am proposing for GlusterFS which suffers from the same issues with RCs that MongoDB does - https://github.com/wattsteve/glusterfs-kubernetes

@bparees

This comment has been minimized.

Show comment
Hide comment
@bparees

bparees Nov 23, 2015

Contributor

@wattsteve and i spoke offline about this and i think we've come to an agreement that while this feature request makes sense and fills a valid need, we should proceed with the replicated mongo sample by using emptydir storage which is generally more suitable since it will likely be more performant, and aligns with the expectations of a mongo deployer that each cluster member is basically expendable.

obviously this means the mongo replica example needs to setup replicated shards to ensure that data is not lost if a single pod fails.

it also likely means the mongo image needs to wipe out the volume contents on startup, for the same reason we need to do it in mysql: the emptydir may not be empty if the container has restarted "in place".

@deanpeterson what are your thoughts on a clustered mongo offering based on ephemeral storage? (ie you are on your own to either backup the cluster, or ensure you have sufficient redundant replicas configured)

Contributor

bparees commented Nov 23, 2015

@wattsteve and i spoke offline about this and i think we've come to an agreement that while this feature request makes sense and fills a valid need, we should proceed with the replicated mongo sample by using emptydir storage which is generally more suitable since it will likely be more performant, and aligns with the expectations of a mongo deployer that each cluster member is basically expendable.

obviously this means the mongo replica example needs to setup replicated shards to ensure that data is not lost if a single pod fails.

it also likely means the mongo image needs to wipe out the volume contents on startup, for the same reason we need to do it in mysql: the emptydir may not be empty if the container has restarted "in place".

@deanpeterson what are your thoughts on a clustered mongo offering based on ephemeral storage? (ie you are on your own to either backup the cluster, or ensure you have sufficient redundant replicas configured)

@rhcarvalho

This comment has been minimized.

Show comment
Hide comment
@rhcarvalho

rhcarvalho Nov 24, 2015

Contributor

Since this issue is for tracking "Independent persistent storage for replicated pods", I would like to move the conversation about how to implement MongoDB replication to sclorg/mongodb-container#114.

Contributor

rhcarvalho commented Nov 24, 2015

Since this issue is for tracking "Independent persistent storage for replicated pods", I would like to move the conversation about how to implement MongoDB replication to sclorg/mongodb-container#114.

@smarterclayton

This comment has been minimized.

Show comment
Hide comment
@smarterclayton

smarterclayton Jan 13, 2016

Member

The specific implementation of this is covered by the PetSet proposal in Kubernetes.

Member

smarterclayton commented Jan 13, 2016

The specific implementation of this is covered by the PetSet proposal in Kubernetes.

@rhcarvalho

This comment has been minimized.

Show comment
Hide comment
@rhcarvalho

rhcarvalho Jan 13, 2016

Contributor

The specific implementation of this is covered by the PetSet proposal in Kubernetes.

link: kubernetes/kubernetes#18016

Contributor

rhcarvalho commented Jan 13, 2016

The specific implementation of this is covered by the PetSet proposal in Kubernetes.

link: kubernetes/kubernetes#18016

@wallnerryan

This comment has been minimized.

Show comment
Hide comment
@wallnerryan

wallnerryan commented Jan 26, 2016

+1

@rashtao

This comment has been minimized.

Show comment
Hide comment
@rashtao

rashtao commented Apr 11, 2016

+1

@markturansky

This comment has been minimized.

Show comment
Hide comment
@markturansky

markturansky May 12, 2016

Member

Reassigning to @childsb

Member

markturansky commented May 12, 2016

Reassigning to @childsb

@markturansky markturansky assigned childsb and unassigned markturansky May 12, 2016

@2012summerain

This comment has been minimized.

Show comment
Hide comment
@2012summerain

2012summerain commented Jun 9, 2016

+1

@speedplane

This comment has been minimized.

Show comment
Hide comment
@speedplane

speedplane Oct 6, 2016

I was trying to get a large elasticsearch cluster up and running and ran into this issue.

I want to create a large number of replicated elasticsearch data nodes, but which each use their own persistent storage. The idea that a cluster or even instance restart could lead to permanent data loss is too scary... not going to go there. Also, in our particular case we are not heavily I/O bound, so the case for using local disks is not quite as persuasive.

I took a look at PetSets and they look like they could solve the problem, but as an alpha feature, they don't seem quite ready for a production environment.

speedplane commented Oct 6, 2016

I was trying to get a large elasticsearch cluster up and running and ran into this issue.

I want to create a large number of replicated elasticsearch data nodes, but which each use their own persistent storage. The idea that a cluster or even instance restart could lead to permanent data loss is too scary... not going to go there. Also, in our particular case we are not heavily I/O bound, so the case for using local disks is not quite as persuasive.

I took a look at PetSets and they look like they could solve the problem, but as an alpha feature, they don't seem quite ready for a production environment.

@speedplane

This comment has been minimized.

Show comment
Hide comment
@speedplane

speedplane Oct 6, 2016

@rhcarvalho Yes, I figured that. Just wrote a script that generates 12 nearly identical Deployment yaml files. It works, but definitely takes the elegance out of Kubernetes.

speedplane commented Oct 6, 2016

@rhcarvalho Yes, I figured that. Just wrote a script that generates 12 nearly identical Deployment yaml files. It works, but definitely takes the elegance out of Kubernetes.

@smarterclayton

This comment has been minimized.

Show comment
Hide comment
@smarterclayton

smarterclayton Oct 6, 2016

Member

Yeah, PetSets are intended to support the "less ugly", but not there quite
yet.

On Oct 6, 2016, at 1:41 AM, Michael Sander notifications@github.com wrote:

@rhcarvalho https://github.com/rhcarvalho Yes, I figured that. Just wrote
a script that generates 12 nearly identical Deployment yaml files. It
works, but definitely takes the elegance out of Kubernetes.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#4211 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_p1sWuVCCcS3fHQrO3g0E07lkO3n3ks5qxLQsgaJpZM4Fs1ga
.

Member

smarterclayton commented Oct 6, 2016

Yeah, PetSets are intended to support the "less ugly", but not there quite
yet.

On Oct 6, 2016, at 1:41 AM, Michael Sander notifications@github.com wrote:

@rhcarvalho https://github.com/rhcarvalho Yes, I figured that. Just wrote
a script that generates 12 nearly identical Deployment yaml files. It
works, but definitely takes the elegance out of Kubernetes.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#4211 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_p1sWuVCCcS3fHQrO3g0E07lkO3n3ks5qxLQsgaJpZM4Fs1ga
.

@rhcarvalho

This comment has been minimized.

Show comment
Hide comment
@rhcarvalho

rhcarvalho Oct 6, 2016

Contributor

@smarterclayton where to send feedback on PetSets?

In particular, for a production-grade MongoDB deployment we are lacking the ability to schedule each pet on separate nodes. Adding a node selector to the pod template would give the same selector to every pod/pet.

Contributor

rhcarvalho commented Oct 6, 2016

@smarterclayton where to send feedback on PetSets?

In particular, for a production-grade MongoDB deployment we are lacking the ability to schedule each pet on separate nodes. Adding a node selector to the pod template would give the same selector to every pod/pet.

@smarterclayton

This comment has been minimized.

Show comment
Hide comment
@smarterclayton

smarterclayton Oct 7, 2016

Member

That's what service affinity spreading (on by default) and pod affinity is
for.

On Oct 6, 2016, at 1:44 PM, Rodolfo Carvalho notifications@github.com
wrote:

@smarterclayton https://github.com/smarterclayton where to send feedback
on PetSets?

In particular, for a production-grade MongoDB deployment we are lacking the
ability to schedule each pet on separate nodes. Adding a node selector to
the pod template would give the same selector to every pod/pet.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#4211 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_p9UiJjyKa1nWGFHwwEFpa2hp0VUmks5qxU-JgaJpZM4Fs1ga
.

Member

smarterclayton commented Oct 7, 2016

That's what service affinity spreading (on by default) and pod affinity is
for.

On Oct 6, 2016, at 1:44 PM, Rodolfo Carvalho notifications@github.com
wrote:

@smarterclayton https://github.com/smarterclayton where to send feedback
on PetSets?

In particular, for a production-grade MongoDB deployment we are lacking the
ability to schedule each pet on separate nodes. Adding a node selector to
the pod template would give the same selector to every pod/pet.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#4211 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_p9UiJjyKa1nWGFHwwEFpa2hp0VUmks5qxU-JgaJpZM4Fs1ga
.

@liggitt

This comment has been minimized.

Show comment
Hide comment
@liggitt

liggitt Oct 7, 2016

Contributor

Looks like those don't take PetSets into account yet, just replicasets, replication controllers, and services (https://github.com/kubernetes/kubernetes/blob/master/plugin/pkg/scheduler/algorithm/priorities/selector_spreading.go#L41)

Contributor

liggitt commented Oct 7, 2016

Looks like those don't take PetSets into account yet, just replicasets, replication controllers, and services (https://github.com/kubernetes/kubernetes/blob/master/plugin/pkg/scheduler/algorithm/priorities/selector_spreading.go#L41)

@smarterclayton

This comment has been minimized.

Show comment
Hide comment
@smarterclayton

smarterclayton Oct 7, 2016

Member

Service affinity and pod affinity should not require any of those.

On Oct 6, 2016, at 7:45 PM, Jordan Liggitt notifications@github.com wrote:

Looks like those don't take PetSets into account yet, just replicasets,
replication controllers, and services (
https://github.com/kubernetes/kubernetes/blob/master/plugin/pkg/scheduler/algorithm/priorities/selector_spreading.go#L41
)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#4211 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_p_hyg4Avq0aTUopm_rsQWPc8w1jnks5qxaQzgaJpZM4Fs1ga
.

Member

smarterclayton commented Oct 7, 2016

Service affinity and pod affinity should not require any of those.

On Oct 6, 2016, at 7:45 PM, Jordan Liggitt notifications@github.com wrote:

Looks like those don't take PetSets into account yet, just replicasets,
replication controllers, and services (
https://github.com/kubernetes/kubernetes/blob/master/plugin/pkg/scheduler/algorithm/priorities/selector_spreading.go#L41
)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#4211 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_p_hyg4Avq0aTUopm_rsQWPc8w1jnks5qxaQzgaJpZM4Fs1ga
.

@smarterclayton

This comment has been minimized.

Show comment
Hide comment
@smarterclayton

smarterclayton Oct 7, 2016

Member

That link is for the old spreaders - pod affinity / anti-affinity is the
new thing.

On Oct 6, 2016, at 7:45 PM, Jordan Liggitt notifications@github.com wrote:

Looks like those don't take PetSets into account yet, just replicasets,
replication controllers, and services (
https://github.com/kubernetes/kubernetes/blob/master/plugin/pkg/scheduler/algorithm/priorities/selector_spreading.go#L41
)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#4211 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_p_hyg4Avq0aTUopm_rsQWPc8w1jnks5qxaQzgaJpZM4Fs1ga
.

Member

smarterclayton commented Oct 7, 2016

That link is for the old spreaders - pod affinity / anti-affinity is the
new thing.

On Oct 6, 2016, at 7:45 PM, Jordan Liggitt notifications@github.com wrote:

Looks like those don't take PetSets into account yet, just replicasets,
replication controllers, and services (
https://github.com/kubernetes/kubernetes/blob/master/plugin/pkg/scheduler/algorithm/priorities/selector_spreading.go#L41
)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#4211 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_p_hyg4Avq0aTUopm_rsQWPc8w1jnks5qxaQzgaJpZM4Fs1ga
.

@davidopp

This comment has been minimized.

Show comment
Hide comment
@davidopp

davidopp Oct 7, 2016

The tried-and-true hack of assigning a host port in the pod template should still work to guarantee max of one replica per node.

But as @smarterclayton said you can also use pod affinity/anti-affinity for this (and to get much finer-grained control); see
http://kubernetes.io/docs/user-guide/node-selection/

davidopp commented Oct 7, 2016

The tried-and-true hack of assigning a host port in the pod template should still work to guarantee max of one replica per node.

But as @smarterclayton said you can also use pod affinity/anti-affinity for this (and to get much finer-grained control); see
http://kubernetes.io/docs/user-guide/node-selection/

@Globegitter

This comment has been minimized.

Show comment
Hide comment
@Globegitter

Globegitter May 16, 2017

I have just been trying out statefulsets (following http://blog.kubernetes.io/2017/01/running-mongodb-on-kubernetes-with-statefulsets.html) and while each replica is getting their own blob storage using the dynamic provisioning I would like them to use multiple storage accounts (that is on azure), as if there are issues that are specific to the storage account itself it would affect all of the replicas. That would not happen if they would be able to somehow select multiple storage-classes/storage accounts.

Globegitter commented May 16, 2017

I have just been trying out statefulsets (following http://blog.kubernetes.io/2017/01/running-mongodb-on-kubernetes-with-statefulsets.html) and while each replica is getting their own blob storage using the dynamic provisioning I would like them to use multiple storage accounts (that is on azure), as if there are issues that are specific to the storage account itself it would affect all of the replicas. That would not happen if they would be able to somehow select multiple storage-classes/storage accounts.

@weinong

This comment has been minimized.

Show comment
Hide comment
@weinong

weinong May 16, 2017

@Globegitter Azure managed disk is what you need: kubernetes/kubernetes#41950

weinong commented May 16, 2017

@Globegitter Azure managed disk is what you need: kubernetes/kubernetes#41950

@Globegitter

This comment has been minimized.

Show comment
Hide comment
@Globegitter

Globegitter May 16, 2017

This is amazing - thanks @weinong

Globegitter commented May 16, 2017

This is amazing - thanks @weinong

@MarkRx

This comment has been minimized.

Show comment
Hide comment
@MarkRx

MarkRx Sep 22, 2017

I would be interested in seeing this. Has progress been made for it with openshift or kubernetes?

MarkRx commented Sep 22, 2017

I would be interested in seeing this. Has progress been made for it with openshift or kubernetes?

@cloudbow

This comment has been minimized.

Show comment
Hide comment
@cloudbow

cloudbow Nov 24, 2017

This will be amazing if its done. Its true that we should not use this for the mongo use case but would be great for someone who don't care about the network I/O throughput.

I see that this is possible with stateful sets. I have seen the example of zookeeper where no of replicas is 3 and the volume definition is once . I got 3 different ebs volumes for all 3 replicas in aws. Does this mean we need to move to statefulsets always?

cloudbow commented Nov 24, 2017

This will be amazing if its done. Its true that we should not use this for the mongo use case but would be great for someone who don't care about the network I/O throughput.

I see that this is possible with stateful sets. I have seen the example of zookeeper where no of replicas is 3 and the volume definition is once . I got 3 different ebs volumes for all 3 replicas in aws. Does this mean we need to move to statefulsets always?

@openshift-bot

This comment has been minimized.

Show comment
Hide comment
@openshift-bot

openshift-bot Feb 26, 2018

Member

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Member

openshift-bot commented Feb 26, 2018

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-bot

This comment has been minimized.

Show comment
Hide comment
@openshift-bot

openshift-bot Mar 28, 2018

Member

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

Member

openshift-bot commented Mar 28, 2018

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@MarkRx

This comment has been minimized.

Show comment
Hide comment
@MarkRx

MarkRx Mar 28, 2018

/remove-lifecycle rotten

MarkRx commented Mar 28, 2018

/remove-lifecycle rotten

@MarkRx

This comment has been minimized.

Show comment
Hide comment
@MarkRx

MarkRx Mar 28, 2018

/remove-lifecycle stale

MarkRx commented Mar 28, 2018

/remove-lifecycle stale

@jsight

This comment has been minimized.

Show comment
Hide comment
@jsight

jsight Apr 2, 2018

I have a case where I would like for worker pods to be spooled up with a user-configurable amount of storage available. The storage would not necessarily have to be persistent and could start as emptyDir, but it would need to be high capacity.

I can probably do this with emptyDir, but unfortunately, there is no way to guarantee that the size meets the needs. hostPath might work as well, but isn't always suitable for all deployment environments.

Basically, I want something like a volumeClaimTemplate, but without necessarily being tied to having a stateful set, as these aren't really stateful pods.

I wonder if this is being progressed upstream?

jsight commented Apr 2, 2018

I have a case where I would like for worker pods to be spooled up with a user-configurable amount of storage available. The storage would not necessarily have to be persistent and could start as emptyDir, but it would need to be high capacity.

I can probably do this with emptyDir, but unfortunately, there is no way to guarantee that the size meets the needs. hostPath might work as well, but isn't always suitable for all deployment environments.

Basically, I want something like a volumeClaimTemplate, but without necessarily being tied to having a stateful set, as these aren't really stateful pods.

I wonder if this is being progressed upstream?

@openshift-bot

This comment has been minimized.

Show comment
Hide comment
@openshift-bot

openshift-bot Jul 2, 2018

Member

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Member

openshift-bot commented Jul 2, 2018

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@srossross-tableau

This comment has been minimized.

Show comment
Hide comment
@srossross-tableau

srossross-tableau Jul 10, 2018

I'm interested in an answer to @jsight's question as well.

srossross-tableau commented Jul 10, 2018

I'm interested in an answer to @jsight's question as well.

@srossross-tableau

This comment has been minimized.

Show comment
Hide comment
@srossross-tableau

srossross-tableau Jul 10, 2018

/remove-lifecycle stale

srossross-tableau commented Jul 10, 2018

/remove-lifecycle stale

@openshift-bot

This comment has been minimized.

Show comment
Hide comment
@openshift-bot

openshift-bot Oct 9, 2018

Member

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Member

openshift-bot commented Oct 9, 2018

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment