New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent data on nodes #150

Open
F21 opened this Issue Sep 29, 2015 · 114 comments

Comments

Projects
None yet
@F21

F21 commented Sep 29, 2015

Nomad should have some way for tasks to acquire persistent storage on nodes. In a lot of cases, we might want to run our own hdfs or ceph cluster on nomad.

That means, things like hdfs' datanodes needs to be able to reserve persistent storage on the node it is launched on. If the whole cluster goes down, once its brought back up, the appropriate tasks should be launched on its original nodes (where possible), so that it can gain access to data it has previously written.

@zrml

This comment has been minimized.

Show comment
Hide comment
@zrml

zrml Sep 30, 2015

+1
we also want to mount a specific FS of a shared storage volume...
we need an API/flag to be able to specify this affinity

zrml commented Sep 30, 2015

+1
we also want to mount a specific FS of a shared storage volume...
we need an API/flag to be able to specify this affinity

@F21

This comment has been minimized.

Show comment
Hide comment
@F21

F21 Oct 12, 2015

It would be awesome if direct attached storage can be implemented as a preview of some sort.

One of the things that came to my mind is the idea of updating containers while keeping the storage:

For example, let's say we have a MySQL 5.6.16 container running and it was allocated 20GB of storage on a client to store its data. If there's a new version of the MySQL container (5.6.17), we want to be able to swap out the container but still keep the storage (persistent data) and have it mounted into the updated container. This way, we can upgrade the infrastructure containers without data loss or having a complicated upgrade process that requires backing up the data, upgrading then restoring it.

F21 commented Oct 12, 2015

It would be awesome if direct attached storage can be implemented as a preview of some sort.

One of the things that came to my mind is the idea of updating containers while keeping the storage:

For example, let's say we have a MySQL 5.6.16 container running and it was allocated 20GB of storage on a client to store its data. If there's a new version of the MySQL container (5.6.17), we want to be able to swap out the container but still keep the storage (persistent data) and have it mounted into the updated container. This way, we can upgrade the infrastructure containers without data loss or having a complicated upgrade process that requires backing up the data, upgrading then restoring it.

@zrml

This comment has been minimized.

Show comment
Hide comment
@zrml

zrml Oct 13, 2015

@F21 good use case too.

zrml commented Oct 13, 2015

@F21 good use case too.

@cbednarski

This comment has been minimized.

Show comment
Hide comment
@cbednarski

cbednarski Oct 15, 2015

Contributor

Just to add some details to this, we need two overarching features: one is the notion of persistent storage that is either mounted into or otherwise persisted on the node, and is accounted for in terms of total disk space available for allocations.

The second is global tracking of where state is located in the cluster so jobs can be rescheduled with hard (mysql) / soft (riak) affinity for nodes that already have their data, and possibly a mechanism to reserve the resources even if the job fails or is not running.

Since these features are quite large we would implement them gradually. For example floating persistence (a la EBS), node-based persistence, global storage state tracking, soft affinity, hard affinity, and offline reservations as independent milestones. I yet can't speak to when / if we will implement these features.

Contributor

cbednarski commented Oct 15, 2015

Just to add some details to this, we need two overarching features: one is the notion of persistent storage that is either mounted into or otherwise persisted on the node, and is accounted for in terms of total disk space available for allocations.

The second is global tracking of where state is located in the cluster so jobs can be rescheduled with hard (mysql) / soft (riak) affinity for nodes that already have their data, and possibly a mechanism to reserve the resources even if the job fails or is not running.

Since these features are quite large we would implement them gradually. For example floating persistence (a la EBS), node-based persistence, global storage state tracking, soft affinity, hard affinity, and offline reservations as independent milestones. I yet can't speak to when / if we will implement these features.

@zrml

This comment has been minimized.

Show comment
Hide comment
@zrml

zrml Oct 15, 2015

@cbednarski sounds like you guys are going in the right direction! Cool!

zrml commented Oct 15, 2015

@cbednarski sounds like you guys are going in the right direction! Cool!

@melo

This comment has been minimized.

Show comment
Hide comment
@melo

melo Oct 18, 2015

Hi @cbednarski, thanks for the explanation of your goals.

This is something that we would also like to see. Other services that we would like to manage that require data storage are Redis Sentinel, Consul, and NSQ.

Currently we dedicate nodes to this tasks, so affinity is something that we manage manually. I understand that other use cases might need/want some more magical targeting, but it would be interesting to see some way of manually deciding this before deciding on how this fits into the overall model.

My point is that if nomad provides some way of manually assigning data volumes to containers, and leave the logic of making sure the containers only start on the correct hosts to manual configuration, then we could start to get a feeling of how it all works, and with that experience, design better models afterwards.

I found this in the code:

func (d *DockerDriver) containerBinds(alloc *allocdir.AllocDir, task *structs.Task) ([]string, error) {
I'm guessed SharedDir is a step in that direction, right?

Thank you,

melo commented Oct 18, 2015

Hi @cbednarski, thanks for the explanation of your goals.

This is something that we would also like to see. Other services that we would like to manage that require data storage are Redis Sentinel, Consul, and NSQ.

Currently we dedicate nodes to this tasks, so affinity is something that we manage manually. I understand that other use cases might need/want some more magical targeting, but it would be interesting to see some way of manually deciding this before deciding on how this fits into the overall model.

My point is that if nomad provides some way of manually assigning data volumes to containers, and leave the logic of making sure the containers only start on the correct hosts to manual configuration, then we could start to get a feeling of how it all works, and with that experience, design better models afterwards.

I found this in the code:

func (d *DockerDriver) containerBinds(alloc *allocdir.AllocDir, task *structs.Task) ([]string, error) {
I'm guessed SharedDir is a step in that direction, right?

Thank you,

@diptanu

This comment has been minimized.

Show comment
Hide comment
@diptanu

diptanu Oct 18, 2015

Collaborator

+1 to @cbednarski's thoughts.

We might also need to think about identity of data volumes. Data Volumes are slightly different than other compute resources like cpu/memory/disk etc which are scalar in nature since these are resources which can be loaned to any processes that might need compute resources where as volumes are usually used by the same type of process which created it in the first place and users might need to refer to a volume while specifying a task definition.

For ex -

resources {
   ....
   volumes = [
     {
        name = "db_personalization_001",
        size = 100000,
      },
   ]
}

In this example we are asking Nomad to place the task on a machine where the volume named db_personalization_001 exists and if it doesn't exist Nomad can create the volume on a machine where it can provide 100GB of disk space and that matches the other constraints that the user might have mentioned. While creating the new volume if a volume with that identity wasn't already present in the cluster we would also need to persist the identity of the volume in a manner which can be restored during a disaster recovery operation..

Collaborator

diptanu commented Oct 18, 2015

+1 to @cbednarski's thoughts.

We might also need to think about identity of data volumes. Data Volumes are slightly different than other compute resources like cpu/memory/disk etc which are scalar in nature since these are resources which can be loaned to any processes that might need compute resources where as volumes are usually used by the same type of process which created it in the first place and users might need to refer to a volume while specifying a task definition.

For ex -

resources {
   ....
   volumes = [
     {
        name = "db_personalization_001",
        size = 100000,
      },
   ]
}

In this example we are asking Nomad to place the task on a machine where the volume named db_personalization_001 exists and if it doesn't exist Nomad can create the volume on a machine where it can provide 100GB of disk space and that matches the other constraints that the user might have mentioned. While creating the new volume if a volume with that identity wasn't already present in the cluster we would also need to persist the identity of the volume in a manner which can be restored during a disaster recovery operation..

@F21

This comment has been minimized.

Show comment
Hide comment
@F21

F21 Nov 18, 2015

Maybe it's possible to lean on https://github.com/emccode/rexray for non-local storage such as Amazon EBS. It doesn't manage persistence on local disks though, so that portion would still need to be implemented.

F21 commented Nov 18, 2015

Maybe it's possible to lean on https://github.com/emccode/rexray for non-local storage such as Amazon EBS. It doesn't manage persistence on local disks though, so that portion would still need to be implemented.

@gourao

This comment has been minimized.

Show comment
Hide comment
@gourao

gourao Nov 19, 2015

I am one of the maintainers of https://github.com/libopenstorage/openstorage. The goal of this project is to provide persistent cluster aware storage to Linux containers (Docker in particular). It supports both data volumes as well as the Graph driver interface. So your images and data are persisted in a multi node scheduler aware manner. I hope this project can help what Nomad would like to achieve.

The open storage daemon (OSD) itself runs on every node as a Docker container. They discover other OSD nodes in the cluster via a KV DB. An container run via the Docker remote API can leverage volumes and graph support from OSD. OSD in turn can support multiple persistent backends. Ideally this would work for Nomad without doing much.

The specs are also available at openstorage.org

gourao commented Nov 19, 2015

I am one of the maintainers of https://github.com/libopenstorage/openstorage. The goal of this project is to provide persistent cluster aware storage to Linux containers (Docker in particular). It supports both data volumes as well as the Graph driver interface. So your images and data are persisted in a multi node scheduler aware manner. I hope this project can help what Nomad would like to achieve.

The open storage daemon (OSD) itself runs on every node as a Docker container. They discover other OSD nodes in the cluster via a KV DB. An container run via the Docker remote API can leverage volumes and graph support from OSD. OSD in turn can support multiple persistent backends. Ideally this would work for Nomad without doing much.

The specs are also available at openstorage.org

@F21

This comment has been minimized.

Show comment
Hide comment
@F21

F21 Nov 19, 2015

@gourao That sounds really exiciting! Are there any plans to support things beyond docker: qemu, rkt, raw exec etc?

F21 commented Nov 19, 2015

@gourao That sounds really exiciting! Are there any plans to support things beyond docker: qemu, rkt, raw exec etc?

@gourao

This comment has been minimized.

Show comment
Hide comment
@gourao

gourao Nov 19, 2015

Yes @F21, that's the plan. There are a few folks looking at rkt support, and as the OCI spec becomes more concrete, this will hopefully be a solved problem.

gourao commented Nov 19, 2015

Yes @F21, that's the plan. There are a few folks looking at rkt support, and as the OCI spec becomes more concrete, this will hopefully be a solved problem.

@erSitzt

This comment has been minimized.

Show comment
Hide comment
@erSitzt

erSitzt Dec 7, 2015

+1
Access to persistent storage mounted or available directly on the node would be great.

While testing nomad with simple containers i did not realize that there was no option in the job syntax for bind mounts which i used when dealing with docker directly. :(

I like @diptanu's proposal.
But wouldn't it be easier to just let users specify volumes to mount into a container in a way we do it with docker directly ? Nomad could check the existence of the path and the free space for that mountpoint.

As @melo mentioned nomad is already doing something like this

func (d *DockerDriver) containerBinds(alloc *allocdir.AllocDir, task *structs.Task) ([]string, error) {

Most other tools to manage docker containers allow users specify volumes on container creation (Marathon, Shipyard for Example... Kubernetes too i think?)

I'm a novice in go, so i did not try anything myself by now. :)

erSitzt commented Dec 7, 2015

+1
Access to persistent storage mounted or available directly on the node would be great.

While testing nomad with simple containers i did not realize that there was no option in the job syntax for bind mounts which i used when dealing with docker directly. :(

I like @diptanu's proposal.
But wouldn't it be easier to just let users specify volumes to mount into a container in a way we do it with docker directly ? Nomad could check the existence of the path and the free space for that mountpoint.

As @melo mentioned nomad is already doing something like this

func (d *DockerDriver) containerBinds(alloc *allocdir.AllocDir, task *structs.Task) ([]string, error) {

Most other tools to manage docker containers allow users specify volumes on container creation (Marathon, Shipyard for Example... Kubernetes too i think?)

I'm a novice in go, so i did not try anything myself by now. :)

@ketzacoatl

This comment has been minimized.

Show comment
Hide comment
@ketzacoatl

ketzacoatl Dec 28, 2015

Contributor

Both #62 and #630 are tracking this simpler use case of mounting a path from the host as a volume mount for the docker container.

Contributor

ketzacoatl commented Dec 28, 2015

Both #62 and #630 are tracking this simpler use case of mounting a path from the host as a volume mount for the docker container.

@bscott

This comment has been minimized.

Show comment
Hide comment
@bscott

bscott Dec 28, 2015

+1

Any timeframe on this as volumes not being supported in Nomad is a huge deal breaker for us using Nomad.

bscott commented Dec 28, 2015

+1

Any timeframe on this as volumes not being supported in Nomad is a huge deal breaker for us using Nomad.

@jefflaplante

This comment has been minimized.

Show comment
Hide comment
@jefflaplante

jefflaplante Dec 28, 2015

+1
I agree with Brian on this.

+1
I agree with Brian on this.

@ketzacoatl

This comment has been minimized.

Show comment
Hide comment
@ketzacoatl

ketzacoatl Dec 28, 2015

Contributor

Yea, for my initial use it is acceptable to enable raw_exec and work around this issue, but that is only because this is not yet truly production use. I too could not put nomad in production without the most basic docker volume mount to the host being supported by the docker driver.

Contributor

ketzacoatl commented Dec 28, 2015

Yea, for my initial use it is acceptable to enable raw_exec and work around this issue, but that is only because this is not yet truly production use. I too could not put nomad in production without the most basic docker volume mount to the host being supported by the docker driver.

@adrianlop

This comment has been minimized.

Show comment
Hide comment
@adrianlop

adrianlop Dec 29, 2015

Contributor

+1 need docker volumes too for production.

Contributor

adrianlop commented Dec 29, 2015

+1 need docker volumes too for production.

@wyattanderson

This comment has been minimized.

Show comment
Hide comment
@wyattanderson

wyattanderson Dec 29, 2015

I'm interested in this not just for Docker, but also for qemu and an in-house Xen implementation. That is, it would be nice if the solution was generic enough to be useful for all task drivers.

I'm interested in this not just for Docker, but also for qemu and an in-house Xen implementation. That is, it would be nice if the solution was generic enough to be useful for all task drivers.

@dkerwin

This comment has been minimized.

Show comment
Hide comment
@dkerwin

dkerwin Jan 7, 2016

Contributor

+1 no way to use in production without docker volumes

Contributor

dkerwin commented Jan 7, 2016

+1 no way to use in production without docker volumes

@calvn

This comment has been minimized.

Show comment
Hide comment
@calvn

calvn Jan 8, 2016

Member

👍

Member

calvn commented Jan 8, 2016

👍

@supernomad

This comment has been minimized.

Show comment
Hide comment
@supernomad

supernomad Jan 18, 2016

So I have been running into this issue myself, as its a pretty fundamental idea to use volumes in conjunction with docker.

I understand there is a much larger architecture and design discussion to have around how to manage storage using Nomad in general. However when I was thinking about the issue, I came to the idea of specifying arbitrary commands to pass on down to docker.

Something like this:

            config {
                image = "registry.your.domain/awesome_image:latest"
                command = "/bin/bash"
                args = ["-c", "/usr/bin/start_awesome_image.sh"]
                docker_args = ["-v", "/host/path:/container/path", "--volume-driver=vDriver"]
            }

This would be entirely un-monitored via Nomad, and placing the container so that its volumes worked would be up to the end user, i.e. they would specify the necessary constraints on the job.

No idea if this is even possible, but figured I would voice the idea at the very least.

So I have been running into this issue myself, as its a pretty fundamental idea to use volumes in conjunction with docker.

I understand there is a much larger architecture and design discussion to have around how to manage storage using Nomad in general. However when I was thinking about the issue, I came to the idea of specifying arbitrary commands to pass on down to docker.

Something like this:

            config {
                image = "registry.your.domain/awesome_image:latest"
                command = "/bin/bash"
                args = ["-c", "/usr/bin/start_awesome_image.sh"]
                docker_args = ["-v", "/host/path:/container/path", "--volume-driver=vDriver"]
            }

This would be entirely un-monitored via Nomad, and placing the container so that its volumes worked would be up to the end user, i.e. they would specify the necessary constraints on the job.

No idea if this is even possible, but figured I would voice the idea at the very least.

@jhartman86

This comment has been minimized.

Show comment
Hide comment
@jhartman86

jhartman86 Jan 29, 2016

👍 for --volumes flag. another great use case: running a cadvisor container as a system service on all nodes that can pipe stats to oh, say, influxdb. In this sense, it has less to do w/ persistent storage than providing volume mounts to the container to monitor the underlying host. Per the cadvisor docs on getting it running:

docker run \
  --volume=/:/rootfs:ro \
  --volume=/var/run:/var/run:rw \
  --volume=/sys:/sys:ro \
  --volume=/var/lib/docker/:/var/lib/docker:ro \
  --publish=8080:8080 \
  --detach=true \
  --name=cadvisor \
  google/cadvisor:latest

👍 for --volumes flag. another great use case: running a cadvisor container as a system service on all nodes that can pipe stats to oh, say, influxdb. In this sense, it has less to do w/ persistent storage than providing volume mounts to the container to monitor the underlying host. Per the cadvisor docs on getting it running:

docker run \
  --volume=/:/rootfs:ro \
  --volume=/var/run:/var/run:rw \
  --volume=/sys:/sys:ro \
  --volume=/var/lib/docker/:/var/lib/docker:ro \
  --publish=8080:8080 \
  --detach=true \
  --name=cadvisor \
  google/cadvisor:latest
@let4be

This comment has been minimized.

Show comment
Hide comment
@let4be

let4be Feb 5, 2016

Any way I could use
https://docs.docker.com/engine/extend/plugins_volume/
and
https://github.com/ClusterHQ/flocker
with docker and nomad today?

this seems like an ultimate solution for my needs

let4be commented Feb 5, 2016

Any way I could use
https://docs.docker.com/engine/extend/plugins_volume/
and
https://github.com/ClusterHQ/flocker
with docker and nomad today?

this seems like an ultimate solution for my needs

@let4be

This comment has been minimized.

Show comment
Hide comment
@let4be

let4be Feb 5, 2016

or probably I should use something simpler, like https://github.com/leg100/docker-ebs-attach
hm...

let4be commented Feb 5, 2016

or probably I should use something simpler, like https://github.com/leg100/docker-ebs-attach
hm...

@dadgar

This comment has been minimized.

Show comment
Hide comment
@dadgar

dadgar Feb 5, 2016

Contributor

@let4be: Not currently. There is no support for persistent volumes in Nomad currently

Contributor

dadgar commented Feb 5, 2016

@let4be: Not currently. There is no support for persistent volumes in Nomad currently

@valibud

This comment has been minimized.

Show comment
Hide comment
@valibud

valibud Feb 6, 2016

+1 for using 'currently' twice. I take it as 'it's coming' :).

valibud commented Feb 6, 2016

+1 for using 'currently' twice. I take it as 'it's coming' :).

@let4be

This comment has been minimized.

Show comment
Hide comment
@let4be

let4be Feb 7, 2016

Is there any chance we could at least get persistent node storage for docker in the nearest time?
Currently you can specify VOLUME in the Dockerfile but it seems docker recreates such volume on each container run., unless you specify a map of host folder to such volume - which is not possible to do with nomad

p.s. @supernomad idea about arbitrary docker arguments would be extremely helpful especially considering early stage of nomad, currently it's a pain to work with docker and nomad and nothing we can do about it as users(logging, volumes mostly)

let4be commented Feb 7, 2016

Is there any chance we could at least get persistent node storage for docker in the nearest time?
Currently you can specify VOLUME in the Dockerfile but it seems docker recreates such volume on each container run., unless you specify a map of host folder to such volume - which is not possible to do with nomad

p.s. @supernomad idea about arbitrary docker arguments would be extremely helpful especially considering early stage of nomad, currently it's a pain to work with docker and nomad and nothing we can do about it as users(logging, volumes mostly)

@mainframe

This comment has been minimized.

Show comment
Hide comment
@mainframe

mainframe Feb 7, 2016

+1 for mounting host volumes (-v)

+1 for mounting host volumes (-v)

@lukemarsden

This comment has been minimized.

Show comment
Hide comment
@lukemarsden

lukemarsden Feb 9, 2016

Hey folks! (disclaimer: I'm the CTO at ClusterHQ).

For the floating persistence use case with Docker (you have a cluster in one EC2 zone, they can all access the same set of EBS volumes) you could get this working seamlessly with tools like Flocker by supporting Docker volume plugins explicitly in the Nomad manifest format. Or we could do a direct integration between Nomad and the Flocker control service if you wanted to avoid tying the implementation to Docker as a containerizer.

Flocker gives you the notion of a cluster-global volume name: so naming that volume in the manifest will
a) create it if it doesn't exist
b) allow the container to start immediately if it's already on the right host (where the container got scheduled)
c) move (aka detach it and attach it to the right host) if the volume is not being used on another host.

For all the other use cases discussed (node-based persistence, global storage state tracking, soft affinity, hard affinity, and offline reservations) I'd be really interested in discussing how we could add capabilities to Flocker and its control service to support these sort of more advanced scheduler-related interaction. Maybe a google hangout some time? Come find me (lewq on #clusterhq on Freenode)!

cc @cbednarski & @dadgar

Hey folks! (disclaimer: I'm the CTO at ClusterHQ).

For the floating persistence use case with Docker (you have a cluster in one EC2 zone, they can all access the same set of EBS volumes) you could get this working seamlessly with tools like Flocker by supporting Docker volume plugins explicitly in the Nomad manifest format. Or we could do a direct integration between Nomad and the Flocker control service if you wanted to avoid tying the implementation to Docker as a containerizer.

Flocker gives you the notion of a cluster-global volume name: so naming that volume in the manifest will
a) create it if it doesn't exist
b) allow the container to start immediately if it's already on the right host (where the container got scheduled)
c) move (aka detach it and attach it to the right host) if the volume is not being used on another host.

For all the other use cases discussed (node-based persistence, global storage state tracking, soft affinity, hard affinity, and offline reservations) I'd be really interested in discussing how we could add capabilities to Flocker and its control service to support these sort of more advanced scheduler-related interaction. Maybe a google hangout some time? Come find me (lewq on #clusterhq on Freenode)!

cc @cbednarski & @dadgar

@ketzacoatl

This comment has been minimized.

Show comment
Hide comment
@ketzacoatl

ketzacoatl Feb 9, 2016

Contributor

While Flocker looks nice, it does not seem to displace the need for nomad to have better access to persistent storage on each node in a nomad cluster.

Contributor

ketzacoatl commented Feb 9, 2016

While Flocker looks nice, it does not seem to displace the need for nomad to have better access to persistent storage on each node in a nomad cluster.

@ranjib

This comment has been minimized.

Show comment
Hide comment
@ranjib

ranjib Feb 9, 2016

Contributor

i'd love to see a direct flocker/nomad integration, that I can use across lxc, lxd, rkt and docker. May be via some plugin interface, so that we can add other persistent storage providers as and when they appear.
@ketzacoatl I think it will be possible to configure nomad to use dedicated flocker volumes for the alloc dir etc.. transparently (like have systemd mount unit doing the flocker -> alloc dir mounting) , and nomad service using that mount unit, or via some deep integration.

Contributor

ranjib commented Feb 9, 2016

i'd love to see a direct flocker/nomad integration, that I can use across lxc, lxd, rkt and docker. May be via some plugin interface, so that we can add other persistent storage providers as and when they appear.
@ketzacoatl I think it will be possible to configure nomad to use dedicated flocker volumes for the alloc dir etc.. transparently (like have systemd mount unit doing the flocker -> alloc dir mounting) , and nomad service using that mount unit, or via some deep integration.

@F21

This comment has been minimized.

Show comment
Hide comment
@F21

F21 Feb 9, 2016

@lukemarsden Is local storage on zfs still supported with the recent releases for flocker? Most of the info on zfs seems to have been removed in the recent docs.

F21 commented Feb 9, 2016

@lukemarsden Is local storage on zfs still supported with the recent releases for flocker? Most of the info on zfs seems to have been removed in the recent docs.

@ketzacoatl

This comment has been minimized.

Show comment
Hide comment
@ketzacoatl

ketzacoatl Feb 9, 2016

Contributor

maybe we could create a separate issue focused on flocker/nomad integration?

Contributor

ketzacoatl commented Feb 9, 2016

maybe we could create a separate issue focused on flocker/nomad integration?

@mainframe

This comment has been minimized.

Show comment
Hide comment
@mainframe

mainframe Feb 16, 2016

While it would be beneficial to have shared volume/filesystem support across cluster in the future - please implement at least very simple docker -v argument passing for nomad docker driver. We can handle storage volumes on the host separately from nomad - but if there is no way to do simple bind mount for host volumes into docker containers - then nomad is pretty useless for us. For the future reference - I think docker has volume plugins concept - which is the common interface to various storage backends (like ceph, flocker, etc) - making shared volumes across cluster possible. But this is somewhat docker specific.

While it would be beneficial to have shared volume/filesystem support across cluster in the future - please implement at least very simple docker -v argument passing for nomad docker driver. We can handle storage volumes on the host separately from nomad - but if there is no way to do simple bind mount for host volumes into docker containers - then nomad is pretty useless for us. For the future reference - I think docker has volume plugins concept - which is the common interface to various storage backends (like ceph, flocker, etc) - making shared volumes across cluster possible. But this is somewhat docker specific.

@ryansch

This comment has been minimized.

Show comment
Hide comment
@ryansch

ryansch Feb 16, 2016

I agree. I just want to be able to add arbitrary docker args to the run command. Even if I can use that power to break nomad's integration.

ryansch commented Feb 16, 2016

I agree. I just want to be able to add arbitrary docker args to the run command. Even if I can use that power to break nomad's integration.

@jovandeginste

This comment has been minimized.

Show comment
Hide comment
@jovandeginste

jovandeginste Jul 31, 2016

@ketzacoatl

This comment has been minimized.

Show comment
Hide comment
@ketzacoatl

ketzacoatl Jul 31, 2016

Contributor

@jovandeginste, I posted a simplified example in #150 (comment) - this is in python, and uses the docker-py library, but any language and bindings to interact with docker could be used.

Contributor

ketzacoatl commented Jul 31, 2016

@jovandeginste, I posted a simplified example in #150 (comment) - this is in python, and uses the docker-py library, but any language and bindings to interact with docker could be used.

@faddat

This comment has been minimized.

Show comment
Hide comment
@faddat

faddat Aug 1, 2016

@ketzacoatl

strongly agreed re: "Hashicorp will do it right" in the end also, you get an emoji on your post for suggesting a really good workaround!

so how about that twitter explosion this week starring Mr. Docker himself and Mr. Cloud Stuff Teach You Good himself?

faddat commented Aug 1, 2016

@ketzacoatl

strongly agreed re: "Hashicorp will do it right" in the end also, you get an emoji on your post for suggesting a really good workaround!

so how about that twitter explosion this week starring Mr. Docker himself and Mr. Cloud Stuff Teach You Good himself?

@far-blue

This comment has been minimized.

Show comment
Hide comment
@far-blue

far-blue Aug 16, 2016

As my question issue (#1592) was closed as a duplicate of this discussion issue then I'll just ask the question again in here - when can we expect volume support for docker?

Why am I asking? Because despite all the discussion in this issue and others there doesn't appear to be a clear roadmap / timeline on delivery of volume support and while I am a huge fan of Hashicorp technology and think Nomad has huge potential, I simply can't hang around waiting indefinitely for a future promise - I have a cluster to upgrade.

I completely understand the suggestion that Hashicorp want to do things right but since when has modern software development revolved around doing everything perfectly right first time? Look back at the list of backwards-incompatible changes in Nomad in the last year and you will see this simply isn't a reasonable answer. Everyone accepts that architectures evolve.

At the moment the lack of volume support is impacting the use of containers in Nomad much more than with other task drivers because you can much more easily use shell scripts etc. to work around the issue with the other drivers. Given that a) Docker has good support now for volume management, b) the more people using Nomad the better the traction and the feedback and c) the Docker community has no lack of other options if Nomad lags behind - then I really think it would be in the best interest of the Nomad community to provide access to docker's native volume support as an interim until a more generic solution is available.

As my question issue (#1592) was closed as a duplicate of this discussion issue then I'll just ask the question again in here - when can we expect volume support for docker?

Why am I asking? Because despite all the discussion in this issue and others there doesn't appear to be a clear roadmap / timeline on delivery of volume support and while I am a huge fan of Hashicorp technology and think Nomad has huge potential, I simply can't hang around waiting indefinitely for a future promise - I have a cluster to upgrade.

I completely understand the suggestion that Hashicorp want to do things right but since when has modern software development revolved around doing everything perfectly right first time? Look back at the list of backwards-incompatible changes in Nomad in the last year and you will see this simply isn't a reasonable answer. Everyone accepts that architectures evolve.

At the moment the lack of volume support is impacting the use of containers in Nomad much more than with other task drivers because you can much more easily use shell scripts etc. to work around the issue with the other drivers. Given that a) Docker has good support now for volume management, b) the more people using Nomad the better the traction and the feedback and c) the Docker community has no lack of other options if Nomad lags behind - then I really think it would be in the best interest of the Nomad community to provide access to docker's native volume support as an interim until a more generic solution is available.

@zrml

This comment has been minimized.

Show comment
Hide comment
@zrml

zrml Aug 16, 2016

@far-blue: agree 100%. I'll be testing Nomad very soon. This issue is what refrains me from putting it in the list of products to evaluate.
Thank you guys @hashicorp for listening.

zrml commented Aug 16, 2016

@far-blue: agree 100%. I'll be testing Nomad very soon. This issue is what refrains me from putting it in the list of products to evaluate.
Thank you guys @hashicorp for listening.

@mainframe

This comment has been minimized.

Show comment
Hide comment
@mainframe

mainframe Aug 16, 2016

I still cant understand why enabling simple -v docker option passthrough is ruled out. This could solve problem at hand and would not really block future developments towards unified volume support in Nomad.

I still cant understand why enabling simple -v docker option passthrough is ruled out. This could solve problem at hand and would not really block future developments towards unified volume support in Nomad.

@murphy7801

This comment has been minimized.

Show comment
Hide comment
@murphy7801

murphy7801 Aug 17, 2016

Yeah guys I know your busy but attachable volumes please. Hell since AWS just released EFS is just attach that to containers that be enough.

Yeah guys I know your busy but attachable volumes please. Hell since AWS just released EFS is just attach that to containers that be enough.

@ketzacoatl

This comment has been minimized.

Show comment
Hide comment
@ketzacoatl

ketzacoatl Aug 17, 2016

Contributor

I would not consider EFS usable for most container-driven services.

Contributor

ketzacoatl commented Aug 17, 2016

I would not consider EFS usable for most container-driven services.

@Ghostium

This comment has been minimized.

Show comment
Hide comment
@Ghostium

Ghostium Aug 17, 2016

Last statement from the Hashicorp team was on 17 Jun.

I don't try to be mean, as we get such a great product for free, we cannot expect anything in return from you, but is it so hard to add an short statement like "We working on it and release a standard/model whatever later"?

@dadgar @diptanu

Last statement from the Hashicorp team was on 17 Jun.

I don't try to be mean, as we get such a great product for free, we cannot expect anything in return from you, but is it so hard to add an short statement like "We working on it and release a standard/model whatever later"?

@dadgar @diptanu

@camerondavison

This comment has been minimized.

Show comment
Hide comment
@camerondavison

camerondavison Aug 17, 2016

Contributor

I was thinking about this issue. Why do people not want to use the work around, raw_exec driver and call docker client from the command line. I know for me it is because the raw_exec driver makes using service discovery and port allocations and memory management more painful and adds a lot of boiler plate.

Maybe instead of starting the service task using volume configuration, the job could have 2 tasks in one group (so they are put on the same machine) and have task 1 be the service, and then task 2 be a raw_exec driver that mounts the volume into the alloc directory that is shared between tasks in the same group. I feel like this would kind of be best of both worlds. You can use all of the docker plugins or other things that mount storage locally, and the docker driver for the service task which handles service discovery and memory/cpu management.

Only problem with this solution would be that there could exist a race condition between the 2 tasks starting up such that task 1 would have a brief period of time where the volume was not available in the alloc directory (could be easy to just wait for the directory to exist or for the raw_exec driver to touch some success file or something)

I know not the most awesome solution, but maybe a helpful work around? I personally use https://github.com/novilabs/bifurcate to wait for a file to exist before starting up a program

Contributor

camerondavison commented Aug 17, 2016

I was thinking about this issue. Why do people not want to use the work around, raw_exec driver and call docker client from the command line. I know for me it is because the raw_exec driver makes using service discovery and port allocations and memory management more painful and adds a lot of boiler plate.

Maybe instead of starting the service task using volume configuration, the job could have 2 tasks in one group (so they are put on the same machine) and have task 1 be the service, and then task 2 be a raw_exec driver that mounts the volume into the alloc directory that is shared between tasks in the same group. I feel like this would kind of be best of both worlds. You can use all of the docker plugins or other things that mount storage locally, and the docker driver for the service task which handles service discovery and memory/cpu management.

Only problem with this solution would be that there could exist a race condition between the 2 tasks starting up such that task 1 would have a brief period of time where the volume was not available in the alloc directory (could be easy to just wait for the directory to exist or for the raw_exec driver to touch some success file or something)

I know not the most awesome solution, but maybe a helpful work around? I personally use https://github.com/novilabs/bifurcate to wait for a file to exist before starting up a program

@far-blue

This comment has been minimized.

Show comment
Hide comment
@far-blue

far-blue Aug 17, 2016

From my perspective, it goes like this:

  • I can't use the docker taskdriver for volumes so I'll use raw_exec
  • I'll raw_exec docker-compose so I can easily manage a group of containers rather than raw_exec them separately via Nomad
  • I can't scale very easily in nomad if I'm raw_exec'ing with docker-compose so maybe I should just use Swarm instead as it will be much easer
  • I'll not bother with Nomad

From my perspective, it goes like this:

  • I can't use the docker taskdriver for volumes so I'll use raw_exec
  • I'll raw_exec docker-compose so I can easily manage a group of containers rather than raw_exec them separately via Nomad
  • I can't scale very easily in nomad if I'm raw_exec'ing with docker-compose so maybe I should just use Swarm instead as it will be much easer
  • I'll not bother with Nomad
@dadgar

This comment has been minimized.

Show comment
Hide comment
@dadgar

dadgar Aug 17, 2016

Contributor

Hey folks,

Persistent volumes are on our roadmap tentatively for 0.6 and we will have another novel disk feature coming in 0.5. Thanks for your patience. As mentioned before, the scope of a project like this is fairly huge and given limited development efforts we are trying to solve the stateless applications first and solve them very well!

Post 0.5, I believe we will be in a very good place to start tackling state-ful services as well.

Thanks,
Alex

Contributor

dadgar commented Aug 17, 2016

Hey folks,

Persistent volumes are on our roadmap tentatively for 0.6 and we will have another novel disk feature coming in 0.5. Thanks for your patience. As mentioned before, the scope of a project like this is fairly huge and given limited development efforts we are trying to solve the stateless applications first and solve them very well!

Post 0.5, I believe we will be in a very good place to start tackling state-ful services as well.

Thanks,
Alex

@far-blue

This comment has been minimized.

Show comment
Hide comment
@far-blue

far-blue Aug 17, 2016

@dadgar - thank you for the update :) While it sounds like moaning I'm genuinely very keen on Nomad and really want to both use it myself and see it succeed. My concerns and frustrations express as irritation and moans!

@dadgar - thank you for the update :) While it sounds like moaning I'm genuinely very keen on Nomad and really want to both use it myself and see it succeed. My concerns and frustrations express as irritation and moans!

@dvusboy

This comment has been minimized.

Show comment
Hide comment
@dvusboy

dvusboy Aug 17, 2016

Contributor

@far-blue @a86c6f7964: I too am using raw_exec + docker-compose as a workaround. The trouble with that is clean-up when one kills a job. When the Nomad executor sends a SIGINT to docker-compose, it does not clean up the containers and volumes by default; you have to explicitly do docker-compose down. For that and other reasons, we have a wrapper shell script to trap SIGINT. There is an outstanding feature request for 'pre-' and 'post-' task hooks. That should help as long as the post-task hooks get run even when it's triggered by nomad stop.

Contributor

dvusboy commented Aug 17, 2016

@far-blue @a86c6f7964: I too am using raw_exec + docker-compose as a workaround. The trouble with that is clean-up when one kills a job. When the Nomad executor sends a SIGINT to docker-compose, it does not clean up the containers and volumes by default; you have to explicitly do docker-compose down. For that and other reasons, we have a wrapper shell script to trap SIGINT. There is an outstanding feature request for 'pre-' and 'post-' task hooks. That should help as long as the post-task hooks get run even when it's triggered by nomad stop.

@faddat

This comment has been minimized.

Show comment
Hide comment
@faddat

faddat Aug 22, 2016

@dvusboy @far-blue @a86c6f7964

THE SOLUTION WAS SO BLINDINGLY OBVIOUS! (yet I couldn't see it)
:).

Thanks!

@dadgar
The Hashicorp suite of tools is fan-freaking-tastic: You guys just keep doing what ya do :).

faddat commented Aug 22, 2016

@dvusboy @far-blue @a86c6f7964

THE SOLUTION WAS SO BLINDINGLY OBVIOUS! (yet I couldn't see it)
:).

Thanks!

@dadgar
The Hashicorp suite of tools is fan-freaking-tastic: You guys just keep doing what ya do :).

@cetex

This comment has been minimized.

Show comment
Hide comment
@cetex

cetex Aug 28, 2016

@dadgar The most important and very simple feature i'd like to see is that we can do a simple bind-mount into the containers (docker's -v option, something similar for rkt and whatever else there is)
This makes it so that we can run stuff in containers and keep control of the data and actually dare to run more important stateful services like databases inside containers, since all data is stored outside of the container environment there's much less risk of dataloss because of screwups in the container service. (Docker in our env has had it's fair share of those)

Other features like integrations with dockers "storage" containers won't get near our persistent data since those introduce quite a bit of complexity (and dependencies on the container service to make sure data is migrated whenever we update the container service, be it docker, rkt or anything similar)

We run services like zookeeper, kafka, mesos, cassandra, haproxy, docker-registry, nginx and similar inside the containers we manage with our service, but we'd like to manage those services/containers fully through nomad instead. which means "system" jobs for most deploys.
Mesos is then used to manage our api/web and similar services, at least for now. To do this the mesos-slave container needs to mount the docker socket and a couple of other paths from the host os into the container as well. Support for things like this is a requirement and this works quite well with our docker setup today.

Since services have quite varying requirements we define roles for hosts with different specs, the requirements of services on the infrastructure-level vary so much that it's not really useful to try and launch stuff fully dynamically on random nodes.
It's for example not the right thing to do to run something cpu-intensive like a compute-task on a node specced to run kafka, (not much cpu or mem but lots of not-so-fast disk), and it's not really the best option to allocate all storage on a compute-node (small amount of not-so-fast-disk) for a cassandra node that won't make use of all cpu but will choke on disk throughput and allocate all available storage making the node and most of the node's cpu unusable for other services.

In our case we don't need or even want any magic for finding or managing storage, we want to tell nomad what servers to run which task on (through system tasks in this case.) All nodes with role/class "cassandra" run cassandra container/service and all those nodes have decently specced storage that we guarantee will be available at the same place on the host. This is also a requirement to be able to monitor diskspace and disk utilization for each class/role of service properly.

Regarding security:
Each cluster in our environment only has one "customer", us. There's no requirements or needs to try to limit access to the host os from inside these containers for us, containers are just a compatibilitylayer in our case. (mesos requires java version X, a service we run requires version Y, aurora's thermos-executor doesn't work with anything else than python 2.7 while we require python 3.5 for some services, the host runs ubuntu 14.04 while we want 16.04 to be able to compile some libraries properly).
If some person has access to deploy through nomad they most likely also has access to become root on the hosts..
The paths that are allowed to be mounted into containers can in our case be limited by the nomad client through a whitelist or similar (can, but doesn't have to), but it's important that we can setup a relatively relaxed whitelist like '/data/*' and not have to explicitly specify every allowed path since this is subject to change and would slow down management / development and similar if it's too strict.

cetex commented Aug 28, 2016

@dadgar The most important and very simple feature i'd like to see is that we can do a simple bind-mount into the containers (docker's -v option, something similar for rkt and whatever else there is)
This makes it so that we can run stuff in containers and keep control of the data and actually dare to run more important stateful services like databases inside containers, since all data is stored outside of the container environment there's much less risk of dataloss because of screwups in the container service. (Docker in our env has had it's fair share of those)

Other features like integrations with dockers "storage" containers won't get near our persistent data since those introduce quite a bit of complexity (and dependencies on the container service to make sure data is migrated whenever we update the container service, be it docker, rkt or anything similar)

We run services like zookeeper, kafka, mesos, cassandra, haproxy, docker-registry, nginx and similar inside the containers we manage with our service, but we'd like to manage those services/containers fully through nomad instead. which means "system" jobs for most deploys.
Mesos is then used to manage our api/web and similar services, at least for now. To do this the mesos-slave container needs to mount the docker socket and a couple of other paths from the host os into the container as well. Support for things like this is a requirement and this works quite well with our docker setup today.

Since services have quite varying requirements we define roles for hosts with different specs, the requirements of services on the infrastructure-level vary so much that it's not really useful to try and launch stuff fully dynamically on random nodes.
It's for example not the right thing to do to run something cpu-intensive like a compute-task on a node specced to run kafka, (not much cpu or mem but lots of not-so-fast disk), and it's not really the best option to allocate all storage on a compute-node (small amount of not-so-fast-disk) for a cassandra node that won't make use of all cpu but will choke on disk throughput and allocate all available storage making the node and most of the node's cpu unusable for other services.

In our case we don't need or even want any magic for finding or managing storage, we want to tell nomad what servers to run which task on (through system tasks in this case.) All nodes with role/class "cassandra" run cassandra container/service and all those nodes have decently specced storage that we guarantee will be available at the same place on the host. This is also a requirement to be able to monitor diskspace and disk utilization for each class/role of service properly.

Regarding security:
Each cluster in our environment only has one "customer", us. There's no requirements or needs to try to limit access to the host os from inside these containers for us, containers are just a compatibilitylayer in our case. (mesos requires java version X, a service we run requires version Y, aurora's thermos-executor doesn't work with anything else than python 2.7 while we require python 3.5 for some services, the host runs ubuntu 14.04 while we want 16.04 to be able to compile some libraries properly).
If some person has access to deploy through nomad they most likely also has access to become root on the hosts..
The paths that are allowed to be mounted into containers can in our case be limited by the nomad client through a whitelist or similar (can, but doesn't have to), but it's important that we can setup a relatively relaxed whitelist like '/data/*' and not have to explicitly specify every allowed path since this is subject to change and would slow down management / development and similar if it's too strict.

@csawyerYumaed

This comment has been minimized.

Show comment
Hide comment
@csawyerYumaed

csawyerYumaed Aug 30, 2016

Contributor

I've made a generic workaround to handle docker volumes using the raw_exec driver, available on github here: https://github.com/csawyerYumaed/nomad-docker
It handles cleaning up after itself (stopping container/etc). It's not perfect, but it seems to do the trick for now.

Contributor

csawyerYumaed commented Aug 30, 2016

I've made a generic workaround to handle docker volumes using the raw_exec driver, available on github here: https://github.com/csawyerYumaed/nomad-docker
It handles cleaning up after itself (stopping container/etc). It's not perfect, but it seems to do the trick for now.

@carlanton

This comment has been minimized.

Show comment
Hide comment
@carlanton

carlanton Sep 12, 2016

If you want to use Docker bind mounts in Nomad but still want to use the docker driver, you should totally check out this new as-good-as-ready-for-production tool I just made: https://github.com/carlanton/nomad-docker-wrapper
It wraps the Docker socket with a new socket that allow you to specify bind mounts as environment variables. Still hacky, but just a bit less hacky than using raw_exec :)

If you want to use Docker bind mounts in Nomad but still want to use the docker driver, you should totally check out this new as-good-as-ready-for-production tool I just made: https://github.com/carlanton/nomad-docker-wrapper
It wraps the Docker socket with a new socket that allow you to specify bind mounts as environment variables. Still hacky, but just a bit less hacky than using raw_exec :)

@diptanu

This comment has been minimized.

Show comment
Hide comment
@diptanu

diptanu Sep 22, 2016

Collaborator

We are going to start working on volume plugins in the next Nomad release. But in the interim(in the upcoming 0.5 release), we will enable users to pass the volume configuration option in the docker driver configuration.

Also, operators will have to explicitly opt into allowing users to pass the volume/volume driver related configuration option in their jobs by enabling it in Nomad client config.

Users should keep in mind that Nomad won't be responsible for cleaning up things behind the scenes with respect to network based file systems until the support for Nomad's own volume plugins come out.

Collaborator

diptanu commented Sep 22, 2016

We are going to start working on volume plugins in the next Nomad release. But in the interim(in the upcoming 0.5 release), we will enable users to pass the volume configuration option in the docker driver configuration.

Also, operators will have to explicitly opt into allowing users to pass the volume/volume driver related configuration option in their jobs by enabling it in Nomad client config.

Users should keep in mind that Nomad won't be responsible for cleaning up things behind the scenes with respect to network based file systems until the support for Nomad's own volume plugins come out.

@far-blue

This comment has been minimized.

Show comment
Hide comment
@far-blue

far-blue Sep 23, 2016

That's great news and a sensible intermediate step

That's great news and a sensible intermediate step

@tlvenn

This comment has been minimized.

Show comment
Hide comment
@tlvenn

tlvenn Sep 29, 2016

@diptanu is there any chance to bring that to rkt as well ?

tlvenn commented Sep 29, 2016

@diptanu is there any chance to bring that to rkt as well ?

@w-p

This comment has been minimized.

Show comment
Hide comment
@w-p

w-p Oct 26, 2016

Is there a schedule attached to that release by chance? Hard to sell people on Nomad without mounts.

w-p commented Oct 26, 2016

Is there a schedule attached to that release by chance? Hard to sell people on Nomad without mounts.

@diptanu

This comment has been minimized.

Show comment
Hide comment
@diptanu

diptanu Oct 26, 2016

Collaborator

@w-p We are trying to do the RC release this week, and main release next week.

Collaborator

diptanu commented Oct 26, 2016

@w-p We are trying to do the RC release this week, and main release next week.

@w-p

This comment has been minimized.

Show comment
Hide comment
@w-p

w-p Nov 2, 2016

Thanks for getting the RC out.

w-p commented Nov 2, 2016

Thanks for getting the RC out.

@ekarlso

This comment has been minimized.

Show comment
Hide comment
@ekarlso

ekarlso Nov 11, 2016

Is there any support now for doing stuff like MySQL with persistent data volumes?

ekarlso commented Nov 11, 2016

Is there any support now for doing stuff like MySQL with persistent data volumes?

@donovanmuller

This comment has been minimized.

Show comment
Hide comment
@donovanmuller

donovanmuller Nov 16, 2016

@ekarlso It looks like 0.5 (currently at 0.5.0-rc2) supports both Docker (driver/docker: Support Docker volumes [GH-1767]) and rkt volumes.

@ekarlso It looks like 0.5 (currently at 0.5.0-rc2) supports both Docker (driver/docker: Support Docker volumes [GH-1767]) and rkt volumes.

@kaskavalci

This comment has been minimized.

Show comment
Hide comment
@kaskavalci

kaskavalci Jan 13, 2017

Contributor

@diptanu, do you have any milestone or ETA for volume drivers?

Contributor

kaskavalci commented Jan 13, 2017

@diptanu, do you have any milestone or ETA for volume drivers?

@far-blue

This comment has been minimized.

Show comment
Hide comment
@far-blue

far-blue Jan 13, 2017

I believe they are now supported. You can pass, in the docker config section of the job spec, an array of strings with the same format you would use in the docker run -v command.

I believe they are now supported. You can pass, in the docker config section of the job spec, an array of strings with the same format you would use in the docker run -v command.

@erickhan

This comment has been minimized.

Show comment
Hide comment
@erickhan

erickhan Mar 2, 2017

If the crux of this issue is Docker volume driver, I think you guys addressed it with the recent PR*.

If it's about extending the resource model, I'd suggest that'll take quite some time and maybe become it's own design. Defer to others, as I'm just learning about Nomad myself. thanks!

*#2351

erickhan commented Mar 2, 2017

If the crux of this issue is Docker volume driver, I think you guys addressed it with the recent PR*.

If it's about extending the resource model, I'd suggest that'll take quite some time and maybe become it's own design. Defer to others, as I'm just learning about Nomad myself. thanks!

*#2351

@c4milo

This comment has been minimized.

Show comment
Hide comment
@c4milo

c4milo May 22, 2017

Contributor

@dadgar is this one on track for 0.6.0?

Contributor

c4milo commented May 22, 2017

@dadgar is this one on track for 0.6.0?

@dadgar

This comment has been minimized.

Show comment
Hide comment
@dadgar

dadgar May 24, 2017

Contributor

@c4milo No, this isn't being tackled in 0.6.0

Contributor

dadgar commented May 24, 2017

@c4milo No, this isn't being tackled in 0.6.0

@maticmeznar

This comment has been minimized.

Show comment
Hide comment
@maticmeznar

maticmeznar Nov 14, 2017

Since Nomad 0.7.0, what is the recommended best practice for running a database Docker container that requires a persistent data volume? ephemeral_disk does not offer any guarantee and only works if the database is clustered. Should constraint be used to lock the job to a specific node and then use volumes Docker driver option?

Since Nomad 0.7.0, what is the recommended best practice for running a database Docker container that requires a persistent data volume? ephemeral_disk does not offer any guarantee and only works if the database is clustered. Should constraint be used to lock the job to a specific node and then use volumes Docker driver option?

@alexey-guryanov

This comment has been minimized.

Show comment
Hide comment
@alexey-guryanov

alexey-guryanov Nov 14, 2017

@maticmeznar, I cannot speak for "recommended best practice", and there is more than one way to achieve it, but I can share an approach that we are using at the moment.
When we want a persistent storage for anything running in managed (by Nomad in this case) Docker container, we decided that we want this storage to be redundant on its own (regardless of the content we put there), and available on all Nomad nodes, so a particular Docker container can be rescheduled to another node in Nomad cluster and still access the same data.

That can be achieved in more than one way, for instance, there is REX-Ray and solutions alike, that look attractive for using a cloud provider storage (like AWS S3, Google Cloud Storage, etc.), but we haven't tried it.

What we are using at the moment is a separate distributed replicated storage cluster (we use GlusterFS at the moment, there are alternatives), mounting GlusterFS volume(s) on each node in Nomad cluster, and mapping an appropriate folder from mounted volume into Docker container.
For instance:

  • mount some GlusterFS volume as /shared_data on all nodes in Nomad cluster
  • create a folder in there for a particular application, say /shared_data/some_app_postgresql
  • define a volume in Nomad job specification:
job "some_app" {
    group "some_app_db" {
        task "some_app_db" {
            driver = "docker"
            config {
                image = "some-postgresql-image"
                volumes = [
                    "/shared_data/some_app_postgresql:/var/lib/postgresql/data/pgdata"
                ]
            }
        }
    }
}

Again, there are multiple ways to go about data persitency with managed Docker containers, hope our perspective may be helpful to somebody.

@maticmeznar, I cannot speak for "recommended best practice", and there is more than one way to achieve it, but I can share an approach that we are using at the moment.
When we want a persistent storage for anything running in managed (by Nomad in this case) Docker container, we decided that we want this storage to be redundant on its own (regardless of the content we put there), and available on all Nomad nodes, so a particular Docker container can be rescheduled to another node in Nomad cluster and still access the same data.

That can be achieved in more than one way, for instance, there is REX-Ray and solutions alike, that look attractive for using a cloud provider storage (like AWS S3, Google Cloud Storage, etc.), but we haven't tried it.

What we are using at the moment is a separate distributed replicated storage cluster (we use GlusterFS at the moment, there are alternatives), mounting GlusterFS volume(s) on each node in Nomad cluster, and mapping an appropriate folder from mounted volume into Docker container.
For instance:

  • mount some GlusterFS volume as /shared_data on all nodes in Nomad cluster
  • create a folder in there for a particular application, say /shared_data/some_app_postgresql
  • define a volume in Nomad job specification:
job "some_app" {
    group "some_app_db" {
        task "some_app_db" {
            driver = "docker"
            config {
                image = "some-postgresql-image"
                volumes = [
                    "/shared_data/some_app_postgresql:/var/lib/postgresql/data/pgdata"
                ]
            }
        }
    }
}

Again, there are multiple ways to go about data persitency with managed Docker containers, hope our perspective may be helpful to somebody.

@moritzheiber

This comment has been minimized.

Show comment
Hide comment
@moritzheiber

moritzheiber Feb 3, 2018

Contributor

The absence of a proper solution to volume management with Nomad is literally the only reason I cannot recommend it to our clients and/or use it instead of Kubernetes. Its Vault and Consul integration, ease of use, minimal installation overhead and workload support is intriguing, but it all doesn't matter because it cannot be trusted with persistent data 😞

I wish this was higher up the product backlog.

Contributor

moritzheiber commented Feb 3, 2018

The absence of a proper solution to volume management with Nomad is literally the only reason I cannot recommend it to our clients and/or use it instead of Kubernetes. Its Vault and Consul integration, ease of use, minimal installation overhead and workload support is intriguing, but it all doesn't matter because it cannot be trusted with persistent data 😞

I wish this was higher up the product backlog.

@ketzacoatl

This comment has been minimized.

Show comment
Hide comment
@ketzacoatl

ketzacoatl Feb 5, 2018

Contributor
Contributor

ketzacoatl commented Feb 5, 2018

@jsilberm

This comment has been minimized.

Show comment
Hide comment
@jsilberm

jsilberm Feb 5, 2018

Contributor

If you look at this thread's origin --- "Nomad should have some way for tasks to acquire persistent storage on nodes." --- it doesn't say that Nomad itself should procure/acquire the persistent storage, only that the task should have a way.

One way is through "container storage on demand". Assuming use of the Nomad 'docker' driver, if the volume-driver plugin can present relevant meta-data at run-time, then it's possible for the storage to be provisioned on-demand when the task starts.

Here's what this might look like:

task "my-app" {
      driver = "docker"
      config {
        image = "myapp/my-image:latest"
        volumes = [
          "name=myvol,size=10,repl=3:/mnt/myapp",
        ]
        volume_driver = "pxd"
    }

In this case, a 10GB volume named "myvol" gets created, with synchronous replication on 3 nodes and is mapped into the container at "/mnt/myapp". The task acquires the persistent storage.

This capability is available today through the Portworx volume-driver plugin, as documented here: https://docs.portworx.com/scheduler/nomad/install.html

(*) disclaimer: I work at Portworx.

Contributor

jsilberm commented Feb 5, 2018

If you look at this thread's origin --- "Nomad should have some way for tasks to acquire persistent storage on nodes." --- it doesn't say that Nomad itself should procure/acquire the persistent storage, only that the task should have a way.

One way is through "container storage on demand". Assuming use of the Nomad 'docker' driver, if the volume-driver plugin can present relevant meta-data at run-time, then it's possible for the storage to be provisioned on-demand when the task starts.

Here's what this might look like:

task "my-app" {
      driver = "docker"
      config {
        image = "myapp/my-image:latest"
        volumes = [
          "name=myvol,size=10,repl=3:/mnt/myapp",
        ]
        volume_driver = "pxd"
    }

In this case, a 10GB volume named "myvol" gets created, with synchronous replication on 3 nodes and is mapped into the container at "/mnt/myapp". The task acquires the persistent storage.

This capability is available today through the Portworx volume-driver plugin, as documented here: https://docs.portworx.com/scheduler/nomad/install.html

(*) disclaimer: I work at Portworx.

@iwvelando

This comment has been minimized.

Show comment
Hide comment
@iwvelando

iwvelando Apr 5, 2018

Hello, I've seen a lot of discussion about persistent storage with Docker containers which I've been using effectively. However I'm also keenly interested in persistent storage for qemu VMs scheduled through nomad. I may have overlooked something but I don't see this as an option.

Is there any expectation of adding this? Or is there any path with existing configuration to achieving some form of persistent storage?

iwvelando commented Apr 5, 2018

Hello, I've seen a lot of discussion about persistent storage with Docker containers which I've been using effectively. However I'm also keenly interested in persistent storage for qemu VMs scheduled through nomad. I may have overlooked something but I don't see this as an option.

Is there any expectation of adding this? Or is there any path with existing configuration to achieving some form of persistent storage?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment