Persistent data on nodes #150

F21 · 2015-09-29T10:45:52Z

Nomad should have some way for tasks to acquire persistent storage on nodes. In a lot of cases, we might want to run our own hdfs or ceph cluster on nomad.

That means, things like hdfs' datanodes needs to be able to reserve persistent storage on the node it is launched on. If the whole cluster goes down, once its brought back up, the appropriate tasks should be launched on its original nodes (where possible), so that it can gain access to data it has previously written.

zrml · 2015-09-30T08:18:29Z

+1
we also want to mount a specific FS of a shared storage volume...
we need an API/flag to be able to specify this affinity

F21 · 2015-10-12T22:03:58Z

It would be awesome if direct attached storage can be implemented as a preview of some sort.

One of the things that came to my mind is the idea of updating containers while keeping the storage:

For example, let's say we have a MySQL 5.6.16 container running and it was allocated 20GB of storage on a client to store its data. If there's a new version of the MySQL container (5.6.17), we want to be able to swap out the container but still keep the storage (persistent data) and have it mounted into the updated container. This way, we can upgrade the infrastructure containers without data loss or having a complicated upgrade process that requires backing up the data, upgrading then restoring it.

zrml · 2015-10-13T09:27:16Z

@F21 good use case too.

cbednarski · 2015-10-15T00:41:32Z

Just to add some details to this, we need two overarching features: one is the notion of persistent storage that is either mounted into or otherwise persisted on the node, and is accounted for in terms of total disk space available for allocations.

The second is global tracking of where state is located in the cluster so jobs can be rescheduled with hard (mysql) / soft (riak) affinity for nodes that already have their data, and possibly a mechanism to reserve the resources even if the job fails or is not running.

Since these features are quite large we would implement them gradually. For example floating persistence (a la EBS), node-based persistence, global storage state tracking, soft affinity, hard affinity, and offline reservations as independent milestones. I yet can't speak to when / if we will implement these features.

zrml · 2015-10-15T10:00:07Z

@cbednarski sounds like you guys are going in the right direction! Cool!

melo · 2015-10-18T08:01:53Z

Hi @cbednarski, thanks for the explanation of your goals.

This is something that we would also like to see. Other services that we would like to manage that require data storage are Redis Sentinel, Consul, and NSQ.

Currently we dedicate nodes to this tasks, so affinity is something that we manage manually. I understand that other use cases might need/want some more magical targeting, but it would be interesting to see some way of manually deciding this before deciding on how this fits into the overall model.

My point is that if nomad provides some way of manually assigning data volumes to containers, and leave the logic of making sure the containers only start on the correct hosts to manual configuration, then we could start to get a feeling of how it all works, and with that experience, design better models afterwards.

I found this in the code:

nomad/client/driver/docker.go

Line 103 in 628f395

    
           func (d *DockerDriver) containerBinds(alloc *allocdir.AllocDir, task *structs.Task) ([]string, error) {

I'm guessed SharedDir is a step in that direction, right?

Thank you,

diptanu · 2015-10-18T08:41:11Z

+1 to @cbednarski's thoughts.

We might also need to think about identity of data volumes. Data Volumes are slightly different than other compute resources like cpu/memory/disk etc which are scalar in nature since these are resources which can be loaned to any processes that might need compute resources where as volumes are usually used by the same type of process which created it in the first place and users might need to refer to a volume while specifying a task definition.

For ex -

resources {
   ....
   volumes = [
     {
        name = "db_personalization_001",
        size = 100000,
      },
   ]
}

In this example we are asking Nomad to place the task on a machine where the volume named db_personalization_001 exists and if it doesn't exist Nomad can create the volume on a machine where it can provide 100GB of disk space and that matches the other constraints that the user might have mentioned. While creating the new volume if a volume with that identity wasn't already present in the cluster we would also need to persist the identity of the volume in a manner which can be restored during a disaster recovery operation..

F21 · 2015-11-18T09:56:15Z

Maybe it's possible to lean on https://github.com/emccode/rexray for non-local storage such as Amazon EBS. It doesn't manage persistence on local disks though, so that portion would still need to be implemented.

gourao · 2015-11-19T14:58:09Z

I am one of the maintainers of https://github.com/libopenstorage/openstorage. The goal of this project is to provide persistent cluster aware storage to Linux containers (Docker in particular). It supports both data volumes as well as the Graph driver interface. So your images and data are persisted in a multi node scheduler aware manner. I hope this project can help what Nomad would like to achieve.

The open storage daemon (OSD) itself runs on every node as a Docker container. They discover other OSD nodes in the cluster via a KV DB. An container run via the Docker remote API can leverage volumes and graph support from OSD. OSD in turn can support multiple persistent backends. Ideally this would work for Nomad without doing much.

The specs are also available at openstorage.org

F21 · 2015-11-19T21:14:29Z

@gourao That sounds really exiciting! Are there any plans to support things beyond docker: qemu, rkt, raw exec etc?

gourao · 2015-11-19T21:45:13Z

Yes @F21, that's the plan. There are a few folks looking at rkt support, and as the OCI spec becomes more concrete, this will hopefully be a solved problem.

erSitzt · 2015-12-07T20:00:24Z

+1
Access to persistent storage mounted or available directly on the node would be great.

While testing nomad with simple containers i did not realize that there was no option in the job syntax for bind mounts which i used when dealing with docker directly. :(

I like @diptanu's proposal.
But wouldn't it be easier to just let users specify volumes to mount into a container in a way we do it with docker directly ? Nomad could check the existence of the path and the free space for that mountpoint.

As @melo mentioned nomad is already doing something like this

nomad/client/driver/docker.go

Line 103 in 628f395

    
           func (d *DockerDriver) containerBinds(alloc *allocdir.AllocDir, task *structs.Task) ([]string, error) {

Most other tools to manage docker containers allow users specify volumes on container creation (Marathon, Shipyard for Example... Kubernetes too i think?)

I'm a novice in go, so i did not try anything myself by now. :)

ketzacoatl · 2015-12-28T02:49:23Z

Both #62 and #630 are tracking this simpler use case of mounting a path from the host as a volume mount for the docker container.

bscott · 2015-12-28T22:22:07Z

+1

Any timeframe on this as volumes not being supported in Nomad is a huge deal breaker for us using Nomad.

jefflaplante · 2015-12-28T23:23:32Z

+1
I agree with Brian on this.

ketzacoatl · 2015-12-28T23:29:35Z

Yea, for my initial use it is acceptable to enable raw_exec and work around this issue, but that is only because this is not yet truly production use. I too could not put nomad in production without the most basic docker volume mount to the host being supported by the docker driver.

adrianlop · 2015-12-29T08:26:45Z

+1 need docker volumes too for production.

wyattanderson · 2015-12-29T16:15:03Z

I'm interested in this not just for Docker, but also for qemu and an in-house Xen implementation. That is, it would be nice if the solution was generic enough to be useful for all task drivers.

dkerwin · 2016-01-07T22:41:44Z

+1 no way to use in production without docker volumes

calvn · 2016-01-08T20:05:37Z

👍

supernomad · 2016-01-18T20:02:35Z

So I have been running into this issue myself, as its a pretty fundamental idea to use volumes in conjunction with docker.

I understand there is a much larger architecture and design discussion to have around how to manage storage using Nomad in general. However when I was thinking about the issue, I came to the idea of specifying arbitrary commands to pass on down to docker.

Something like this:

            config {
                image = "registry.your.domain/awesome_image:latest"
                command = "/bin/bash"
                args = ["-c", "/usr/bin/start_awesome_image.sh"]
                docker_args = ["-v", "/host/path:/container/path", "--volume-driver=vDriver"]
            }

This would be entirely un-monitored via Nomad, and placing the container so that its volumes worked would be up to the end user, i.e. they would specify the necessary constraints on the job.

No idea if this is even possible, but figured I would voice the idea at the very least.

jhartman86 · 2016-01-29T17:47:44Z

👍 for --volumes flag. another great use case: running a cadvisor container as a system service on all nodes that can pipe stats to oh, say, influxdb. In this sense, it has less to do w/ persistent storage than providing volume mounts to the container to monitor the underlying host. Per the cadvisor docs on getting it running:

docker run \
  --volume=/:/rootfs:ro \
  --volume=/var/run:/var/run:rw \
  --volume=/sys:/sys:ro \
  --volume=/var/lib/docker/:/var/lib/docker:ro \
  --publish=8080:8080 \
  --detach=true \
  --name=cadvisor \
  google/cadvisor:latest

let4be · 2016-02-05T15:03:04Z

Any way I could use
https://docs.docker.com/engine/extend/plugins_volume/
and
https://github.com/ClusterHQ/flocker
with docker and nomad today?

this seems like an ultimate solution for my needs

let4be · 2016-02-05T16:01:13Z

or probably I should use something simpler, like https://github.com/leg100/docker-ebs-attach
hm...

dadgar · 2016-02-05T20:03:36Z

@let4be: Not currently. There is no support for persistent volumes in Nomad currently

faddat · 2016-08-22T20:46:39Z

@dvusboy @far-blue @a86c6f7964

THE SOLUTION WAS SO BLINDINGLY OBVIOUS! (yet I couldn't see it)
:).

Thanks!

@dadgar
The Hashicorp suite of tools is fan-freaking-tastic: You guys just keep doing what ya do :).

cetex · 2016-08-28T18:38:18Z

@dadgar The most important and very simple feature i'd like to see is that we can do a simple bind-mount into the containers (docker's -v option, something similar for rkt and whatever else there is)
This makes it so that we can run stuff in containers and keep control of the data and actually dare to run more important stateful services like databases inside containers, since all data is stored outside of the container environment there's much less risk of dataloss because of screwups in the container service. (Docker in our env has had it's fair share of those)

Other features like integrations with dockers "storage" containers won't get near our persistent data since those introduce quite a bit of complexity (and dependencies on the container service to make sure data is migrated whenever we update the container service, be it docker, rkt or anything similar)

We run services like zookeeper, kafka, mesos, cassandra, haproxy, docker-registry, nginx and similar inside the containers we manage with our service, but we'd like to manage those services/containers fully through nomad instead. which means "system" jobs for most deploys.
Mesos is then used to manage our api/web and similar services, at least for now. To do this the mesos-slave container needs to mount the docker socket and a couple of other paths from the host os into the container as well. Support for things like this is a requirement and this works quite well with our docker setup today.

Since services have quite varying requirements we define roles for hosts with different specs, the requirements of services on the infrastructure-level vary so much that it's not really useful to try and launch stuff fully dynamically on random nodes.
It's for example not the right thing to do to run something cpu-intensive like a compute-task on a node specced to run kafka, (not much cpu or mem but lots of not-so-fast disk), and it's not really the best option to allocate all storage on a compute-node (small amount of not-so-fast-disk) for a cassandra node that won't make use of all cpu but will choke on disk throughput and allocate all available storage making the node and most of the node's cpu unusable for other services.

In our case we don't need or even want any magic for finding or managing storage, we want to tell nomad what servers to run which task on (through system tasks in this case.) All nodes with role/class "cassandra" run cassandra container/service and all those nodes have decently specced storage that we guarantee will be available at the same place on the host. This is also a requirement to be able to monitor diskspace and disk utilization for each class/role of service properly.

Regarding security:
Each cluster in our environment only has one "customer", us. There's no requirements or needs to try to limit access to the host os from inside these containers for us, containers are just a compatibilitylayer in our case. (mesos requires java version X, a service we run requires version Y, aurora's thermos-executor doesn't work with anything else than python 2.7 while we require python 3.5 for some services, the host runs ubuntu 14.04 while we want 16.04 to be able to compile some libraries properly).
If some person has access to deploy through nomad they most likely also has access to become root on the hosts..
The paths that are allowed to be mounted into containers can in our case be limited by the nomad client through a whitelist or similar (can, but doesn't have to), but it's important that we can setup a relatively relaxed whitelist like '/data/*' and not have to explicitly specify every allowed path since this is subject to change and would slow down management / development and similar if it's too strict.

csawyerYumaed · 2016-08-30T16:36:29Z

I've made a generic workaround to handle docker volumes using the raw_exec driver, available on github here: https://github.com/csawyerYumaed/nomad-docker
It handles cleaning up after itself (stopping container/etc). It's not perfect, but it seems to do the trick for now.

carlanton · 2016-09-12T21:09:36Z

If you want to use Docker bind mounts in Nomad but still want to use the docker driver, you should totally check out this new as-good-as-ready-for-production tool I just made: https://github.com/carlanton/nomad-docker-wrapper
It wraps the Docker socket with a new socket that allow you to specify bind mounts as environment variables. Still hacky, but just a bit less hacky than using raw_exec :)

diptanu · 2016-09-22T22:03:50Z

We are going to start working on volume plugins in the next Nomad release. But in the interim(in the upcoming 0.5 release), we will enable users to pass the volume configuration option in the docker driver configuration.

Also, operators will have to explicitly opt into allowing users to pass the volume/volume driver related configuration option in their jobs by enabling it in Nomad client config.

Users should keep in mind that Nomad won't be responsible for cleaning up things behind the scenes with respect to network based file systems until the support for Nomad's own volume plugins come out.

far-blue · 2016-09-23T12:33:34Z

That's great news and a sensible intermediate step

tlvenn · 2016-09-29T14:22:40Z

@diptanu is there any chance to bring that to rkt as well ?

w-p · 2016-10-26T18:10:19Z

Is there a schedule attached to that release by chance? Hard to sell people on Nomad without mounts.

diptanu · 2016-10-26T18:18:51Z

@w-p We are trying to do the RC release this week, and main release next week.

w-p · 2016-11-02T12:19:12Z

Thanks for getting the RC out.

ekarlso · 2016-11-11T09:43:59Z

Is there any support now for doing stuff like MySQL with persistent data volumes?

donovanmuller · 2016-11-16T10:13:35Z

@ekarlso It looks like 0.5 (currently at 0.5.0-rc2) supports both Docker (driver/docker: Support Docker volumes [GH-1767]) and rkt volumes.

kaskavalci · 2017-01-13T08:27:25Z

@diptanu, do you have any milestone or ETA for volume drivers?

far-blue · 2017-01-13T10:07:21Z

I believe they are now supported. You can pass, in the docker config section of the job spec, an array of strings with the same format you would use in the docker run -v command.

erickhan · 2017-03-02T19:16:32Z

If the crux of this issue is Docker volume driver, I think you guys addressed it with the recent PR*.

If it's about extending the resource model, I'd suggest that'll take quite some time and maybe become it's own design. Defer to others, as I'm just learning about Nomad myself. thanks!

*#2351

c4milo · 2017-05-22T22:56:23Z

@dadgar is this one on track for 0.6.0?

dadgar · 2017-05-24T18:32:45Z

@c4milo No, this isn't being tackled in 0.6.0

maticmeznar · 2017-11-14T20:45:32Z

Since Nomad 0.7.0, what is the recommended best practice for running a database Docker container that requires a persistent data volume? ephemeral_disk does not offer any guarantee and only works if the database is clustered. Should constraint be used to lock the job to a specific node and then use volumes Docker driver option?

alexey-guryanov · 2017-11-14T21:58:14Z

@maticmeznar, I cannot speak for "recommended best practice", and there is more than one way to achieve it, but I can share an approach that we are using at the moment.
When we want a persistent storage for anything running in managed (by Nomad in this case) Docker container, we decided that we want this storage to be redundant on its own (regardless of the content we put there), and available on all Nomad nodes, so a particular Docker container can be rescheduled to another node in Nomad cluster and still access the same data.

That can be achieved in more than one way, for instance, there is REX-Ray and solutions alike, that look attractive for using a cloud provider storage (like AWS S3, Google Cloud Storage, etc.), but we haven't tried it.

What we are using at the moment is a separate distributed replicated storage cluster (we use GlusterFS at the moment, there are alternatives), mounting GlusterFS volume(s) on each node in Nomad cluster, and mapping an appropriate folder from mounted volume into Docker container.
For instance:

mount some GlusterFS volume as /shared_data on all nodes in Nomad cluster
create a folder in there for a particular application, say /shared_data/some_app_postgresql
define a volume in Nomad job specification:

job "some_app" {
    group "some_app_db" {
        task "some_app_db" {
            driver = "docker"
            config {
                image = "some-postgresql-image"
                volumes = [
                    "/shared_data/some_app_postgresql:/var/lib/postgresql/data/pgdata"
                ]
            }
        }
    }
}

Again, there are multiple ways to go about data persitency with managed Docker containers, hope our perspective may be helpful to somebody.

moritzheiber · 2018-02-03T13:07:00Z

The absence of a proper solution to volume management with Nomad is literally the only reason I cannot recommend it to our clients and/or use it instead of Kubernetes. Its Vault and Consul integration, ease of use, minimal installation overhead and workload support is intriguing, but it all doesn't matter because it cannot be trusted with persistent data 😞

I wish this was higher up the product backlog.

ketzacoatl · 2018-02-05T11:56:02Z

Well, I don't think that's 100% true. Nomad definitely has support for persistent data, but maybe not in the way that you are expecting. Many of us have used a variety of methods to ensure our needs are met here, and the experience was not terrible. I would recommend them to other people. Kubernetes is not the same, and it's not reasonable to do a direct comparison of features (you would need to compare the "ecosystems" more than the individual components).

…

On Sat, Feb 3, 2018 at 8:07 AM, Moritz Heiber ***@***.***> wrote: The absence of a proper solution to volume management with Nomad is literally the only reason I cannot recommend it to our clients and/or use it instead of Kubernetes. Its Vault and Consul integration, ease of use, minimal installation overhead and workload support is intriguing, but it all doesn't matter because it cannot be trusted with persistent data 😞 I wish this was higher up the product backlog. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#150 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJp2uUK_83CigZg-SJ7jRGcnXcTbR69Nks5tRFn-gaJpZM4GFmHO> .

jsilberm · 2018-02-05T20:33:46Z

If you look at this thread's origin --- "Nomad should have some way for tasks to acquire persistent storage on nodes." --- it doesn't say that Nomad itself should procure/acquire the persistent storage, only that the task should have a way.

One way is through "container storage on demand". Assuming use of the Nomad 'docker' driver, if the volume-driver plugin can present relevant meta-data at run-time, then it's possible for the storage to be provisioned on-demand when the task starts.

Here's what this might look like:

task "my-app" {
      driver = "docker"
      config {
        image = "myapp/my-image:latest"
        volumes = [
          "name=myvol,size=10,repl=3:/mnt/myapp",
        ]
        volume_driver = "pxd"
    }

In this case, a 10GB volume named "myvol" gets created, with synchronous replication on 3 nodes and is mapped into the container at "/mnt/myapp". The task acquires the persistent storage.

This capability is available today through the Portworx volume-driver plugin, as documented here: https://docs.portworx.com/scheduler/nomad/install.html

(*) disclaimer: I work at Portworx.

iwvelando · 2018-04-05T16:39:42Z

Hello, I've seen a lot of discussion about persistent storage with Docker containers which I've been using effectively. However I'm also keenly interested in persistent storage for qemu VMs scheduled through nomad. I may have overlooked something but I don't see this as an option.

Is there any expectation of adding this? Or is there any path with existing configuration to achieving some form of persistent storage?

endocrimes · 2019-03-01T16:08:20Z

👋 Hey Folks,

We're currently planning on implementing support for persistent storage across various task drivers via support for Host Volume Mounts (#5377), and the Container Storage Interface (#5378).

Please follow along with the respective issues for updates as they're available 😄.

akamac · 2019-03-26T15:46:43Z

@far-blue @a86c6f7964: I too am using raw_exec + docker-compose as a workaround. The trouble with that is clean-up when one kills a job. When the Nomad executor sends a SIGINT to docker-compose, it does not clean up the containers and volumes by default; you have to explicitly do docker-compose down. For that and other reasons, we have a wrapper shell script to trap SIGINT. There is an outstanding feature request for 'pre-' and 'post-' task hooks. That should help as long as the post-task hooks get run even when it's triggered by nomad stop.

@dvusboy Could you share your wrapper code please?

github-actions · 2022-11-25T02:20:52Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

cbednarski added the type/enhancement label Oct 9, 2015

cbednarski added the theme/scheduling label Oct 15, 2015

dadgar mentioned this issue Jan 4, 2016

Add volumes for docker driver #630

Closed

sgotti mentioned this issue Jan 25, 2016

Hashicorp nomad integration. sorintlab/stolon#101

Open

valentinbud mentioned this issue Feb 2, 2016

Invalid config format is silently ignored #735

Closed

schmichael mentioned this issue Nov 8, 2017

Support mounting volumes into executor context / task alloc dir #61

Closed

schmichael mentioned this issue Jun 15, 2018

[question] ephemeral_disk sticky changed? #4420

Closed

endocrimes closed this as completed Mar 1, 2019

github-actions bot locked as resolved and limited conversation to collaborators Nov 25, 2022

Persistent data on nodes #150

Persistent data on nodes #150

Comments

F21 commented Sep 29, 2015

zrml commented Sep 30, 2015

F21 commented Oct 12, 2015

zrml commented Oct 13, 2015

cbednarski commented Oct 15, 2015

zrml commented Oct 15, 2015

melo commented Oct 18, 2015

diptanu commented Oct 18, 2015

F21 commented Nov 18, 2015

gourao commented Nov 19, 2015

F21 commented Nov 19, 2015

gourao commented Nov 19, 2015

erSitzt commented Dec 7, 2015

ketzacoatl commented Dec 28, 2015

bscott commented Dec 28, 2015

jefflaplante commented Dec 28, 2015

ketzacoatl commented Dec 28, 2015

adrianlop commented Dec 29, 2015

wyattanderson commented Dec 29, 2015

dkerwin commented Jan 7, 2016

calvn commented Jan 8, 2016

supernomad commented Jan 18, 2016

jhartman86 commented Jan 29, 2016

let4be commented Feb 5, 2016

let4be commented Feb 5, 2016

dadgar commented Feb 5, 2016

faddat commented Aug 22, 2016 • edited

cetex commented Aug 28, 2016 • edited

csawyerYumaed commented Aug 30, 2016

carlanton commented Sep 12, 2016

diptanu commented Sep 22, 2016 • edited

far-blue commented Sep 23, 2016

tlvenn commented Sep 29, 2016

w-p commented Oct 26, 2016

diptanu commented Oct 26, 2016

w-p commented Nov 2, 2016

ekarlso commented Nov 11, 2016

donovanmuller commented Nov 16, 2016

kaskavalci commented Jan 13, 2017

far-blue commented Jan 13, 2017

erickhan commented Mar 2, 2017

c4milo commented May 22, 2017

dadgar commented May 24, 2017

maticmeznar commented Nov 14, 2017

alexey-guryanov commented Nov 14, 2017

moritzheiber commented Feb 3, 2018

ketzacoatl commented Feb 5, 2018 via email

jsilberm commented Feb 5, 2018

iwvelando commented Apr 5, 2018 • edited

endocrimes commented Mar 1, 2019

akamac commented Mar 26, 2019

github-actions bot commented Nov 25, 2022

faddat commented Aug 22, 2016 •

edited

cetex commented Aug 28, 2016 •

edited

diptanu commented Sep 22, 2016 •

edited

iwvelando commented Apr 5, 2018 •

edited