Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent data volumes #111

Closed
JeremyGrosser opened this issue Mar 20, 2013 · 43 comments

Comments

Projects
None yet
@JeremyGrosser
Copy link
Contributor

commented Mar 20, 2013

A lot of applications store nontrivial amounts of data on disk and need it to be persisted outside the scope of the aufs filesystem.

Proposal:

docker run base --bind /path/outside/container:/mnt/path/inside/container /path/to/crazydb

Bonus points if you can use variable substitution in the bind path names eg.

docker run base --bind '/mnt/$Id/mail:/var/spool/mail' /usr/sbin/postfix

Presumably this feature would manifest itself in config.json somewhat like this:

"Mountpoint": {
  "Bind": [
    {"OutsidePath": "/path/outside/container",
     "InsidePath": "/path/inside/container"}
  ]
}

@shykes shykes closed this Mar 21, 2013

@shykes shykes reopened this Mar 26, 2013

@shykes shykes referenced this issue Mar 26, 2013

Closed

Data volumes #3

@sa2ajj

This comment has been minimized.

Copy link
Contributor

commented Mar 26, 2013

would it be possible to have a some sort of --fstab option that'd result in adding lxc.mount.entry entries in the container's config file??

@sa2ajj

This comment has been minimized.

Copy link
Contributor

commented Mar 26, 2013

actually, there are two options here:

  • copy the given fstab verbatim and use lxc.mount = option
  • translate the content of the file to corresponding lxc.mount.entry
@shykes

This comment has been minimized.

Copy link
Collaborator

commented Mar 26, 2013

The key principle to keep in mind is that we want to minimize how much the
container's execution environment depends on the host's.

On Tue, Mar 26, 2013 at 7:40 AM, Mikhail Sobolev
notifications@github.comwrote:

actually, there are two options here:

  • copy the given fstab verbatim and use lxc.mount = option
  • translate the content of the file to corresponding lxc.mount.entry


Reply to this email directly or view it on GitHubhttps://github.com//issues/111#issuecomment-15462016
.

@jpetazzo

This comment has been minimized.

Copy link
Contributor

commented Mar 27, 2013

  1. Short term.
    The command-line binding proposed by @synack would work great. That would make an easy way to persist data, with minimal "out of docker" instrumentation. My personal taste would be to reverse the args, e.g. dotcloud run base -volume /path/in/container=/path/on/host, but that's just me.
  2. Mid term.
    I don't know what we want in the config.json file. FTR, on the current dotCloud platform (which uses the cloudlets format, this is split between two parts: manifest and config. The manifest is the conceptual equivalent of a class definition. It says "to run this, I need one tcp port for SSH, and another for SQL; and also, /var/lib/mysql should be a persistent volume". The config is the instantiated version, so it tells exactly which port was allocated, which volume was binded, etc.
    It looks like we might have port information in the image json file (to mention "hey, that image exposes a service on port 5432, so by default, dotcloud run should automatically add -p 5432 unless overriden").
    If that's so, it would make sense to also mention which paths are supposed to be volumes, if only for mere introspection purposes.
    Then, if we implement container tagging, it would integrate very neatly to provide persistent data storage. I.E. by default, you get a tmpfs on each volume, but if the container is tagged, then volume foo is bound from e.g. /var/lib/docker/volumes/<containertag>/foo.
  3. Long term.
    I believe that storage providers will be an important feature. It's too early to discuss that in detail I guess; but the idea would be to allow docker to interface with storage systems like LVM, btrfs, iSCSI, NFS, glusterfs, ceph... The scheme used by Xen 3 for network and block devices is not perfect, but it's a good source of inspiration (TL,DR: it allows to specify that e.g. /dev/xvdk should be myiscsi:foobar, and it will offload to a myiscsi script the task of locating foobar and making it available, whatever that means; so it is fairly extendable without touching the core). Of course docker wouldn't implement all those interfaces, but provide something that makes it easy for everyone to hook up whatever they need in the system.
@sa2ajj

This comment has been minimized.

Copy link
Contributor

commented Mar 27, 2013

(Just for the record) I realized one thing: the bound directory should somehow excluded from what is being tracked as "changes". I am not sure if a straightforward implementation would work right away.

@jpetazzo

This comment has been minimized.

Copy link
Contributor

commented Mar 27, 2013

That will actually work out of the box—because docker tracks changes by checking the AUFS layer, and a bind mount wouldn't show up in the layer.

@tadev

This comment has been minimized.

Copy link

commented Mar 29, 2013

+1 want

@titanous

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2013

👍 I want to see if I can get Ceph running in docker so that I can get docker running on Ceph.

@sa2ajj sa2ajj referenced this issue Apr 8, 2013

Closed

riak use case #351

@ghost ghost assigned creack Apr 8, 2013

@shykes

This comment has been minimized.

Copy link
Collaborator

commented Apr 8, 2013

Updated the title for clarity.

@shykes

This comment has been minimized.

Copy link
Collaborator

commented Apr 8, 2013

So, here's the current plan.

1. Creating data volumes

At container creation, parts of a container's filesystem can be mounted as separate data volumes. Volumes are defined with the -v flag.

For example:

$ docker run -v /var/lib/postgres -v /var/log postgres /usr/bin/postgres

In this example, a new container is created from the 'postgres' image. At the same time, docker creates 2 new data volumes: one will be mapped to the container at /var/lib/postgres, the other at /var/log.

2 important notes:

  1. Volumes don't have top-level names. At no point does the user provide a name, or is a name given to him. Volumes are identified by the path at which they are mounted inside their container.

  2. The user doesn't choose the source of the volume. Docker only mounts volumes it created itself, in the same way that it only runs containers that it created itself. That is by design.

2. Sharing data volumes

Instead of creating its own volumes, a container can share another container's volumes. For example:

$ docker run --volumes-from $OTHER_CONTAINER_ID postgres /usr/local/bin/postgres-backup

In this example, a new container is created from the 'postgres' example. At the same time, docker will re-use the 2 data volumes created in the previous example. One volume will be mounted on the /var/lib/postgres of both containers, and the other will be mounted on the /var/log of both containers.

3. Under the hood

Docker stores volumes in /var/lib/docker/volumes. Each volume receives a globally unique ID at creation, and is stored at /var/lib/docker/volumes/ID.

At creation, volumes are attached to a single container - the source of truth for this mapping will be the container's configuration.

Mounting a volume consists of calling "mount --bind" from the volume's directory to the appropriate sub-directory of the container mountpoint. This may be done by Docker itself, or farmed out to lxc (which supports mount-binding) if possible.

4. Backups, transfers and other volume operations

Volumes sometimes need to be backed up, transfered between hosts, synchronized, etc. These operations typically are application-specific or site-specific, eg. rsync vs. S3 upload vs. replication vs...

Rather than attempting to implement all these scenarios directly, Docker will allow for custom implementations using an extension mechanism.

5. Custom volume handlers

Docker allows for arbitrary code to be executed against a container's volumes, to implement any custom action: backup, transfer, synchronization across hosts, etc.

Here's an example:

$ DB=$(docker run -d -v /var/lib/postgres -v /var/log postgres /usr/bin/postgres)

$ BACKUP_JOB=$(docker run -d --volumes-from $DB shykes/backuper /usr/local/bin/backup-postgres --s3creds=$S3CREDS)

$ docker wait $BACKUP_JOB

Congratulations, you just implemented a custom volume handler, using Docker's built-in ability to 1) execute arbitrary code and 2) share volumes between containers.

@shykes

This comment has been minimized.

Copy link
Collaborator

commented Apr 8, 2013

One aspect of the spec which is not yet determined: specifying read-only mounts. Any preference on the best way to extend the syntax?

@glasser

This comment has been minimized.

Copy link
Contributor

commented Apr 8, 2013

Can you specify --volumes-from more than once?

@shykes

This comment has been minimized.

Copy link
Collaborator

commented Apr 8, 2013

@glasser I didn't consider it. One obvious problem is that 2 containers might each have a volume mounted on the same path - in which case the 2 volumes would conflict.

I'm guessing you have a specific use case in mind? :)

@glasser

This comment has been minimized.

Copy link
Contributor

commented Apr 8, 2013

Sure, but that should be something that can be statically checked by docker while building the container, right?

And yes :)

@DanielVF

This comment has been minimized.

Copy link
Contributor

commented Apr 18, 2013

Thanks @shykes. I will play with that branch.

@neomantra

This comment has been minimized.

Copy link
Contributor

commented Apr 24, 2013

Another use case for exposing the host file system is for communication via Unix Domain Sockets and mqueues, which use files as the connection point. Or maybe also exposing serial ports in /dev?

The original proposal at the top of this thread would allow this, however, I don't think the "data volumes" spec covers it since it only deals with container-to-container bridging and extraction.

This is concept definitely opposed to @shykes comment regarding repeatability on different hosts. Similarly, what I tried to do here with pinning CPUs (#439) is host-specific, albeit repeatable only if different hosts supports all the same key/values. But people who do this would be using some way of configuring/maintaining it all (like picking service endpoints filepaths/host:port or making sure all processes aren't pinned to the same CPU).

@jpetazzo

This comment has been minimized.

Copy link
Contributor

commented Apr 24, 2013

I had an interesting discussion with @mpetazzoni yesterday about a rather
contrived use-case: using containerization to provide isolated environment
for remote shell access on a multi-user server.

Specifically, the concern was to operate the local MTA in a separate
container, but still deliver mails to e.g. ~/Maildir (per-user).

The challenge here is "partial volume sharing", i.e. the "shell" containers
should be given access only to /home/$USERNAME (for a unique $USERNAME),
while the MTA should be able to access /home/$USERNAME/Maildir (for all
users of the system).

On Wed, Apr 24, 2013 at 3:49 PM, Evan notifications@github.com wrote:

Another use case for exposing the host file system is for communication
via Unix Domain Sockets, mqueues, , all which use files as the connection
point. Or maybe also exposing serial ports in /dev?

The original proposal at the top of this thread would allow this, however,
I don't think the "data volumes" spec covers it since it only deals with
container-to-container bridging and extraction.

This is concept definitely opposed to @shykes https://github.com/shykescomment regarding repeatability on different hosts. Similarly, what I tried
to do here with pinning CPUs (#439#439)
is host-specific, albeit repeatable only if different hosts supports all
the same key/values. But people who do this would be using some way of
configuring/maintaining it all (like picking service endpoints
filepaths/host:port or making sure all processes aren't pinned to the same
CPU).


Reply to this email directly or view it on GitHubhttps://github.com//issues/111#issuecomment-16977366
.

@shykes

This comment has been minimized.

Copy link
Collaborator

commented Apr 24, 2013

So, I now have enough datapoints (ie people yelling at me) to acknowledge the need for "choosing your own adventure" when it comes to custom runtime configuration: cpu pinning, external mountpoints, etc.

So, to quote @brianm, we need "escape hatches" for the experts without ruining the experience for everyone.

To get back to external mountpoints for volumes, I am convinced and have a design in mind for the escape hatch, stay tuned.

@solomonstre
@getdocker

On Wed, Apr 24, 2013 at 3:49 PM, Evan notifications@github.com wrote:

Another use case for exposing the host file system is for communication via Unix Domain Sockets, mqueues, , all which use files as the connection point. Or maybe also exposing serial ports in /dev?
The original proposal at the top of this thread would allow this, however, I don't think the "data volumes" spec covers it since it only deals with container-to-container bridging and extraction.

This is concept definitely opposed to @shykes comment regarding repeatability on different hosts. Similarly, what I tried to do here with pinning CPUs (#439) is host-specific, albeit repeatable only if different hosts supports all the same key/values. But people who do this would be using some way of configuring/maintaining it all (like picking service endpoints filepaths/host:port or making sure all processes aren't pinned to the same CPU).

Reply to this email directly or view it on GitHub:
#111 (comment)

@shykes

This comment has been minimized.

Copy link
Collaborator

commented Apr 24, 2013

.. and the design will solve your MTA example, jerome, as well as all those listed in this issue.

@solomonstre
@getdocker

On Wed, Apr 24, 2013 at 4:16 PM, Jérôme Petazzoni
notifications@github.com wrote:

I had an interesting discussion with @mpetazzoni yesterday about a rather
contrived use-case: using containerization to provide isolated environment
for remote shell access on a multi-user server.
Specifically, the concern was to operate the local MTA in a separate
container, but still deliver mails to e.g. ~/Maildir (per-user).
The challenge here is "partial volume sharing", i.e. the "shell" containers
should be given access only to /home/$USERNAME (for a unique $USERNAME),
while the MTA should be able to access /home/$USERNAME/Maildir (for all
users of the system).
On Wed, Apr 24, 2013 at 3:49 PM, Evan notifications@github.com wrote:

Another use case for exposing the host file system is for communication
via Unix Domain Sockets, mqueues, , all which use files as the connection
point. Or maybe also exposing serial ports in /dev?

The original proposal at the top of this thread would allow this, however,
I don't think the "data volumes" spec covers it since it only deals with
container-to-container bridging and extraction.

This is concept definitely opposed to @shykes https://github.com/shykescomment regarding repeatability on different hosts. Similarly, what I tried
to do here with pinning CPUs (#439#439)
is host-specific, albeit repeatable only if different hosts supports all
the same key/values. But people who do this would be using some way of
configuring/maintaining it all (like picking service endpoints
filepaths/host:port or making sure all processes aren't pinned to the same
CPU).


Reply to this email directly or view it on GitHubhttps://github.com//issues/111#issuecomment-16977366
.


Reply to this email directly or view it on GitHub:
#111 (comment)

@creack

This comment has been minimized.

Copy link
Contributor

commented May 7, 2013

Closed by #376

@creack creack closed this May 7, 2013

@niclashoyer

This comment has been minimized.

Copy link

commented May 7, 2013

does the pull request #376 solve the use cases mentioned here? If so, is there any documentation about how to use it?

@jpetazzo jpetazzo referenced this issue Jul 4, 2014

Closed

Collected issues with Volumes #6496

7 of 12 tasks complete

erikh added a commit to erikh/docker that referenced this issue Dec 4, 2014

runcom pushed a commit to runcom/docker that referenced this issue Apr 20, 2016

ijc added a commit to ijc/moby that referenced this issue Jan 31, 2017

Revendor github.com/spf13/pflag to 9ff6c6923cfffbcd502984b8e0c80539a9…
…4968b7

This pulls in the change made in my fork as 9ff6c6923cff:

$ git log --oneline ca3be74b3ca72..9ff6c6923cff
9ff6c69 Add FlagSet.FlagUsagesWrapped(cols) which wraps to the given column (moby#105)
a9a634f Add BoolSlice and UintSlice flag types. (moby#111)
a232f6d Merge pull request moby#102 from bogem/redundant
5126803 Merge pull request moby#110 from hardikbagdi/master
230dccf add badges to README.md
c431975 Merge pull request moby#107 from xilabao/add-user-supplied-func-when-parse
271ea0e Make command line parsing available outside pflag
25f8b5b Merge pull request moby#109 from SinghamXiao/master
1fcda0c too many arguments
86d3545 Clean up code

Signed-off-by: Ian Campbell <ian.campbell@docker.com>

ijc added a commit to ijc/moby that referenced this issue Jan 31, 2017

Revendor github.com/spf13/pflag to 9ff6c6923cfffbcd502984b8e0c80539a9…
…4968b7

This pulls in the change made in my fork as 9ff6c6923cff:

$ git log --oneline ca3be74b3ca72..9ff6c6923cff
9ff6c69 Add FlagSet.FlagUsagesWrapped(cols) which wraps to the given column (moby#105)
a9a634f Add BoolSlice and UintSlice flag types. (moby#111)
a232f6d Merge pull request moby#102 from bogem/redundant
5126803 Merge pull request moby#110 from hardikbagdi/master
230dccf add badges to README.md
c431975 Merge pull request moby#107 from xilabao/add-user-supplied-func-when-parse
271ea0e Make command line parsing available outside pflag
25f8b5b Merge pull request moby#109 from SinghamXiao/master
1fcda0c too many arguments
86d3545 Clean up code

Signed-off-by: Ian Campbell <ian.campbell@docker.com>

ijc added a commit to ijc/moby that referenced this issue Feb 1, 2017

Revendor github.com/spf13/pflag to 9ff6c6923cfffbcd502984b8e0c80539a9…
…4968b7

$ git log --oneline dabebe21bf79..9ff6c6923cff
9ff6c69 Add FlagSet.FlagUsagesWrapped(cols) which wraps to the given column (moby#105)
a9a634f Add BoolSlice and UintSlice flag types. (moby#111)
a232f6d Merge pull request moby#102 from bogem/redundant
5126803 Merge pull request moby#110 from hardikbagdi/master
230dccf add badges to README.md
c431975 Merge pull request moby#107 from xilabao/add-user-supplied-func-when-parse
271ea0e Make command line parsing available outside pflag
25f8b5b Merge pull request moby#109 from SinghamXiao/master
1fcda0c too many arguments
5ccb023 Remove Go 1.5 from Travis
86d3545 Clean up code

I am interested in 9ff6c69 for a followup.

Signed-off-by: Ian Campbell <ian.campbell@docker.com>

ijc added a commit to ijc/moby that referenced this issue Feb 3, 2017

Revendor github.com/spf13/pflag to 9ff6c6923cfffbcd502984b8e0c80539a9…
…4968b7

$ git log --oneline dabebe21bf79..9ff6c6923cff
9ff6c69 Add FlagSet.FlagUsagesWrapped(cols) which wraps to the given column (moby#105)
a9a634f Add BoolSlice and UintSlice flag types. (moby#111)
a232f6d Merge pull request moby#102 from bogem/redundant
5126803 Merge pull request moby#110 from hardikbagdi/master
230dccf add badges to README.md
c431975 Merge pull request moby#107 from xilabao/add-user-supplied-func-when-parse
271ea0e Make command line parsing available outside pflag
25f8b5b Merge pull request moby#109 from SinghamXiao/master
1fcda0c too many arguments
5ccb023 Remove Go 1.5 from Travis
86d3545 Clean up code

I am interested in 9ff6c69 for a followup.

Signed-off-by: Ian Campbell <ian.campbell@docker.com>

srust added a commit to srust/moby that referenced this issue Nov 30, 2017

Revendor github.com/spf13/pflag to 9ff6c6923cfffbcd502984b8e0c80539a9…
…4968b7

$ git log --oneline dabebe21bf79..9ff6c6923cff
9ff6c69 Add FlagSet.FlagUsagesWrapped(cols) which wraps to the given column (moby#105)
a9a634f Add BoolSlice and UintSlice flag types. (moby#111)
a232f6d Merge pull request moby#102 from bogem/redundant
5126803 Merge pull request moby#110 from hardikbagdi/master
230dccf add badges to README.md
c431975 Merge pull request moby#107 from xilabao/add-user-supplied-func-when-parse
271ea0e Make command line parsing available outside pflag
25f8b5b Merge pull request moby#109 from SinghamXiao/master
1fcda0c too many arguments
5ccb023 Remove Go 1.5 from Travis
86d3545 Clean up code

I am interested in 9ff6c69 for a followup.

Signed-off-by: Ian Campbell <ian.campbell@docker.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.