New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New implementation of /run support #8478

Closed
wants to merge 4 commits into
base: master
from

Conversation

Projects
None yet
@rhatdan
Contributor

rhatdan commented Oct 8, 2014

This mounts a /run tmpfs into the container, with the initial contents
copies from the /run in the base image, unless NoRunFs is set in the
HostConfig.

Additionally NoRunFs is always set during a docker build, which means
any setup of /run in a Dockerfile is saved in the image to be copied
into the final /run tmpfs when a container is started.

Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)

Docker-DCO-1.1-Signed-off-by: Dan Walsh dwalsh@redhat.com (github: rhatdan)

Docker-DCO-1.1-Signed-off-by: Dan Walsh dwalsh@redhat.com (github: rhatdan)

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Oct 8, 2014

Contributor

This patch was previously written by Alex Larsson, and I have updated it to work with current docker.
There was a previous /run patch which was rejected because it did not copy /run off the image onto /run in a /tmpfs. This patch fixes this issue.

We need this in order to get systemd to run properly within a docker container. systemd insists on /run being mounted on a tmpfs, and refuse to fix this, since they claim this is a standard now.

Contributor

rhatdan commented Oct 8, 2014

This patch was previously written by Alex Larsson, and I have updated it to work with current docker.
There was a previous /run patch which was rejected because it did not copy /run off the image onto /run in a /tmpfs. This patch fixes this issue.

We need this in order to get systemd to run properly within a docker container. systemd insists on /run being mounted on a tmpfs, and refuse to fix this, since they claim this is a standard now.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Oct 8, 2014

Member

And it's signed-off in threefold! This must be good 😸

(Just a funny note)

Member

thaJeztah commented Oct 8, 2014

And it's signed-off in threefold! This must be good 😸

(Just a funny note)

@haraldh

This comment has been minimized.

Show comment
Hide comment
@haraldh

haraldh Oct 10, 2014

My 2 cents:
"NoRunFs=no" sounds awful. It's already a boolean. Why does it have to have "No" in front?
"MountRunFS=yes" as a default sounds more sane.

haraldh commented Oct 10, 2014

My 2 cents:
"NoRunFs=no" sounds awful. It's already a boolean. Why does it have to have "No" in front?
"MountRunFS=yes" as a default sounds more sane.

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Oct 10, 2014

Contributor

How about TmpfsRun? Or MountRunFs

Contributor

rhatdan commented Oct 10, 2014

How about TmpfsRun? Or MountRunFs

@crosbymichael

This comment has been minimized.

Show comment
Hide comment
@crosbymichael

crosbymichael Oct 10, 2014

Contributor

What about specifying a generic flag like
docker run --tmpfs /run --tmpfs /tmp ubuntu bash?

I think this would solve your issue and allow generic tmpfs mounts.

Contributor

crosbymichael commented Oct 10, 2014

What about specifying a generic flag like
docker run --tmpfs /run --tmpfs /tmp ubuntu bash?

I think this would solve your issue and allow generic tmpfs mounts.

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Oct 10, 2014

Contributor

Well no, unless these fixes also copied the content off of the image.

If we could make this a daemon option as well then I would fine, so that customers using systemd based images would not blow up when they forgot to do the command.

We would ship our distributsions with

docker -d --tmpfs /run --tmpfs /tmp

Contributor

rhatdan commented Oct 10, 2014

Well no, unless these fixes also copied the content off of the image.

If we could make this a daemon option as well then I would fine, so that customers using systemd based images would not blow up when they forgot to do the command.

We would ship our distributsions with

docker -d --tmpfs /run --tmpfs /tmp

@tianon

This comment has been minimized.

Show comment
Hide comment
@tianon

tianon Oct 10, 2014

Member

If it's not enabled manually during "docker run" (and is thus a conscious
choice) or unilaterally for all containers on every engine, I think this
would have some huge potential to cause issues with image portability.

Member

tianon commented Oct 10, 2014

If it's not enabled manually during "docker run" (and is thus a conscious
choice) or unilaterally for all containers on every engine, I think this
would have some huge potential to cause issues with image portability.

@tianon

This comment has been minimized.

Show comment
Hide comment
@tianon

tianon Oct 10, 2014

Member

(I mean specifically if it's a new flag on the daemon -- I'm actually
+1 to having it enabled everywhere, but having it as a "docker run"
flag would be very interesting as well.)

Member

tianon commented Oct 10, 2014

(I mean specifically if it's a new flag on the daemon -- I'm actually
+1 to having it enabled everywhere, but having it as a "docker run"
flag would be very interesting as well.)

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Oct 10, 2014

Contributor

Well that is one of the reasons to copy the images /run or /tmp onto the tmpfs before the container runs.
Then have it saved on docker commit.

The patch disables tmpfs for docker build.

Contributor

rhatdan commented Oct 10, 2014

Well that is one of the reasons to copy the images /run or /tmp onto the tmpfs before the container runs.
Then have it saved on docker commit.

The patch disables tmpfs for docker build.

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Oct 23, 2014

Contributor

@crosbymichael @tianon @shykes Any additional comment on this one?

Contributor

rhatdan commented Oct 23, 2014

@crosbymichael @tianon @shykes Any additional comment on this one?

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Oct 30, 2014

Contributor

After meeting with @crosbymichael and working on a Read/Only image format, I believe this pull request becomes more important. For Read/Only images you need to still be able to mount tmpfs on /run, /tmp and /var/tmp
I think we could extend this patch so that /tmp and /var/tmp work the same way. But /run should be default tmpfs, just like it is on Ubunto, RHEL7, Fedora, Centos.

Contributor

rhatdan commented Oct 30, 2014

After meeting with @crosbymichael and working on a Read/Only image format, I believe this pull request becomes more important. For Read/Only images you need to still be able to mount tmpfs on /run, /tmp and /var/tmp
I think we could extend this patch so that /tmp and /var/tmp work the same way. But /run should be default tmpfs, just like it is on Ubunto, RHEL7, Fedora, Centos.

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Nov 11, 2014

Contributor

@crosbymichael Any movement? If we could get to the point of /tmp, /var/tmp, /run using similar technology to the /run patch we could make them all tmpfs and then could run the underlying file systems as Read/Only. Then processes could write to these directories as well as /dev.

This would more closely match the defaults in RHEL and Fedora and I think Ubunto. For tmpfs on /run and /tmp.

I would be willing to make this a daemon option, if necessary.

Contributor

rhatdan commented Nov 11, 2014

@crosbymichael Any movement? If we could get to the point of /tmp, /var/tmp, /run using similar technology to the /run patch we could make them all tmpfs and then could run the underlying file systems as Read/Only. Then processes could write to these directories as well as /dev.

This would more closely match the defaults in RHEL and Fedora and I think Ubunto. For tmpfs on /run and /tmp.

I would be willing to make this a daemon option, if necessary.

@vpavlin

This comment has been minimized.

Show comment
Hide comment
@vpavlin

vpavlin Nov 12, 2014

Hi, we have systemd-container for RHEL7 base images which is patched version of systemd to work well in Docker containers. There has been a lot of discussions with Lennart Poettering (and other systemd developers) about modifying systemd to work within Docker containers and they provided strong arguments against doing so. This patch is the last thing which prevents us from running systemd in Fedora Docker container. This also applies to CentOS containers and indeed to RHEL7 containers where we wouldn't need patches to workaround missing /run mount...

That said, could this, please, get more attention?

vpavlin commented Nov 12, 2014

Hi, we have systemd-container for RHEL7 base images which is patched version of systemd to work well in Docker containers. There has been a lot of discussions with Lennart Poettering (and other systemd developers) about modifying systemd to work within Docker containers and they provided strong arguments against doing so. This patch is the last thing which prevents us from running systemd in Fedora Docker container. This also applies to CentOS containers and indeed to RHEL7 containers where we wouldn't need patches to workaround missing /run mount...

That said, could this, please, get more attention?

@tianon

This comment has been minimized.

Show comment
Hide comment
@tianon

tianon Nov 13, 2014

Member

/run as tmpfs is definitely very standard at this point (across almost all distributions), and /tmp as tmpfs is very, very common, but /var/tmp is defined as "Temporary files preserved between system reboots".

So while I'm +1 for sure on /run and could personally be persuaded that /tmp by default makes sense, I'm strongly -1 on /var/tmp. 👍

(but, standard disclaimer that I'm not the maintainer here)

Member

tianon commented Nov 13, 2014

/run as tmpfs is definitely very standard at this point (across almost all distributions), and /tmp as tmpfs is very, very common, but /var/tmp is defined as "Temporary files preserved between system reboots".

So while I'm +1 for sure on /run and could personally be persuaded that /tmp by default makes sense, I'm strongly -1 on /var/tmp. 👍

(but, standard disclaimer that I'm not the maintainer here)

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Nov 13, 2014

Contributor

@tianon I agree somewhat, the only question was for @crosbymichael use case of a Read/Only /. But I guess we could just tell apps that use Read/Only to only use /tmp, /run and /dev/shm for temporary content. /var/tmp would be read/only.

Contributor

rhatdan commented Nov 13, 2014

@tianon I agree somewhat, the only question was for @crosbymichael use case of a Read/Only /. But I guess we could just tell apps that use Read/Only to only use /tmp, /run and /dev/shm for temporary content. /var/tmp would be read/only.

@vpavlin

This comment has been minimized.

Show comment
Hide comment
@vpavlin

vpavlin Nov 24, 2014

/var/tmp should probably stay untouched, but I'd like to see /tmp and /run to be tmpfs mounts - could we proceed with this and push this change further? ( @tianon @crosbymichael ?)

vpavlin commented Nov 24, 2014

/var/tmp should probably stay untouched, but I'd like to see /tmp and /run to be tmpfs mounts - could we proceed with this and push this change further? ( @tianon @crosbymichael ?)

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Nov 24, 2014

Contributor

I updated the patch and added a second patch to support /tmp on tmpfs.

This second patch would allow @crosbymichael to do his Read/Only file systems patch.

The question I have is whether we should either allow users to specify this support in the docker run/create commands or in the docker -d. Easy enough to do since we have the MountRun and MountTmp hostconfig options.

Contributor

rhatdan commented Nov 24, 2014

I updated the patch and added a second patch to support /tmp on tmpfs.

This second patch would allow @crosbymichael to do his Read/Only file systems patch.

The question I have is whether we should either allow users to specify this support in the docker run/create commands or in the docker -d. Easy enough to do since we have the MountRun and MountTmp hostconfig options.

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Dec 2, 2014

Contributor

@crosbymichael @shykes @tianon Any update on this? Comment?

Contributor

rhatdan commented Dec 2, 2014

@crosbymichael @shykes @tianon Any update on this? Comment?

@tianon

This comment has been minimized.

Show comment
Hide comment
@tianon

tianon Dec 3, 2014

Member

I think @crosbymichael is still on the fence, but it's ultimately up to him and @shykes.

I'm still +1 on the general idea, FWIW.

Member

tianon commented Dec 3, 2014

I think @crosbymichael is still on the fence, but it's ultimately up to him and @shykes.

I'm still +1 on the general idea, FWIW.

@icecrime

This comment has been minimized.

Show comment
Hide comment
@icecrime

icecrime Jan 6, 2015

Contributor

Review session with @tiborvass @unclejack @crosbymichael

@rhatdan Can you please explain why --tmpfs /run (as implemented by #9586) isn't sufficient? It seems that populating the content of /run could be achieved through a container entrypoint script.

Contributor

icecrime commented Jan 6, 2015

Review session with @tiborvass @unclejack @crosbymichael

@rhatdan Can you please explain why --tmpfs /run (as implemented by #9586) isn't sufficient? It seems that populating the content of /run could be achieved through a container entrypoint script.

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Jan 6, 2015

Contributor

/run on TMPFS is a standard. Is there a distribution where this is not true.

How do I see what is under the /run or the /tmp if something is mounted over them?

Forcing everyone that uses applications that expect /run to be tmpfs to satisfy this seems ass backwards. We want to ship systemd based images, which expects /run to be on tmpfs by default as is the standard.

Lets not make running docker containers harder and more fragile by not accepting this patch. At least allow us to customize the daemon to make this the default behaviour. I see no reason to require /run to not be a tmpfs.

Contributor

rhatdan commented Jan 6, 2015

/run on TMPFS is a standard. Is there a distribution where this is not true.

How do I see what is under the /run or the /tmp if something is mounted over them?

Forcing everyone that uses applications that expect /run to be tmpfs to satisfy this seems ass backwards. We want to ship systemd based images, which expects /run to be on tmpfs by default as is the standard.

Lets not make running docker containers harder and more fragile by not accepting this patch. At least allow us to customize the daemon to make this the default behaviour. I see no reason to require /run to not be a tmpfs.

@crosbymichael

This comment has been minimized.

Show comment
Hide comment
@crosbymichael

crosbymichael Jan 6, 2015

Contributor

@rhatdan If you are using systemd images, shouldn't the setup scripts populate /run for the daemons that you are executing?

Contributor

crosbymichael commented Jan 6, 2015

@rhatdan If you are using systemd images, shouldn't the setup scripts populate /run for the daemons that you are executing?

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Jan 6, 2015

Contributor

I want to allow users to do a Dockerfile that does the following

FROM rhel7
RUN yum install httpd; systemctl enable httpd
CMD /sbin/init

This will run the httpd service the way it was designed, no special hacking of init scripts, as simple as can be. You just install the service the way you would on bare metal and use its service script to manage it.

Contributor

rhatdan commented Jan 6, 2015

I want to allow users to do a Dockerfile that does the following

FROM rhel7
RUN yum install httpd; systemctl enable httpd
CMD /sbin/init

This will run the httpd service the way it was designed, no special hacking of init scripts, as simple as can be. You just install the service the way you would on bare metal and use its service script to manage it.

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Jan 6, 2015

Contributor

In this case systemd will populate the tmpfs if you run with the --tmpfs /run, but will blow up if the user forgets the command.
Systemd-tmpfiles will take care of populating /run

In other cases people using --tmpfs /run will blow up apps since there is no way to prepopulate /run and certain images expect it. Having /run as the default, or allowing users to configure it to be default, solves all problems.

Contributor

rhatdan commented Jan 6, 2015

In this case systemd will populate the tmpfs if you run with the --tmpfs /run, but will blow up if the user forgets the command.
Systemd-tmpfiles will take care of populating /run

In other cases people using --tmpfs /run will blow up apps since there is no way to prepopulate /run and certain images expect it. Having /run as the default, or allowing users to configure it to be default, solves all problems.

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Jan 6, 2015

Contributor

I would actually like to know what having /run as tmpfs actually breaks, considering the patch copies the contents of /run off the image.

Contributor

rhatdan commented Jan 6, 2015

I would actually like to know what having /run as tmpfs actually breaks, considering the patch copies the contents of /run off the image.

@crosbymichael

This comment has been minimized.

Show comment
Hide comment
@crosbymichael

crosbymichael Jan 6, 2015

Contributor

It would not break anything with the copy but I'm not a fan of the mount outside of the containers mount namespace and libcontainer is not the place to be copying files.

I don't think it's too much to ask a user to add --tmpfs /run in order to run an advanced piece of software like systemd in their container. For httpd they are still going to have to map ports or the like.

Maybe we can make the argument that TMPFS /run should be allowed in the Dockerfile because it is portable and not machine specific.

Contributor

crosbymichael commented Jan 6, 2015

It would not break anything with the copy but I'm not a fan of the mount outside of the containers mount namespace and libcontainer is not the place to be copying files.

I don't think it's too much to ask a user to add --tmpfs /run in order to run an advanced piece of software like systemd in their container. For httpd they are still going to have to map ports or the like.

Maybe we can make the argument that TMPFS /run should be allowed in the Dockerfile because it is portable and not machine specific.

@cgwalters

This comment has been minimized.

Show comment
Hide comment
@cgwalters

cgwalters Jan 6, 2015

Contributor

WRT /var/run and read-only - for what it's worth, modern ostree will automatically make all of /var a tmpfs if / is readonly. See https://git.gnome.org/browse/ostree/commit/?id=ff6883ca0655ac8844cd783caf6a7d8815515ba3

I think it's a pretty good default, as if you boot a read-only image, you wouldn't expect persistence, but it's nice if the system does boot out of the box.

Contributor

cgwalters commented Jan 6, 2015

WRT /var/run and read-only - for what it's worth, modern ostree will automatically make all of /var a tmpfs if / is readonly. See https://git.gnome.org/browse/ostree/commit/?id=ff6883ca0655ac8844cd783caf6a7d8815515ba3

I think it's a pretty good default, as if you boot a read-only image, you wouldn't expect persistence, but it's nice if the system does boot out of the box.

@crosbymichael

This comment has been minimized.

Show comment
Hide comment
@crosbymichael

crosbymichael Jan 6, 2015

Contributor

I'm not disagreeing that /run should not be in tmpfs by default, the problem is that for docker's entire lifetime it's not and so changing that now is a little harder. Copy is the solution to that problem but it comes at a cost of performance and complexity

Contributor

crosbymichael commented Jan 6, 2015

I'm not disagreeing that /run should not be in tmpfs by default, the problem is that for docker's entire lifetime it's not and so changing that now is a little harder. Copy is the solution to that problem but it comes at a cost of performance and complexity

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Jan 6, 2015

Contributor

httpd was just an example. Any system service provided by the distribution would work. Is there another way we could copy the files off of /run to preserve it. Your new patch is going to have the same problem, If we had a standard way of preserving the underlying image content on top of the tmpfs, we could solve the problem.

Contributor

rhatdan commented Jan 6, 2015

httpd was just an example. Any system service provided by the distribution would work. Is there another way we could copy the files off of /run to preserve it. Your new patch is going to have the same problem, If we had a standard way of preserving the underlying image content on top of the tmpfs, we could solve the problem.

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Jan 6, 2015

Contributor

TMPFS in dockerfile has the same problem

FROM rhel7
TMPFS /run
RUN touch /run/dan

Would not work.

Contributor

rhatdan commented Jan 6, 2015

TMPFS in dockerfile has the same problem

FROM rhel7
TMPFS /run
RUN touch /run/dan

Would not work.

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Apr 20, 2015

Contributor

On 04/20/2015 01:30 PM, Darren Shepherd wrote:

Something seems a bit off to me. In general I don't like to assume
anything about what is running in the container. The default mounts
that docker sets up today are all /proc and /sys, IIRC, and those are
essential to linux. Are we saying that /run as tmpfs is just
absolutely fundamental to the behavior of Linux? I don't know
@crosbymichael https://github.com/crosbymichael latest opinion, but
--tmpfs /run seems a bit more reasonable to me.

As a side question, is something broken if /run isn't a tmpfs? Is
there a basic assumption somewhere that /run must be ephemeral?


Reply to this email directly or view it on GitHub
#8478 (comment).

How come I don't see this on github? The biggest problem we have with
/run not being on a tmpfs is that systemd will not run without
it on a tmpfs, actually it will but it will attempt to mount /run as a
tmpfs and will blow up.

Secondly, I want to get both /run and /tmp as tmpfs by default so that
it is simple to do a

docker run --read-only fedora /bin/sh

And most apps will run. Apps expect to be able write to /tmp and /run.

/run is a tmpfs on every modern Linux Distribution and /tmp is also on
most. Having these be the same in containers makes sense.

Contributor

rhatdan commented Apr 20, 2015

On 04/20/2015 01:30 PM, Darren Shepherd wrote:

Something seems a bit off to me. In general I don't like to assume
anything about what is running in the container. The default mounts
that docker sets up today are all /proc and /sys, IIRC, and those are
essential to linux. Are we saying that /run as tmpfs is just
absolutely fundamental to the behavior of Linux? I don't know
@crosbymichael https://github.com/crosbymichael latest opinion, but
--tmpfs /run seems a bit more reasonable to me.

As a side question, is something broken if /run isn't a tmpfs? Is
there a basic assumption somewhere that /run must be ephemeral?


Reply to this email directly or view it on GitHub
#8478 (comment).

How come I don't see this on github? The biggest problem we have with
/run not being on a tmpfs is that systemd will not run without
it on a tmpfs, actually it will but it will attempt to mount /run as a
tmpfs and will blow up.

Secondly, I want to get both /run and /tmp as tmpfs by default so that
it is simple to do a

docker run --read-only fedora /bin/sh

And most apps will run. Apps expect to be able write to /tmp and /run.

/run is a tmpfs on every modern Linux Distribution and /tmp is also on
most. Having these be the same in containers makes sense.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Apr 20, 2015

Member

How come I don't see this on github?

Interesting question, I just trashed the email, luckily you noticed it

Member

thaJeztah commented Apr 20, 2015

How come I don't see this on github?

Interesting question, I just trashed the email, luckily you noticed it

@ibuildthecloud

This comment has been minimized.

Show comment
Hide comment
@ibuildthecloud

ibuildthecloud Apr 20, 2015

Contributor

@rhatdan after reading over the thread, i deleted my comment, I didn't think it was valid. I just struggle with how much we are assuming about the target env. There's probably not a big harm in doing /run and /tmp but then I've seen other PR that talk about putting things in /etc and /var/run. Where do we draw the line? We draw this fine line between what systemd requires and what is "standard" in Linux because what major distro isn't influenced by systemd now. Wasn't the notion of /run really introduced by systemd to begin with? Not really important...

I feel the image should be able to specify this requirement and we shouldn't make this global. So the TMPFS dockerfile command seem much more the right way to go.

If somebody used TMPFS /run then they are deciding to make /run ephemeral and then there is no need to prepopulate /run, right? I'm in favor of making this a Dockerfile command and adding --tmpfs to docker run. Then we could back out the change in libcontainer.

Contributor

ibuildthecloud commented Apr 20, 2015

@rhatdan after reading over the thread, i deleted my comment, I didn't think it was valid. I just struggle with how much we are assuming about the target env. There's probably not a big harm in doing /run and /tmp but then I've seen other PR that talk about putting things in /etc and /var/run. Where do we draw the line? We draw this fine line between what systemd requires and what is "standard" in Linux because what major distro isn't influenced by systemd now. Wasn't the notion of /run really introduced by systemd to begin with? Not really important...

I feel the image should be able to specify this requirement and we shouldn't make this global. So the TMPFS dockerfile command seem much more the right way to go.

If somebody used TMPFS /run then they are deciding to make /run ephemeral and then there is no need to prepopulate /run, right? I'm in favor of making this a Dockerfile command and adding --tmpfs to docker run. Then we could back out the change in libcontainer.

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Apr 21, 2015

Contributor

/var/run and /run are basically the same thing. On modern systems /var/run is a symlink to /run. I believe Ubuntu was the first to standardize on /run as a tmpfs and Fedora/systemd followed.

I actually like the idea of specifying in the base image or individual images where the TMPFS directories are. So I have no problem moving to this model, but you still need to be able to store files/directories in the underlying directory and copying it into the TMPFS.

For example apache expects their to be a /run/httpd directory and will not work without it. Putting this in the base image and allowing libcontainer/docker to preserve it onto the tmpfs is necessary.

Where do we draw the line?

I draw the line at matching what standard distros define. /run and /tmp as tmpfs. But if we go with putting it in the base images, I can go along with that. The goal with /tmp is for --read-only, otherwise not that important.

We draw this fine line between what systemd requires and what is "standard" in Linux because what major distro isn't influenced by systemd now.

Not sure the answer to this question. But trust me I am trying hard to get systemd to change some of its decisions. Some times I feel like I am between a rock (systemd) and a hard place (docker). :^)

Wasn't the notion of /run really introduced by systemd to begin with?

No I think it pre-existed systemd.

If somebody used TMPFS /run then they are deciding to make /run ephemeral and then there is no need to prepopulate /run, right? I'm in favor of making this a Dockerfile command and adding --tmpfs to docker run.

I would prefer a syntax like -v tmpfs:/tmp or -v /tmp:T

Then we could back out the change in libcontainer.

No we need the Premount/Postmount behaviour in libcontainer to be able to get the content off of the image into the tmpfs.

I could see use cases where I would want to run different containers with different /etc/httpd or /var

docker run --read-only -v tmpfs:/var httpd

Where httpd would be mounted as a tmpfs but the contents off of the image would be copied into it.

Contributor

rhatdan commented Apr 21, 2015

/var/run and /run are basically the same thing. On modern systems /var/run is a symlink to /run. I believe Ubuntu was the first to standardize on /run as a tmpfs and Fedora/systemd followed.

I actually like the idea of specifying in the base image or individual images where the TMPFS directories are. So I have no problem moving to this model, but you still need to be able to store files/directories in the underlying directory and copying it into the TMPFS.

For example apache expects their to be a /run/httpd directory and will not work without it. Putting this in the base image and allowing libcontainer/docker to preserve it onto the tmpfs is necessary.

Where do we draw the line?

I draw the line at matching what standard distros define. /run and /tmp as tmpfs. But if we go with putting it in the base images, I can go along with that. The goal with /tmp is for --read-only, otherwise not that important.

We draw this fine line between what systemd requires and what is "standard" in Linux because what major distro isn't influenced by systemd now.

Not sure the answer to this question. But trust me I am trying hard to get systemd to change some of its decisions. Some times I feel like I am between a rock (systemd) and a hard place (docker). :^)

Wasn't the notion of /run really introduced by systemd to begin with?

No I think it pre-existed systemd.

If somebody used TMPFS /run then they are deciding to make /run ephemeral and then there is no need to prepopulate /run, right? I'm in favor of making this a Dockerfile command and adding --tmpfs to docker run.

I would prefer a syntax like -v tmpfs:/tmp or -v /tmp:T

Then we could back out the change in libcontainer.

No we need the Premount/Postmount behaviour in libcontainer to be able to get the content off of the image into the tmpfs.

I could see use cases where I would want to run different containers with different /etc/httpd or /var

docker run --read-only -v tmpfs:/var httpd

Where httpd would be mounted as a tmpfs but the contents off of the image would be copied into it.

@ibuildthecloud

This comment has been minimized.

Show comment
Hide comment
@ibuildthecloud

ibuildthecloud Apr 21, 2015

Contributor

@rhatdan How does httpd work on a normal Linux distro if /run is empty. Prepopulating a tmpfs seems a bit dirty to me because I can't really find an analogy to a typical linux distro. If I create a tmpfs mount then the folder is empty and if I want to populate it I must write script that runs on startup, or use systemd-tmpfiles(?).

Your use case of docker run --read-only -v tmpfs:/var httpd while compelling and cool, is like a poor man's COW. I'd personally opt to not support prepopulating a tmpfs.

The red flag to me is that when I see the libcontainer changes the hacker in me gets excited because you just gave me the ability to run arbitrary commands through libcontainer. This is a blocking hook that I could abuse like crazy (granted it's not exposed through Docker, but don't worry I'll get creative :) ) I prefer a bit more of a focused implementation in libcontainer to support prepopulating a tmpfs, but then I can't think of a clean design.

I think Docker should be as opinionated about user space as Linux is. This means while every major distro supports /run tmpfs, it is still just a defacto standard among distros and init systems. But Docker should support the ability to do such a thing because if you are running a Ubuntu or Fedora based image you would expect /run to be tmpfs. This reinforces in my mind that the right approach is to support it in the image.

I'd prefer TMPFS/--tmpfs over -v. The reason being that if it's a volume then the Dockerfile would have to be something like VOLUME tmpfs://:/run. That sort of breaks the design of VOLUME in Dockerfiles because current VOLUME does not allow specifying the left side of : (well you're not supposed to). We would have to make an exception for tmpfs:// which seems hacky.

Contributor

ibuildthecloud commented Apr 21, 2015

@rhatdan How does httpd work on a normal Linux distro if /run is empty. Prepopulating a tmpfs seems a bit dirty to me because I can't really find an analogy to a typical linux distro. If I create a tmpfs mount then the folder is empty and if I want to populate it I must write script that runs on startup, or use systemd-tmpfiles(?).

Your use case of docker run --read-only -v tmpfs:/var httpd while compelling and cool, is like a poor man's COW. I'd personally opt to not support prepopulating a tmpfs.

The red flag to me is that when I see the libcontainer changes the hacker in me gets excited because you just gave me the ability to run arbitrary commands through libcontainer. This is a blocking hook that I could abuse like crazy (granted it's not exposed through Docker, but don't worry I'll get creative :) ) I prefer a bit more of a focused implementation in libcontainer to support prepopulating a tmpfs, but then I can't think of a clean design.

I think Docker should be as opinionated about user space as Linux is. This means while every major distro supports /run tmpfs, it is still just a defacto standard among distros and init systems. But Docker should support the ability to do such a thing because if you are running a Ubuntu or Fedora based image you would expect /run to be tmpfs. This reinforces in my mind that the right approach is to support it in the image.

I'd prefer TMPFS/--tmpfs over -v. The reason being that if it's a volume then the Dockerfile would have to be something like VOLUME tmpfs://:/run. That sort of breaks the design of VOLUME in Dockerfiles because current VOLUME does not allow specifying the left side of : (well you're not supposed to). We would have to make an exception for tmpfs:// which seems hacky.

@tianon

This comment has been minimized.

Show comment
Hide comment
@tianon

tianon Apr 21, 2015

Member

How does httpd work on a normal Linux distro if /run is empty.

Because it has an init script that does the mkdir (or the equivalent configuration for systemd).

In Docker, the Dockerfile is as close to an "init script" as we have, save entrypoint workarounds.

Member

tianon commented Apr 21, 2015

How does httpd work on a normal Linux distro if /run is empty.

Because it has an init script that does the mkdir (or the equivalent configuration for systemd).

In Docker, the Dockerfile is as close to an "init script" as we have, save entrypoint workarounds.

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Apr 21, 2015

Contributor

Content in /run is populated via the init script or systemd-tmpfiles.

We have gone back and forth on the PreMount/PostMount command, this is what the upstream wanted so we gave it to them. Previous patches had the "tar" happening on the libcontainer side.

I think having premount/postmount could be handy for other functions like notifications of mounts. Not looking at this from a Malicious point of view. I guess if you think of libcontainer as a priv separation point of view, where libcontianer would have more privs then the caller, the execution of these commands could be risky. But I don't believe that is the way it is used currently.

Contributor

rhatdan commented Apr 21, 2015

Content in /run is populated via the init script or systemd-tmpfiles.

We have gone back and forth on the PreMount/PostMount command, this is what the upstream wanted so we gave it to them. Previous patches had the "tar" happening on the libcontainer side.

I think having premount/postmount could be handy for other functions like notifications of mounts. Not looking at this from a Malicious point of view. I guess if you think of libcontainer as a priv separation point of view, where libcontianer would have more privs then the caller, the execution of these commands could be risky. But I don't believe that is the way it is used currently.

@icecrime icecrime removed the dco/yes label Apr 23, 2015

@calavera

This comment has been minimized.

Show comment
Hide comment
@calavera

calavera Apr 29, 2015

Contributor

I agree with @crosbymichael that we should look for a more general solution here. I see the benefit of allowing to mount tmpfs directories and I understand that systemd requires /run to be mounted that way.

I don't think making this specific to /run is the way to go, though. A more general solution like Mike proposes with --tmpfs allows to fulfill systemd's requirements and is not coupled to it at the same time. Besides, this is a hard linux implementation, using --tmpfs would allow us to abstract the implementation by OS. Windows is coming.

So, I'm 👍 on allowing to mount tmpfs directories and I'll be glad to push it forward with a general solution.

Contributor

calavera commented Apr 29, 2015

I agree with @crosbymichael that we should look for a more general solution here. I see the benefit of allowing to mount tmpfs directories and I understand that systemd requires /run to be mounted that way.

I don't think making this specific to /run is the way to go, though. A more general solution like Mike proposes with --tmpfs allows to fulfill systemd's requirements and is not coupled to it at the same time. Besides, this is a hard linux implementation, using --tmpfs would allow us to abstract the implementation by OS. Windows is coming.

So, I'm 👍 on allowing to mount tmpfs directories and I'll be glad to push it forward with a general solution.

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Apr 29, 2015

Contributor

Well I am now thinking of moving the systemd requirements into a different patch set and adding a --systemd switch. There are lots of requirements that systemd and docker do not agree with, and I don't see getting either upstream to agree. Therefore I think we need docker run to support --systemd flag which would tell it that the container will run with a systemd as an init program and set up the container correctly.
Then container that do not want to run in systemd mode can continue to run as they currently do. Only problem with doing this would be users confusion in running systemd based containers failing because they forget to run with the --systemd flag.

Contributor

rhatdan commented Apr 29, 2015

Well I am now thinking of moving the systemd requirements into a different patch set and adding a --systemd switch. There are lots of requirements that systemd and docker do not agree with, and I don't see getting either upstream to agree. Therefore I think we need docker run to support --systemd flag which would tell it that the container will run with a systemd as an init program and set up the container correctly.
Then container that do not want to run in systemd mode can continue to run as they currently do. Only problem with doing this would be users confusion in running systemd based containers failing because they forget to run with the --systemd flag.

@calavera

This comment has been minimized.

Show comment
Hide comment
@calavera

calavera Apr 29, 2015

Contributor

@rhatdan docker already knows how to detect that systemd is running via libcontainer, see https://github.com/docker/docker/blob/53bef64804c6dae6662a7d55c3bb3e48b3e5dfdf/daemon/execdriver/native/driver.go#L62 for instance.

If we add --tmpfs as a flag, there is nothing stopping us to set --tmpfs /run by default when systemd is running. That way we get both, a generic implementation and a systemd sane default.

Contributor

calavera commented Apr 29, 2015

@rhatdan docker already knows how to detect that systemd is running via libcontainer, see https://github.com/docker/docker/blob/53bef64804c6dae6662a7d55c3bb3e48b3e5dfdf/daemon/execdriver/native/driver.go#L62 for instance.

If we add --tmpfs as a flag, there is nothing stopping us to set --tmpfs /run by default when systemd is running. That way we get both, a generic implementation and a systemd sane default.

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Apr 29, 2015

Contributor

It can tell if systemd is running on the host ,but not if it will be running as PID 1 in the container. systemd expects things like the SIGTERM to have different meaning then what docker expects. It expects to have /run and /sys/fs/cgroup mounted in the container, I want it to be able to write journal data to the host OS. There are a few other features that are also required to fully support systemd as pid1 in a container.

Contributor

rhatdan commented Apr 29, 2015

It can tell if systemd is running on the host ,but not if it will be running as PID 1 in the container. systemd expects things like the SIGTERM to have different meaning then what docker expects. It expects to have /run and /sys/fs/cgroup mounted in the container, I want it to be able to write journal data to the host OS. There are a few other features that are also required to fully support systemd as pid1 in a container.

@LK4D4

This comment has been minimized.

Show comment
Hide comment
@LK4D4

LK4D4 May 12, 2015

Contributor

Implementation makes sense for me. But tests failing hard.
Also this doing this by default, so I moving to design-review.
ping @docker/core-maintainers

Contributor

LK4D4 commented May 12, 2015

Implementation makes sense for me. But tests failing hard.
Also this doing this by default, so I moving to design-review.
ping @docker/core-maintainers

rhatdan added some commits May 11, 2015

Tar up contents of child directory onto tmpfs if mounted over
This patch will use the new PreMount and PostMount hooks to "tar"
up the contents of the base image on top of tmpfs mount points.

Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)

Conflicts:
	daemon/execdriver/native/create.go
Add buildflag to differentiate when container is being used in build
Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
RunPatch
Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
Merge branch 'master' of github.com:docker/docker into run
Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
@duglin

This comment has been minimized.

Show comment
Hide comment
@duglin

duglin May 25, 2015

Contributor

Just coming up to speed on this issue, but I'm not following something. Based on what I've read so far (in this PR and http://thread.gmane.org/gmane.linux.redhat.fedora.devel/146976) it appears that /run mounted as tmpfs is standard, right? If so, why wouldn't Docker do the same thing by default? Why would we want to require people to add a flag to enable something that would be enabled by default if they were running natively, outside of Docker?

Also, if we did end up with a flag, I'm not in favor of one on the daemon because if we think some people may not always want it then I'd prefer for it to be on a per-container basis. Or at least allow it for both so that a container can override what the daemon's default is.

Contributor

duglin commented May 25, 2015

Just coming up to speed on this issue, but I'm not following something. Based on what I've read so far (in this PR and http://thread.gmane.org/gmane.linux.redhat.fedora.devel/146976) it appears that /run mounted as tmpfs is standard, right? If so, why wouldn't Docker do the same thing by default? Why would we want to require people to add a flag to enable something that would be enabled by default if they were running natively, outside of Docker?

Also, if we did end up with a flag, I'm not in favor of one on the daemon because if we think some people may not always want it then I'd prefer for it to be on a per-container basis. Or at least allow it for both so that a container can override what the daemon's default is.

@icecrime icecrime removed this from the 1.7.0 milestone May 26, 2015

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan May 27, 2015

Contributor

After playing and shipping this patch for a while. We are seeing other problems with it. Biggest one being docker commit does not work the way one would expect. docker commit only saves the underlying image, not anything mounted on top of the container image. This means someone doing a

docker run -ti -n myhttpd image /bin/sh; yum install httpd; mkdir /run/httpd; ^d
docker commit myhttpd httpd

Would not end up getting /run/httpd by default.

I am now thinking the best way to handle this is just to have a big --systemd flag.

docker run --systemd ...

Which would set the container up in the mode that systemd would expect and would mount /run and /sys/fc/cgroup the way systemd would want, as well as generate journald content on the host and send the proper signals to systemd when users do a docker stop ID.

But for rank and file containers, we leave /run on the image.

Contributor

rhatdan commented May 27, 2015

After playing and shipping this patch for a while. We are seeing other problems with it. Biggest one being docker commit does not work the way one would expect. docker commit only saves the underlying image, not anything mounted on top of the container image. This means someone doing a

docker run -ti -n myhttpd image /bin/sh; yum install httpd; mkdir /run/httpd; ^d
docker commit myhttpd httpd

Would not end up getting /run/httpd by default.

I am now thinking the best way to handle this is just to have a big --systemd flag.

docker run --systemd ...

Which would set the container up in the mode that systemd would expect and would mount /run and /sys/fc/cgroup the way systemd would want, as well as generate journald content on the host and send the proper signals to systemd when users do a docker stop ID.

But for rank and file containers, we leave /run on the image.

@alberts

This comment has been minimized.

Show comment
Hide comment
@alberts

alberts May 27, 2015

We've been running systemd in Docker in production for a few months and it mostly works and has helped us do many things where multiple Dockers wouldn't have made sense. Big +1 on --systemd. We still need --tmpfs to be able to use --read-only more widely, without breaking apps that need small scratch directories where -v or VOLUME would be overkill (or having the scratch disk on physical disk would be bad for performance).

alberts commented May 27, 2015

We've been running systemd in Docker in production for a few months and it mostly works and has helped us do many things where multiple Dockers wouldn't have made sense. Big +1 on --systemd. We still need --tmpfs to be able to use --read-only more widely, without breaking apps that need small scratch directories where -v or VOLUME would be overkill (or having the scratch disk on physical disk would be bad for performance).

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan May 27, 2015

Contributor

I agree I like the idea of --tmpfs although I think -v tmpfs:/PATH would be more consistent. If docker upstream would decide which to do, I could have a patch available in a couple of hours.

I will push to get --systemd pull request together by next week.

Contributor

rhatdan commented May 27, 2015

I agree I like the idea of --tmpfs although I think -v tmpfs:/PATH would be more consistent. If docker upstream would decide which to do, I could have a patch available in a couple of hours.

I will push to get --systemd pull request together by next week.

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan May 27, 2015

Contributor

But remember with --systemd or --tmpfs, we need to document that docker commit will not save any content that is stored on tmpfs, or for that matter any volume mounted content onto the new image, which might surprise some users.

Contributor

rhatdan commented May 27, 2015

But remember with --systemd or --tmpfs, we need to document that docker commit will not save any content that is stored on tmpfs, or for that matter any volume mounted content onto the new image, which might surprise some users.

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan May 28, 2015

Contributor

Closing this and replacing it with

#13525

Contributor

rhatdan commented May 28, 2015

Closing this and replacing it with

#13525

@rhatdan rhatdan closed this May 28, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment