Patch set to support systemd as PID1 in container #13525

Closed
wants to merge 5 commits into
from

Conversation

Projects
None yet
@rhatdan
Contributor

rhatdan commented May 28, 2015

This patch set combines a few patches to allow systemd to fully function in a non privileged container

I am closeing down some pull requests for /run and journald and putting them all behind a

docker run --systemd option.

This patch set includes a couple of patches that should get merged without using the systemd pull.

@jessfraz

This comment has been minimized.

Show comment
Hide comment
@jessfraz

jessfraz May 28, 2015

Contributor

i just can't condone a --systemd flag

Contributor

jessfraz commented May 28, 2015

i just can't condone a --systemd flag

@jessfraz

This comment has been minimized.

Show comment
Hide comment
@jessfraz

jessfraz May 28, 2015

Contributor

whats next --sysvinit, --upstart, --i-just-wrote-my-own-init-system-in-a-weekend-give-me-a-flag

Contributor

jessfraz commented May 28, 2015

whats next --sysvinit, --upstart, --i-just-wrote-my-own-init-system-in-a-weekend-give-me-a-flag

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan May 29, 2015

Contributor

--systemd is the primary init system on almost all linux distributions and ubunto has said they will adopt it. Alot of the changes are defaults for standard init systems including the proper signals to send to an init daemon.

Contributor

rhatdan commented May 29, 2015

--systemd is the primary init system on almost all linux distributions and ubunto has said they will adopt it. Alot of the changes are defaults for standard init systems including the proper signals to send to an init daemon.

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan May 29, 2015

Contributor

Ubuntu (sic) has already replaced Upstream with systemd:

https://wiki.ubuntu.com/VividVervet/ReleaseNotes/#Boot_and_service_management

It's just not in a long-term release yet.

Contributor

rhatdan commented May 29, 2015

Ubuntu (sic) has already replaced Upstream with systemd:

https://wiki.ubuntu.com/VividVervet/ReleaseNotes/#Boot_and_service_management

It's just not in a long-term release yet.

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan May 29, 2015

Contributor

I have no problem with changing the option to something like --os or --init, but this patch set is basically following these standards defined by systemd maintainers.

https://wiki.freedesktop.org/www/Software/systemd/ContainerInterface/

Contributor

rhatdan commented May 29, 2015

I have no problem with changing the option to something like --os or --init, but this patch set is basically following these standards defined by systemd maintainers.

https://wiki.freedesktop.org/www/Software/systemd/ContainerInterface/

@jessfraz

This comment has been minimized.

Show comment
Hide comment
@jessfraz

jessfraz May 29, 2015

Contributor

"Do not drop CAP_SYS_ADMIN from the container. "

Seems secure....

On Friday, May 29, 2015, Daniel J Walsh notifications@github.com wrote:

I have no problem with changing the option to something like --os or
--init, but this patch set is basically following these standards defined
by systemd maintainers.

https://wiki.freedesktop.org/www/Software/systemd/ContainerInterface/


Reply to this email directly or view it on GitHub
#13525 (comment).

Contributor

jessfraz commented May 29, 2015

"Do not drop CAP_SYS_ADMIN from the container. "

Seems secure....

On Friday, May 29, 2015, Daniel J Walsh notifications@github.com wrote:

I have no problem with changing the option to something like --os or
--init, but this patch set is basically following these standards defined
by systemd maintainers.

https://wiki.freedesktop.org/www/Software/systemd/ContainerInterface/


Reply to this email directly or view it on GitHub
#13525 (comment).

@shaded-enmity

This comment has been minimized.

Show comment
Hide comment
@shaded-enmity

shaded-enmity May 29, 2015

@jfrazelle why the straw man?

Or do you think that integrating one process manager (which the Docker daemon is) with system-wide service/process manager is bad? I don't get it. Especially in the context of this:
https://github.com/docker/docker/tree/master/contrib/init

So yeah, support natively both systemd and upstart and let users choose?
I'd just change the option to --init=[none|systemd|upstart] and all is well, isn't it?

@jfrazelle why the straw man?

Or do you think that integrating one process manager (which the Docker daemon is) with system-wide service/process manager is bad? I don't get it. Especially in the context of this:
https://github.com/docker/docker/tree/master/contrib/init

So yeah, support natively both systemd and upstart and let users choose?
I'd just change the option to --init=[none|systemd|upstart] and all is well, isn't it?

@jessfraz

This comment has been minimized.

Show comment
Hide comment
@jessfraz

jessfraz May 29, 2015

Contributor

I think there is another way we can make everyone happy, and the Init flags
were a joke chill out

On Friday, May 29, 2015, Causa Sui notifications@github.com wrote:

@jfrazelle https://github.com/jfrazelle why the straw man?

Or do you think that integrating one process manager (which the Docker
daemon is) with system-wide service/process manager is bad? I don't get
it. Especially in the context of this:
https://github.com/docker/docker/tree/master/contrib/init

So yeah, support natively both systemd and upstart and let users choose?
I'd just change the option to --init=[none|systemd|upstart] and all is
well, isn't it?


Reply to this email directly or view it on GitHub
#13525 (comment).

Contributor

jessfraz commented May 29, 2015

I think there is another way we can make everyone happy, and the Init flags
were a joke chill out

On Friday, May 29, 2015, Causa Sui notifications@github.com wrote:

@jfrazelle https://github.com/jfrazelle why the straw man?

Or do you think that integrating one process manager (which the Docker
daemon is) with system-wide service/process manager is bad? I don't get
it. Especially in the context of this:
https://github.com/docker/docker/tree/master/contrib/init

So yeah, support natively both systemd and upstart and let users choose?
I'd just change the option to --init=[none|systemd|upstart] and all is
well, isn't it?


Reply to this email directly or view it on GitHub
#13525 (comment).

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan May 29, 2015

Contributor

"Do not drop CAP_SYS_ADMIN from the container. "

Not sure what you mean, I don't want CAP_SYS_ADMIN in the container.

Contributor

rhatdan commented May 29, 2015

"Do not drop CAP_SYS_ADMIN from the container. "

Not sure what you mean, I don't want CAP_SYS_ADMIN in the container.

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan May 29, 2015

Contributor

How about we go with

--subsystem=systemd
or
--boot=systemd

Then if some other init system comes along, they could implement it without haveing to add options.

Contributor

rhatdan commented May 29, 2015

How about we go with

--subsystem=systemd
or
--boot=systemd

Then if some other init system comes along, they could implement it without haveing to add options.

@jessfraz

This comment has been minimized.

Show comment
Hide comment
@jessfraz

jessfraz May 29, 2015

Contributor

If you don't want it in the container then you should drop it

On Friday, May 29, 2015, Daniel J Walsh <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

"Do not drop CAP_SYS_ADMIN from the container. "

Not sure what you mean, I don't want CAP_SYS_ADMIN in the container.


Reply to this email directly or view it on GitHub
#13525 (comment).

Contributor

jessfraz commented May 29, 2015

If you don't want it in the container then you should drop it

On Friday, May 29, 2015, Daniel J Walsh <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

"Do not drop CAP_SYS_ADMIN from the container. "

Not sure what you mean, I don't want CAP_SYS_ADMIN in the container.


Reply to this email directly or view it on GitHub
#13525 (comment).

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan May 29, 2015

Contributor

It is dropped by default. I don't think we are communicating well. Maybe we should go to IRC. I don't understand what you are trying to say.

Contributor

rhatdan commented May 29, 2015

It is dropped by default. I don't think we are communicating well. Maybe we should go to IRC. I don't understand what you are trying to say.

@jessfraz

This comment has been minimized.

Show comment
Hide comment
@jessfraz

jessfraz May 29, 2015

Contributor

I think it's fine don't worry it's just interesting wording to warn users
not to drop a cap that isn't even added

On Friday, May 29, 2015, Daniel J Walsh notifications@github.com wrote:

It is dropped by default. I don't think we are communicating well. Maybe
we should go to IRC. I don't understand what you are trying to say.


Reply to this email directly or view it on GitHub
#13525 (comment).

Contributor

jessfraz commented May 29, 2015

I think it's fine don't worry it's just interesting wording to warn users
not to drop a cap that isn't even added

On Friday, May 29, 2015, Daniel J Walsh notifications@github.com wrote:

It is dropped by default. I don't think we are communicating well. Maybe
we should go to IRC. I don't understand what you are trying to say.


Reply to this email directly or view it on GitHub
#13525 (comment).

@shaded-enmity

This comment has been minimized.

Show comment
Hide comment
@shaded-enmity

shaded-enmity May 29, 2015

@jfrazelle Well, making everyone happy is utopic thought :)

I think that a program that does OS level virtualization should be aware about the innards of the OS it is virtualizing to simply provide the most seamless experience possible.

BTW: By default, when you create user namespace you get a full capability set, therefore it is suggested not to drop that capability. The fact that Docker drops it by default is nice, yet completely unrelated to the underlying kernel technologies, since Docker doesn't even use user namespace in the first place and therefore "relies" on the fact that CAP_SYS_ADMIN is inherited from the root account the Daemon is running as.

@jfrazelle Well, making everyone happy is utopic thought :)

I think that a program that does OS level virtualization should be aware about the innards of the OS it is virtualizing to simply provide the most seamless experience possible.

BTW: By default, when you create user namespace you get a full capability set, therefore it is suggested not to drop that capability. The fact that Docker drops it by default is nice, yet completely unrelated to the underlying kernel technologies, since Docker doesn't even use user namespace in the first place and therefore "relies" on the fact that CAP_SYS_ADMIN is inherited from the root account the Daemon is running as.

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan May 29, 2015

Contributor

Ok, you were reading lennarts article. I was wondering where these quotes were coming from. I want to run a non-privileged container, so we will not follow that part of his description as well as allowing the container to do mknod. I believe they currently disable Pr

Contributor

rhatdan commented May 29, 2015

Ok, you were reading lennarts article. I was wondering where these quotes were coming from. I want to run a non-privileged container, so we will not follow that part of his description as well as allowing the container to do mknod. I believe they currently disable Pr

@LalatenduMohanty

This comment has been minimized.

Show comment
Hide comment
@LalatenduMohanty

LalatenduMohanty May 29, 2015

Contributor

Honestly I don't see this as an issue (i.e. the idea) i.e. init subsystem specific flags . Init systems are critical piece for os/processes. I don't think anyone writes init system just like that. For example if we look at history of Linux(may be Unix too), how many times new init systems came to major Linux/GNU distributions.
So an optional flag makes the Docker experience better , then why not.

Contributor

LalatenduMohanty commented May 29, 2015

Honestly I don't see this as an issue (i.e. the idea) i.e. init subsystem specific flags . Init systems are critical piece for os/processes. I don't think anyone writes init system just like that. For example if we look at history of Linux(may be Unix too), how many times new init systems came to major Linux/GNU distributions.
So an optional flag makes the Docker experience better , then why not.

@ncoghlan

This comment has been minimized.

Show comment
Hide comment
@ncoghlan

ncoghlan Jun 2, 2015

I don't think it makes sense for Docker to go down the path of supporting arbitrary init systems running inside containers, which I believe is the implication @jfrazelle was reacting to. That expectation of having to eventually adjust Docker to handle the different init systems is implied by the "--systemd" name for the flag, as well as by the alternative suggestions of making Docker aware of multiple init systems and configuring the container differently based on which init system is being selected.

However, I do think it makes sense for Docker to better support the "multi-service container" use case, where an existing application that isn't particularly amenable to containerisation can be run inside a multi-service container, allow it to be managed and deployed using Docker based container management and orchestration tools, rather than having to have separate tooling to manage full VMs for such applications.

With a flag name like "--multiservice", this feature would let Docker define for init system developers what they need to do to integrate with Docker as the init system running inside a multi-service container. The fact that systemd already supports working this way shows that the defined interface is sufficient to get a multi-service container working correctly. Other init systems are likely to work better in a container configured this way, and if they still don't work completely, then it will be up to the developers of those systems to either update their init system to support Docker properly (as the systemd developers have), or else to propose changes to the way Docker configures multi-service containers.

ncoghlan commented Jun 2, 2015

I don't think it makes sense for Docker to go down the path of supporting arbitrary init systems running inside containers, which I believe is the implication @jfrazelle was reacting to. That expectation of having to eventually adjust Docker to handle the different init systems is implied by the "--systemd" name for the flag, as well as by the alternative suggestions of making Docker aware of multiple init systems and configuring the container differently based on which init system is being selected.

However, I do think it makes sense for Docker to better support the "multi-service container" use case, where an existing application that isn't particularly amenable to containerisation can be run inside a multi-service container, allow it to be managed and deployed using Docker based container management and orchestration tools, rather than having to have separate tooling to manage full VMs for such applications.

With a flag name like "--multiservice", this feature would let Docker define for init system developers what they need to do to integrate with Docker as the init system running inside a multi-service container. The fact that systemd already supports working this way shows that the defined interface is sufficient to get a multi-service container working correctly. Other init systems are likely to work better in a container configured this way, and if they still don't work completely, then it will be up to the developers of those systems to either update their init system to support Docker properly (as the systemd developers have), or else to propose changes to the way Docker configures multi-service containers.

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Jun 10, 2015

Contributor

Reworked this patch set to use --init=systemd rather then --systemd. Also added some more man page changes and completion code. This would allow additional init systems should they come along to add patches to run in their defined containerized world.

I would like to at least get the ok on the CLI change, so we could announce this to the Fedora world.

Contributor

rhatdan commented Jun 10, 2015

Reworked this patch set to use --init=systemd rather then --systemd. Also added some more man page changes and completion code. This would allow additional init systems should they come along to add patches to run in their defined containerized world.

I would like to at least get the ok on the CLI change, so we could announce this to the Fedora world.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@arthurlogilab

This comment has been minimized.

Show comment
Hide comment
@arthurlogilab

arthurlogilab Jun 23, 2015

we're interrested in this too, thanks @rhatdan for this pull request.

we're interrested in this too, thanks @rhatdan for this pull request.

@gw0

This comment has been minimized.

Show comment
Hide comment
@gw0

gw0 Jul 3, 2015

The main thing this patch does is sets the default graceful stop signal at create/run time. So why not name the parameter --signal?

As far as I can see not just process managers, but many daemons use different signals for graceful shutdown, so it doesn't make sense to enforce SIGTERM (check out Handling Signals section). Also check out systemd and docker discussion of this bug:

"When you send PID 1 a SIGTERM, then this means that PID 1 shall reexecute. It always has meant that, regardless if you look at sysvinit, upstart or systemd. It's one of the special semantics that PID 1 has."

Otherwise with the official docker-upstart image graceful shutdown can be achieved with SIGPWR:

# docker kill --signal="PWR" xxx1 && sleep 10 && docker kill xxx1

Systemd and others also use SIGPWR for graceful shutdown (if prepared correctly, see configure_debian_systemd()). Also newer versions of lxc-stop have SIGPWR as halt signal:

"By default, it will request a clean shutdown of the container by sending lxc.haltsignal (defaults to SIGPWR) to the container's init process, waiting up to 60 seconds for the container to exit, and then returning. If the container fails to cleanly exit in 60 seconds, it will be sent the lxc.stopsignal (defaults to SIGKILL) to force it to shut down. A request to reboot will send the lxc.rebootsignal (defaults to SIGINT) to the container's init process."

To me it seems that it would make most sense to at least put the default graceful stop (now SIGTERM) and default graceful stop timeout before killing (now only in stop command) as settings into Dockerfile and at create/run time (eg. docker create --stop-signal SIGPWR --stop-timeout 60 ...). But also overridable at stop time as timeout already is. (At stop time there could also be a --ask-before-kill option to ask the user what to do.)

gw0 commented Jul 3, 2015

The main thing this patch does is sets the default graceful stop signal at create/run time. So why not name the parameter --signal?

As far as I can see not just process managers, but many daemons use different signals for graceful shutdown, so it doesn't make sense to enforce SIGTERM (check out Handling Signals section). Also check out systemd and docker discussion of this bug:

"When you send PID 1 a SIGTERM, then this means that PID 1 shall reexecute. It always has meant that, regardless if you look at sysvinit, upstart or systemd. It's one of the special semantics that PID 1 has."

Otherwise with the official docker-upstart image graceful shutdown can be achieved with SIGPWR:

# docker kill --signal="PWR" xxx1 && sleep 10 && docker kill xxx1

Systemd and others also use SIGPWR for graceful shutdown (if prepared correctly, see configure_debian_systemd()). Also newer versions of lxc-stop have SIGPWR as halt signal:

"By default, it will request a clean shutdown of the container by sending lxc.haltsignal (defaults to SIGPWR) to the container's init process, waiting up to 60 seconds for the container to exit, and then returning. If the container fails to cleanly exit in 60 seconds, it will be sent the lxc.stopsignal (defaults to SIGKILL) to force it to shut down. A request to reboot will send the lxc.rebootsignal (defaults to SIGINT) to the container's init process."

To me it seems that it would make most sense to at least put the default graceful stop (now SIGTERM) and default graceful stop timeout before killing (now only in stop command) as settings into Dockerfile and at create/run time (eg. docker create --stop-signal SIGPWR --stop-timeout 60 ...). But also overridable at stop time as timeout already is. (At stop time there could also be a --ask-before-kill option to ask the user what to do.)

@mitar

This comment has been minimized.

Show comment
Hide comment
@mitar

mitar Jul 3, 2015

--stop-signal is a great idea!

mitar commented Jul 3, 2015

--stop-signal is a great idea!

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Jul 3, 2015

Contributor

That is fine, but lets make this a second pull request, not a replacement for this. Their are several things that need to be done to run in systemd mode correctly in addition to the signal. Setting up journald correctly being the biggest, mounting /run as tmpfs and mounting /sys/fs/cgroup into the container as Read Only.

Contributor

rhatdan commented Jul 3, 2015

That is fine, but lets make this a second pull request, not a replacement for this. Their are several things that need to be done to run in systemd mode correctly in addition to the signal. Setting up journald correctly being the biggest, mounting /run as tmpfs and mounting /sys/fs/cgroup into the container as Read Only.

+ Data: "mode=755,size=65536k",
+ Flags: flags,
+ PremountCmds: d.genPremountCmd(c, dest, m.Destination),
+ PostmountCmds: d.genPostmountCmd(c, dest, m.Destination),

This comment has been minimized.

@gw0

gw0 Jul 3, 2015

In what situation would this make sense? How can anything useful and worth preserving be in /run at mount time?

@gw0

gw0 Jul 3, 2015

In what situation would this make sense? How can anything useful and worth preserving be in /run at mount time?

This comment has been minimized.

@rhatdan

rhatdan Jul 7, 2015

Contributor

This would just allow the image developer to prepopulate the /run directory so that when the container starts up the content is still in the /run directory. If that is what you are asking.

@rhatdan

rhatdan Jul 7, 2015

Contributor

This would just allow the image developer to prepopulate the /run directory so that when the container starts up the content is still in the /run directory. If that is what you are asking.

@icecrime

This comment has been minimized.

Show comment
Hide comment
@icecrime

icecrime Jul 13, 2015

Contributor

I am sorry, I don't get it.

  • Mounts "/run" as a tmpfs,
  • mounts /sys/fs/cgroup into the container as a read/only volume
  • Adds container_uuid environment variable.
  • Sets up volume mount /var/log/journald/UUID. Allowing journald data within the container to be seen by the host journalctl.

All things you can do today with Docker, without having to make it all systemd specific. I don't want to have to know about "the systemd container specification". I'm working on Docker, not systemd.

Contributor

icecrime commented Jul 13, 2015

I am sorry, I don't get it.

  • Mounts "/run" as a tmpfs,
  • mounts /sys/fs/cgroup into the container as a read/only volume
  • Adds container_uuid environment variable.
  • Sets up volume mount /var/log/journald/UUID. Allowing journald data within the container to be seen by the host journalctl.

All things you can do today with Docker, without having to make it all systemd specific. I don't want to have to know about "the systemd container specification". I'm working on Docker, not systemd.

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Jul 13, 2015

Contributor

@icecrime You can not do this now. container_uuid can not be known before creating the container., which is required for journald to work. We also have no way of mounting /run as tmpfs yet. Finally it also changes the signals that are sent to terminate the container.

Contributor

rhatdan commented Jul 13, 2015

@icecrime You can not do this now. container_uuid can not be known before creating the container., which is required for journald to work. We also have no way of mounting /run as tmpfs yet. Finally it also changes the signals that are sent to terminate the container.

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Jul 13, 2015

Contributor

Sometimes I think people are clouding their dislike of systemd rather then doing what is best for the users. systemd is the default init system, which will be running on the majority of systems that docker will be running on going forward.

Having a systemd running properly within the container, without having everyone need to learn some fancy recipe is the goal.

Perhaps this should be moved to runc going forward, but we still would prefer docker to tell the container it will be running a traditional init system as pid1.

Contributor

rhatdan commented Jul 13, 2015

Sometimes I think people are clouding their dislike of systemd rather then doing what is best for the users. systemd is the default init system, which will be running on the majority of systems that docker will be running on going forward.

Having a systemd running properly within the container, without having everyone need to learn some fancy recipe is the goal.

Perhaps this should be moved to runc going forward, but we still would prefer docker to tell the container it will be running a traditional init system as pid1.

docs/reference/run.md
@@ -201,6 +202,14 @@ more advanced use case would be changing the host's hostname from a container.
> **Note**: `--uts="host"` gives the container full access to change the
> hostname of the host and is therefore considered insecure.
+## INIT settings (--init)
+ Default to docker standard method of running containers.

This comment has been minimized.

@timthelion

timthelion Jul 27, 2015

Contributor

Defaults to Docker's standard method of running contianers.

Though this is still confusing.

@timthelion

timthelion Jul 27, 2015

Contributor

Defaults to Docker's standard method of running contianers.

Though this is still confusing.

This comment has been minimized.

@rhatdan

rhatdan Jul 27, 2015

Contributor

Suggest some other wording?

## INIT settings (--init)
Specify alternate method of running containers.
@rhatdan

rhatdan Jul 27, 2015

Contributor

Suggest some other wording?

## INIT settings (--init)
Specify alternate method of running containers.

This comment has been minimized.

@timthelion

timthelion Jul 27, 2015

Contributor

I would write:

INIT settings (--init)

Enable a pre-configured profile for running init systems within containers.
Default: No profile is enabled.

@timthelion

timthelion Jul 27, 2015

Contributor

I would write:

INIT settings (--init)

Enable a pre-configured profile for running init systems within containers.
Default: No profile is enabled.

This comment has been minimized.

@thaJeztah

thaJeztah Jul 27, 2015

Member

While you're working on that, please add a blank line after the heading (## INIT..), otherwise it may not render properly when generating the docs

@thaJeztah

thaJeztah Jul 27, 2015

Member

While you're working on that, please add a blank line after the heading (## INIT..), otherwise it may not render properly when generating the docs

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Jul 27, 2015

Contributor

Changed the docs

Contributor

rhatdan commented Jul 27, 2015

Changed the docs

rhatdan added some commits May 27, 2015

Set container_uuid environment variable.
This tells systemd what to name the /etc/machine-id file
which links the journald within the container to to external log

Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
Tar up contents of child directory onto tmpfs if mounted over
This patch will use the new PreMount and PostMount hooks to "tar"
up the contents of the base image on top of tmpfs mount points.

We want to add support for /run and /tmp on tmpfs and Michael Chrosby idea to
add --tmpfs PATH would require a patch like this.

Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
RegisterMachine truncate container names to first 64 chars
Signed-off-by: Sally O'Malley <somalley@redhat.com>

Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
Add --init=INITSYSTEM option to run the container in different init s…
…ystem mode

If you are running a container with an init system like systemd as pid 1 this i
option will modify the default container environment to follow the standard
defined for that init system.

Specifying --init=systemd will run the container with the following changes:

 /run mounted as a tmpfs
 /sys/fs/cgroup volume mounted from the host into the container read-only
 /var/log/journal/UUID volume mounted from the host into the container to allow
journald data from the container to be seen on the host.
 Changes the default signals send from docker stop to the init process (systemd)

Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)

Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
Merge branch 'master' of github.com:docker/docker into systemd
Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
@calavera

This comment has been minimized.

Show comment
Hide comment
@calavera

calavera Aug 7, 2015

Contributor

We've discussed this change with several maintainers and we reached the agreement that we do NOT want to add a flag for a specific init system to docker.

We're going to provider the proper primitives to make this work, like allowing to stop a container with a specific signal and allowing to mount tmpfs.

Closing this PR. Thank you all for your feedback.

Contributor

calavera commented Aug 7, 2015

We've discussed this change with several maintainers and we reached the agreement that we do NOT want to add a flag for a specific init system to docker.

We're going to provider the proper primitives to make this work, like allowing to stop a container with a specific signal and allowing to mount tmpfs.

Closing this PR. Thank you all for your feedback.

@calavera calavera closed this Aug 7, 2015

@timthelion

This comment has been minimized.

Show comment
Hide comment
@timthelion

timthelion Aug 7, 2015

Contributor

@calavera It will take longer to add each of these capabilities in a general way, but I agree that it will be worth the effort.

Contributor

timthelion commented Aug 7, 2015

@calavera It will take longer to add each of these capabilities in a general way, but I agree that it will be worth the effort.

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Aug 8, 2015

Contributor

I totally disagree with this decision. I don't see a clean way to do this without at least getting the container ID information into the container. The complexity of doing this correctly is going to be a huge headache to users, and going to cause lots of bugs when systemd/docker does not work correctly.

I responded to emails discussing this issue with my concerns and then never heard back on them.

Contributor

rhatdan commented Aug 8, 2015

I totally disagree with this decision. I don't see a clean way to do this without at least getting the container ID information into the container. The complexity of doing this correctly is going to be a huge headache to users, and going to cause lots of bugs when systemd/docker does not work correctly.

I responded to emails discussing this issue with my concerns and then never heard back on them.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Aug 8, 2015

Member

@icecrime @calavera would it be a option to schedule a hangout with @rhatdan ? I've seen this issue, and several related proposals/attempts (e.g. the container_ uuid) appear for over a year; is there a ways to reach some consensus, or better discuss the approach @calavera proposes?

Member

thaJeztah commented Aug 8, 2015

@icecrime @calavera would it be a option to schedule a hangout with @rhatdan ? I've seen this issue, and several related proposals/attempts (e.g. the container_ uuid) appear for over a year; is there a ways to reach some consensus, or better discuss the approach @calavera proposes?

@timthelion

This comment has been minimized.

Show comment
Hide comment
@timthelion

timthelion Aug 8, 2015

Contributor

@thaJeztah would you mind linkng me to @calavera's approach? I am not in the in club I guess ;)

Contributor

timthelion commented Aug 8, 2015

@thaJeztah would you mind linkng me to @calavera's approach? I am not in the in club I guess ;)

@timthelion

This comment has been minimized.

Show comment
Hide comment
@timthelion

timthelion Aug 8, 2015

Contributor

@thaJeztah Or do you simply refer to his closing comments?

Contributor

timthelion commented Aug 8, 2015

@thaJeztah Or do you simply refer to his closing comments?

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Aug 8, 2015

Member

@timthelion sorry, yes, just meant the approach in his comment (not "in the loop" either, if there's any "loop", lol)

Just hoping we can find a solution that's satisfactory to all involved here ❤️

Member

thaJeztah commented Aug 8, 2015

@timthelion sorry, yes, just meant the approach in his comment (not "in the loop" either, if there's any "loop", lol)

Just hoping we can find a solution that's satisfactory to all involved here ❤️

@calavera

This comment has been minimized.

Show comment
Hide comment
@calavera

calavera Aug 10, 2015

Contributor

@timthelion There are two PRs open already to address some of the issues that this change tried to solve:

Stop signaling: #15307
Tmpfs: #13587

We already mount the cgroups directory inside the container as read-only, and other issues raised here can be work around with options that we already have.

Contributor

calavera commented Aug 10, 2015

@timthelion There are two PRs open already to address some of the issues that this change tried to solve:

Stop signaling: #15307
Tmpfs: #13587

We already mount the cgroups directory inside the container as read-only, and other issues raised here can be work around with options that we already have.

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Aug 10, 2015

Contributor

The unification of the UUID is the one that can not be fixed. Having separate UUID to refer to the docker container, is a user nightmare.

This patch along with machinectl patches basically allow a user to refer to a container based on the UUID.

The end goal is to be able to do

journalctl -M UUID

And then would see the journal from inside of the container.

This requieres RegisterMachine to register the UUID, /var/log/journal/UUID from the host mounted into the container, /etc/machine-id generated by systemd to match the UUID, which can only happen if $container_uuid=UUID before systemd starts.

Contributor

rhatdan commented Aug 10, 2015

The unification of the UUID is the one that can not be fixed. Having separate UUID to refer to the docker container, is a user nightmare.

This patch along with machinectl patches basically allow a user to refer to a container based on the UUID.

The end goal is to be able to do

journalctl -M UUID

And then would see the journal from inside of the container.

This requieres RegisterMachine to register the UUID, /var/log/journal/UUID from the host mounted into the container, /etc/machine-id generated by systemd to match the UUID, which can only happen if $container_uuid=UUID before systemd starts.

@timthelion

This comment has been minimized.

Show comment
Hide comment
@timthelion

timthelion Aug 10, 2015

Contributor

@rhatdan Then can you make a pull request that would create a --expose-container-uuid-to-container CLI option to docker run which would set $DOCKER_CONTAINER_UUID at container creation time? That should be pretty strait forward, no?

Contributor

timthelion commented Aug 10, 2015

@rhatdan Then can you make a pull request that would create a --expose-container-uuid-to-container CLI option to docker run which would set $DOCKER_CONTAINER_UUID at container creation time? That should be pretty strait forward, no?

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Aug 10, 2015

Contributor

I have had a pull request to do this for over a year

#7685

but it is not that simple.

Since we need to know the name before the container starts to create the /var/log/docker/UUID directory to mount into the container. We also need the RegisterMachine patch to be added to to docker to register the UUID before the container starts.

#13526

Contributor

rhatdan commented Aug 10, 2015

I have had a pull request to do this for over a year

#7685

but it is not that simple.

Since we need to know the name before the container starts to create the /var/log/docker/UUID directory to mount into the container. We also need the RegisterMachine patch to be added to to docker to register the UUID before the container starts.

#13526

@jeffmccune

This comment has been minimized.

Show comment
Hide comment
@jeffmccune

jeffmccune Nov 20, 2015

@calavera

We've discussed this change with several maintainers and we reached the agreement that we do NOT want to add a flag for a specific init system to docker.

Could you explain why please? I'd like to see this added for more efficient system level testing using containers running systemd rather than full VM's running systemd. I've read this comment thread a few times and it's not clear why the patch has been rejected.

@calavera

We've discussed this change with several maintainers and we reached the agreement that we do NOT want to add a flag for a specific init system to docker.

Could you explain why please? I'd like to see this added for more efficient system level testing using containers running systemd rather than full VM's running systemd. I've read this comment thread a few times and it's not clear why the patch has been rejected.

@immarvin immarvin referenced this pull request in xcat2/xcat-core Mar 7, 2016

Closed

[FVT]xCAT in docker :genimage failed #663

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment