-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add note about recovery in docker containers #6
Comments
Yeah that's a bummer, just found that out. I think there is no easy solution to that. The executor processes when executing Mesos slave the normal way (without docker container) are switched to the init process (pid 1). To my understanding this is not be possible when a containerized process dies, because it would violate the whole container idea. However, it may be possible to create a Mesos executor image (basically the Mesos slave image but without the mesos-slave ENTRYPOINT) and switch the mesos-executor binary/script of the Mesos slave image with a script that executes the Mesos executor image. Since Mesos slave is already spawning Docker containers on the host system ( At least the Task containers launched by the Mesos slave container don't die - I've tested this. They just get killed when the Slave container is restarted, because the it cannot find a running executor. The Executor containers probably need to be started with Sounds hacky, but it maybe it will work. I'll try that. |
@bobrik I did it. Basically, with the method I mentioned in my last post. Roughly explained: Now it is possible to restart the mesos-slave Service without losing/killing the executor (it's a Docker container on the host machine!), and on restart mesos-slave finds the executor and reconnects it. Yay! I'll post the code here ASAP. |
The solution I came up with: https://github.com/ederst/docker-containers/tree/master/mesos/dockerfile-templates |
Is this still an issue with 0.25.0? As I understand, this was fixed in 0.23.0? |
Good question. AFAIK we never implemented my somewhat hacky solution. |
Well, I was asking because #20 was closed, which references this issue. I'm not really sure if using the |
@tobilg I did mention this issue in #20. Unfortunately it still isn't working for me even with 0.25.0. Did you manage to run mesos-slave with docker containers recovery(e.g. after slave restart/failure)? If yes, I'd really appreciate if you share how you start mesos on CoreOS. |
@gregory90 the way to run the slave so you can recover correctly is that you need to both 1) Set the right flags when you run the mesos slave docker container 2) Set the right flags when you start the slave.
And basically the flag allows Mesos slave to launch all executors in also docker containers, so when the mesos-slave restarts again it will be able to reattach and find the running executors in the containers. |
@tnachen what I'm trying to say is I'm still having problems with containers recovery on mesos with those settings. I'm still not sure if it's misconfiguration on my side. I've provided all the information I could think of in https://www.mail-archive.com/user@mesos.apache.org/msg04983.html . Is there something suspicious? |
@gregory90 so looks like the container was sigtermed? what shows up on the host when you do docker ps -a, and also docker inspect on the finished docker container? |
@tnachen docker ps -a: https://gist.github.com/gregory90/3aa141ed6acb56f02ee1 |
I just tested it with the official Removing this option enables the task deployment again. Current config:
|
@tnachen I just tried it on boot2docker on OSX (thinking the problem was CoreOS specific) and can't make docker recovery to work.
|
The problem with docker executor container is that it doesn't have appropriate mounts for docker files like /var/run/docker.sock:/var/run/docker.sock and /usr/bin/docker:/usr/bin/docker and thus not able to launch docker tasks.
I couldn't find the way to specify appropriate mounts for executor containers so far. |
@bobrik I don't think that this solves the problem. If I understand correctly, Mesos doesn't provide a way yet to even specifiy additional options together with the |
Ah, right, you try to make executor in the separate container work. I'd prefer having MESOS-3573 resolved. |
@tnachen Great to see a flag like
According to the logs it seems that the This is because the Mesos slave Docker image does not include a Docker binary (my original slave container mounts that binary to Without the Edit: aesthetic changes... Edit2: Adding something like this to the code of docker.cpp (where the slave starts the executor in the container) would probably help:
(flags.docker == Edit3: It probably would work with the Images of mesoscloud/mesos-slave since a docker binary is included in those. However there are no images for Mesos >=0.25* and having a different version of a Docker binary accessing a registry on the host via a mounted socket does not always work, as far as I remember (some versions of Docker requiring a certain version of the registry, etc.). |
I have implemented my change (adding the binary volume mount to the executor), and compiled it to test it. With this change, the executor container is able to find the docker binary (duh), but another problem arises: it seems that the fetching does is not triggered anymore What this means is, we are using Mesos in combination with the Jenkins Mesos Plugin, and the fetcher should download the slave.jar (and also some other stuff defined in "Additional URIs"), but this does not happen. When looking at the stdout/stderr outputs in the sandbox, it seems that the fetching stuff is never called:
Without the
It looks like the executor containerization does not work as expected. |
From the comments seems like the |
@asridharan Seems to me that the skipped fetcher step is resolved by issue https://issues.apache.org/jira/browse/MESOS-4249 (version 0.28.0), have not tested it but it looks good Still, the only issue which seems to prevail is that the Docker cotainer running a Mesos agent would need a Docker executable either installed in the corresponding Docker image (which could lead to incompatibilities between the mounted in Docker socket) or a mounted in Docker executable. The latter would require a code change in
Which would mount the Docker executable from the host to the Mesos agent container by setting the path to the Docker container with Maybe I should open this issue at https://issues.apache.org since this seems to be the wrong place for this. |
The docker container running the mesos executor not only needs the docker binary but also other host libraries might be needed too such as:
|
Since all executors run in the same container as mesos-slave, they all die during upgrade. This results to inability to reconnect to executors during upgrades which leads to wiping of all running tasks.
This should be mentioned on docker hub page.
If there is a solution, I'd love to know about it.
The text was updated successfully, but these errors were encountered: