New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker Swarm update killing old containers too early before container is fully initialized #35881
Comments
Have you tried |
@chanwit your suggestion is based on a time delay. I need docker swarm to be notified that the container is "initialised and ready" after my |
@thaJeztah Is there another way to achieve this? Perhaps "the docker" way? Notifying that a container is ready and initialized seems like something that goes without saying. |
So, first of all; scaling your service would resolve the primary issue: if you only have a single instance of a service, it means you have no redundancy, so when that one instance of the service goes down, your service is down. The
This really sounds like the wrong way of doing things: how are you able to tell if your container will work at all? Fetching/updating the code for your service at runtime is risky, and throws away one of the major advantages of using containers; providing a reproducible environment for your service. PHP is an interpreted language, but doing a Containers should be treated immutable (where possible), and Instead of fetching the source code at runtime, this should be done beforehand; during Building an image from source (in your case; adding PHP code, fetching dependencies etc) allows you to (build,) test, and run your image before you deploy it: you can verify the image, have it scanned for vulnerabilities, and know exactly what code will be deployed. It also allows you to revert to a previous version of your code by using a previous version of the image. And if you use multi-stage builds, you can keep your images minimal, and keep tools that are only needed during build out of the final image (reducing the risk of deploying an image that contains vulnerabilities), for example: # the build-stage
FROM some-image AS build-stage
RUN apt-get update && apt-get install <some build tools>
# copy your site's source files to the image (assuming the Dockerfile
# is kept in source control together with your PHP source - doing so
# doesn't require you to `git clone` your source code, and makes
# the build predictable.
COPY . /src
RUN build your site, cleanup things, optimize, etc
# start with a clean php/apache image for the final image
FROM php:7-apache
# add the site to this image
COPY --from build-stage /site /var/www/html |
I have a build script that installs all the required software.
That's what the healthcheck is for. If after the init script is complete (and it signals to docker that the container is ready and initialised) and healthcheck fails, then you know container failed. Quite simple. |
You mentioned increasing replica count. That doesn't do anything because Docker swarm kills off all the old containers and loads new containers before they are fully initialised. |
Your analogy comparing PHP and a compiled binary is also not quite the same because I need to pull a different PHP codebase based on the environment variable. If I could docker-build based on environment variables, that would be amazing but despite numerous complaints such as: #6822 (comment), I accept Docker's reasoning as sound. So your suggestion of doing everything at the Docker build stage is not adequate. |
Why is it different? You're building an application (website) from a different code base; the only difference with a compiled binary is that it doesn't "compile" a binary. If your code uses (e.g.) composer, or (idk) generates stylesheets, you're building an application.
Swarm does not help you there; it will create a new instance of the service if the service becomes unhealthy, but during that time, the service will be down (because there's no redundancy)
A health check checks if the service is "healthy", but it's not a replacement for testing that the application works without issues; a file may be missing, a bug may be in one of your .php files, and the only instance of exactly that version of the code is now in that container, not in any image.
Have you tried the |
|
hm, yes, looks like documentation is sparse on that; I opened an issue for that; docker/cli#795 |
Isn't that the default? In any case, that's of no use to me. |
No, it's not the default: not all use-cases can handle multiple instances being started; the default is "stop-first"; see the output of
Can you elaborate why it's of no use to you? Because I think this is what you're asking for: Here's a simple example; Build version 1 of our image docker build -t myapp -<<EOF
FROM nginx:alpine
RUN echo 'VERSION 1' > /usr/share/nginx/html/index.html
EOF Deploy the service, with a health-check (this healthcheck will become "healthy" after a number of tries, illustrating your "installation" steps of the container): docker service create --name=example-start-first \
--health-cmd='if [ ! -f "/count" ] ; then ctr=0; else ctr=`cat /count`; fi; ctr=`expr ${ctr} + 1`; echo "${ctr}" > /count; if [ "$ctr" -gt 2 ] ; then exit 0; else exit 1; fi' \
--health-interval=10s \
--health-timeout=3s \
--health-retries=3 \
--health-start-period=60s \
--update-order=start-first \
--update-parallelism=1 \
-p8080:80 \
myapp:latest Meanwhile, in another shell, try connecting to the service ( Every 2.0s: curl localhost:8080
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed to connect to localhost port 8080: Connection refused After 40 seconds or so, the service becomes healthy, and connection succeeds, showing
Now, build a new version of the image (to illustrate the new version being deployed); docker build -t myapp -<<EOF
FROM nginx:alpine
RUN echo 'VERSION 2' > /usr/share/nginx/html/index.html
EOF And update the service (I tagged both images
In the second shell (which is still running Every 2.0s: curl localhost:8080
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0100 10 100 10 0 0 1381 0 --:--:-- --:--:-- --:--:-- 1428
VERSION 1 Until (after 40 seconds) the updated instance becomes healthy, at which point traffic is no longer routed to the old instance, but now cut-over to the new one; Every 2.0s: curl localhost:8080
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0100 10 100 10 0 0 1410 0 --:--:-- --:--:-- --:--:-- 1428
VERSION 2 At that point, the original instance is stopped; docker service ps example-start-first
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
kxdtb4p9lzfp example-start-first.1 myapp:latest linuxkit-025000000001 Running Running about a minute ago
3a6448vf2ybg \_ example-start-first.1 myapp:latest linuxkit-025000000001 Shutdown Shutdown 58 seconds ago |
Thank you for your detailed assistance but unfortunately it still isn't applicable to me because you did a second docker build to represent the updated code base. My repo path for code base entirely changes hence its defined by a environment variable. The only way to reconcile your method is to:
|
Correction:
|
Have you tried? The different images is just for illustration; updating an environment variable is even easier as it wouldn't require Take this as image instead: docker build -t myapp -<<EOF
FROM nginx:alpine
ENV VERSION=default
CMD echo hello \$VERSION > /usr/share/nginx/html/index.html; exec nginx -g 'daemon off;'
EOF Deploy the service, using docker service create --name=example-start-first \
--env=VERSION=1 \
--health-cmd='if [ ! -f "/count" ] ; then ctr=0; else ctr=`cat /count`; fi; ctr=`expr ${ctr} + 1`; echo "${ctr}" > /count; if [ "$ctr" -gt 2 ] ; then exit 0; else exit 1; fi' \
--health-interval=10s \
--health-timeout=3s \
--health-retries=3 \
--health-start-period=60s \
--update-order=start-first \
--update-parallelism=1 \
-p8080:80 \
myapp:latest Verify that the service is running (i.e. it prints
Update the service, update the docker service update --env-add=VERSION=2 --force example-start-first And, after 40 seconds, see that it prints Every 2.0s: curl localhost:8080
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0100 8 100 8 0 0 1309 0 --:--:-- --:--:-- --:--:-- 1333
hello 2 |
Yea maybe said a different way on your desire to have zero downtime of a service update @pjebs: If you had
To ensure when you do a
If you have those two things set in the
Does this make sense? Have you tried it this way? |
I'll be looking into this issue and the submitted solutions in a few weeks. Got eye strain issues and can't use computer much |
@thaJeztah I tried out your solution. It's definitely the correct approach. What's your suggestion on how to automatically delete the image since after rebuilding the image with a later version of the php application, the old image needs to be deleted. |
@pjebs you'd need to use the prune command on each node, which you can do from a service once a day with something like:
|
This seems relevant: |
Is there a reason build ARGs aren't used in this scenario? They seem like the exact use case. |
This is a great way to do a docker prune! @BretFisher I'm using a standard cron at the moment.
about |
If you don't want to update the container's command, or want to create a "cron" for minimal containers that don't have a shell, or For example, the example below creates a service for which Docker spins up a new task every 30 seconds. The container's command is not a long-running process, so the container will exit directly after. Docker (SwarmKit) will notice the container exits, and because of that, retries after 30 seconds ( When deploying the service from the command line, use the docker service create \
--detach \
--restart-delay=30s \
--name cronnie \
busybox date '+%Y-%m-%d %H:%M:%S doing my thing' Checking logs for the service above shows something like;
|
Hi,
Updating a web service behind Traefik.If I hit the webpage during this step
... I get one bad request (404), and then it's all good. I feel the issue is that my reverse-proxy (Traefik 1.6.2) doesn't have the time to catch up. So when the request hits to the previous service (404), Traefik refresh all services and find the newest VIP address, and the next request is properly served. It there a way to force Traefik to update the VIP "table" at this very moment? |
My service has a DOCKERFILE which looks like this at the end:
The webserver is initialized at the end of the initialization script after many other things are done before hand (such as git-pull the new code for my PHP project).
I have noticed that
docker swarm update --force XXX
seems to kill the old container and assume the new container is good to go BEFORE myinit.sh
script is finished.This means my website has a blackout for a minute or 2 until the init script finishes loading everything.
How can I solve this?
The text was updated successfully, but these errors were encountered: