[WIP] Docker Compose Update #57

seoester · 2018-02-03T19:03:29Z

This PR contains new two new docker-compose.yml files, one for the backend (mariadb and redis) and one for Promgen, Prometheus, etc.

The docker docs are also updated to describe how to use Docker Compose to quickly get started with promgen.

I split the docker-compose.yml file into two so that the backend can be started separately, and only when ready promgen is started. A better solution would be to integrate wait-for-db functionality into the promgen image.
However, I have not looked into this yet.

The old compose file has been split into two, one is managing only the backend (databases: mysql (mariadb) and redis), the other one contains promgen (web and worker) as well as prometheus, blackbox exporter and alertmanager. By splitting it is possible to start the backends first and wait until setup finishes. Only then promgen can be started.

codecov-io · 2018-02-03T19:07:47Z

Codecov Report

Merging #57 into master will increase coverage by 0.66%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master      #57      +/-   ##
==========================================
+ Coverage   47.14%   47.81%   +0.66%     
==========================================
  Files          76       72       -4     
  Lines        2507     2405     -102     
==========================================
- Hits         1182     1150      -32     
+ Misses       1325     1255      -70

Impacted Files	Coverage Δ
views.py
notification/user.py
notification/linenotify.py
migrations/0013_auto_20161027_0307.py
celery.py
discovery/default.py
util.py
shortcuts.py
migrations/0007_auto_20161020_0253.py
migrations/0029_auto_20170406_0236.py
... and 138 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fef9b9c...2a9e3da. Read the comment docs.

kfdm · 2018-02-05T03:38:08Z

I suppose my biggest question (again due to my lack of knowledge regarding docker best practices) but I'm a bit unsure why to split things up into two docker files. My initial impression was that the docker-compose file gave a base starting point as a demo or to get started but for most production deployments it would always require some adjustments. It's unfortunate that docker does not have a built in way to wait-for a container and Django also does not have a better way to have it reconnect so perhaps keeping it as a single docker-compose file and including a wait-for-it type check may be the way to go.

Previously I used the sentry onpremise script as a base, since Sentry is also a django application
https://github.com/getsentry/onpremise/blob/master/docker-compose.yml

Looks like there may be a default way built into newer docker versions but I have not tested it myself
https://stackoverflow.com/questions/31746182/docker-compose-wait-for-container-x-before-starting-y/41854997#41854997

seoester · 2018-02-05T15:07:19Z

I agree that applications (and docker containers as examples for microservices especially) should be resilient against failure of external dependencies / services.

I've read both about django supporting database restarts and django stalling when the database is shut down and then started again.
I would like to test this out, but might not get around to it this week. Until then, I'd like to keep this PR open.

By splitting into two files the backend can be treated as an external service which is assumed to always be available.

HEALTHCHECK is not an adequate method for waiting on other containers, it is meant for monitoring and automated life-cycle management.

* Integrated with docker-compose file

seoester · 2018-07-28T18:25:08Z

Hey, long time without any progress... But I finally got around to testing out some scenarios of db unavailability :) and mostly, it seemed to work out of the box. Put together with some additional wait-for-db functionality and the new docker-compose-bootstrap command from #92 we got a decent docker-compose configuration!

Together with some further changes, I'd like to finally remove the WIP label from this PR next week.

I'll paste my raw notes on different scenarios:

Experiments with promgen and db unavailability

Tested using the docker compose file with a recent version of the Dockerfile and

Context: Set-up and web app and worker running

mysql

When taken down

Web app returns error on request.

Worker is not impacted directly.

When coming back up

Web app works again (serves requests).

Worker is not impacted directly.

Unclear: Effect of CONN_MAX_AGE (django docs) but does not seem to affect the behaviour described above.

redis

When taken down

Requests requiring queue interaction hang forever

Celery enters retry loop (with backoff), worker exits (code 1) after a while

When coming back up

Hanging requests recover

If worker process has not exited yet, celery recovers

Context: Set-up and web app and worker not running

mysql

When down

Web app does not start properly, OperationalError is raised during system checks

See Django system checks framework: It is implicitly executed when running runserver

HTTP listener is not started

Exits after a while? (not within 10 min)

Note: What check tries to connect to the database? Seems to be a generic field check: source code django.db.models.fields method _check_backend_specific_checks (Error message indicates self.features.supports_microsecond_precision)

Worker starts up, no errors logged. Presumably it would have problems performing configuration write tasks, but those are unlikely to be issued as the web app is unavailable. (Task retry mechanism?)

When coming back up

Web app does not re-check connection. Doesn't recover.

No direct impact on worker.

redis

When down

Web app starts up correctly, see above section (Context: Set-up and web app and worker running)

Worker starts but does not start up celery worker

No retrying visible in the logs

When coming back up

For web app, see above section (Context: Set-up and web app and worker running)

Worker starts celery as soon as redis is available

seoester · 2018-07-29T14:13:56Z

This would be much improved by pending PR #95.

kfdm · 2018-07-29T23:45:01Z

docker/docker-entrypoint.sh

@@ -16,6 +16,17 @@ web)
  shift
  set -- gunicorn "promgen.wsgi:application" "$@"
  ;;
+wait-runserver)


Instead of using a separate wait-runserver block, I wonder if this would be better added to the existing 'web' block above here. Runserver is sufficient for development but perhaps the docker container should use gunicorn.

I realized that my Dockerfile entrypoint uses web though my original docker-compose uses runserver. It's probably best to make both as web since runserver is mostly for development use. What do you think?

I also thought a bit about having the promgen check ; sleep check apply to all of the commands by putting it at the top, but there are also likely valid cases or not forcing that check while debugging the system. Do you have any thoughts about this as well?

gunicorn is running a WSGI server, so this would require an additional container running nginx (or similar) to be able to serve HTTP requests.

I think it is sensible to have two different entrypoints, both with a wait mechanism:

An entrypoint starting up a HTTP server using the runserver command .

An entrypoint starting up a WSGI server using gunicorn.

I think commands should be regarded individually when considering whether to wait for db. With one-off commands run manually, it is in the user's interest to get an immediate response. For example worker does not really need to wait for db, as it's basic functionality does not depend on the availability of mysql and resilience to redis outages is already built into celery.

Perhaps for some one-off commands it would be useful to have optional wait for db functionality, for example when running it from a docker-compose file. In this case either a new command wait-… could be introduced using case fallthrough or an env var could be checked to decide whether to wait for db or try immediately.

gunicorn can speak HTTP so nginx is not required though it is often used in a production environment. I do not have a strong opinion since I tend to consider docker-compose a bit more of a demo than a fully ready production environment. When deploying for real it's likely that most would want to tweak their setup a bit.

Maybe best to keep things simple in this pass. Can always add additional wait- commands or make it global if required in the future.

Hm, what do you think about these entrypoints, both protected by promgen check:

web - gunicorn running the application. Used as the default for docker-compose.

web-dev - django runserver. Meant to be used for testing and development.

web could then be used for a more production-y setup together with nginx. Additional documentation could be compiled on how to achieve this (e.g. using two containers sharing a volume for hosting of the static files).

Sounds reasonable. I think those names are more descriptive than wait-runserver for those who may not be as familiar with django directly.

seoester · 2018-07-30T12:22:19Z

docker-compose.yml

+      PROMGEN_REGISTER_HOST: prometheus
+      PROMGEN_REGISTER_PORT: 9090
+      DATABASE_URL: mysql://root:secretrootpassword@mysql:3306/promgen
+      SECRET_KEY: some-secret-key-AhvSh+xGkm9kA,ghfi4?w8K


I'm not familiar with django commands and best practices, should this stanza also include DEBUG: 1, the CELERY_BROKER_URL? Is SECRET_KEY even required?

DEBUG would only be necessary for times when you want extra debugging information, typically just for development. Could drop it from the docker-compose file and be fine.

SECRET_KEY in theory is only required for http but I believe the django startup code errors out if it is not set somewhere. Could set a default SECRET_KEY again in settings.py so that we could remove it from the docker-compose file.

kfdm · 2018-07-31T02:03:40Z

docs/docker.rst

+The easiest way to get started is by using
+`docker-compose <https://docs.docker.com/compose/>`__.
+
+Promgen ships with two ``docker-compose.yml`` files:


Since the two files were combined, I think these docs need to also be updated :)

Yeah, true 😄 I'll update the documentation when this PR is about to settle.

github-actions · 2019-11-08T00:15:29Z

This pr is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

seoester · 2019-11-11T15:00:45Z

I'm interested in revisiting this, as we are finally deploying Promgen in a Docker Swarm.
I'm closing this now as the numerous changes to Promgen and new Docker versions require a re-write of the compose file.

seoester added 2 commits February 3, 2018 19:30

Updated docs/docker.rst with docker-compose instructions

d87b4cc

seoester mentioned this pull request Feb 4, 2018

Promgen Worker #58

Closed

kfdm mentioned this pull request Mar 15, 2018

await mysql when promgen starts #71

Closed

seoester added 2 commits July 14, 2018 13:34

docker-compose files merged

3a8c380

Docker entrypoint: Added wrapper to wait for db

c02893e

* Integrated with docker-compose file

kfdm reviewed Jul 29, 2018

View reviewed changes

docker-compose: Added promgen-setup container

2a9e3da

seoester commented Jul 30, 2018

View reviewed changes

kfdm reviewed Jul 31, 2018

View reviewed changes

github-actions bot added the Stale label Nov 8, 2019

seoester closed this Nov 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Docker Compose Update #57

[WIP] Docker Compose Update #57

seoester commented Feb 3, 2018

codecov-io commented Feb 3, 2018 •

edited

Loading

kfdm commented Feb 5, 2018

seoester commented Feb 5, 2018

seoester commented Jul 28, 2018

Experiments with promgen and db unavailability

Context: Set-up and web app and worker running

Context: Set-up and web app and worker not running

seoester commented Jul 29, 2018

kfdm Jul 29, 2018

kfdm Jul 30, 2018

seoester Jul 30, 2018

kfdm Jul 31, 2018

seoester Jul 31, 2018

kfdm Aug 1, 2018

seoester Jul 30, 2018

kfdm Jul 31, 2018

kfdm Jul 31, 2018

seoester Jul 31, 2018

github-actions bot commented Nov 8, 2019

seoester commented Nov 11, 2019

[WIP] Docker Compose Update #57

[WIP] Docker Compose Update #57

Conversation

seoester commented Feb 3, 2018

codecov-io commented Feb 3, 2018 • edited Loading

Codecov Report

kfdm commented Feb 5, 2018

seoester commented Feb 5, 2018

seoester commented Jul 28, 2018

Experiments with promgen and db unavailability

Context: Set-up and web app and worker running

Context: Set-up and web app and worker not running

seoester commented Jul 29, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Nov 8, 2019

seoester commented Nov 11, 2019

codecov-io commented Feb 3, 2018 •

edited

Loading