New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use local DB and clock process in every container #5236

Closed
wants to merge 6 commits into
base: master
from

Conversation

Projects
None yet
1 participant
@pmac
Member

pmac commented Nov 1, 2017

Removes our reliance on an external database server. A.K.A:

THE SQLITENING

@pmac

This comment has been minimized.

Show comment
Hide comment
@pmac

pmac Nov 2, 2017

Member

@jgmize I've made some changes that might seem unrelated, but this effort has uncovered some things. The way we handle Git repos for management commands has changed for example, since the initial population of the DB doesn't have the app config it could set the wrong remote, so now we don't use git remotes since git remotes are really just conveniences and you can use the git repo URL.

I also further improved the cron health check to get the health check task names and expected run intervals from the job configs themselves. It's possibly a tad hacky, but has the advantage of making sure that the first run of the health check view will set the last-run file mtimes so that it won't fail immediately on deploy if the app starts before the cron process.

tl'dr: coming along nicely

Member

pmac commented Nov 2, 2017

@jgmize I've made some changes that might seem unrelated, but this effort has uncovered some things. The way we handle Git repos for management commands has changed for example, since the initial population of the DB doesn't have the app config it could set the wrong remote, so now we don't use git remotes since git remotes are really just conveniences and you can use the git repo URL.

I also further improved the cron health check to get the health check task names and expected run intervals from the job configs themselves. It's possibly a tad hacky, but has the advantage of making sure that the first run of the health check view will set the last-run file mtimes so that it won't fail immediately on deploy if the app starts before the cron process.

tl'dr: coming along nicely

@pmac

This comment has been minimized.

Show comment
Hide comment
@pmac

pmac Nov 2, 2017

Member

You can see the new fancy task health check on my demo:

https://bedrock-demo-pmac.us-west.moz.works/healthz-cron/

Member

pmac commented Nov 2, 2017

You can see the new fancy task health check on my demo:

https://bedrock-demo-pmac.us-west.moz.works/healthz-cron/

@pmac

This comment has been minimized.

Show comment
Hide comment
@pmac

pmac Nov 3, 2017

Member

Fixes issue #5235

Member

pmac commented Nov 3, 2017

Fixes issue #5235

@pmac

This comment has been minimized.

Show comment
Hide comment
@pmac

pmac Nov 3, 2017

Member

Did some testing with a local container and ab. I had the cron process set to constantly be reloading the security advisories data from disk to the db and ran the following while that was happening:

$ ab -t 120 -c 4 http://localhost:8000/en-US/security/advisories/
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Finished 809 requests


Server Software:        meinheld/0.6.1
Server Hostname:        localhost
Server Port:            8000

Document Path:          /en-US/security/advisories/
Document Length:        302890 bytes

Concurrency Level:      4
Time taken for tests:   120.097 seconds
Complete requests:      809
Failed requests:        0
Total transferred:      246914078 bytes
HTML transferred:       245643298 bytes
Requests per second:    6.74 [#/sec] (mean)
Time per request:       593.807 [ms] (mean)
Time per request:       148.452 [ms] (mean, across all concurrent requests)
Transfer rate:          2007.76 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   119  588 549.5    441    3759
Waiting:      119  412 405.2    301    3758
Total:        119  589 549.5    441    3759

Percentage of the requests served within a certain time (ms)
  50%    441
  66%    564
  75%    671
  80%    751
  90%   1122
  95%   1594
  98%   2471
  99%   3288
 100%   3759 (longest request)

Looks pretty good to me.

Member

pmac commented Nov 3, 2017

Did some testing with a local container and ab. I had the cron process set to constantly be reloading the security advisories data from disk to the db and ran the following while that was happening:

$ ab -t 120 -c 4 http://localhost:8000/en-US/security/advisories/
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Finished 809 requests


Server Software:        meinheld/0.6.1
Server Hostname:        localhost
Server Port:            8000

Document Path:          /en-US/security/advisories/
Document Length:        302890 bytes

Concurrency Level:      4
Time taken for tests:   120.097 seconds
Complete requests:      809
Failed requests:        0
Total transferred:      246914078 bytes
HTML transferred:       245643298 bytes
Requests per second:    6.74 [#/sec] (mean)
Time per request:       593.807 [ms] (mean)
Time per request:       148.452 [ms] (mean, across all concurrent requests)
Transfer rate:          2007.76 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:   119  588 549.5    441    3759
Waiting:      119  412 405.2    301    3758
Total:        119  589 549.5    441    3759

Percentage of the requests served within a certain time (ms)
  50%    441
  66%    564
  75%    671
  80%    751
  90%   1122
  95%   1594
  98%   2471
  99%   3288
 100%   3759 (longest request)

Looks pretty good to me.

@pmac

This comment has been minimized.

Show comment
Hide comment
@pmac

pmac Nov 15, 2017

Member

After talking with @jgmize we'd like to try another direction. The current implementation has every bedrock container fetching data from all of the sources. This is pretty inefficient as well as potentially more error prone since they're all fetching data across the internet. We should be able to scale down the number of running containers with this change, but it's still not great.

New Proposal

We should have a single container (or Jenkins job) running in a single region that is doing the DB updates. This container will upload the file to S3 when the file changes, naming it after the git-sha of the running bedrock code. Naming it after the git-sha ensures that we don't run into any schema change issues.

The bedrock containers meanwhile will have a process that will be checking for updated database files on a schedule. When the check returns a new file, it will save said file, and swap the new db file for the old one and all should be well. It will do the swap via symlinks. It should work something like:

  1. Bedrock SQLite db file is generated during deployment and included in the built image. This DB file is named after its own hash calculation and symlinked to bedrock.db, which bedrock is configured to use.
  2. The db-updater process will check for new db file at the S3 URL using etags to only fetch it if it's new. If the response is 404 it will do nothing as it will assume that the updater for this particular git sha build of bedrock has not found any data updates yet.
  3. When it gets a new file it will name it with the new hash of the database file, check the integrity of the new file, and swap the bedrock.db symlink to the new file.
  4. It will then delete the old database file.
  5. Profit.

This will mean that we have only a single updater process to monitor rather than n = len(bedrock_containers). And it will mean that a good and updated database file will be publicly available for everyone, which should make our dev environment much easier to setup.

Member

pmac commented Nov 15, 2017

After talking with @jgmize we'd like to try another direction. The current implementation has every bedrock container fetching data from all of the sources. This is pretty inefficient as well as potentially more error prone since they're all fetching data across the internet. We should be able to scale down the number of running containers with this change, but it's still not great.

New Proposal

We should have a single container (or Jenkins job) running in a single region that is doing the DB updates. This container will upload the file to S3 when the file changes, naming it after the git-sha of the running bedrock code. Naming it after the git-sha ensures that we don't run into any schema change issues.

The bedrock containers meanwhile will have a process that will be checking for updated database files on a schedule. When the check returns a new file, it will save said file, and swap the new db file for the old one and all should be well. It will do the swap via symlinks. It should work something like:

  1. Bedrock SQLite db file is generated during deployment and included in the built image. This DB file is named after its own hash calculation and symlinked to bedrock.db, which bedrock is configured to use.
  2. The db-updater process will check for new db file at the S3 URL using etags to only fetch it if it's new. If the response is 404 it will do nothing as it will assume that the updater for this particular git sha build of bedrock has not found any data updates yet.
  3. When it gets a new file it will name it with the new hash of the database file, check the integrity of the new file, and swap the bedrock.db symlink to the new file.
  4. It will then delete the old database file.
  5. Profit.

This will mean that we have only a single updater process to monitor rather than n = len(bedrock_containers). And it will mean that a good and updated database file will be publicly available for everyone, which should make our dev environment much easier to setup.

pmac added some commits Oct 30, 2017

Use local DB and clock process in every container
* Removes our reliance on an external database server.
* Adds a /healthz-cron/ URL that will 500 if no data updates
  have happened in more than 10 minutes.
Make cron health check more comprehensive
Will now check and report on each cron task individually.
Also include the data git repos for prod-details, security
advisories, and release notes in the docker image.
Git repos avoid using remotes
Strange state when initial setup of git repos
happened under a different config than the running
container. This should fix this by using repo URLs
directly.
Get cron config for health check from cron file
cron.py will now print a CSV of its config to a tmp file
at startup which the health check view will read.
Move to a multistage dockerfile and build
Moves us from 4 dockerfiles to 1 \o/
@pmac

This comment has been minimized.

Show comment
Hide comment
@pmac

pmac Dec 6, 2017

Member

Closing in favor of #5334

Member

pmac commented Dec 6, 2017

Closing in favor of #5334

@pmac pmac closed this Dec 6, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment