Room for improvement on handling failures in migrations when deploying Pycon #55

dpoirier · 2015-08-13T13:14:53Z

I noticed this yesterday - running migrations during a deploy (highstate) is conditional on the code having changed. Which makes a lot of sense, given that the highstate runs many, many times a day. However, if the migration fails, it fails that highstate run, but subsequent runs don't try to run the migration again (because the code was updated in the previous run, so in the later run it's not changing), and so subsequent runs appear to succeed even though things are actually not in the proper state anymore.

I'm not sure what the best fix is though. We could hack up the way we run migrations so we run them when the code has changed or the previous migration failed (keeping track of that somehow), but that's pretty kludgey. And anyway, unless someone has fixed something manually, migrations aren't suddenly going to start working without a code change.

Or we could just bite the bullet and remove the condition, so Django checks whether any migrations need to run on each highstate. Maybe we should also consider whether deploys should run in a frequent periodic highstate... but at least this way, if something was wrong each highstate would fail until it was fixed.

What we'd want ideally would be for all the changes in a deploy to happen in a transaction (somehow), so if anything fails, no changes take effect. The previous system with Chef was set up that way with regard to the source code, but not the database or the virtualenv, so things could still get out of sync when there was a failure. And I don't think anyone has a great solution for that.

dpoirier · 2015-09-21T13:13:02Z

We deployed a fix on Friday afternoon to our staging server, but we were still getting errors showing tracebacks from the previous code this morning. I verified that the code checked out on the server was the updated code, so apparently the server processes (or at least one of them) were still running the old code.

Here's a bit of the minion log from Friday:

2015-09-18 21:32:09,753 [salt.loaded.int.module.cmdmod][ERROR   ] Command '/srv/pycon/env/bin/python manage.py migrate --noinput && /srv/pycon/env/bin/python manage.py compress --force && /srv/pycon/env/bin/python manage.py collectstatic -v0 --noinput' failed with return code: 139
2015-09-18 21:32:09,757 [salt.loaded.int.module.cmdmod][ERROR   ] stderr: /bin/bash: line 1:  9661 Segmentation fault      /srv/pycon/env/bin/python manage.py migrate --noinput
2015-09-18 21:32:09,757 [salt.loaded.int.module.cmdmod][ERROR   ] retcode: 139
2015-09-18 21:32:09,760 [salt.state       ][ERROR   ] {'pid': 9660, 'retcode': 139, 'stderr': '/bin/bash: line 1:  9661 Segmentation fault      /srv/pycon/env/bin/python manage.py migrate --noinput', 'stdout': ''}

I think here's what happened:

We updated the staging branch
The next deploy checked out the updated code
A later state segfaulted while trying to run migrations (I don't know why it segfaulted), so that Salt never got to the server restart step
The next time the deploy ran, the code didn't need to be updated, so none of the rest of the steps that include restarting the server ever got run

I still don't have any wonderful ideas for fixing this kind of problem. See my previous comments in this issue.

ewdurbin · 2022-07-21T16:12:26Z

We now deploy the pycon site via Heroku, so this is no longer relevant.

dpoirier mentioned this issue Sep 23, 2015

Possible way to retry migrates etc when they fail the first time #65

Closed

ewdurbin closed this as completed Jul 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Room for improvement on handling failures in migrations when deploying Pycon #55

Room for improvement on handling failures in migrations when deploying Pycon #55

dpoirier commented Aug 13, 2015

dpoirier commented Sep 21, 2015

ewdurbin commented Jul 21, 2022

Room for improvement on handling failures in migrations when deploying Pycon #55

Room for improvement on handling failures in migrations when deploying Pycon #55

Comments

dpoirier commented Aug 13, 2015

dpoirier commented Sep 21, 2015

ewdurbin commented Jul 21, 2022