Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Horizon: does not exit if database connection fails #898

Closed
andrenarchy opened this issue Feb 18, 2019 · 10 comments
Closed

Horizon: does not exit if database connection fails #898

andrenarchy opened this issue Feb 18, 2019 · 10 comments
Assignees

Comments

@andrenarchy
Copy link

In satoshipay/docker-stellar-horizon#10 it has been observed that Horizon does not exit (with a non-zero exit code) if the database connection fails when it's starting up. Instead the HTTP server starts up but requests fail.

@howardtw
Copy link
Contributor

howardtw commented Mar 8, 2019

I'm not sure how this can happen because we indeed check for necessary migrations before starting the server. Can I know how you can reproduce it with CLI? @gracenoah

@gracenoah
Copy link

This happened on horizon 0.16.0. It looks like the code you are pointing to is way newer than that... I tried to dig farther back into the history of this code, and it seems like the migration check was added before 0.16.0 even though the code looked pretty different: f931bc7

I guess I'll try upgrading and let you know if CI starts failing intermittently with the same issue again.

@howardtw
Copy link
Contributor

@gracenoah that sounds good to me! Do let me know once you upgrade it and test it out so I can do what's needed to close this issue 😃

@gracenoah
Copy link

gracenoah commented Mar 18, 2019

Well, I'm running into a new failure mode with the latest version of the horizon image in CI: https://gist.github.com/gracenoah/c1ac539daf724b03554e5f491e022ff6 I'm not sure if this is an issue with the image, an issue with horizon or an issue with the way docker compose is starting the db in parallel with horizon and letting horizon just keep crashing until it succeeds to start up.

It looks like the postgres image starts postgres briefly to do some sort of initialization. I think the issue is that gorp_migrations table is created outside of a transaction the init process and then init fails, but migrate up doesn't get run.

  1. Postgres initialization short lived process starts
  2. horizon db init starts and creates gorp_migrations
  3. horizon db init starts performing migrations
  4. postgres finishes and shuts down, aborting the transaction that performed the migrations
  5. horizon tries to start and crashes because it's not migrated
  6. horizon db init starts and tries to create gorp_migrations, fails because it can't
  7. the initialization script assumes that init failed because migrations were performed, but they weren't, horizon fails to start because they weren't, etc.

I don't know if this is the same error or a new one, but it's way clearer what goes wrong. A workaround can be done in the image, but I do think that horizon db init is too fragile. @andrenarchy let me know if you are interested in merging the workaround I proposed before that covers all of the migration issues. @howardtw please help make this process more reliable.

@howardtw
Copy link
Contributor

@bartekn does this make sense to you? I don't understand why this

postgres finishes and shuts down, aborting the transaction that performed the migrations

can happen

@gracenoah
Copy link

I think this is it: https://github.com/docker-library/postgres/blob/master/docker-entrypoint.sh#L124
It's trying to start postgres without it listening for external connections but somehow the horizon binary is succeeding at connecting to it while it's doing that.

@gracenoah
Copy link

Hmm.. there was an issue about this not that long ago: docker-library/postgres#440. We use postgres:9.6.11-alpine@sha256:9ca98c730b23ecf4e0f89c4acc070ece194f43032f88e4b89a2bf942cb281b9e and I don't think that's too old, but I'll try upgrading the docker image to try to mitigate this issue for now.

@gracenoah
Copy link

Oops, I was able to verify that my current version of the image already has this fix:

pg_ctl -D "$PGDATA" \
                        -o "-c listen_addresses=''" \
                        -w start

I don't think it'll be easy to fix this by changing the postgres docker image. I would really prefer to have a fix in horizon itself.

@gracenoah
Copy link

I found a workaround that seems to work. Running db migrate up every time before starting horizon ensures that the initialization process gets completed even if it fails part way through.

@bartekn
Copy link
Contributor

bartekn commented May 12, 2020

This is very old issue that seems to be resolved. If not, let me know!

@bartekn bartekn closed this as completed May 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants