Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail fast during sourced initialization if a component breaks #147

Closed
dpordomingo opened this issue Jul 16, 2019 · 9 comments · Fixed by #186

Comments

@dpordomingo
Copy link
Member

commented Jul 16, 2019

If a component fails during source{d} CE initialization (before opening the browser, during the spinner spins), the process should fail

@dpordomingo

This comment has been minimized.

Copy link
Member Author

commented Jul 16, 2019

I'd say that the process should fail if any of the source{d} CE components is not up, but ghsync and gitcollector components, that can be either up or Exited(0)

@dpordomingo

This comment has been minimized.

Copy link
Member Author

commented Jul 21, 2019

a normal sourced status output would be like:

Name                   Command                    State         Ports
-------------------------------------------------------------------------------
srcd-x_bblfsh-web_1    /bin/bblfsh-web -addr ...  Up            9999->8080/tcp
srcd-x_bblfsh_1        /tini -- bblfshd           Up            9432->9432/tcp
srcd-x_ghsync_1        /bin/sh -c sleep 10s  ...  Exit 0
srcd-x_gitbase_1       ./init.sh                  Up            3306->3306/tcp
srcd-x_gitcollector_1  /bin/dumb-init -- /bi ...  Exit 0
srcd-x_metadatadb_1    docker-entrypoint.sh  ...  Up            5433->5432/tcp
srcd-x_postgres_1      docker-entrypoint.sh  ...  Up            5432->5432/tcp
srcd-x_redis_1         docker-entrypoint.sh  ...  Up            6379->6379/tcp
srcd-x_sourced-ui_1    /entrypoint.sh             Up (healthy)  8088->8088/tcp

If any component (not being ghsync nor gitcollector) appears as Exit, it will be a reason to fail fast.
But what about if a component takes too much time as Restarting (what could mean the component is in a restart boot loop)? Or what about if when checking the status of the components, it is detected an Exit 1 in one component? Should it fail fast? or should it wait till docker tries to restart the component?
Should we implement a grace period, and fail not fast, but when the grace period ends?

I'm not sure if there is a clean way to proceed in the examples above.
I'd vote to just show the sourced status till the UI is available, and let the user decide how to proceed in any of the situations proposed above.

@se7entyse7en se7entyse7en self-assigned this Jul 30, 2019

@se7entyse7en

This comment has been minimized.

Copy link
Member

commented Jul 30, 2019

Ideally, I'd say that I'd give to each component a maximum number of possible restarts and/or amount of time before considering it as Failed. Then I'd exit at first component failure. Regarding ghsync and gitcollector we can avoid checking them if sourced-ce has been initialized with local.

But I also don't mind displaying the status as an initial step to have better monitoring. WDYT?

@carlosms

This comment has been minimized.

Copy link
Contributor

commented Jul 30, 2019

Why not treat a container that is restarting directly as a failure? If it's restarting right after we initialize it, before the user has a change to interact, something is very likely broken and we should tell the user so he can inspect the logs, or stop/prune...

@dpordomingo

This comment has been minimized.

Copy link
Member Author

commented Jul 30, 2019

then... are we superseding docker-compose behavior on failures, applying our own logic?

  • e.g if one component fails, let's fail-fast with an error, no matter its defined behavior for restart.

Or are we only checking the status of the started components until web.OpenUI succeeds?

  • e.g. the user run init local → we'll watch (every certain amount of time) the status of all the components but ghsync and gitcollector; if anyone appears as Exit before web.OpenUI finishes, the init will be canceled with an error message.
  • e.g. the user run init orgs → same as above, but also failing if ghsync or gitcollector appears as Exit >0
@se7entyse7en

This comment has been minimized.

Copy link
Member

commented Jul 30, 2019

I have the same concerns. If we decide to fail fast when a container is restarting, we're ignoring the restart policy defined in the compose file. On the other hand, it's true that if a component is restarting without the user had any chance to interact, then there's probably something wrong. But actually, I don't know if even before the interaction of the user there could be some temporary condition that may incur in having a container restarting at least once.

WRT restart policy after initialization I don't see any reason to supersede docker-compose behavior.

@carlosms

This comment has been minimized.

Copy link
Contributor

commented Jul 30, 2019

But what are we planning to do exactly by "fail fast"? If we are talking about killing all the containers and exit, yes it can be risky to be too aggressive and consider anything a failure.

But fail fast could also be just stopping the spinner, exit init with a -1, but leave the containers as they are. It's like saying to the user: I found something bad happened, so instead of keeping you waiting here, I'm just finishing to let you know that you can now read the logs, run web if things are fixed, stop/prune, etc...
In this case a container with exit != 0 or a container that is restarting could be safely treated the same way.
This way we are not preventing the restart policy. But we are acknowledging this restart, and letting the user know that something is wrong.

@dpordomingo

This comment has been minimized.

Copy link
Member Author

commented Jul 30, 2019

Your proposal about killing web.OpenUI, logging the failuer, and then keeping docker-compose running, LGTM @carlosms

@se7entyse7en

This comment has been minimized.

Copy link
Member

commented Jul 31, 2019

Agree! Yup, I implicitly thought that we were talking about killing the containers, but you're right, that's too aggressive and unnecessary. I like the proposal of just exiting from init 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.