Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not die after failing to load app.psgi #117

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

kappa
Copy link

@kappa kappa commented Sep 27, 2015

This stops the error flood in logs when app.psgi is broken
due to endless forking of immediately dying children.

Fixes #94 and #106.

This stops the error flood in logs when app.psgi is broken
due to endless forking of immediately dying children.
As suggested by Aristotle Pagaltzis.
@maros
Copy link

maros commented Sep 9, 2020

Could we get a release with this patch included, since this is what effectively prevents us from using starman in production, and also what has lead to the fork of starwoman?! If you need some more testing and/or an further improvements, please let me know if i can help.

@miyagawa
Copy link
Owner

miyagawa commented Sep 9, 2020

I understand the issue but am not fully convinced that this PR is the right fix because if the .psgi has an initialization code that can die (such as opening a configuration file, or opening a database connection), then it will still repeatedly die and cause the same flood of repeated execution errors until it succeeds. You can see that with starman -e 'die if rand > 0.01; sub {}' and it will eventually stop showing them when all workers have initialized the app successfully. I'm not going to claim that's the best and most correct behavior, but at least this is a consistent behavior and it might be a breaking change to the users who do expect it.

Special casing the syntax error, although it might be the most common and makes it handy to fix in dev, might actually lead to be more inconsistent behaviors.

Maybe the right behavior is either:

  1. handle all exceptions, and retry on demand upon requests, or
  2. handle all exceptions, retry with an exponential backoff, or
  3. handle all exceptions, add a sleep and exit immediately

3 is essentially what we're doing, and adding a sleep will at least slow down the amount of errors you get. However, adding a sleep in init, or lazy-loading the app upon request, makes the server process "half up" without the full capacity, and i'm not super in favor of that personally.

It is really tricky to tell which one is the best behavior at this moment, but we need to be careful about the changing behavior, and if we choose an option that is a breaking change, there might have to be an option to turn on that change.

since this is what effectively prevents us from using starman in production

my company has been using starman in production for years and has no issues around this, but I guess mainly because we use it with --preload-app.

@miyagawa
Copy link
Owner

miyagawa commented Sep 9, 2020

i've been putting more thoughts into this, and simply delaying the compilation of the app until the request time doesn't really feel like something we need/want for the "production usage" -- I don't expect most users to modify the code on the production server once it's deployed, so simply delaying it won't solve that particular problem. Granted, this will definitely fix the issue of a potential disk full situation because of the flood of endless recompilation of the app.

I lean towards thinking that the right fix would be to add some interval/backoff for retrying the compilation of the app, or somehow signal to the parent that the compilation failed, and if all children failed compilation, just let the parent exit, like you do with the --preload-app option.

Maybe this behavior should be customizable with an option. I know it might sound too much, but it's an important behavior change that might be potentially breaking existing users.

@ap
Copy link
Contributor

ap commented Sep 15, 2020

my company has been using starman in production for years and has no issues around this, but I guess mainly because we use it with --preload-app.

Same here. I switched from Starman to Starlet back to Starman because even --preload-app cannot prevent these problems if you have to have a Server::Starter process in front. If the server is standalone and uses --preload-app, no problems.

I lean towards thinking that the right fix would be to add some interval/backoff for retrying the compilation of the app or somehow signal to the parent that the compilation failed, and if all children failed compilation, just let the parent exit, like you do with the --preload-app option.

I think you can’t get around having both. If the app fails to compile, then exiting ASAP is arguably the only correct response, since no success is possible. If the app fails to start due to things like required services going missing (infamous “database connection failed” etc.), then a backoff is… essentially mandatory. And I can think of use cases for any combination of these, so ideally they should be separate options that are not mutually exclusive.

@joshnatis
Copy link

+1 for the idea of turning this behavior on with an option. That way we can bypass any breaking changes but still please people (like me :P) who think this option would be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error log is spammed if app can't be loaded
5 participants