Do not die after failing to load app.psgi #117

kappa · 2015-09-27T20:20:42Z

This stops the error flood in logs when app.psgi is broken
due to endless forking of immediately dying children.

Fixes #94 and #106.

This stops the error flood in logs when app.psgi is broken due to endless forking of immediately dying children.

As suggested by Aristotle Pagaltzis.

maros · 2020-09-09T07:04:28Z

Could we get a release with this patch included, since this is what effectively prevents us from using starman in production, and also what has lead to the fork of starwoman?! If you need some more testing and/or an further improvements, please let me know if i can help.

miyagawa · 2020-09-09T09:56:06Z

I understand the issue but am not fully convinced that this PR is the right fix because if the .psgi has an initialization code that can die (such as opening a configuration file, or opening a database connection), then it will still repeatedly die and cause the same flood of repeated execution errors until it succeeds. You can see that with starman -e 'die if rand > 0.01; sub {}' and it will eventually stop showing them when all workers have initialized the app successfully. I'm not going to claim that's the best and most correct behavior, but at least this is a consistent behavior and it might be a breaking change to the users who do expect it.

Special casing the syntax error, although it might be the most common and makes it handy to fix in dev, might actually lead to be more inconsistent behaviors.

Maybe the right behavior is either:

handle all exceptions, and retry on demand upon requests, or
handle all exceptions, retry with an exponential backoff, or
handle all exceptions, add a sleep and exit immediately

3 is essentially what we're doing, and adding a sleep will at least slow down the amount of errors you get. However, adding a sleep in init, or lazy-loading the app upon request, makes the server process "half up" without the full capacity, and i'm not super in favor of that personally.

It is really tricky to tell which one is the best behavior at this moment, but we need to be careful about the changing behavior, and if we choose an option that is a breaking change, there might have to be an option to turn on that change.

since this is what effectively prevents us from using starman in production

my company has been using starman in production for years and has no issues around this, but I guess mainly because we use it with --preload-app.

miyagawa · 2020-09-09T19:59:21Z

i've been putting more thoughts into this, and simply delaying the compilation of the app until the request time doesn't really feel like something we need/want for the "production usage" -- I don't expect most users to modify the code on the production server once it's deployed, so simply delaying it won't solve that particular problem. Granted, this will definitely fix the issue of a potential disk full situation because of the flood of endless recompilation of the app.

I lean towards thinking that the right fix would be to add some interval/backoff for retrying the compilation of the app, or somehow signal to the parent that the compilation failed, and if all children failed compilation, just let the parent exit, like you do with the --preload-app option.

Maybe this behavior should be customizable with an option. I know it might sound too much, but it's an important behavior change that might be potentially breaking existing users.

ap · 2020-09-15T16:05:52Z

my company has been using starman in production for years and has no issues around this, but I guess mainly because we use it with --preload-app.

Same here. I switched from Starman to Starlet back to Starman because even --preload-app cannot prevent these problems if you have to have a Server::Starter process in front. If the server is standalone and uses --preload-app, no problems.

I lean towards thinking that the right fix would be to add some interval/backoff for retrying the compilation of the app or somehow signal to the parent that the compilation failed, and if all children failed compilation, just let the parent exit, like you do with the --preload-app option.

I think you can’t get around having both. If the app fails to compile, then exiting ASAP is arguably the only correct response, since no success is possible. If the app fails to start due to things like required services going missing (infamous “database connection failed” etc.), then a backoff is… essentially mandatory. And I can think of use cases for any combination of these, so ideally they should be separate options that are not mutually exclusive.

joshnatis · 2022-09-12T19:21:10Z

+1 for the idea of turning this behavior on with an option. That way we can bypass any breaking changes but still please people (like me :P) who think this option would be useful.

kappa added 2 commits September 27, 2015 23:10

Do not die after failing to load app.psgi

6e63c3d

This stops the error flood in logs when app.psgi is broken due to endless forking of immediately dying children.

Catch more properly

c193861

As suggested by Aristotle Pagaltzis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not die after failing to load app.psgi #117

Do not die after failing to load app.psgi #117

kappa commented Sep 27, 2015

maros commented Sep 9, 2020

miyagawa commented Sep 9, 2020 •

edited

Loading

miyagawa commented Sep 9, 2020

ap commented Sep 15, 2020

joshnatis commented Sep 12, 2022

Do not die after failing to load app.psgi #117

Are you sure you want to change the base?

Do not die after failing to load app.psgi #117

Conversation

kappa commented Sep 27, 2015

maros commented Sep 9, 2020

miyagawa commented Sep 9, 2020 • edited Loading

miyagawa commented Sep 9, 2020

ap commented Sep 15, 2020

joshnatis commented Sep 12, 2022

miyagawa commented Sep 9, 2020 •

edited

Loading