Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTPServer infinite loop on "too many open files" #2

Open
jemc opened this issue Dec 26, 2021 · 3 comments
Open

HTTPServer infinite loop on "too many open files" #2

jemc opened this issue Dec 26, 2021 · 3 comments

Comments

@jemc
Copy link
Contributor

jemc commented Dec 26, 2021

On Linux, when using the examples/http server and testing it with a large number of simultaneous connections, I commonly see one thread get stuck in an infinite loop, and the server fails to accept more connections.

After stepping through the issue in lldb, I can see that the TCPListener actor gets stuck in an infinite loop when it tries to run pony_os_accept to accept a connection and that call fails with EMFILE ("too many open files") as the errno. The TCPListener is currently written to naively assume that any failure can be considered spurious, and it tries again to accept a connection. In this failure mode, trying again to accept a new connection will not help - it will always fail. Hence, it gets stuck in an infinite retry loop.

@jemc
Copy link
Contributor Author

jemc commented Dec 26, 2021

Interestingly, the naive code that causes this infinite loop was copied directly from Pony: https://github.com/ponylang/ponyc/blob/725fac75381c35da5803c83bd00fbd552261f099/packages/net/tcp_listener.pony#L209-L222

Our version of that same code is here: https://github.com/savi-lang/savi/blob/739b8abb2ab393218778066db775ed1124a74cfe/packages/src/TCP/Listener.savi#L78-L85

So I'd be surprised if the Pony server didn't have a similar bug/flaw in it. Unless there is some other way of addressing this case in the Pony server that I'm not noticing.

@jemc jemc transferred this issue from savi-lang/savi Mar 5, 2022
@jemc
Copy link
Contributor Author

jemc commented Mar 10, 2022

I convinced myself that Pony has the same bug, so I opened ponylang/ponyc#4052 to discuss with the rest of the Pony team and decide whether this fix should be handled within the runtime C function (pony_os_accept) or in the Savi/Pony code that calls it.

@jemc
Copy link
Contributor Author

jemc commented Feb 17, 2023

One thing I noticed while refactoring the current code, which may explain how we get to the "too many open files" condition in the first place:

I think the code that uses io_deferred_action to let the listener know when its accepted child has been closed is not having the desired effect of reaching the TCP.Listen.Engine to clean up the accepted child socket, because as currently written the IO.Action.ClosedChild will only reach the top level IO.Actor - the engine(s) inside won't be able to react to it.

I likely need to come up with a different design for deferred actions that lets the inner engine(s) react to the deferred action from the inside out, like the normal reaction process. There are two ways to look at going about this:

  • 🅰️ Use pony_asio_event_send to send a "true" event which gets reacted to in the normal way (but how do we attach the specific IO.Action? do we even need to attach an action? or can we repurpose some existing field as the action code?).
  • 🅱️ Separate the method for an engine "categorizing" an event as an IO.Action from the code that reacts to it, so that each engine gets a chance to categorize first in the normal flow, and in the deferred flow we skip the categorization step.

It's also possible these two ideas could/should be used in conjunction with one another to achieve the right design. Or perhaps there is another way of doing it.

Those are just my current thoughts about the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant