-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
socket loop thread abort_on_exception (was: Thread Safety) #4
Comments
I found why it wasn't thread safe. The first call to a server will start the underlying socket thread, responsible to listen on a socket. However, due to concurrency, two socket threads were actually started. Solved by adding a mutex on thread creation. |
i believe there are more thread safety issues. unfortunately it's not so simple to reproduce them... if you look into the current implementation of def ensure_open
@mutex.synchronize do
return if @socket_thread
main_thread = Thread.current
@socket = new_socket
@socket_thread = Thread.new do
begin
socket_loop
rescue EOFError
# socket closed
init_vars
main_thread.raise SocketError.new 'Socket was remotely closed'
rescue Exception => e
# error on socket
init_vars
main_thread.raise e
end
end
end
end this method creates a new watch-dog-thread to check if the connection is alive. when something fails an error is raised on the main thread. as far as i can see this can result in the following scenario:
in this scenario the main thread, or say the application will exit with an unhandled timeout error. i would argue that |
Yes, this is expected behavior. It is up to the main application to isolate errors if needed, but NetHttp2 will raise them when encountered. |
@ostinelli well, isolation is the actual problem here. how would you go about isolation? processor B has no knowledge of any watch-dog-thread and just dies when the background thread raises an error. |
This is in the nature of bi-directional async sockets though. How would you go about it? BTW main usage of this gem is https://github.com/ostinelli/apnotic, example usage can be seen there. |
yeah, and my comment is referring to this issue. it might be more reasonable to use set abort_on_exception via a configuration option per thread or even just use report_on_exception. |
i've replied in there with how Apnotic has been designed and the solution works. If your concern is related to NetHttp2 then we can talk about it.
It is not enough to just make it configurable. If an error happens in the socket thread and it is not raised in the main one, then the library needs to automatically act upon the error. This raises another series of questions discussed elsewhere (i.e. here), for instance what do we do with in-the-midst operations (should they be retried? but what if the server actually got them and they are non-idempotent operations? etc). |
i understand that the issue is not trivial. all i want is a way to make sure that the background thread is not allowed to crash the application randomly. if you can think of another way... |
I understand your concern, that's why I've reopened this issue. If you have suggestions please let me know. |
just came across this blog post https://jvns.ca/blog/2015/11/27/why-rubys-timeout-is-dangerous-and-thread-dot-raise-is-terrifying/ |
Thank you @phoet, I'm pretty familiar with this. The idea though is how to enable async operations. I've come to the conclusion that the best option might be to refactor to have an |
I think something like the client = NetHttp2::Client.new("http://nghttp2.org")
client.on(:error) do |exception|
# Do something
end Is there a huge amount of refactoring involved here? I don't see any other callback architecture on the |
Something like this, perhaps? https://github.com/adamcooke/net-http2/commit/bf3738a204fbe42031084d20c593d385cffdf3ad |
i'd do a minor release and not do a backwards compatible change. the code could issue a warning when no callback is given. imo thread raise should be gone |
@phoet this is a 0.x version. I won't care about backwards compatibility, that's what 0.x stands for. If I can easily make it, I will. @adamcooke will look at your suggestion asap. Thank you. |
@adamcooke I saw your suggestion, couple of issues there:
Any suggestions welcome. |
You don't know the request with the current situation either, though, do you? It's entirely possible that no request caused the error too. If the socket disconnects (my actual case) when no request is in progress. |
Indeed. Will have something hopefully by tomorrow or next week. |
@adamcooke and @phoet can you try out the branch https://github.com/ostinelli/net-http2/tree/error_callback and see if that covers your needs? |
This looks good to me and solves my issue (although I will need to a way to access the |
No worries I will add this callback on Apnotic too. But if you can try it out (even with a monkey patch for now) to see if this is ok for you before I merge it in, it would be great. |
All looked good in development. I'm now running this live and will report back in a few hours 👍 |
All running fine for me now. I'd say it was good to merge. |
Merged and published |
Hello guys. I think I am being bit by this issue too. My current setup is apnotic 1.0.1 and net_http2 0.14.1
and my sidekiq process comes crashing down, all threads at a time. Your discussion seems to be something of a similar issue. I would appreciate it if you can confirm that this is fixed by upgrading to net-http2 0.15.0 version or if I have to add some extra code (error callback?). You mention adding the "Apnotic part of it" too. |
Hello @augustosamame, you are commenting on a resolved issue asking if it would resolve your issue. I'd recommend that you follow the readme instructions and report if the latest version does not solve what you are experiencing. |
@augustosamame i think you can simply upgrade, the additional errorhandling in the callback is optional. on top of that there is not much that you can do to recover from such errors except for maybe logging them. |
It appears that NetHttp2 might not be thread-safe.
To try it out, run this concurrent test multiple times, you will probably encounter the error
HTTP2::Error::HandshakeError
raised inside of the dummy server.Full failure looks like:
Investigation is needed, to see if this is an issue on the dummy server (who cares then) or on the client.
The text was updated successfully, but these errors were encountered: