New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolv.rb issues - threads hanging #747
Comments
Thanks for the detailed report, sorry to hear this impacted you (I had hoped it would be zero impact). I wonder if there could be fixes in newer ruby and resolv versions, since 2.6.0 is a bit on the older side. I suspect updating to a newer Ruby might be difficult, but newer versions of Resolv are also released as a gem. So one, hopefully easier, thing to try would be to install a newer, updated Resolv gem to replace the stdlib and see if you still have the same behavior. Would one or both of these be plausible with your setup? Beyond that I'm not sure. I'll keep an eye out for additional reports or ideas, but it's a bit hard to dig deeper on this kind of thing without a clear reproducible case. And with concurrent issues like this, getting a repro-case can be really challenging/problematic. Otherwise, happy to keep chatting and working on this to see if we can figure out what is wrong or if we need to revert or take a different track. |
We've seen similar hung threads, resulting in timeouts, in the same code:
We are running Ruby |
Ok, digging in to this, it looks like the timeout is coming from a mutex around access/updates to local hostfile. By default, it looks like Resolve uses this as the default resolvers: Maybe we can reverse the ordering to alleviate this? Are any of you in a position to add the following to setup and see if it alleviates the timeouts?
I think this and/or updating the version of resolv (using gem) are good next steps. |
Note, you could also omit the Hosts portion of that (so that it would always use nameserver based resolution). The nameserver based one still has a mutex, but it looks like the window for it is much smaller, so it seems unlikely you would run into the same issues. It's plausible that we should consider setting this for excon usage in general, but it would be good to know if it solves your problem or not first. |
I’m not sure this will fix it. I was digging around the resolv code and it looks like there is potential for either a deadlock or unbounded growth around the `RequestID` Hash.
…--
Steven Harman
On Apr 30, 2021, at 15:07, Wesley Beary ***@***.***> wrote:
Note, you could also omit the Hosts portion of that (so that it would always use nameserver based resolution). The nameserver based one still has a mutex, but it looks like the window for it is much smaller, so it seems unlikely you would run into the same issues.
It's plausible that we should consider setting this for excon usage in general, but it would be good to know if it solves your problem or not first.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Just saw this: https://bugs.ruby-lang.org/issues/17781 So confirmed, it is a bug in the Resolv library. |
Thanks for letting me know. Unless anyone on here needs the particular other changes that have happened in between, I'll hope the related fixes can be merged (rather that temporarily reverting this). |
The issue has been fixed upstream, but we're still waiting for a new version of the Resolv Gem to be cut. Once a version > Good luck, everyone! |
@stevenharman Thanks for closing the loop, I had also been monitoring the upstream issue but it unfortunately slipped my mind to note the change here for everyone. I'm looking forward to (hopefully) having this fix this issue for us so that we can keep rolling forward and take advantage of the other benefits we had originally hoped to gain. |
Since June 1 there’s a new version of the resolv gem: https://rubygems.org/gems/resolv/versions/0.2.1 |
@dentarg Thanks for the update. Hopefully that will get things working for those who had issues previously. |
I am not using the resolv gem, just the built in ruby version. I think when loading the resolv gem in my gemfile my isuse is fixed, but I expect the issue also to be fixed in pure ruby without this gem at some point right? |
@nathansamson Good question. I believe that any updates such as this will eventually be released in supported versions of Ruby, it just will take longer as those releases are less frequent than gem releases. |
This has now been back ported to Ruby 3.0.2, 2.7.4, and 2.6.8. Yay! ruby/ruby@v2_7_3...v2_7_4#diff-73ed31bea018f3a994e757faebca1dbf4edb3027ca4a1ecdd623abc955b8e2f6 |
@stevenharman awesome, thanks for the update! I'm going to go ahead and close this now as, to the best of my knowledge, these upstream fixes should remove the problem. |
We're noticing issues with our highly-concurrent sidekiq jobs after the bump to
Excon v0.80.1
this weekend. Threads are hanging in theresolv.rb
code. We're still investigating, but wanted to post notice here incase others experience similar issue. Below is an example of the stack trace associated with one of our "hung" threads... Any thoughts or suggestions on how/why this might be happening would be appreciated.Current workaround, downgrade back to
v0.79.0
.Thanks!
The text was updated successfully, but these errors were encountered: