Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve how we acquire ThreadContext to eliminate null refs. #5940

Merged
merged 2 commits into from Oct 27, 2019

Conversation

@headius
Copy link
Member

headius commented Oct 23, 2019

For #5936 we saw some cases where the SoftReference created here
came back as null, even after the "adopt" logic had run. While I
have not isolated an exact cause, there are thread locals, soft
references, and GC interacting here. Since the original logic did
not guarantee non-null references throughout the code, I've
modified it to always check and only proceed once it truly has
non-null references and eventually context in hand.

@headius headius added this to the JRuby 9.2.9.0 milestone Oct 23, 2019
@headius headius requested review from kares and enebo Oct 23, 2019
@headius

This comment has been minimized.

Copy link
Member Author

headius commented Oct 23, 2019

@enebo @kares Take a quick peek at this. Critical functionality, I just wanted more eyes on it before shipping.

@kares

This comment has been minimized.

Copy link
Member

kares commented Oct 23, 2019

previous version ended up recursing instead of always reaching a loop - otherwise seems the same?
not sure what changed here - we never ended up with null on production apps for a long time.
really lack an understanding what precisely is fixed here ...

@enebo
enebo approved these changes Oct 23, 2019
Copy link
Member

enebo left a comment

The actual scenario where you adopt and it returns null is pretty mysterious to me (very low memory?). If it does that once it feels like it could happen endlessly (and if so then perhaps we will get an issue showing just that). None the less this definitely gives it more of a chance to eventually adopt a TC.

@headius

This comment has been minimized.

Copy link
Member Author

headius commented Oct 23, 2019

@kares In the later comments of #5936 both @ahorek and I saw NPE when attempting to get the ThreadContext out of the SoftReference we acquired from the ThreadLocal. Somehow, that SoftReference was coming out as null even after calling adoptCurrentThread. I could not suss out how that would be possible, exactly, but there's a complicated dance of references and threads and GC in play.

The new logic basically just adds in extra null checking, in case the stars align to allow such to happen, and avoids recursing to re-attempt the acquisition of the context. It eliminates the NPE we saw.

@headius

This comment has been minimized.

Copy link
Member Author

headius commented Oct 23, 2019

@enebo Yeah I couldn't trace a path that would result in getting a null SoftReference after adoption, but I could not exactly rule it out.

There's two situations where that ThreadLocal might contain null:

  • The current thread has never been adopted
  • The current thread was adopted but the reference went away (soft ref, memory pressure trigger)

In both cases the current thread should go into the logic that assigns a new SoftReference before proceeding.

@kares
kares approved these changes Oct 24, 2019
Copy link
Member

kares left a comment

minor improvement: do { ... } while since we always execute the loop at least once
that is also the way this method used to be before (the loop elimination).

@headius

This comment has been minimized.

Copy link
Member Author

headius commented Oct 27, 2019

@kares I will make that change, thank you.

I want to understand if there was a reason you preferred recursion...something I'm missing in my version?

My only concern with the recursion was that under some heavy GC scenario it may recurse many times, failing to acquire a context. It is not likely to happen or to cause stack issues, since we're talking soft references instead of weak references, but it nagged at the back of my brain a bit. The change was not altogether related to this patch, but it made sense to me.

I don't like the overall complexity of the adoption logic, so I want to be absolutely sure I understand your version of this method.

headius added 2 commits Oct 23, 2019
For #5936 we saw some cases where the SoftReference created here
came back as null, even after the "adopt" logic had run. While I
have not isolated an exact cause, there are thread locals, soft
references, and GC interacting here. Since the original logic did
not guarantee non-null references throughout the code, I've
modified it to always check and only proceed once it truly has
non-null references and eventually context in hand.
@headius headius force-pushed the headius:improved_threadcontext_acquisition branch from 8d37390 to b529298 Oct 27, 2019
@headius

This comment has been minimized.

Copy link
Member Author

headius commented Oct 27, 2019

I've made the change and rebased to pick up fixes from master.

@kares

This comment has been minimized.

Copy link
Member

kares commented Oct 27, 2019

I want to understand if there was a reason you preferred recursion...something I'm missing in my version?

think I did a micro-benchmark where it showed up that always entering a loop was 'slower'.
was looking for that benchmark but did not find (not sure what I actually did - whether trying to stress this from a known getCurrentContext .rb site or simply a JMH).

anyway, its more important to have this working (even in edge cases) rather than 'optimized' to the ms which won't matter much in the end.

@headius headius merged commit 76ceb12 into jruby:master Oct 27, 2019
5 checks passed
5 checks passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
jruby.jruby Build #20191027.3 succeeded
Details
jruby.jruby (Job linux) Job linux succeeded
Details
jruby.jruby (Job mac) Job mac succeeded
Details
jruby.jruby (Job windows) Job windows succeeded
Details
@headius headius deleted the headius:improved_threadcontext_acquisition branch Oct 27, 2019
@headius

This comment has been minimized.

Copy link
Member Author

headius commented Oct 27, 2019

@kares That seems plausible, but I think it shouldn't make much difference these days. It boils down to a branch versus a recursive invocation, and in both cases the rare path that fails to acquire a context should end up being profiled out. If you do see a difference, though, we can revisit it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.