-
Notifications
You must be signed in to change notification settings - Fork 6k
JDK-8282405: Make thread resource areas signal safe #7624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Welcome back stuefe! A progress list of the required criteria for merging this PR into |
ffa5d85
to
19b550a
Compare
Webrevs
|
Gentle ping. All tests are green in GHA and at SAP. This patch increases stability for the async profiler and in hs_err reports. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Thomas,
Seems a reasonable idea in principle, but I'm unsure of some details.
At the entrance of the hotspot signal handler (which everyone goes through, even in chain scenarios like with AsyncGetCallTrace)
That is not my understanding. For AGCT the signal handler is installed directly for, e.g. SIGPROF, and we don't intercept or chain it in the VM AFAIK. Also UserHandler is separate from JVM_HANDLE_XXX_SIGNAL.
Further comments below.
Thanks,
David
// When entering signal handling with a valid current Thread, we switch to the | ||
// Thread's secondary resource area. That lets us safely use resource area |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit: no need to indent paragraph continuation lines.
// memory during signal handling even if the signal left the primary area in | ||
// an inconsistent state. We then swap back to the primary area upon leaving | ||
// the signal handler. | ||
// Note that signal handlers can nest, and in theory we should do this on each |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what happens if you get a secondary signal whilst in the midst of trying to switch to the secondary resource area? Seems we need to ensure signals be deferred whilst this happens so that it is atomic with respect to subsequent signals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is necessary.
We already block asynchronous signals (see set_signal_handler()). We explicitly allow synchronous error signals though since blocking them would just mean the process terminates right away when receiving them while blocked.
Thinking this through more, I don't see actually a way how signals could destroy the secondary RA or interrupt the switchover:
- we don't allow asynchronous signals for the time of the signal handling
- we allow error signals but that would be a programming error that should be fixed. It is preventable. If I fault when accessing the secondary resource area or trying a switch over to the secondary resource area, I should code more defensively.
This is all assuming that for ACGT the same rules hold, and whoever uses ACGT sets up the signal handler with async signals blocked. But we could just block them ourselves inside ACGT if we think this is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay that sounds reasonable - we can't get nested non-error signals.
Hi David,
Thank you for taking the time to look at this!
Oh, it is? Thanks for the catch, seems my assumption was false. How embarrassing :) But should be easy enough to fix by guarding ACGT directly.
Thanks, Thomas |
The And AGCT is not always called from a signal handler e.g. we have a test: ./test/hotspot/jtreg/serviceability/AsyncGetCallTrace/ which seems to use it synchronously. Maybe only error reporting (known to use the RA) should actually do this? It isn't obvious to me that UserHandler, SR_handler, or even AGCT are affected by this potential problem. |
AGCT was affected by this. That was the original motivation for the patch. The patch was done by us downstream a while ago because we had sporadic crashes with corrupted RAs on the PPC port of the async profiler. The original variant of this patch did patch up AGCT directly, not the signal handler, and the crashes disappeared. So I would do this at least for AGCT too, and for error handling, and preferably for the regular "big" signal handler too. About UserHandler and SR_handler, I agree. Those seem simple enough. Although switching the resource area does not really cost much. We also unblock error signals at each signal handler entry, and this feels like the same precaution. Alternatively, if we think a signal handler should not use RA, I thought that we could add checks for this assumption. Temporarily forbidding RA access. But since that would incur cost on every RA allocation (you'd have to check every time) I would only do this for debug. Do you think this makes sense? But I would do this in a separate RFE, if at all. |
Out of curiosity, where does AGCT perform RA allocation? That seems like a bug ... but then AGCT is really a fingers-crossed mechanism anyway so not much point trying to make it too bullet-proof. A debug mechanism to disallow RA allocation might be interesting ... but might just show we have a lot more RA allocation than we thought. :) Up to you if you want to investigate that further. |
I agree that this would be a bug. I try to find out what the call stacks were. AGCT seems like a best effort mechanism, but async profiler got a lot more popular recently, so it would be good to make it safer. And the RA double buffering mechanism I propose is quite simple, and we need it anyway for error handling.
I will try, but no promises :) This feels like it could be a whack the mole game. |
Changes:
|
Test errors unrelated:
Tests at SAP went through. Will rerun them tonight with the new patch version. |
Hi Thomas, I have a remaining concern. It seemed to me that after we switch the RA we should always have a ResourceMark to ensure that the secondary RA is cleared by the time we switch back. Or else we should be able to check it is "empty" before switching back. But that lead to a second concern - which really relates to how RA allocation works in general. Consider this hypothetical scenario:
now suppose a signal hits before In short you can't know that RA allocation in the secondary area is really safe without knowing what code is going to do the allocation and how that interacts with existing ResourceMarks on the stack. If you add a ResourceMark when switching to the secondary RA then you may cause a "dangling pointer", and if you don't then you may cause a leak in the secondary RA. |
Hi David,
I don't understand. The signal handler, if it calls foo(), executes foo(), from start to finish, then returns. With RA1 as primary, RA2 as secondary area I see it like this:
In other words, entering the handler causes us to switch stacks, very similar to what one could do with I do agree though that when switching to the secondary RA we need a Mark. Otherwise subsequent invocations of a signal handler would just accumulate memory in the secondary RA. |
BTW, I was sure I had seen problems with AsyncGetCallTrace and ResourceMarks recently, so I looked and found: https://bugs.openjdk.java.net/browse/JDK-8265150 |
The signal handler doesn't call foo(), it just happens to call some code that also uses the data structure X. If X expands while on the alt-RA and doesn't contract again, then X will have a chunk that still refers to the alt-RA when we have switched back to the primary-RA. Ideally such things can never happen and the alt-RA is always unused by the time the signal handler returns, but without looking at the kind of RA usage that might occur from the signal handler, I can't say whether it will be clean this way or not. JDK-8265150 is a great example as it shows that allocating from the RA in a signal handler via AGCT is fundamentally broken. Even with the alt-RA do we still not have the ThreadCritical problem? I don't want the perfect to be the enemy of the "good enough" here, but do want to be sure the cost:benefit ratio is sufficiently small to make this worthwhile. |
Okay, I think I understand. Things like the former implementation of stringStream, which used an internal buffer allocated from RA, which could grow on demand. We switched stringStream to C-heap allocation for the very reason that this is broken by design. Pointers allocated from RA should not escape the boundary of the ResourceMark it was allocated in, because the ResourceMark could unwind the RA and deallocate the chunk the pointer points to. That also means that you should not hand a on-demand-growable structure up and down the stack unless you are very sure that you don't cross a RM boundary. That excludes using RA memory as back buffers in utility objects that are handed down the stack naturally, like outputStream. My argument is that this is a broken pattern, and it was one before my patch too. Switching the underlying RA itself is really not different from crossing a RM boundary. Look at it like this, if your allocation escapes an RM, it does not matter whether the Chunk it lives in is part of the secondary RA or part of the primary RA, it will get cleaned out and your allocation becomes invalid either way.
Definitely. Its interesting that such a shaky concept became the cornerstone of a popular product like the async profiler.
Hmm, I think it is. I know our sporadic crashes on PPC in the async profiler disappeared with the use of secondary RAs. |
Okay. I think RA's have become much maligned because they have been used incorrectly in conjunction with RM's. It should be okay to pass an RA allocated object around in a well-defined way, but we've become overzealous in the use of RM's at the immediate use-site of an RA object, making it impossible to actually establish well-defined scopes in which that can happen. But I digress. :) Applying one more band-aid to AGCT doesn't seem too harmful here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really not getting the nested entry story here - what scenarios do you envisage and how are they safe? If you can get nested entry then the counter itself can cause breakage as the switch between areas needs to be essentially an atomic operation and the counter prevents that.
// an inconsistent state. | ||
|
||
void Thread::switch_to_primary_resource_area() { | ||
_secondary_resource_area_switch_count --; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit: no spaces with unary operators (here and elsewhere)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this perhaps narrows but does not fix the problem.
What if we're in the Chunk allocator when a signal is received, and the signal
handler allocates enough that it needs another chunk? The Chunk allocator uses
ThreadCritical to protect it's critical regions, which does not make it
re-entrant either.
And if the Chunk allocator finds the associated ChunkPool free-list to be
empty, then we need to go to malloc, which has the same problem of not being
async-signal safe.
I'm also not thrilled by the idea of devoting 1K (or more, if we discover 1K
isn't enough for all the allocations we might need) for each thread for this
purpose. That seems like quite a bit of memory being (mostly) wasted.
So basically I'm not convinced the proposed design is right yet.
One thing I was wondering (because I don't know the answer) is whether there
is any bound on the number of in-flight signals that might need to be handled
at the same time by different threads.
Hi Kim, thanks for looking at this. Answers below.
There are some ideas I have to address your concerns:
By delaying the first chunk allocation, we prevent memory from being wasted if the secondary RA is not being used. All we would pay is the base cost for sizeof(ResourceArea) in Thread, 40 bytes. And that could be reduced too if we were to groom Arena a bit, but that's a different story.
That would mean we don't need to pull a central lock. We still use malloc() if someone uses RA inside signal handling - it is used on the first allocation, and if subsequent allocations expand the secondary RA. Side note, I am curious if ChunkPool is even needed nowadays. I often find that the libc keeps returned memory close by. Re-acquisition via malloc is fast and often handled thread-local without synchronization. Especially compared to ChunkPool, where we 1) always pull a central lock and 2) free in a different thread from where the memory was malloced, which may not be ideal. So maybe removing ChunkPool would be okay in general.
That reduces follow-up mallocs. We now only call malloc for the first dive down the stack.
That reduces the number of dangerous malloc calls to exactly one: the initial large chunk. We keep that in the secondary RA and never release it. The idea behind (4) is that we don't expect a ton of RA memory to be allocated in signal handling or in ACGT. We only expect small utility functions to be called, that don't need much space. A variant of (4) could be to place the one big chunk we allow the secondary RA to have on the thread stack instead of using malloc(). Basically, we would use ResourceArea as an interface for normal stack memory allocated in the frame above us where we switched to the secondary RA. Unfortunately, this would reduce its usefulness for error reporting, which has to work in low-stack scenarios. But whoever calls ACGT at least could make sure it gets called on a large enough stack.
The idea behind this is that we don't need this for every thread, we don't expect a ton of threads to use RA during signal handling. We may be able to just use this for a bunch of threads (the one (?) that does ACGT sampling, and maybe the one that crashes and writes a hs-err file) I admit (5) is not really fleshed out yet in my head. But even with (1)..(4) we could have a solution that is a lot more signal safe than what we do today. Cheers, Thomas |
I put this back into draft for further refinement. Atm I tend toward a solution as laid out in my comment #7624 (comment). Condensed: The secondary "safe" ResourceArea would be one that has a single chunk and cannot grow more chunks. That chunk would be either (a) allocated with raw ::malloc(), on demand only on first RA use, or (b) just use an array placed on the stack in the frame that entered ACGT. If I go with (b), that solution may be restricted to AGCT only since with signal handling, we cannot be sure how much stack we have left. If that sounds okay to you both, I go with this. I only want to spend time on this if we agree on the general direction. ..Thomas |
Mailing list message from David Holmes on hotspot-runtime-dev: Hi Thomas, On 9/03/2022 7:15 pm, Thomas Stuefe wrote:
Sorry but how can you constrain the new RA to only a single chunk? What And apologies but I'm about to disappear for a week on vacation. Cheers, |
The idea is that even though we don't know exactly who or what uses RA, we can assume that it will not need a ton of memory. That holds def. true for error handling, and probably for ACGT too. The culprits are usually small utility functions, tightly guarded by ResourceMarks. So we should be able to safely get by with a single 64k or 128k chunk. And if we still see problems, those can be solved by enlarging the chunk or placing tighter resource marks. It's a pragmatic way to defuse the problem. We'd go from "forbid RA under any circumstances" to covering ~95% of use cases.
No problem, have a good time!
|
In the context of signal handlers, we may allocate RA memory. That is not ideal but may happen. One example is error reporting - even if we are careful, some code down the stack may use RA. Another example is code running in the context of AsyncGetCallTrace. I'm sure there may be more examples.
The problem is that the signal may (rarely) leave the current thread's RA in an inconsistent state, especially if it got interrupted in the middle of a chunk turnover. Subsequent allocations from it inside the signal handler then would malfunction.
A simple solution would be double buffering. Let each thread have a second resource area, to be used only in signal handling. At the entrance of the hotspot signal handler (which everyone goes through, even in chain scenarios like with AsyncGetCallTrace) we would switch over to the secondary resource area, and switch back when leaving the hotspot signal handler.
Note that I proposed this on hs-runtime-dev [1] but I am actually not sure if the mailing lists work, since I did not see that mail delivered to subscribers. Therefore I went ahead and implemented a simple prototype.
The prototype keeps matters simple:
Tests:
[1] https://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2022-February/054126.html
Progress
Issue
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/7624/head:pull/7624
$ git checkout pull/7624
Update a local copy of the PR:
$ git checkout pull/7624
$ git pull https://git.openjdk.java.net/jdk pull/7624/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 7624
View PR using the GUI difftool:
$ git pr show -t 7624
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/7624.diff