-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8325095: C2: bailout message broken: ResourceArea allocated string used after free #17710
Conversation
👋 Welcome back epeter! A progress list of the required criteria for merging this PR into |
Hi, I don't agree with the way this patch works and I think we're bound for trouble. Why not store an instance of this class inside of the task instead? It takes a bit more space (136 bytes) but it actually handles its own lifetime. class CompilationFailure {
stringStream failure_reason;
CompilationFailure() : failure_reason() {}
void replace_failure(const char* reason) {
failure_reason.reset();
failure_reason.print_raw(reason);
}
const stringStream& reason() const {
return failure_reason;
}
} |
Why? Essencially, your solution would just be a better version of what we have with Your suggestion would require that there is only a single failure_reason around at any given time. But I'm not sure if there is really only a single reason around, at least they are set at different places, and can be replaced... That is why I was looking for something that can keep the lifetime of multiple strings. I'm not saying I'm proud of the draft-solution yet 🤣 These strings are stored at the level of the compilation, ciEnv (outlives compilation), and CompileTask (outlives ciEnv). So I need to allocate the string from somewhere that persists when I format the string, and then pass it into Maybe I can refactor it all and make sure there is only a single reason at any time. Not sure if that is feasible or how much work that would be. One issue is that failure_reasons, and especially dynamic failure_reasons are rare, and not really tested. So it is hard to see if I would break things. |
I don't like it because you're leaking memory over the lifetime of the
How would multiple strings be accessed through one const pointer :-)?
Yes, the lifetime is managed correctly and automatically by the RAII object. Plus, the bool can be removed. One issue is that you will probably have to add a bool |
Well, there are multiple pointers 🙈 I know that the strings are passed between What about having a new |
You don't need a new string class, you can use |
@jdksjolen I think I prefer a new string-class, that also allows for null and possibly static strings (no allocation). Otherwise, if I use Thanks for chiming in here, Johan :) |
Depending on the usage you'll just have to use |
@jdksjolen I tried it with such a string class. And then I tried to use it for
Any suggestions with that? Update: I think it is actually not used by the servicability agent, and I can just delete that reference. |
@@ -405,7 +405,7 @@ void CompilationMemoryStatistic::on_end_compilation() { | |||
if (env) { | |||
const char* const failure_reason = env->failure_reason(); | |||
if (failure_reason != nullptr) { | |||
result = (failure_reason == failure_reason_memlimit()) ? "oom" : "err"; | |||
result = (strcmp(failure_reason, failure_reason_memlimit()) == 0) ? "oom" : "err"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Now that ciEnv
makes a local copy, we cannot just compare pointers, we need to compare the content.
Thanks @jdksjolen for your suggestions! I now found a solution I'm fine with, and that seems to work. |
Webrevs
|
Back when I filed JDK-8132354, I proposed that both C1 and C2 delegate to ciEnv. Now there is a 3rd layer/scope of error reporting. I don't think the solution needs to be so complicated. Even with the new formatted strings, the maximum possible length can be determined (I don't see it using klass or method names). So why can't the storage simply be |
@@ -1265,7 +1268,7 @@ void ciEnv::record_method_not_compilable(const char* reason, bool all_tiers) { | |||
_compilable = new_compilable; | |||
|
|||
// Reset failure reason; this one is more important. | |||
_failure_reason = nullptr; | |||
_failure_reason.clear(); | |||
record_failure(reason); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dean-long the tricky part is here, I think:
We are in ciEnv::record_method_not_compilable
. Depending on above condition, we either update the failure reason, or don't. So there may already be an old failure reason (the first one) that we overwrite (with a second one) or leave.
This method is called from Compile::record_method_not_compilable
. There, we also call Compile::record_failure
, which sets Compile::_failure_reason
, with no such condition.
We might for example decide to keep the old failure reason at ciEnv, but update the failure reason at Compile, at least that is how I read the code.
Hence, there can be multiple failure reasons around, I think. And I cannot fit multiple messages in a single buffer. Thus, I need some way to dynamically allocate.
Now: is this good or desirable? I don't know. Potentially having different failure messages at Compile and ciEnv - not sure if that is reasonable. We should probably discuss that. But for now I'm just trying to fix the code the way it works.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think initially we care only about "most" important failure because implementation allowed to report only one failure.
If you can keep all failures AND report all of them it would be nice, I think. May be in separate RFE.
We should have only one place where we keep failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reporting all failures might make sense for logging, but we can only report one reason, unless we want to do something like chained exceptions, which I don't think we need. The different layers is already looking like a custom exception handling mechanism, unfortunately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Piling on... I opt for keeping this code simple. A list of subsequent failure states could easily be part of logging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I am voting for UL to print all failures we hit during compilation and keep reporting only one reason at the end.
log_error(compilation) (failiure_reason);
Or log_warning()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vnkozlov @dean-long @tstuefe
Ok, I mean logging of additional failure reasons might even be done in a separate RFE, right?
But don't we already log the failures? I mean there is the log()->elem(
code.
And should I still keep the "most important" failure reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought after our offline discussion that you will keep failure reasons in CompileTask.
What happened to that approach?
@vnkozlov yes, that was the plan. But then @jdksjolen objected, see his comments above. The new approach would be a bit more "clean", as the ownership is now clear. Any yes, there should probably be some refactoring. I'm not sure why there are Would it be ok to push this BugFix (for backport), and then come up with a refactoring RFE? We probably also want to refactor the strange |
// Make sure it is written before the pointer is used again | ||
OrderAccess::storestore(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this internal string being exposed to other threads ??? That seems dangerous in general given this is a stackobj.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dholmes-ora I don't think it is exposed to other threads. It only goes as far up as CompileTask, so I think it always stays in the compile thread. I saw OrderAccess::storestore()
in the stringStream
code. But maybe I don't need it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OrderAccess
is only needed for exchanging data across threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dholmes-ora ok, thanks for the comment! I removed it now.
@@ -1265,7 +1268,7 @@ void ciEnv::record_method_not_compilable(const char* reason, bool all_tiers) { | |||
_compilable = new_compilable; | |||
|
|||
// Reset failure reason; this one is more important. | |||
_failure_reason = nullptr; | |||
_failure_reason.clear(); | |||
record_failure(reason); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Piling on... I opt for keeping this code simple. A list of subsequent failure states could easily be part of logging.
CHeapStringHolder(MEMFLAGS memflags) : | ||
_memflags(memflags), _string(nullptr) {} | ||
~CHeapStringHolder() { clear(); }; | ||
NONCOPYABLE(CHeapStringHolder); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd opt for a use-case-specific class inside compiler. It would allow you to hard-code mtCompiler, simplifying the code, and to log with UL into the compiler log when the failure reason is changed.
A new generic string class opens a whole can of bikeshedding worms, e.g. whether to make it mutable, how best to specify MEMFLAGS etc... I would avoid that, especially since the added value fo this helper class is slim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this should be in compiler/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I can do that. Just wanted it to be shared code in case someone needs it in the future. But maybe that is too hard to do, due to bikeshedding... :/
Not sure I want to UL here, since I hold the string in multiple places, that would lead to multiple loggings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few comments.
failure_reason = ci_env.failure_reason(); | ||
// Duplicate the failure reason string, so that it outlives ciEnv | ||
failure_reason = os::strdup(ci_env.failure_reason(), mtCompiler); | ||
bool reason_on_C_heap = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean: failure_reason_on_C_heap = true;
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Fixing it.
Ha, copied this from above lines:
failure_reason = os::strdup(err_msg("Error attaching to libjvmci (err: %d, %s)",
env.init_error(), msg == nullptr ? "unknown" : msg), mtJVMCI);
bool reason_on_C_heap = true;
// In case of JNI_ENOMEM, there's a good chance a subsequent attempt to create libjvmci or attach to it
// might succeed. Other errors most likely indicate a non-recoverable error in the JVMCI runtime.
bool retryable = env.init_error() == JNI_ENOMEM;
compile_state.set_failure(retryable, failure_reason, reason_on_C_heap);
I guess here the variable is local, and compile_state.set_failure(
"eats" up the reference (i.e. becomes the owner). But the reference to the string is also held by failure_reason
. This is so nasty 😅 :
src/hotspot/share/opto/compile.hpp
Outdated
@@ -363,7 +364,7 @@ class Compile : public Phase { | |||
ciEnv* _env; // CI interface | |||
DirectiveSet* _directive; // Compiler directive | |||
CompileLog* _log; // from CompilerThread | |||
const char* _failure_reason; // for record_failure/failing pattern | |||
CHeapStringHolder _failure_reason; // for record_failure/failing pattern |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong comment's indent
CHeapStringHolder(MEMFLAGS memflags) : | ||
_memflags(memflags), _string(nullptr) {} | ||
~CHeapStringHolder() { clear(); }; | ||
NONCOPYABLE(CHeapStringHolder); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this should be in compiler/
@@ -1265,7 +1268,7 @@ void ciEnv::record_method_not_compilable(const char* reason, bool all_tiers) { | |||
_compilable = new_compilable; | |||
|
|||
// Reset failure reason; this one is more important. | |||
_failure_reason = nullptr; | |||
_failure_reason.clear(); | |||
record_failure(reason); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I am voting for UL to print all failures we hit during compilation and keep reporting only one reason at the end.
log_error(compilation) (failiure_reason);
Or log_warning()
UL should be done in separate RFE. Current logging ( Different failures are recorded in ciEnv and Compiler, only after compilation is finish we record compiler's failure in ciEnv. |
@vnkozlov @tstuefe thanks for the comments and suggestions! What I did:
What I did not yet do: About Logging, in a future RFE: |
I suggest to leave the normal report code as it is now (one failure). And if we need additional information we will use UL when we implement it. Most ciEnv failures are recorded ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This version looks good to me.
@eme64 This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 77 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
@dean-long @tstuefe @dholmes-ora can anybody give me a second review, please? |
*/ | ||
|
||
#include "precompiled.hpp" | ||
#include "runtime/orderAccess.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need this include
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your are right! Removed it.
Thanks @vnkozlov @dlunde @tstuefe @dholmes-ora for reviewing / your suggestions! /integrate |
Going to push as commit c589555.
Your commit was automatically rebased without conflicts. |
Problem
Bailout
failure_reason
strings used to be eithernullptr
or a static string. But with JDK-8303951 I had introduced dynamic strings (e.g. formatted usingstringStream::as_string()
). These dynamic strings were ResourceArea allocated, and have a very limited life-span, i.e. withing the most recent ResourceMark scope. The pointer is passed toCompile::_failure_reason
, which already might leave that ResourceMark scope, and the pointer is invalid. And then the pointer is further passed tociEnv::_failure_reason
. And finally, it even gets passed up toCompileBroker::invoke_compiler_on_method
. When the string is finally printed for the bailout message, we already left originalResourceMark
scope, and also theCompile
andciEnv
scopes. The pointer now points to basically random data.Solution
Whenever a string is passed to an outer scope, I make a CHeap copy, and the outer scope is the owner of that copy.
This way every scope is the owner of its own copy, and can allocate and deallocate according to its own strategy safely.
I introduced a utility class
CHeapStringHolder
:set
: make local copy on CHeap.clear
: free local copy. We beforeset
, and in the destructor. Thus, when the holder goes out of scope, the memory is automatically freed.We have these 4 scopes:
ResourceMark
: It allocates fromResourcArea
, and deallocation is taken care at the end of the ResourceMark scope.Compile
: We turn the_failure_reason
from achar*
into aCHeapStringHolder
. We set_failure_reason
while theResourceMark
is live.ciEnv
: We turn the_failure_reason
from achar*
into aCHeapStringHolder
. We set_failure_reason
whileCompile
is live.CompileTask
: We used to just setfailure_reason
, which assumes that the string is static or elsewhere managed. Now I use pattern available in other places ofCompileBroker::invoke_compiler_on_method
: duplicate the string withos::strdup
from some other scope, and setreason_on_C_heap = true
, which means that we are the owner of that copy, and are responsible for freeing it later. Not sure if that is a nice pattern, but I don't want to refactor that code and it does what I need it to do.Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/17710/head:pull/17710
$ git checkout pull/17710
Update a local copy of the PR:
$ git checkout pull/17710
$ git pull https://git.openjdk.org/jdk.git pull/17710/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 17710
View PR using the GUI difftool:
$ git pr show -t 17710
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/17710.diff
Webrev
Link to Webrev Comment