Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer #19424

Closed
wants to merge 3 commits into from

Conversation

MBaesken
Copy link
Member

@MBaesken MBaesken commented May 28, 2024

When running on macOS with ubsan enabled, we see some issues in relocInfo (hpp and cpp); those already occur in the build quite early.

/jdk/src/hotspot/share/code/relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer

Similar happens when we add to the _current pointer
_current++;
this gives :
relocInfo.hpp:606:13: runtime error: applying non-zero offset to non-null pointer 0xfffffffffffffffe produced null pointer

Seems the pointer subtraction/addition worked so far, so it might be an option to disable ubsan for those 2 functions.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Warning

 ⚠️ Found leading lowercase letter in issue title for 8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer

Issue

  • JDK-8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer (Bug - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/19424/head:pull/19424
$ git checkout pull/19424

Update a local copy of the PR:
$ git checkout pull/19424
$ git pull https://git.openjdk.org/jdk.git pull/19424/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 19424

View PR using the GUI difftool:
$ git pr show -t 19424

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/19424.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented May 28, 2024

👋 Welcome back mbaesken! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented May 28, 2024

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot changed the title JDK-8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer 8331731: ubsan: relocInfo.cpp:155:30: runtime error: applying non-zero offset 18446744073709551614 to null pointer May 28, 2024
@openjdk openjdk bot added the rfr Pull request is ready for review label May 28, 2024
@openjdk
Copy link

openjdk bot commented May 28, 2024

@MBaesken The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label May 28, 2024
@mlbridge
Copy link

mlbridge bot commented May 28, 2024

Webrevs

Copy link
Contributor

@TheRealMDoerr TheRealMDoerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An idea to fix it is to avoid pointer arithmetic. E.g.

diff --git a/src/hotspot/share/code/relocInfo.cpp b/src/hotspot/share/code/relocInfo.cpp
index d0f732edac4..b6e1517aefd 100644
--- a/src/hotspot/share/code/relocInfo.cpp
+++ b/src/hotspot/share/code/relocInfo.cpp
@@ -152,7 +152,7 @@ RelocIterator::RelocIterator(CodeSection* cs, address begin, address limit) {
   initialize_misc();
   assert(((cs->locs_start() != nullptr) && (cs->locs_end() != nullptr)) ||
          ((cs->locs_start() == nullptr) && (cs->locs_end() == nullptr)), "valid start and end pointer");
-  _current = cs->locs_start()-1;
+  _current = (relocInfo*)((uintptr_t)cs->locs_start() - sizeof(relocInfo));
   _end     = cs->locs_end();
   _addr    = cs->start();
   _code    = nullptr; // Not cb->blob();
diff --git a/src/hotspot/share/code/relocInfo.hpp b/src/hotspot/share/code/relocInfo.hpp
index 6d0907d97de..1774c8ac62a 100644
--- a/src/hotspot/share/code/relocInfo.hpp
+++ b/src/hotspot/share/code/relocInfo.hpp
@@ -603,7 +603,7 @@ class RelocIterator : public StackObj {
 
   // get next reloc info, return !eos
   bool next() {
-    _current++;
+    _current = (relocInfo*)((uintptr_t)_current + sizeof(relocInfo));
     assert(_current <= _end, "must not overrun relocInfo");
     if (_current == _end) {
       set_has_current(false);

Doesn't look very nice, but should work.

@MBaesken
Copy link
Member Author

Doesn't look very nice, but should work.

I agree, it does not look very nice. Not sure what is better, disabling ubsan for the methods or use the code you suggested.
Or maybe add some helper template/macro for pointer additions that covers those cases and handles nullptr nicely ?

@vnkozlov
Copy link
Contributor

Doesn't look very nice, but should work.

I agree, it does not look very nice. Not sure what is better, disabling ubsan for the methods or use the code you suggested. Or maybe add some helper template/macro for pointer additions that covers those cases and handles nullptr nicely ?

I prefer @TheRealMDoerr suggestion vs disabling ubsan check for this code. It should be compiled to the same assembler.
I would only add comment to explain why we don't do simple pointer arithmetic here.

cs->locs_start() == nullptr is common case and I don't want to complicate code with additional nullptr checks.

@dean-long
Copy link
Member

We could consider using a sentinel value instead of nullptr, but it would require more changes.

@MBaesken
Copy link
Member Author

What you think about using some helper templates or macros like this, doing what Martin suggested ?

// helper templates to avoid undefined addition/subtraction from nullptr
template<typename T>
T* add_to_ptr(T* ptr, int val) {
  return (T*)((uintptr_t)ptr + val * sizeof(T));
}

template<typename T>
T* sub_from_ptr(T* ptr, int val) {
  return (T*)((uintptr_t)ptr - val * sizeof(T));
}

@TheRealMDoerr
Copy link
Contributor

val needs an unsigned type to avoid undefined behavior because of signed integer overflow. I'd use uintptr_t.

@MBaesken
Copy link
Member Author

val needs an unsigned type to avoid undefined behavior because of signed integer overflow. I'd use uintptr_t.

Makes sense to use something unsigned.
Any good place(s) where to put those templates? For now I would just simply put them into relocInfo.hpp (we can used them if we need to reuse them somewhere else) .

@vnkozlov
Copy link
Contributor

val needs an unsigned type to avoid undefined behavior because of signed integer overflow. I'd use uintptr_t.

Makes sense to use something unsigned. Any good place(s) where to put those templates? For now I would just simply put them into relocInfo.hpp (we can used them if we need to reuse them somewhere else) .

I would suggest utilities/globalDefinitions.hpp somewhere near pointer_delta*()

@stefank
Copy link
Member

stefank commented May 30, 2024

val needs an unsigned type to avoid undefined behavior because of signed integer overflow. I'd use uintptr_t.

Makes sense to use something unsigned. Any good place(s) where to put those templates? For now I would just simply put them into relocInfo.hpp (we can used them if we need to reuse them somewhere else) .

I would suggest utilities/globalDefinitions.hpp somewhere near pointer_delta*()

I'm not fully convinced that this is good idea.

While reading this patch, it is not clear to me that it is correct to hide the warning that ubsan has found. Maybe it is, but I don't see any explanation here showing why it is OK to subtract or add against null here.

This is one reason why I'm reluctant to see these functions getting put into globalDefinitions.hpp. I think that there's a risk that people will start to use these functions without making a full analysis to see if there really is a bug that needs to be solved, if there are some code quality improvements that could be done to get rid of the null pointer, or if it is in fact something that we just want to silence the warning for.

If you really want to go ahead and add these functions, I would like to see them get more descriptive names that explain why they are used instead of plain ++ and --. For example: add/sub_to_ptr_maybe_null.

@TheRealMDoerr
Copy link
Contributor

We're not hiding a warning. Using pointer addition with nullptr or result nullptr is undefined behavior. So, the current implementation is not guaranteed to do what we expect. It only works because compilers seem to be merciful.
However, casting nullptr to uintptr_t is guaranteed to be 0. So, switching to unsigned integer arithmetics avoids this problem. The RelocIterator code is designed to work with such values.
Nevertheless, I like your proposal to call them add/sub_to_ptr_maybe_null.

@xmas92
Copy link
Member

xmas92 commented May 30, 2024

This seems to be the same as JDK-8300821. (Changeset 01312a0)

The miss here seems to be that has_loc does not mean "This CodeSection has relocatations". But means "This CodeSection has allocated a relocations buffer". I believe the correct check would be cs->locs_count() == 0

--- a/src/hotspot/share/asm/codeBuffer.cpp
+++ b/src/hotspot/share/asm/codeBuffer.cpp
@@ -525,7 +525,7 @@ void CodeBuffer::finalize_oop_references(const methodHandle& mh) {
   for (int n = (int) SECT_FIRST; n < (int) SECT_LIMIT; n++) {
     // pull code out of each section
     CodeSection* cs = code_section(n);
-    if (cs->is_empty() || !cs->has_locs()) continue;  // skip trivial section
+    if (cs->is_empty() || cs->locs_count() == 0) continue;  // skip trivial section
     RelocIterator iter(cs);
     while (iter.next()) {
       if (iter.type() == relocInfo::metadata_type) {
@@ -793,7 +793,7 @@ void CodeBuffer::relocate_code_to(CodeBuffer* dest) const {
   for (int n = (int) SECT_FIRST; n < (int)SECT_LIMIT; n++) {
     // pull code out of each section
     const CodeSection* cs = code_section(n);
-    if (cs->is_empty() || !cs->has_locs()) continue;  // skip trivial section
+    if (cs->is_empty() || cs->locs_count() == 0) continue;  // skip trivial section
     CodeSection* dest_cs = dest->code_section(n);
     { // Repair the pc relative information in the code after the move
       RelocIterator iter(dest_cs);
@@ -1057,7 +1057,7 @@ void CodeSection::print(const char* name) {
                 name, p2i(start()), p2i(end()), p2i(limit()), size(), capacity());
   tty->print_cr(" %7s.locs = " PTR_FORMAT " : " PTR_FORMAT " : " PTR_FORMAT " (%d of %d) point=%d",
                 name, p2i(locs_start()), p2i(locs_end()), p2i(locs_limit()), locs_size, locs_capacity(), locs_point_off());
-  if (PrintRelocations) {
+  if (PrintRelocations && locs_size != 0) {
     RelocIterator iter(this);
     iter.print();
   }

There is also the following which perhaps should assert that there is a relocation at the call pc. As it makes some assumptions about being able to patch its type. (I am not familiar with this code.)

RelocIterator iter(cs, (address)_pc_start, (address)(_pc_start + 1));

As for the RelocIterator::RelocIterator(nmethod* nm, address begin, address limit) iterator it is less clear to me if it should be guarded agains nullptr from outside.

So something would still have to be done about the ubsan.

Much of this seems to be about optimising the RelocIterator. But it does not seem worth either the cpu cycles to execute the constructor (nor the cognitive brain cycles to reason about why the constructor is valid when current == -sizeof(relocInfo*)), when we can easily check from the callsite that there are no relocations in our nmethod or code section.

Unsure if solving this by type casting is just hiding underlying design issues.

@stefank
Copy link
Member

stefank commented May 30, 2024

We're not hiding a warning. Using pointer addition with nullptr or result nullptr is undefined behavior. So, the current implementation is not guaranteed to do what we expect. It only works because compilers seem to be merciful.

I think you are reading too much into the words I used. Ubsan is warning/complaining/error that the code is problematic.

However, casting nullptr to uintptr_t is guaranteed to be 0. So, switching to unsigned integer arithmetics avoids this problem.

I fully understand the suggested patch.

The RelocIterator code is designed to work with such values.

This is the crux of my complaint. Is this a good design given that the usage of null isn't apparent on first read? Or was this only something that grew into existence after a while? And would it make more sense for maintainability to not be doing this? I think it is important to think about those questions when trying to handle all these ubsan failures.

FWIW, I (and Axel) took a step back and asked why do we think it is necessary to support null pointers here and I think you can see the musing around in Axel's response above.

@stefank
Copy link
Member

stefank commented May 30, 2024

Oh, and as it doesn't seem to have been clear from my earlier comments: I don't strongly oppose that you fix it this way you do in the RelocIterator, since I have very little interaction with that code.

The comment was more that I would prefer if we take a case-by-case approach when we look at other parts of HotSpot with similar problems and really think what the correct solution would be, and that we don't too quickly start to grab for the add/sub_to_ptr solution. Putting these functions in globalDefinitions makes it all too easy to just grab for these functions when we try to solve similar problems, IMHO. That's my 2c. I'm not blocking this patch, as long as we get somewhat decent names.

@vnkozlov
Copy link
Contributor

RelocIterator is used in a lot of places and not all are guarded by has_locs(). The code assumes that RelocIterator::next() will return false if no relocations are present.

We have to use pre-increment in next() with check after it because in following code there are current() accessing _current. I don't want to touch this code.

I really don't want to add nullptr check into this hot code which may affect performance. That is why I agreed with latest changes. Based on this discussion I am fine to keep them locally in relocInfo.hpp with more descriptive names. We can also add comment explaining why we need them.

@vnkozlov
Copy link
Contributor

The miss here seems to be that has_loc does not mean "This CodeSection has relocatations". But means "This CodeSection has allocated a relocations buffer". I believe the correct check would be cs->locs_count() == 0

This suggestion seems correct because we may allocate relocation buffer in section which does not have relocations codeBuffer.cpp#L169.

But this is different issue for different RFE.

@xmas92
Copy link
Member

xmas92 commented May 30, 2024

My stance is the same as @stefank that I do not oppose this change to fix the immediate issue.

Looking closer at how the RelocIterator is created from a nmethod it would never end up with a nullptr - 1. Because relocation_begin(), which is used to initialize _current, would never produce a nullptr. So there is no issue with the other constructor. So plugging the three holes above would remove the ub. (Along with introducing the invariant that you are not allowed to construct from a CodeSection with no relocations).

But this is different issue for different RFE.

It may be a different RFE, but it is the same issue (unless I am misunderstanding you are referring to). The !has_loc() was specifically introduced to solve this exact ub bug. However it was the wrong property to check. Reading #12854 gives me this impression as well. (Given that the logic around has_loc does not seem to have changed since 8153779 )

@vnkozlov
Copy link
Contributor

vnkozlov commented May 30, 2024

So you want to patch the path which introduces nullptr. And in addition to your suggested fix we need to adjust assert:

 RelocIterator::RelocIterator(CodeSection* cs, address begin, address limit) {
   initialize_misc();
-  assert(((cs->locs_start() != nullptr) && (cs->locs_end() != nullptr)) ||
-         ((cs->locs_start() == nullptr) && (cs->locs_end() == nullptr)), "valid start and end pointer");
+  assert(((cs->locs_start() != nullptr) && (cs->locs_end() != nullptr)), "valid start and end pointer");
   _current = cs->locs_start()-1;

This seems reasonable.

@dean-long
Copy link
Member

I believe using has_locs() is fine in relocate_code_to(). We just need to call it against dest_cs, which the iterator is using.

@MBaesken
Copy link
Member Author

I renamed the templates to sub / add_to_ptr_maybe_null .

Maybe other changes could be done in a separate RFE .

@stefank
Copy link
Member

stefank commented May 31, 2024

I renamed the templates to sub / add_to_ptr_maybe_null .

Maybe other changes could be done in a separate RFE .

The other changes would render this RFE unnecessary because _current would then never contain a nullptr.

@TheRealMDoerr
Copy link
Contributor

I guess Matthias only wanted to fix UB in hotspot ASAP and doesn't have the bandwidth to change the design everywhere. Sounds like you guys already have an alternative solution which already works. Maybe you would like to put it into a PR and we continue the discussion there?
Nevertheless, having sub / add_to_ptr_maybe_null available in hotspot may be a good thing. There are some places where we really use additions with nullptr (e.g. index_oop_from_field_offset_long in unsafe.cpp).

@MBaesken
Copy link
Member Author

MBaesken commented May 31, 2024

I guess Matthias only wanted to fix UB in hotspot ASAP and doesn't have the bandwidth to change the design everywhere.

Yes .
The first goal to make the '--enable-ubsan' configure flag useful; currently we have the configure flag but still fail already in the OpenJDK build (because of a number of ubsan related issues in HS).

@stefank
Copy link
Member

stefank commented May 31, 2024

I guess Matthias only wanted to fix UB in hotspot ASAP and doesn't have the bandwidth to change the design everywhere.

The proposal solves the UB but (IMHO) adds a wart to the code instead of taking a small step back and fixing the root cause of the UB. This then leaves the wart for other maintainers fix with their own bandwidth.

Sounds like you guys already have an alternative solution which already works. Maybe you would like to put it into a PR and we continue the discussion there?

I would prefer if we did the right fix here in this PR.

Nevertheless, having sub / add_to_ptr_maybe_null available in hotspot may be a good thing. There are some places where we really use additions with nullptr (e.g. index_oop_from_field_offset_long in unsafe.cpp).

My previous arguments have been that I don't think it is a good thing, so our opinion here differs.

How many places are there that really should be doing add/sub with nullptr? Are most places just like this PR, where using add_to_ptr_maybe_null is a convenient way to get away from the UB, but the "real" fix would be to remove the null pointer? Given that we have different opinions about this, could we at least wait with adding add_to_ptr_maybe_null until we have assessed the other instances of this UB issue? Do you have a list of other places that have this issue?

@TheRealMDoerr
Copy link
Contributor

TheRealMDoerr commented May 31, 2024

I would prefer if we did the right fix here in this PR.

Well, you could ask Matthias. It's his PR. However, your new proposal doesn't have any code in common with the existing PR. So, what's the benefit of using this one?

Do you have a list of other places that have this issue?

No. They typically come up after fixing other ones.

@vnkozlov
Copy link
Contributor

@MBaesken can close this PR and re-assign this bug to me if he don't have time to do proposed changes to code.

@dean-long
Copy link
Member

I agree with @xmas92, and I propose a slight change to his fix:

@@ -792,9 +792,8 @@ void CodeBuffer::relocate_code_to(CodeBuffer* dest) const {
   // section, so that section has to be copied before relocating.
   for (int n = (int) SECT_FIRST; n < (int)SECT_LIMIT; n++) {
     // pull code out of each section
-    const CodeSection* cs = code_section(n);
-    if (cs->is_empty() || !cs->has_locs()) continue;  // skip trivial section
     CodeSection* dest_cs = dest->code_section(n);
+    if (dest_cs->is_empty() || dest_cs->locs_count() == 0) continue;  // skip trivial section
     { // Repair the pc relative information in the code after the move
       RelocIterator iter(dest_cs);
       while (iter.next()) {

@vnkozlov
Copy link
Contributor

I agree with @xmas92, and I propose a slight change to his fix:

It does not matter. Few lines above we copied relocation info to new buffer and there is assert in initialize_locs_from():

assert(this->locs_count() == source_cs->locs_count(), "sanity");

@MBaesken
Copy link
Member Author

MBaesken commented Jun 1, 2024

@MBaesken can close this PR and re-assign this bug to me if he don't have time to do proposed changes to code.

Sounds good to me.

@MBaesken MBaesken closed this Jun 1, 2024
@TheRealMDoerr
Copy link
Contributor

Thank you, Vladimir! I appreciate it.

@vnkozlov
Copy link
Contributor

vnkozlov commented Jun 3, 2024

New PR: #19525

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

6 participants