Skip to content

8281213: Unsafe uses of long and size_t in MemReporterBase::diff_in_current_scale #11514

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from

Conversation

afshin-zafari
Copy link
Contributor

@afshin-zafari afshin-zafari commented Dec 5, 2022

Description

MemReporterBase::diff_in_current_scale is defined as follows:

inline long diff_in_current_scale(size_t s1, size_t s2) const {
long amount = (long)(s1 - s2);
long scale = (long)_scale;
amount = (amount > 0) ? (amount + scale / 2) : (amount - scale / 2);
return amount / scale;
}

Long and size_t can have different sizes: 4 bytes and 8 bytes (LLP64). The result of 's1 - s2' might not fit into long. It might not fit into int64_t. For example: s1 is SIZE_MAX and s2 is SIZE_MAX-MAX_INT64-1.

Size_t should be used instead of long. Assertions must be added to check:
s1 >= s2 and (amount - scale/2) >= 0 and (amount + scale/2) <= SIZE_MAX.

Patch

long is replaced by size_t. Comparison to 0 is implemented accordingly since size_t is always >= 0.
Since s1 can be less than s2 in some invocations of this method, no assert is written for (s1 >= s2) case.

Test

local: runtime/NMT/Jcmd*
mach5: tier1


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8281213: Unsafe uses of long and size_t in MemReporterBase::diff_in_current_scale

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/11514/head:pull/11514
$ git checkout pull/11514

Update a local copy of the PR:
$ git checkout pull/11514
$ git pull https://git.openjdk.org/jdk pull/11514/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 11514

View PR using the GUI difftool:
$ git pr show -t 11514

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/11514.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 5, 2022

👋 Welcome back afshin-zafari! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 5, 2022
@openjdk
Copy link

openjdk bot commented Dec 5, 2022

@afshin-zafari The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Dec 5, 2022
@mlbridge
Copy link

mlbridge bot commented Dec 5, 2022

@tstuefe
Copy link
Member

tstuefe commented Dec 7, 2022

I would return ssize_t instead.

For values >SSIZE_MAX and <SSIZE_MIN I would assert in debug (because we should never see such high numbers) and cap in release builds.

And of course, the print format has to be adapted to use ssize_t format

@tstuefe
Copy link
Member

tstuefe commented Dec 7, 2022

I would return ssize_t instead.

For values >SSIZE_MAX and <SSIZE_MIN I would assert in debug (because we should never see such high numbers) and cap in release builds.

And of course, the print format has to be adapted to use ssize_t format

Correcting myself:

I would return int64_t (signed 64-bit).

On 32-bit platforms, where we could conceivably surpass SSIZE_MAX and SSIZE_MIN, that is large enough to hold positive and negative deltas.

On 64-bit, int64_t is the same as ssize_t. There, as I proposed, I would consider any delta > SSIZE_MIN or SSIZE_MAX to be an error. Because that indicates a negative overflow in a malloc counter.

I would actually consider any input value > 1000 TB an error as well, certainly any input > SSIZE_MAX.

@afshin-zafari
Copy link
Contributor Author

int64_t is typedef'ed as long. To be portable to Windows, it should be as __int64.
What to do?

@tstuefe
Copy link
Member

tstuefe commented Dec 8, 2022

int64_t is typedef'ed as long. To be portable to Windows, it should be as __int64. What to do?

Its not. Hotspot defines its own variants. int/uint64 are always 64-bit, on all platforms.

@afshin-zafari
Copy link
Contributor Author

I would return ssize_t instead.
For values >SSIZE_MAX and <SSIZE_MIN I would assert in debug (because we should never see such high numbers) and cap in release builds.
And of course, the print format has to be adapted to use ssize_t format

Correcting myself:

I would return int64_t (signed 64-bit).

On 32-bit platforms, where we could conceivably surpass SSIZE_MAX and SSIZE_MIN, that is large enough to hold positive and negative deltas.

On 64-bit, int64_t is the same as ssize_t. There, as I proposed, I would consider any delta > SSIZE_MIN or SSIZE_MAX to be an error. Because that indicates a negative overflow in a malloc counter.

I would actually consider any input value > 1000 TB an error as well, certainly any input > SSIZE_MAX.

So, I will do these: the return value is of type int64_t; Then, there is no need to change the printing format for the returned value in places where this function is called. Correct?
What other changes need to be done?

@tstuefe
Copy link
Member

tstuefe commented Dec 9, 2022

So, I will do these: the return value is of type int64_t; Then, there is no need to change the printing format for the returned value in places where this function is called. Correct? What other changes need to be done?

See this PR: #11568 - solves the same issue as yours, but for the counters. The only difference to your patch is that with counters, I can use ssize_t, since I know that on 32-bit these counters cannot overflow SSIZE_MAX (2g, -2g). But with memory sizes this can happen (think: a 32-bit VM mallocing 2.1GB, then freeing it again), therefore we need to use int64_t.

You need to change the print formats too. Currently, the printout uses %+ld, which on Windows and on 32-bit platforms in general would print with 32-bit. You need to use INT64_FORMAT, but since the printout wants to have a leading '+' for positive numbers, you need something like INT64_PLUS_FORMAT.

INT64_PLUS_FORMAT does not exist yet, you need to add it. Just do this:

--- a/src/hotspot/share/utilities/globalDefinitions.hpp
+++ b/src/hotspot/share/utilities/globalDefinitions.hpp
@@ -120,6 +120,7 @@ class oopDesc;
 
 // Format 64-bit quantities.
 #define INT64_FORMAT             "%"          PRId64
+#define INT64_PLUS_FORMAT        "%+"         PRId64
 #define INT64_FORMAT_X           "0x%"        PRIx64
 #define INT64_FORMAT_X_0         "0x%016"     PRIx64
 #define INT64_FORMAT_W(width)    "%"   #width PRId64

Cheers, Thomas

@afshin-zafari
Copy link
Contributor Author

Thanks Thomas. I implemented the changes as you guided.
Tests:
runtime/NMT/* , tier1-5 passed.

size_t amount = s1 - s2;
assert(amount <= SIZE_MAX - _scale / 2, "size_t overflow");
amount = (amount + _scale / 2) / _scale;
// We assume the valid range for deltas [INT64_MIN, INT64_MAX] to simplify the code.
Copy link
Member

@eastig eastig Dec 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the assert the comment should be:

We assume the valid range for deltas [-INT64_MAX, INT64_MAX] to simplify the code.

I excluded the INT64_MIN case to simplify the code.
The INT64_MIN case is when sizeof(size_t) == sizeof(int64_t) && amount == INT64_MAX + 1 && is_negative == true.
For this case -(int64_t)amount is UB which needs proper handling.
The code instead of

return (is_negative) ? -(int64_t)amount : (int64_t)amount;

would be more complex:

if (!is_negative) {
  return (int64_t)amount;
} else {
  return (amount != INT64_MAX+1) ? -(int64_t)amount : INT64_MIN;
}

This is why the assert part is sizeof(size_t) == sizeof(int64_t) && amount <= INT64_MAX)

Maybe the better comment is

// We assume the valid range for deltas [-INT64_MAX, INT64_MAX].
// We excluded the `INT64_MIN` case to simplify the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't follow your idea.
The INT64_MAX + 1 results in compiler overflow error. Isn't it equal to INT64_MIN?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not know there was a defect report #1313 fixed in C++11.
Now the standard prohibits to use constant expressions like INT64_MAX + 1 in code.

I couldn't follow your idea.

The idea is to show that if you want to handle -(2^63), you need more code. Your comment is not correct. The code does not handle INT64_MIN.

If we limit cases to [-(2^63 - 1), 2^63 - 1] the code would be simpler. IMHO one byte does not worth to increase the code complexity. In most cases -(2^63) would be a result of some wrong calculations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the replacement for INT64_MAX + 1 in our code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, if we want to support [INT64_MIN, INT64_MAX] here is the solution:

inline int64_t diff_in_current_scale(size_t s1, size_t s2) const {
  // _scale should not be 0, otherwise division by zero at return.
  assert(_scale != 0, "wrong scale");

  bool is_negative = false;
  if (s1 < s2) {
    is_negative = true;
    swap(s1, s2);
  }

   size_t amount = s1 - s2;
   assert(amount <= SIZE_MAX - _scale / 2, "size_t overflow");
   amount = (amount + _scale / 2) / _scale;
   // We assume the valid range for deltas [INT64_MIN, INT64_MAX].
   assert(sizeof(size_t) <= sizeof(int64_t) && amount - (int)is_negative <= INT64_MAX), "cannot fit scaled diff into int64_t");
   if (is_negative) {
     return (sizeof(size_t) == sizeof(int64_t) && amount - 1 == INT64_MAX)
                 ? INT64_MIN : -(int64_t)amount;
   } else {
     return (int64_t)amount;
   }
}

If sizeof(size_t) < sizeof(int64_t), a compiler will optimise (sizeof(size_t) == sizeof(int64_t) && amount - 1 == INT64_MAX) ? INT64_MIN : -(int64_t)amount into -(int64_t)amount.

If sizeof(size_t) == sizeof(int64_t), a compiler will optimise (sizeof(size_t) == sizeof(int64_t) && amount - 1 == INT64_MAX) ? INT64_MIN : -(int64_t)amount into (amount - 1 == INT64_MAX) ? INT64_MIN : -(int64_t)amount.

The C++ standard guarantees (int)is_negative to be either 0 or 1. So the check amount - (int)is_negative <= INT64_MAX will be amount <= INT64_MAX for positive diffs and amount - 1 <= INT64_MAX for negative diffs.

assert(amount <= SIZE_MAX - _scale / 2, "size_t overflow");
amount = (amount + _scale / 2) / _scale;
// We assume the valid range for deltas [INT64_MIN, INT64_MAX] to simplify the code.
assert((sizeof(size_t) < sizeof(int64_t)) || (sizeof(size_t) == sizeof(int64_t) && amount <= INT64_MAX), "cannot fit scaled diff into size_t");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sizeof comparisons are pointless or even wrong. The only thing we care
about here is that the scaled value <= INT64_MAX.
If sizeof(size_t) < sizeof(int64_t) then amount <= INT64_MAX is always true,
so there's no need to do the size comparison. Otherwise, we don't care if
size_t is bigger than int64_t, we only care about the value range for the
result. So just

assert(amount <= INT64_MAX, "overflow");

is sufficient.

amount = (amount + _scale / 2) / _scale;
// We assume the valid range for deltas [INT64_MIN, INT64_MAX] to simplify the code.
assert((sizeof(size_t) < sizeof(int64_t)) || (sizeof(size_t) == sizeof(int64_t) && amount <= INT64_MAX), "cannot fit scaled diff into size_t");
return (is_negative) ? -(int64_t)amount : (int64_t)amount;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer C++-style casts, so

int64_t result = static_cast<int64_t>(amount);
return is_negative ? -result : result;

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afshin-zafari, thanks a lot for taking my suggestion about using in64_t. Some minor nits remain, mainly the diff function can be simplified.

Thanks, Thomas

Copy link
Member

@eastig eastig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just minor suggestions.

Copy link

@kimbarrett kimbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one tiny nit remaining. Otherwise, looks good.

@openjdk
Copy link

openjdk bot commented Jan 16, 2023

@afshin-zafari This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8281213: Unsafe uses of long and size_t in MemReporterBase::diff_in_current_scale

Reviewed-by: eastigeevich, kbarrett

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 334 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@dholmes-ora, @eastig, @kimbarrett, @tstuefe) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jan 16, 2023
@afshin-zafari
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Jan 20, 2023
@openjdk
Copy link

openjdk bot commented Jan 20, 2023

@afshin-zafari
Your change (at version a1f9930) is now ready to be sponsored by a Committer.

@eastig
Copy link
Member

eastig commented Jan 20, 2023

/sponsor

@openjdk
Copy link

openjdk bot commented Jan 20, 2023

Going to push as commit 26410c1.
Since your change was applied there have been 334 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jan 20, 2023
@openjdk openjdk bot closed this Jan 20, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Jan 20, 2023
@openjdk
Copy link

openjdk bot commented Jan 20, 2023

@eastig @afshin-zafari Pushed as commit 26410c1.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

5 participants