Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8276563: Undefined Behaviour in class Assembler #6280

Closed
wants to merge 14 commits into from

Conversation

theRealAph
Copy link
Contributor

@theRealAph theRealAph commented Nov 5, 2021

The HotSpot code base contains a number of instances of Undefined Behavior, which can cause all manner of unpleasant surprises.
The UB to which this patch relates is in class Assembler, in which instances are pointers to (nonexistent) objects defined as, for example,

  typedef RegisterImpl *Register;
  const Register r10 = ((Register)10);

Registers have accessors, e.g.:

int RegisterImpl::encoding() const { return (intptr_t)this; }

This works by an accident of implementation: it is not legal C++.

The most obvious way to this UB bug is to make instances of Register point to something, and to use pointer subtraction to find the encoding: (simplified for clarity)

  extern RegisterImpl all_Registers[num_Registers];
  int RegisterImpl::encoding() const   { return this - all_Registers; }

After this patch there is slightly more work to be done when assembling code but it's merely the subtraction of a constant in encoding() and the difference in execution time is so small (and the startup variance so large) that I have been unable to measure it, even after averaging 100 runs. It does lead to an increase of about 1% in the size of the stripped libjvm.so, but I think that can be recovered by a subsequent patch.

An alternative way to implement this would be to make the encoding a byte-wide field in RegisterImpl and define encoding() this way:

int RegisterImpl::encoding() const { return _encoding; }

This would result in smaller code, but I suspect slower.

If this change is accepted, I propose that all instances of this pattern in HotSpot be treated similarly.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6280/head:pull/6280
$ git checkout pull/6280

Update a local copy of the PR:
$ git checkout pull/6280
$ git pull https://git.openjdk.java.net/jdk pull/6280/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 6280

View PR using the GUI difftool:
$ git pr show -t 6280

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6280.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 5, 2021

👋 Welcome back aph! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 5, 2021
@openjdk
Copy link

openjdk bot commented Nov 5, 2021

@theRealAph The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Nov 5, 2021
@mlbridge
Copy link

mlbridge bot commented Nov 5, 2021

@tstuefe
Copy link
Member

tstuefe commented Nov 6, 2021

Just an idea, but could you make the register number a constant template argument?

struct Register { virtual int encoding() const = 0; };

template <int regnum>
struct RegisterImpl : public Register {
        int encoding() const override { return regnum; }
};

static const RegisterImpl<0> r0;
static const RegisterImpl<1> r1;

Compiled with gcc using -O3 on x64 it seems to use the constants directly as long as both Register and RegisterImpl are fully visible: https://gist.github.com/tstuefe/cd98163226b6cbaef44ea75d97c732b1.

@theRealAph
Copy link
Contributor Author

Just an idea, but could you make the register number a constant template argument?

OK, I'll kick that around.

@JornVernee
Copy link
Member

JornVernee commented Nov 6, 2021

It's not clear to me why Register is implemented as a pointer in the first place, instead of a class with a single int or intptr_t field for the encoding. There is a comment about that in register.hpp, but it doesn't offer an explanation:

// The super class for platform specific registers. Instead of using value objects,
// registers are implemented as pointers. Subclassing is used so all registers can
// use the debugging suport below. No virtual functions are used for efficiency.
// They are canonicalized; i.e., registers are equal if their pointers are equal,
// and vice versa. A concrete implementation may just map the register onto 'this'.

I wonder if changing the implementation to define AbstractRegister as:

class AbstractRegister {
  int _value;
protected:
  int value() { return _value; }
};

have a class Register that extend from AbstractRegister, and remove all the typedef RegisterImpl* Register and similar, wouldn't be a viable solution as well? The implementations of encoding() could simply call value() after doing a validity check (they already do in most cases).

To save us from having to change -> to . everywhere, an overloaded operator-> could be added to each implementation class that just returns this (Godbolt).

@tstuefe
Copy link
Member

tstuefe commented Nov 6, 2021

Just an idea, but could you make the register number a constant template argument?

OK, I'll kick that around.

Hmm, I realize my proposal works well if a specific register is used, but not if it is hidden behind a generic Register* pointer. In the former case, the compiler inlines the constant into the code. In the latter case, it calls virtual RegisterImpl::encoding(), so you get one vtable access and one subroutine call.

@tstuefe
Copy link
Member

tstuefe commented Nov 6, 2021

It's not clear to me why Register is implemented as a pointer in the first place, instead of a class with a single int or intptr_t field for the encoding. There is a comment about that in register.hpp, but it doesn't offer an explanation:

I believe the point is to encode the register number in the this pointer. If it were a member, you'd have to dereference the pointer to access the member value, so one more load instruction.

@JornVernee
Copy link
Member

JornVernee commented Nov 6, 2021

It's not clear to me why Register is implemented as a pointer in the first place, instead of a class with a single int or intptr_t field for the encoding. There is a comment about that in register.hpp, but it doesn't offer an explanation:

I believe the point is to encode the register number in the this pointer. If it were a member, you'd have to dereference the pointer to access the member value, so one more load instruction.

Yes, if we would keep using pointers we'd get a dereference, but I'm saying: let's get rid of the pointers and use straight up values. No dereferences. AFAIU compilers would treat this the same as the current scheme.

Or, perhaps even better: I've seen compiler constant folding be perturbed in the past by integer <-> pointer casts (I guess it just bails out in some cases), and also, using an int would halve the footprint of arrays of registers on platforms where pointers are 64-bits.

@JornVernee
Copy link
Member

JornVernee commented Nov 6, 2021

In other words: in the current scheme we pass around an integer disguised as a pointer, and then have to cast it to an integer to use it. That seems silly. Let's just pass around the integer instead, wrapped in a value-object, which the compiler should treat the same as if it were just the integer.

@tstuefe
Copy link
Member

tstuefe commented Nov 7, 2021

Or, perhaps even better: I've seen compiler constant folding be perturbed in the past by integer <-> pointer casts (I guess it just bails out in some cases), and also, using an int would halve the footprint of arrays of registers on platforms where pointers are 64-bits.

Now I get it. I like your proposal, it's way simpler. I did a small test on x64:

https://gist.github.com/tstuefe/c97dff7624a1469a7295dda51ed9a265

As you say, compiler passes encoding value directly, using - in my example - just the 16-bit portion of RDI since I used a short. The only odd thing was that when using a constant object know at the use site (case B), I would have expected an immediate, but the constant encoding gets loaded from the text segment instead.

Wrt -> to ., I dislike operator overloading so I would just do the changes instead. But that's up to Andrew.

Cheers, Thomas

@merykitty
Copy link
Member

merykitty commented Nov 7, 2021

The only odd thing was that when using a constant object know at the use site (case B), I would have expected an immediate, but the constant encoding gets loaded from the text segment instead.

I believe to achieve constant folding here you should mark the constructor of the Register class as constexpr, and maybe the rxxx variable too just to make sure.

Actually, it seems that while Clang is able to fold the memory load, GCC and MSVC both emit a load instruction, which is another unfortunate circumstance.

Cheers.

@merykitty
Copy link
Member

Hi,

I have investigated Jorn proposal and found out that in order to achieve the desired performance we will need to rely on the compiler to inline the methods of the Register class themselves, and the failure to do so would lead to the caller materialising the object on the stack to retrieve the this pointer, as well as the callee needing to pop the object from the stack to process. (Godbolt).

Cheers.

@theRealAph
Copy link
Contributor Author

Just an idea, but could you make the register number a constant template argument?

OK, I'll kick that around.

Hmm, I realize my proposal works well if a specific register is used, but not if it is hidden behind a generic Register* pointer. In the former case, the compiler inlines the constant into the code. In the latter case, it calls virtual RegisterImpl::encoding(), so you get one vtable access and one subroutine call.

Ah. Oh well.

@theRealAph
Copy link
Contributor Author

theRealAph commented Nov 7, 2021

I believe the point is to encode the register number in the this pointer. If it were a member, you'd have to dereference the pointer to access the member value, so one more load instruction.

That's exactly the point. We have two alternatives, either a subtraction or a load, and it's hard to do much better than that. However, there is one idea, but it's rather hacky: 64-align the array of Registers then use an AND operation to get the encoding. The trouble with that is that it's not really portable, just portable-ish.

@theRealAph
Copy link
Contributor Author

In other words: in the current scheme we pass around an integer disguised as a pointer, and then have to cast it to an integer to use it. That seems silly. Let's just pass around the integer instead, wrapped in a value-object, which the compiler should treat the same as if it were just the integer.

That's worth exploring, but I'm reluctant to start overloading ->. I'll have a look.

@JornVernee
Copy link
Member

JornVernee commented Nov 7, 2021

The only odd thing was that when using a constant object know at the use site (case B), I would have expected an immediate, but the constant encoding gets loaded from the text segment instead.

Yeah... I found that the constructor has to be marked constexpr (and the global value at least const) to make it work with GCC (as @merykitty also found). The value also gets emitted into .rodata in that case. It looks like MSVC also has that requirement for constant folding.

@theRealAph
Copy link
Contributor Author

theRealAph commented Nov 8, 2021

In other words: in the current scheme we pass around an integer disguised as a pointer, and then have to cast it to an integer to use it. That seems silly. Let's just pass around the integer instead, wrapped in a value-object, which the compiler should treat the same as if it were just the integer.

That's worth exploring, but I'm reluctant to start overloading ->. I'll have a look.

So I had a good look, and I think this idea is not going to fly without substantial changes elsewhere. None of these problems is insurmountable, of course, but it does mean that a change like this spills into many more places in shared code.

Oh, the other problem is that variables of type Register may be assigned, which means that the field containing the encoding can no longer be const, and there are const problems elsewhere too.

@JornVernee
Copy link
Member

In other words: in the current scheme we pass around an integer disguised as a pointer, and then have to cast it to an integer to use it. That seems silly. Let's just pass around the integer instead, wrapped in a value-object, which the compiler should treat the same as if it were just the integer.

That's worth exploring, but I'm reluctant to start overloading ->. I'll have a look.

So I had a good look, and I think this idea is not going to fly without substantial changes elsewhere. The trickiest problem I've found so far is GrowableArray<Register>, which assumes that Register is a pointer type. None of these problems is insurmountable, of course, but it does mean that a change like this spills into many more places in shared code.

I ran into problems before where GrowableArray::print was assuming the values it held were pointer sized. Is that what you're talking about as well? I think a long term fix there might be to implement some kind of print_on template function with specializations for different types, which GrowableArray then uses instead of trying to cast each element to an intptr_t.

Oh, the other problem is that variables of type Register may be assigned, which means that the field containing the encoding can no longer be const, and there are const problems elsewhere too.

Okay, I'd assume not having the encoding be const would be fine, as long as Register is passed around by-value. i.e. only some code's private copy of a Register instance would be overwritten (same as if it was a pointer).


Any ways, thanks for trying this out! If the required changes are too much (making backporting harder as well) I think going with a more focused fix is the way to go, as well. Maybe we can come back for a broader refactoring another day.

@theRealAph
Copy link
Contributor Author

Okay, I'd assume not having the encoding be const would be fine, as long as Register is passed around by-value. i.e. only some code's private copy of a Register instance would be overwritten (same as if it was a pointer).

Any ways, thanks for trying this out! If the required changes are too much (making backporting harder as well) I think going with a more focused fix is the way to go, as well. Maybe we can come back for a broader refactoring another day.

Sorry, my last comment was a bit misleading. I should not have posted so soon. I have something that works now with a bunch of overloads and constexpr references, etc., but it's all a bit too much.

@tstuefe
Copy link
Member

tstuefe commented Nov 9, 2021

Okay, I'd assume not having the encoding be const would be fine, as long as Register is passed around by-value. i.e. only some code's private copy of a Register instance would be overwritten (same as if it was a pointer).
Any ways, thanks for trying this out! If the required changes are too much (making backporting harder as well) I think going with a more focused fix is the way to go, as well. Maybe we can come back for a broader refactoring another day.

Sorry, my last comment was a bit misleading. I should not have posted so soon. I have something that works now with a bunch of overloads and constexpr references, etc., but it's all a bit too much.

I thought some more about this, and so far like your first approach (using an array, and maybe not subtracting but ANDing) best.

Its a pity that piggybacking on "this" is UB since it is exactly what one wants here.

I know this is probably not what you want, but I just mention it: how about making Register a plain type (e.g. int)? We could replace calls to r->encoding and r->value() with just r. Then, to add functionality and to group it nicely, make todays Register subclasses utility wrappers which wrap around such an Register. You'd have to create those things on the fly when you need them, but at least the compiler would optimze them away.

I know that would mean a ton of changes. But it would be exactly what you want to express, would not be UB, would be portable and easy to understand.

@theRealAph
Copy link
Contributor Author

In other words: in the current scheme we pass around an integer disguised as a pointer, and then have to cast it to an integer to use it. That seems silly. Let's just pass around the integer instead, wrapped in a value-object, which the compiler should treat the same as if it were just the integer.

That's worth exploring, but I'm reluctant to start overloading ->. I'll have a look.

So I implemented this in a proof-of-concept way, and unfortunately GCC does a lot of forcing Register instances into memory, even though it's a value-only class with no virtual members.

libjvm size:

Before patch:          18667808
Using array in memory: 18864416 101.05%
Pass by value:         18930040 101.40%

This was definitely worthy of investigation, and I am grateful for the suggestion, but I don't think it's worth pursuing any further. The important thing is to get rid of the UB, and I intend to concentrate on cleaning up Plan A.

@theRealAph
Copy link
Contributor Author

I know this is probably not what you want, but I just mention it: how about making Register a plain type (e.g. int)? We could replace calls to r->encoding and r->value() with just r. Then, to add functionality and to group it nicely, make todays Register subclasses utility wrappers which wrap around such an Register. You'd have to create those things on the fly when you need them, but at least the compiler would optimze them away.

I know that would mean a ton of changes. But it would be exactly what you want to express, would not be UB, would be portable and easy to understand.

Oh totally, but it's a pretty extreme case of spec inflation. My goal right now is to make the UB go away without causing extra work for anyone else. And I have some other ideas that would improve efficiency, probably more beneficial than this, I want to get on with. (I hope - more later.)

Copy link
Member

@JornVernee JornVernee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I noticed this is only fixed on AArch64, but AFAICS we could have the same problem on x86.

src/hotspot/cpu/aarch64/register_aarch64.hpp Outdated Show resolved Hide resolved
src/hotspot/cpu/aarch64/register_aarch64.hpp Outdated Show resolved Hide resolved
src/hotspot/cpu/aarch64/register_aarch64.hpp Outdated Show resolved Hide resolved
src/hotspot/cpu/aarch64/register_aarch64.hpp Outdated Show resolved Hide resolved
@openjdk
Copy link

openjdk bot commented Nov 9, 2021

@theRealAph This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8276563: Undefined Behaviour in class Assembler

Reviewed-by: jvernee, stuefe

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the master branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now.

Cheers, Thomas

@theRealAph
Copy link
Contributor Author

This has been a big cleanup, prompted by porting x86 to this scheme.
There are fewer unnecessary changes, making this patch easier to review, and it's a bit more efficient too.

Enormous thanks to my very patient reviewers. I think this is done now.

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused that there were no x86 changes, or is that part of a future RFE?

Mostly nits and questions remain.

Cheers, Thomas

src/hotspot/cpu/aarch64/register_aarch64.hpp Outdated Show resolved Hide resolved
src/hotspot/cpu/aarch64/register_aarch64.hpp Show resolved Hide resolved
src/hotspot/cpu/aarch64/register_aarch64.hpp Show resolved Hide resolved
Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still good.

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 17, 2021

@theRealAph This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 14, 2022

@theRealAph This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

@bridgekeeper bridgekeeper bot closed this Jan 14, 2022
@theRealAph
Copy link
Contributor Author

/open

@openjdk openjdk bot reopened this Jan 14, 2022
@openjdk
Copy link

openjdk bot commented Jan 14, 2022

@theRealAph This pull request is now open

@theRealAph
Copy link
Contributor Author

/integrate

@openjdk
Copy link

openjdk bot commented Jan 18, 2022

Going to push as commit 64c0c0e.
Since your change was applied there have been 21 commits pushed to the master branch:

  • d175d33: 8280079: Serial: Remove empty Generation::prepare_for_verify
  • 1725f77: 8280029: G1: "Overflow during reference processing, can not continue" on x86_32
  • 645b38d: 8280089: compiler/c2/irTests/TestIRAbs.java fails on some arches
  • eb94995: 8280070: G1: Fix template parameters in G1SegmentedArraySegment
  • 9452262: 8278892: java.naming module description is missing @uses tags to document the services that it uses
  • 48c5f3c: 8280026: Cleanup of IGV printing
  • 39f140a: Merge
  • 4d9b3f4: 8279998: PPC64 debug builds fail with "untested: RangeCheckStub: predicate_failed_trap_id"
  • 09d61b6: 8280034: ProblemList jdk/jfr/api/consumer/recordingstream/TestOnEvent.java on linux-x64
  • c809d34: 8279924: [PPC64, s390] implement frame::is_interpreted_frame_valid checks
  • ... and 11 more: https://git.openjdk.java.net/jdk/compare/5d52bf9987445b3a6033d66e8644ed77c4d761bd...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jan 18, 2022
@openjdk openjdk bot closed this Jan 18, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jan 18, 2022
@openjdk
Copy link

openjdk bot commented Jan 18, 2022

@theRealAph Pushed as commit 64c0c0e.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
4 participants