-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base #2595
Conversation
👋 Welcome back stuefe! A progress list of the required criteria for merging this PR into |
/label s390x-port |
@tstuefe The label
|
Webrevs
|
bind(ok3); | ||
} | ||
#endif | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not like the cross-dependency to metaspace.hpp just for the sake of checking an artificial restriction on Klass pointers. And by the way, you could do the check with one test:
z_oihf(current, 0);
z_brc(Assembler::bcondZero, ok);
z_oihf() does modify the contents of register current, but it writes back the same value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not like the cross-dependency to metaspace.hpp just for the sake of checking an artificial restriction on Klass pointers.
It is not just for the assertion, it is for limiting the 32bit add to situations where we know Klass pointers cannot exceed 32bit. That was the main reason. As I wrote, I was not sure about the assertion myself and am happy to drop it.
And by the way, you could do the check with one test:
z_oihf(current, 0); z_brc(Assembler::bcondZero, ok);
z_oihf() does modify the contents of register current, but it writes back the same value.
Thank you. Unfortunately, information about z assembly was hard to come by. The only public information I found had hardly more than the instruction names, the rest was trial and error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I admit. To find System z information, you need to know the "magic keywords" to search for. In this case, it would be "Principles of Operation". The third or so Google hit would lead you to the System z architecture document. With 2000+ pages to read, you would be lost anyway. :-)
lgr_if_needed(dst, current); | ||
z_agfi(dst, -(int)base_l); | ||
z_afi(dst, -(int)base_l); // Note: 32bit add | ||
} else { | ||
load_const(Z_R0, base); | ||
lgr_if_needed(dst, current); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would you think of a more general rework like this? The comments in the code should explain the intentions/assumptions/conclusions.
// Klass oop manipulations if compressed.
void MacroAssembler::encode_klass_not_null(Register dst, Register src) {
Register current = (src != noreg) ? src : dst; // Klass is in dst if no src provided. (dst == src) also possible.
address base = CompressedKlassPointers::base();
int shift = CompressedKlassPointers::shift();
bool need_zero_extend = false;
assert(UseCompressedClassPointers, "only for compressed klass ptrs");
BLOCK_COMMENT("cKlass encoder {");
#ifdef ASSERT
Label ok;
z_tmll(current, KlassAlignmentInBytes-1); // Check alignment.
z_brc(Assembler::bcondAllZero, ok);
// The plain disassembler does not recognize illtrap. It instead displays
// a 32-bit value. Issueing two illtraps assures the disassembler finds
// the proper beginning of the next instruction.
z_illtrap(0xee);
z_illtrap(0xee);
bind(ok);
#endif
// Scale down the incoming klass pointer first.
// We then can be sure we calculate an offset that fits into 32 bit.
// More generally speaking: all subsequent calculations are purely 32-bit.
if (shift != 0) {
assert (LogKlassAlignmentInBytes == shift, "decode alg wrong");
z_srlg(dst, current, shift);
need_zero_extend = true;
current = dst;
}
if (base != NULL) {
// Use scaled-down base address parts to match scaled-down klass pointer.
unsigned int base_h = ((unsigned long)base)>>(32+shift);
unsigned int base_l = (unsigned int)(((unsigned long)base)>>shift);
// General considerations:
// - when calculating (current_h - base_h), all digits must cancel (become 0).
// Otherwise, we would end up with a compressed klass pointer which doesn't
// fit into 32-bit.
// - Only bit#33 of the difference could potentially be non-zero. For that
// to happen, (current_l < base_l) must hold. In this case, the subtraction
// will create a borrow out of bit#32, nicely killing bit#33.
// - With the above, we only need to consider current_l and base_l to
// calculate the result.
// - Both values are treated as unsigned. The unsigned subtraction is
// replaced by adding (unsigned) the 2's complement of the subtrahend.
if (base_l == 0) {
// - By theory, the calculation to be performed here (current_h - base_h) MUST
// cancel all high-word bits. Otherwise, we would end up with an offset
// (i.e. compressed klass pointer) that does not fit into 32 bit.
// - current_l remains unchanged.
// - Therefore, we can replace all calculation with just a
// zero-extending load 32 to 64 bit.
// - Even that can be replaced with a conditional load if dst != current.
// (this is a local view. The shift step may have requested zero-extension).
} else {
// To begin with, we may need to copy and/or zero-extend the register operand.
// We have to calculate (current_l - base_l). Because there is no unsigend
// subtract instruction with immediate operand, we add the 2's complement of base_l.
if (need_zero_extend) {
z_llgfr(dst, current);
need_zero_extend = false;
} else {
llgfr_if_needed(dst, current); // zero-extension while copying comes at no extra cost.
}
current = dst;
z_alfi(dst, -(int)base_l);
}
if (need_zero_extend) {
// We must zero-extend the calculated result. It may have some leftover bits in
// the hi-word because we only did optimized calculations.
z_llgfr(dst, current);
} else {
llgfr_if_needed(dst, current); // zero-extension while copying comes at no extra cost.
}
BLOCK_COMMENT("} cKlass encoder");
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice and elegant.
But as said offlist, I dislike the fact that this hard codes the limitation to 32bit for the narrow klass pointer range.
That restriction is artificial and we may just want to drop it. E.g. one recurring idea I have is to drop the duality in metaspace between non-class- and class-metaspace, and just store everything in class space. That would save quite a bit of memory (less overhead) and make the metaspace coding quite a bit simpler. However, in that case it could be that we exceed the current 3g limit and may even exceed 32bit. Since add+shift for decoding is universally done on all platforms at least if CDS is on, this should work out of the box. Unless of course the platforms hard-code the 32bit limitation into their encoding schemes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how you want to overcome the 32-bit limit for compressed pointers. This whole "compression" thing is based on the "trick" to store an offset instead of the full address. Depending on the object alignment requirement, this affords you 32 GB (8-byte alignment) or 64 GB (16-byte alignment) of addressable (or should I say offset-able) space. That's quite a bit.
You use pointer compression to save space, and for nothing else. Space savings have to be so significant that they outweigh the added effort for encoding and decoding. With just some shift and add, the effort is limited, though noticeable. If you would make compressed pointers 40 bits wide (5 bytes), encoding and decoding would impose more effort. What's even worse, you then would have entities with a size not native to any processor. Just imagine you have to atomically store such a value.
I my opinion, wider compressed pointers will have to wait until we have 128-bit pointers.
Back to code:
In the code suggested above, you could make use of the Metaspace::class_space_end() function. If the class space end address, shifted right, fits into 32 bit, need_zero_extend may remain false. Your choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You misunderstand me. My point was not to make narrow pointers larger than 32bit, but use the full encodable range. The encodable range is 32g atm. But we artificially limit the range to 3G (CompressedClassSpaceSize is capped at that value).
I thought your proposal was based upon the assumption that the highest uncompressed offset into class space can be not larger than 4G. But looking at your proposal again, I see you moved the shift up before the add, so it should probably work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it was mutual misunderstanding. Good to have that resolved.
/contributor add @RealLucy |
@tstuefe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing. Looks correct, but I have one minor finding.
|
||
if (base != NULL) { | ||
// Use scaled-down base address parts to match scaled-down klass pointer. | ||
unsigned int base_h = ((unsigned long)base)>>(32+shift); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
base_h is unused, but referred to in the comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a comment crossing...
With my latest suggestion, base_h is now used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good to me now.
Including the additional optimisation is optional.
Thanks for debugging, finding and fixing!
// calculate the result. | ||
// - Both values are treated as unsigned. The unsigned subtraction is | ||
// replaced by adding (unsigned) the 2's complement of the subtrahend. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a further tiny optimisation you may want to include in the final version:
// If we happen to see (base_h == 0), we are sure there
// is no borrow from bit#33. No zero-extension is needed.
if (base_h == 0) {
need_zero_extend = false;
}
@tstuefe This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 160 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, I think we need to support encoding of Class Pointer 0x100000000 (and above) with e.g. base = 0x0C0000000 and shift = 0. need_zero_extend is false in this example which possibly leaves a 1 in the higher 32 bit. Lower 32 bit are correct in your current version, but some code may rely on zero extension to 64 bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, still.
Thanks @TheRealMDoerr and @RealLucy for advice and reviews! /integrate |
@tstuefe Since your change was applied there have been 173 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit fdd1093. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
If Compressed class pointer base has a non-zero value it may cause MacroAssembler::encode_klass_not_null() to encode a Klass pointer to a wrong narrow pointer.
This can be reproduced by starting the VM with
but CDS is not involved. It is only relevant insofar as this is the only way to get the following combination:
In MacroAssembler::encode_klass_not_null(), there is the following section:
We enter the condition at (A) if the narrow klass pointer base is non-zero but fits into 32bit. At (B), we want to substract the base from the Klass pointer; we do this by calculating the 32bit twos-complement of the base and add it with AGFI. AGFI adds a 32bit immediate to a 64bit register. In this case, it produces the wrong result if the base is >0x800_0000:
In the case of the crash, we have:
base: 8800_0000
klass pointer: 8804_1040
32bit two's complement of base: 7800_0000
added to the klass pointer: 1_0004_1040
So the result of the "substraction" is 1_0004_1040, it should be 4_1040, which would be the correct offset of the Klass* pointer within the ccs.
This bug has been dormant; was activated by JDK-8250989 which changed the way class space reservation happens at CDS dump time. It surfaced first as crash in a CDS-specific jtreg test (JDK-8261552).
================
Fix:
I changed the AGFI instruction to a pure 32bit add (AFI). That works as long as the Klass pointer also fits into 32bit. So I narrowed the condition at (A) to only fire if it can be ensured that both narrow base and Klass* pointers fit into 32bit.
I also added a runtime verification in that case that any Klass pointer passed down is indeed a 32bit pointer. However, I am not really sure this is useful, or that this is the best way to do this (using TMHH and TMHL). I was looking for something like TMH or TML to check whole 32bit words but could not find any.
Tests:
I manually tested that the crash disappears, which it does. I stepped through the encoding code and the values now look right.
I also did build a VM with the ability to override both class space start address and the narrow klass pointer base to exact values (see master...tstuefe:override-ccs-start-and-base).
I used this method to test various combinations:
(would this override-feature be useful? We could do better testing).
Thanks, Thomas
Progress
Issue
Reviewers
Contributors
<lucy@openjdk.org>
Download
$ git fetch https://git.openjdk.java.net/jdk pull/2595/head:pull/2595
$ git checkout pull/2595