Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base #2595

Closed

Conversation

tstuefe
Copy link
Member

@tstuefe tstuefe commented Feb 16, 2021

If Compressed class pointer base has a non-zero value it may cause MacroAssembler::encode_klass_not_null() to encode a Klass pointer to a wrong narrow pointer.

This can be reproduced by starting the VM with

-Xshare:dump -XX:HeapBaseMinAddress=2g -Xmx128m

but CDS is not involved. It is only relevant insofar as this is the only way to get the following combination:

  • heap is allocated at 0x800_0000. It is small and ends at 0x8800_0000.
  • class space follows at 0x8800_0000
  • the narrow klass pointer base points to the start of the class space at 0x8800_0000.

In MacroAssembler::encode_klass_not_null(), there is the following section:

  if (base != NULL) {
    unsigned int base_h = ((unsigned long)base)>>32;
    unsigned int base_l = (unsigned int)((unsigned long)base);
    if ((base_h != 0) && (base_l == 0) && VM_Version::has_HighWordInstr()) {
      lgr_if_needed(dst, current);
      z_aih(dst, -((int)base_h));     // Base has no set bits in lower half.
    } else if ((base_h == 0) && (base_l != 0)) {   (A)
      lgr_if_needed(dst, current);                
      z_agfi(dst, -(int)base_l);                   (B)
    } else {
      load_const(Z_R0, base);
      lgr_if_needed(dst, current);
      z_sgr(dst, Z_R0);
    }
    current = dst;
  }

We enter the condition at (A) if the narrow klass pointer base is non-zero but fits into 32bit. At (B), we want to substract the base from the Klass pointer; we do this by calculating the 32bit twos-complement of the base and add it with AGFI. AGFI adds a 32bit immediate to a 64bit register. In this case, it produces the wrong result if the base is >0x800_0000:

In the case of the crash, we have:
base: 8800_0000
klass pointer: 8804_1040
32bit two's complement of base: 7800_0000
added to the klass pointer: 1_0004_1040

So the result of the "substraction" is 1_0004_1040, it should be 4_1040, which would be the correct offset of the Klass* pointer within the ccs.

This bug has been dormant; was activated by JDK-8250989 which changed the way class space reservation happens at CDS dump time. It surfaced first as crash in a CDS-specific jtreg test (JDK-8261552).

================

Fix:

I changed the AGFI instruction to a pure 32bit add (AFI). That works as long as the Klass pointer also fits into 32bit. So I narrowed the condition at (A) to only fire if it can be ensured that both narrow base and Klass* pointers fit into 32bit.

I also added a runtime verification in that case that any Klass pointer passed down is indeed a 32bit pointer. However, I am not really sure this is useful, or that this is the best way to do this (using TMHH and TMHL). I was looking for something like TMH or TML to check whole 32bit words but could not find any.


Tests:

I manually tested that the crash disappears, which it does. I stepped through the encoding code and the values now look right.

I also did build a VM with the ability to override both class space start address and the narrow klass pointer base to exact values (see master...tstuefe:override-ccs-start-and-base).

I used this method to test various combinations:

  • narrow klass pointer base > 0 < 4g + ccs end < 4g (we hit our branch doing AFI)
  • narrow klass pointer base > 0 < 4g + ccs end > 4g (we hit the fallback doing SGR with r0)
  • narrow klass pointer base = 0 (we dont do anything)

(would this override-feature be useful? We could do better testing).

Thanks, Thomas


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base

Reviewers

Contributors

  • Lutz Schmidt <lucy@openjdk.org>

Download

$ git fetch https://git.openjdk.java.net/jdk pull/2595/head:pull/2595
$ git checkout pull/2595

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 16, 2021

👋 Welcome back stuefe! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Feb 16, 2021

@tstuefe The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Feb 16, 2021
@tstuefe
Copy link
Member Author

tstuefe commented Feb 16, 2021

/label s390x-port

@openjdk
Copy link

openjdk bot commented Feb 16, 2021

@tstuefe The label s390x-port is not a valid label. These labels are valid:

  • serviceability
  • hotspot
  • sound
  • hotspot-compiler
  • kulla
  • i18n
  • shenandoah
  • jdk
  • javadoc
  • 2d
  • security
  • swing
  • hotspot-runtime
  • jmx
  • build
  • nio
  • beans
  • core-libs
  • compiler
  • net
  • hotspot-gc
  • hotspot-jfr
  • awt

@tstuefe tstuefe marked this pull request as ready for review February 16, 2021 20:56
@openjdk openjdk bot added the rfr Pull request is ready for review label Feb 16, 2021
@mlbridge
Copy link

mlbridge bot commented Feb 16, 2021

Webrevs

bind(ok3);
}
#endif

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not like the cross-dependency to metaspace.hpp just for the sake of checking an artificial restriction on Klass pointers. And by the way, you could do the check with one test:

  z_oihf(current, 0);
  z_brc(Assembler::bcondZero, ok);

z_oihf() does modify the contents of register current, but it writes back the same value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not like the cross-dependency to metaspace.hpp just for the sake of checking an artificial restriction on Klass pointers.

It is not just for the assertion, it is for limiting the 32bit add to situations where we know Klass pointers cannot exceed 32bit. That was the main reason. As I wrote, I was not sure about the assertion myself and am happy to drop it.

And by the way, you could do the check with one test:

  z_oihf(current, 0);
  z_brc(Assembler::bcondZero, ok);

z_oihf() does modify the contents of register current, but it writes back the same value.

Thank you. Unfortunately, information about z assembly was hard to come by. The only public information I found had hardly more than the instruction names, the rest was trial and error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit. To find System z information, you need to know the "magic keywords" to search for. In this case, it would be "Principles of Operation". The third or so Google hit would lead you to the System z architecture document. With 2000+ pages to read, you would be lost anyway. :-)

lgr_if_needed(dst, current);
z_agfi(dst, -(int)base_l);
z_afi(dst, -(int)base_l); // Note: 32bit add
} else {
load_const(Z_R0, base);
lgr_if_needed(dst, current);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you think of a more general rework like this? The comments in the code should explain the intentions/assumptions/conclusions.

// Klass oop manipulations if compressed.
void MacroAssembler::encode_klass_not_null(Register dst, Register src) {
  Register current = (src != noreg) ? src : dst; // Klass is in dst if no src provided. (dst == src) also possible.
  address  base    = CompressedKlassPointers::base();
  int      shift   = CompressedKlassPointers::shift();
  bool     need_zero_extend = false;
  assert(UseCompressedClassPointers, "only for compressed klass ptrs");

  BLOCK_COMMENT("cKlass encoder {");

#ifdef ASSERT
  Label ok;
  z_tmll(current, KlassAlignmentInBytes-1); // Check alignment.
  z_brc(Assembler::bcondAllZero, ok);
  // The plain disassembler does not recognize illtrap. It instead displays
  // a 32-bit value. Issueing two illtraps assures the disassembler finds
  // the proper beginning of the next instruction.
  z_illtrap(0xee);
  z_illtrap(0xee);
  bind(ok);
#endif

  // Scale down the incoming klass pointer first.
  // We then can be sure we calculate an offset that fits into 32 bit.
  // More generally speaking: all subsequent calculations are purely 32-bit.
  if (shift != 0) {
    assert (LogKlassAlignmentInBytes == shift, "decode alg wrong");
    z_srlg(dst, current, shift);
    need_zero_extend = true;
    current = dst;
  }

  if (base != NULL) {
    // Use scaled-down base address parts to match scaled-down klass pointer.
    unsigned int base_h = ((unsigned long)base)>>(32+shift);
    unsigned int base_l = (unsigned int)(((unsigned long)base)>>shift);

    // General considerations:
    //  - when calculating (current_h - base_h), all digits must cancel (become 0).
    //    Otherwise, we would end up with a compressed klass pointer which doesn't
    //    fit into 32-bit.
    //  - Only bit#33 of the difference could potentially be non-zero. For that
    //    to happen, (current_l < base_l) must hold. In this case, the subtraction
    //    will create a borrow out of bit#32, nicely killing bit#33.
    //  - With the above, we only need to consider current_l and base_l to
    //    calculate the result.
    //  - Both values are treated as unsigned. The unsigned subtraction is
    //    replaced by adding (unsigned) the 2's complement of the subtrahend.

    if (base_l == 0) {
      //  - By theory, the calculation to be performed here (current_h - base_h) MUST
      //    cancel all high-word bits. Otherwise, we would end up with an offset
      //    (i.e. compressed klass pointer) that does not fit into 32 bit.
      //  - current_l remains unchanged.
      //  - Therefore, we can replace all calculation with just a
      //    zero-extending load 32 to 64 bit.
      //  - Even that can be replaced with a conditional load if dst != current.
      //    (this is a local view. The shift step may have requested zero-extension).
    } else {
      // To begin with, we may need to copy and/or zero-extend the register operand.
      // We have to calculate (current_l - base_l). Because there is no unsigend
      // subtract instruction with immediate operand, we add the 2's complement of base_l.
      if (need_zero_extend) {
        z_llgfr(dst, current);
        need_zero_extend = false;
      } else {
        llgfr_if_needed(dst, current); // zero-extension while copying comes at no extra cost.
      }
      current = dst;
      z_alfi(dst, -(int)base_l);
    }

  if (need_zero_extend) {
    // We must zero-extend the calculated result. It may have some leftover bits in
    // the hi-word because we only did optimized calculations.
    z_llgfr(dst, current);
  } else {
    llgfr_if_needed(dst, current); // zero-extension while copying comes at no extra cost.
  }

  BLOCK_COMMENT("} cKlass encoder");
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice and elegant.

But as said offlist, I dislike the fact that this hard codes the limitation to 32bit for the narrow klass pointer range.

That restriction is artificial and we may just want to drop it. E.g. one recurring idea I have is to drop the duality in metaspace between non-class- and class-metaspace, and just store everything in class space. That would save quite a bit of memory (less overhead) and make the metaspace coding quite a bit simpler. However, in that case it could be that we exceed the current 3g limit and may even exceed 32bit. Since add+shift for decoding is universally done on all platforms at least if CDS is on, this should work out of the box. Unless of course the platforms hard-code the 32bit limitation into their encoding schemes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how you want to overcome the 32-bit limit for compressed pointers. This whole "compression" thing is based on the "trick" to store an offset instead of the full address. Depending on the object alignment requirement, this affords you 32 GB (8-byte alignment) or 64 GB (16-byte alignment) of addressable (or should I say offset-able) space. That's quite a bit.

You use pointer compression to save space, and for nothing else. Space savings have to be so significant that they outweigh the added effort for encoding and decoding. With just some shift and add, the effort is limited, though noticeable. If you would make compressed pointers 40 bits wide (5 bytes), encoding and decoding would impose more effort. What's even worse, you then would have entities with a size not native to any processor. Just imagine you have to atomically store such a value.

I my opinion, wider compressed pointers will have to wait until we have 128-bit pointers.

Back to code:
In the code suggested above, you could make use of the Metaspace::class_space_end() function. If the class space end address, shifted right, fits into 32 bit, need_zero_extend may remain false. Your choice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You misunderstand me. My point was not to make narrow pointers larger than 32bit, but use the full encodable range. The encodable range is 32g atm. But we artificially limit the range to 3G (CompressedClassSpaceSize is capped at that value).

I thought your proposal was based upon the assumption that the highest uncompressed offset into class space can be not larger than 4G. But looking at your proposal again, I see you moved the shift up before the add, so it should probably work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it was mutual misunderstanding. Good to have that resolved.

@tstuefe
Copy link
Member Author

tstuefe commented Feb 22, 2021

/contributor add @RealLucy

@openjdk
Copy link

openjdk bot commented Feb 22, 2021

@tstuefe
Contributor Lutz Schmidt <lucy@openjdk.org> successfully added.

Copy link
Contributor

@TheRealMDoerr TheRealMDoerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing. Looks correct, but I have one minor finding.


if (base != NULL) {
// Use scaled-down base address parts to match scaled-down klass pointer.
unsigned int base_h = ((unsigned long)base)>>(32+shift);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

base_h is unused, but referred to in the comments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a comment crossing...
With my latest suggestion, base_h is now used.

Copy link
Contributor

@RealLucy RealLucy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me now.
Including the additional optimisation is optional.
Thanks for debugging, finding and fixing!

// calculate the result.
// - Both values are treated as unsigned. The unsigned subtraction is
// replaced by adding (unsigned) the 2's complement of the subtrahend.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a further tiny optimisation you may want to include in the final version:

  // If we happen to see (base_h == 0), we are sure there 
  // is no borrow from bit#33. No zero-extension is needed. 
  if (base_h == 0) {
    need_zero_extend = false;
  }

@openjdk
Copy link

openjdk bot commented Feb 22, 2021

@tstuefe This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base

Co-authored-by: Lutz Schmidt <lucy@openjdk.org>
Reviewed-by: mdoerr, lucy

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 160 new commits pushed to the master branch:

  • fe8e370: 8262188: Add test to verify trace page sizes logging on Linux
  • 0a7fff4: 8261636: The test mapping in hugetlbfs_sanity_check should consider LargePageSizeInBytes
  • 702ca62: 8262185: G1: Prune collection set candidates early
  • 8bc8542: 8262195: Harden tests that use the HostsFileNameService (jdk.net.hosts.file property)
  • 20c93b3: 8261914: IfNode::fold_compares_helper faces non-canonicalized bool when running JRuby JSON workload
  • ddd550a: 8261308: C2: assert(inner->is_valid_counted_loop(T_INT) && inner->is_strip_mined()) failed: OuterStripMinedLoop should have been removed
  • 03d888f: 8261804: Remove field _processing_is_mt, calculate it instead
  • 6800ba4: 8257500: Drawing MultiResolutionImage with ImageObserver "leaks" memory
  • 65a245e: 8262329: Fix JFR parser exception messages
  • a4c2496: 8259535: ECDSA SignatureValue do not always have the specified length
  • ... and 150 more: https://git.openjdk.java.net/jdk/compare/3cbd16de3d0b53c001c3fe6f4f3c723e04b0dfa6...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Feb 22, 2021
Copy link
Contributor

@TheRealMDoerr TheRealMDoerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, I think we need to support encoding of Class Pointer 0x100000000 (and above) with e.g. base = 0x0C0000000 and shift = 0. need_zero_extend is false in this example which possibly leaves a 1 in the higher 32 bit. Lower 32 bit are correct in your current version, but some code may rely on zero extension to 64 bit.

Copy link
Contributor

@RealLucy RealLucy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, still.

@tstuefe
Copy link
Member Author

tstuefe commented Mar 2, 2021

Thanks @TheRealMDoerr and @RealLucy for advice and reviews!

/integrate

@openjdk openjdk bot closed this Mar 2, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Mar 2, 2021
@openjdk
Copy link

openjdk bot commented Mar 2, 2021

@tstuefe Since your change was applied there have been 173 commits pushed to the master branch:

  • f5ab7f6: 8262472: Buffer overflow in UNICODE::as_utf8 for zero length output buffer
  • 6635d7a: 8261670: Add javadoc for the XML processing limits
  • 85b774a: 8255859: Incorrect comments in log.hpp
  • c3eb80e: 8262500: HostName entry in VM.info should be a new line
  • 9f0f0c9: 8260933: runtime/cds/serviceability/ReplaceCriticalClassesForSubgraphs.java fails without CompactStrings
  • d339832: 8257414: Drag n Drop target area is wrong on high DPI systems
  • 353416f: 8262509: JSSE Server should check the legacy version in TLSv1.3 ClientHello
  • 642f45f: 8261839: Error creating runtime package on macos without mac-package-identifier
  • 682e120: 8262497: Delete unused utility methods in ICC_Profile class
  • 4c9adce: 8262379: Add regression test for JDK-8257746
  • ... and 163 more: https://git.openjdk.java.net/jdk/compare/3cbd16de3d0b53c001c3fe6f4f3c723e04b0dfa6...master

Your commit was automatically rebased without conflicts.

Pushed as commit fdd1093.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@tstuefe tstuefe deleted the JDK-8261552-s390-templ-intr-bug branch March 6, 2021 05:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

3 participants