Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8263164: assert(_base >= VectorA && _base <= VectorZ) failed: Not a Vector while calling StoreVectorNode::memory_size() #2867

Closed
wants to merge 7 commits into from

Conversation

DamonFool
Copy link
Member

@DamonFool DamonFool commented Mar 8, 2021

Hi all,

Several Vector API tests failed intermittently due to this assert.

#
#  Internal Error (/home/jvm/jdk/src/hotspot/share/opto/type.hpp:1686), pid=10561, tid=10577
#  assert(_base >= VectorA && _base <= VectorZ) failed: Not a Vector
#
# JRE version: OpenJDK Runtime Environment (17.0) (fastdebug build 17-internal+0-adhoc..jdk)
# Java VM: OpenJDK 64-Bit Server VM (fastdebug 17-internal+0-adhoc..jdk, compiled mode, sharing, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x918f44]  StoreVectorNode::memory_size() const+0xb4

The reason is that StoreVectorNode::memory_size() will fail if StoreVectorNode->in(MemNode::ValueIn)->bottom_type() == Type::TOP.
However, this case seems possible during C2's optimization.
And the same bug exists for LoadVectorNode::memory_size() as well.

The fix returns 0 if StoreVectorNode->in(MemNode::ValueIn)->bottom_type() is not a vector type.

Testing:

  • jdk/incubator/vector
  • tier1~tier3

Any comments?

Thanks.
Best regards,
Jie


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8263164: assert(_base >= VectorA && _base <= VectorZ) failed: Not a Vector while calling StoreVectorNode::memory_size()

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/2867/head:pull/2867
$ git checkout pull/2867

@DamonFool
Copy link
Member Author

/issue add JDK-8263164
/test
/label add hotspot-compiler
/cc hotspot-compiler

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 8, 2021

👋 Welcome back jiefu! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Mar 8, 2021
…ector while calling StoreVectorNode::memory_size()
@openjdk
Copy link

openjdk bot commented Mar 8, 2021

@DamonFool This issue is referenced in the PR title - it will now be updated.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Mar 8, 2021
@openjdk
Copy link

openjdk bot commented Mar 8, 2021

@DamonFool
The hotspot-compiler label was successfully added.

@openjdk
Copy link

openjdk bot commented Mar 8, 2021

@DamonFool The hotspot-compiler label was already applied.

@mlbridge
Copy link

mlbridge bot commented Mar 8, 2021

Webrevs

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is correct place to fix it. Why checks in StoreNode(and LoadNode)::Value() does not work for vectors?:
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/memnode.cpp#L2702

Did we forgot to put this nodes on IGVN worklist somewhere?

@DamonFool
Copy link
Member Author

I don't think it is correct place to fix it. Why checks in StoreNode(and LoadNode)::Value() does not work for vectors?:
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/memnode.cpp#L2702

Did we forgot to put this nodes on IGVN worklist somewhere?

Hi @vnkozlov ,

Thanks for your review.

After more investigation, I found the same issue for non-vector store nodes.
For example, StoreN was also observed with in(MemNode::ValueIn)->bottom_type() == Type::TOP during PhaseIterGVN::optimize().

The nodes should have been pushed on the IGVN worklist since it happened when StoreN was popped from the IGVN worklist.
So I think StoreVectorNode::memory_size() should also work when in(MemNode::ValueIn)->bottom_type() == Type::TOP during PhaseIterGVN::optimize().

By the way, I'm not clear why you say the checks in StoreNode(and LoadNode)::Value() does not work for vectors?
Could you make it more clearer?
Thanks.

Best regards,
Jie

@vnkozlov
Copy link
Contributor

What I mean is IGVN should replace nodes which have TOP input with TOP and remove them from graph as dead nodes.
Value() methods do replacement after checking node's inputs for TOP. And Ideal() method should check for TOP and immediately return to avoid unneeded transformations (or guard transformation code with != TOP check) and let Value() do replacement with TOP.

In case of memory nodes MemNode::Ideal_common() do such checks (but only for ctrl, mem and adr inputs):
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/memnode.cpp#L286

To my surprise StoreNode::Ideal() does not check in(MemNode::ValueIn) input for TOP at all. May be because in normal case we don't do much for in(MemNode::ValueIn) when processing Store nodes and eventually StoreNode::Value() will replace node with TOP.

Yes, your fix works for Vector nodes. But as you saw normal Store nodes also can have TOP value input and we do useless transformations until StoreNode::Value() calls. I think the better fix is to check in(MemNode::ValueIn) for TOP in StoreNode::Ideal().

And we need to test it hard because it is very old code.

@DamonFool
Copy link
Member Author

What I mean is IGVN should replace nodes which have TOP input with TOP and remove them from graph as dead nodes.
Value() methods do replacement after checking node's inputs for TOP. And Ideal() method should check for TOP and immediately return to avoid unneeded transformations (or guard transformation code with != TOP check) and let Value() do replacement with TOP.

In case of memory nodes MemNode::Ideal_common() do such checks (but only for ctrl, mem and adr inputs):
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/memnode.cpp#L286

To my surprise StoreNode::Ideal() does not check in(MemNode::ValueIn) input for TOP at all. May be because in normal case we don't do much for in(MemNode::ValueIn) when processing Store nodes and eventually StoreNode::Value() will replace node with TOP.

Yes, your fix works for Vector nodes. But as you saw normal Store nodes also can have TOP value input and we do useless transformations until StoreNode::Value() calls. I think the better fix is to check in(MemNode::ValueIn) for TOP in StoreNode::Ideal().

And we need to test it hard because it is very old code.

Thanks @vnkozlov for your kind explanation.
I'll try to fix it in StoreNode::Ideal() and test it as much as we can.
Thanks.

@DamonFool
Copy link
Member Author

Hi @vnkozlov ,

Unfortunately, it doesn't work to avoid store->in(MemNode::ValueIn)->bottom_type() == Type::TOP by just improving StoreNode::Ideal().

TOP value seems hard to be avoided for store nodes during C2's optimization phase.
So I think we should support StoreVectorNode::memory_size() for that case, which has already been supported by non-vector stores.

And I suggest to keep StoreNode::Ideal() as it is to lower the risk.

What do you think?

Thanks.
Best regards,
Jie


To show you how this can happen, please run vmTestbase/nsk/jdi/StepEvent/itself/stepEvent004/stepEvent004.java several times with the following change:

diff --git a/src/hotspot/share/opto/memnode.cpp b/src/hotspot/share/opto/memnode.cpp
index 440e29f..28debe1 100644
--- a/src/hotspot/share/opto/memnode.cpp
+++ b/src/hotspot/share/opto/memnode.cpp
@@ -2617,6 +2617,9 @@ Node *StoreNode::Ideal(PhaseGVN *phase, bool can_reshape) {
   Node* mem     = in(MemNode::Memory);
   Node* address = in(MemNode::Address);
   Node* value   = in(MemNode::ValueIn);
+
+  if (value && value->is_top()) return NULL;
+
   // Back-to-back stores to same address?  Fold em up.  Generally
   // unsafe if I have intervening uses...  Also disallowed for StoreCM
   // since they must follow each StoreP operation.  Redundant StoreCMs
diff --git a/src/hotspot/share/opto/memnode.hpp b/src/hotspot/share/opto/memnode.hpp
index b9ee30f..ea878f9 100644
--- a/src/hotspot/share/opto/memnode.hpp
+++ b/src/hotspot/share/opto/memnode.hpp
@@ -603,6 +603,19 @@ public:
 #endif
   }

+  virtual int memory_size() const {
+    if (in(MemNode::ValueIn)->bottom_type() == Type::TOP) {
+      tty->print_cr("jiefu catch it ...");
+      this->dump();
+      assert(false, "Find top in StoreNode");
+    }
+#ifdef ASSERT
+    return type2aelembytes(memory_type(), true);
+#else
+    return type2aelembytes(memory_type());
+#endif
+  }
+
   // Polymorphic factory method
   //
   // We must ensure that stores of object references will be visible

In my experiment, I got the failure like this

jiefu catch it ...
 21550  StoreI  ===  15446  15451  21551  1  [[ 15454 ]]

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (/home/jvm/jdk/src/hotspot/share/opto/memnode.hpp:610), pid=20324, tid=20395
#  assert(false) failed: Find top in StoreNode
#

Current thread (0x00007f8900308b40):  JavaThread "C2 CompilerThread1" daemon [_thread_in_native, id=20395, stack(0x00007f891a5ac000,0x00007f891a6ad000)]


Current CompileTask:
C2:   2238 1321 %     4       nsk.share.jdi.sde.SDEDebugger::compareLocations @ 66 (381 bytes)

Stack: [0x00007f891a5ac000,0x00007f891a6ad000],  sp=0x00007f891a6a8280,  free space=1008k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x91a452]  StoreNode::memory_size() const+0xe2
V  [libjvm.so+0x1282223]  InitializeNode::stores_are_sane(PhaseTransform*) [clone .part.212]+0x123
V  [libjvm.so+0x128fef8]  InitializeNode::capture_store(StoreNode*, long, PhaseGVN*, bool)+0x268
V  [libjvm.so+0x12988a3]  StoreNode::Ideal(PhaseGVN*, bool)+0x533
V  [libjvm.so+0x144564a]  PhaseIterGVN::transform_old(Node*)+0xba
V  [libjvm.so+0x143f72d]  PhaseIterGVN::optimize()+0x7d
V  [libjvm.so+0x979834]  Compile::Optimize()+0x204
V  [libjvm.so+0x97c134]  Compile::Compile(ciEnv*, ciMethod*, int, bool, bool, bool, bool, DirectiveSet*)+0x1864
V  [libjvm.so+0x7e253a]  C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x10a
V  [libjvm.so+0x98a4c6]  CompileBroker::invoke_compiler_on_method(CompileTask*)+0xbf6
V  [libjvm.so+0x98b058]  CompileBroker::compiler_thread_loop()+0x4b8
V  [libjvm.so+0x17d120a]  JavaThread::thread_main_inner()+0x2fa
V  [libjvm.so+0x17d1527]  JavaThread::run()+0x2b7
V  [libjvm.so+0x17d5fe8]  Thread::call_run()+0xf8
V  [libjvm.so+0x13c5866]  thread_native_entry(Thread*)+0x116

virtual int memory_size() const { return vect_type()->length_in_bytes(); }
virtual int memory_size() const {
if (in(MemNode::ValueIn)->bottom_type()->isa_vect() != NULL) {
return vect_type()->length_in_bytes();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same issue also exists when method "vect_type()" (line 757) is called directly, doesn't it?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All codes that calles vect_type() in StoreVectorNode met the same issue. And I think it's better to make sure the in(MemNode::ValueIn)->bottom_type() is a TypeVect. We need to fix the issues that make it not the right type, doesn't it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!
Thanks @XiaohongGong for your review.

Yes, in theory, there may be the same bug when vect_type() is called since it's based on Type::is_vect().

However, by this reasoning, there might be the same bug in non-vector types such as Type::is_int()/Type::is_long()/Type::is_ptr()/... too.

Will do more investigation.
Thanks.

@vnkozlov
Copy link
Contributor

Would be nice to see where TOP value is coming from in stepEvent004.java test.

@e1iu
Copy link
Member

e1iu commented Mar 11, 2021

To show you how this can happen, please run vmTestbase/nsk/jdi/StepEvent/itself/stepEvent004/stepEvent004.java several times with the following change:

diff --git a/src/hotspot/share/opto/memnode.cpp b/src/hotspot/share/opto/memnode.cpp
index 440e29f..28debe1 100644
--- a/src/hotspot/share/opto/memnode.cpp
+++ b/src/hotspot/share/opto/memnode.cpp
@@ -2617,6 +2617,9 @@ Node *StoreNode::Ideal(PhaseGVN *phase, bool can_reshape) {
   Node* mem     = in(MemNode::Memory);
   Node* address = in(MemNode::Address);
   Node* value   = in(MemNode::ValueIn);
+
+  if (value && value->is_top()) return NULL;
+

Have you tried phase->type(value) ? I suppose the type maybe TOP but value itself may not.

@DamonFool
Copy link
Member Author

Would be nice to see where TOP value is coming from in stepEvent004.java test.

Okay, will do it.
Thanks

@DamonFool
Copy link
Member Author

Have you tried phase->type(value) ? I suppose the type maybe TOP but value itself may not.

Thanks @theRealELiu for your review.

I've tried this

diff --git a/src/hotspot/share/opto/memnode.cpp b/src/hotspot/share/opto/memnode.cpp
index 440e29f..c1a8634 100644
--- a/src/hotspot/share/opto/memnode.cpp
+++ b/src/hotspot/share/opto/memnode.cpp
@@ -2617,6 +2617,10 @@ Node *StoreNode::Ideal(PhaseGVN *phase, bool can_reshape) {
   Node* mem     = in(MemNode::Memory);
   Node* address = in(MemNode::Address);
   Node* value   = in(MemNode::ValueIn);
+
+  if (value && value->is_top()) return NULL;
+  if (value && phase->type(value) == Type::TOP) return NULL;
+
   // Back-to-back stores to same address?  Fold em up.  Generally
   // unsafe if I have intervening uses...  Also disallowed for StoreCM
   // since they must follow each StoreP operation.  Redundant StoreCMs

But there is still no good news.

@DamonFool
Copy link
Member Author

/test

@openjdk
Copy link

openjdk bot commented Mar 11, 2021

@DamonFool you need to get approval to run the tests in tier1 for commits up until 9166bd1

@openjdk openjdk bot added the test-request label Mar 11, 2021
@DamonFool
Copy link
Member Author

/test

@openjdk
Copy link

openjdk bot commented Mar 11, 2021

@DamonFool you need to get approval to run the tests in tier1 for commits up until 311c976

@DamonFool
Copy link
Member Author

Hi @vnkozlov , @XiaohongGong and @theRealELiu ,

If I read the code correctly, StoreNode::Idea can't fix this kind of bugs.

This is because we may see other store nodes with value=top on the IGVN list [1]

StoreNode::Ideal
  |
  `InitializeNode::capture_store
     |
     `InitializeNode::stores_are_sane
        |
        `st->as_Store()->memory_size()  <-- st's value may be TOP (e.g., when it is on the IGVN list waiting bo be transformed)

Fixing in StoreNode::Idea can just bypass the top value for the current store node during IGVN optimization.
But it may still fail when other nodes with top value queuing on the IGVN list.

So we should support vector's memory_size()/element_size()/length()/length_in_bytes() for that case.

vect_type() has been improved based on @XiaohongGong 's comments.
And since StoreNode::Idea won't fix the problem, I suggest to keep StoreNode::Ideal() as it is to lower the risk.
What do you think?

Testing:

  • tier1~tier3 on Linux/x64, no regression
  • nightly testing of jdk/incubator/vector

Thanks.
Best regards,
Jie

[1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/memnode.cpp#L4467

@iwanowww
Copy link
Contributor

Why not simply cache the vector type of value being stored during StoreVectorNode construction and use it for subsequent queries? Most of the vector nodes already do that since VectorNode subclasses TypeNode, but StoreVectorNode is a MemNode.

And that's the reason I don't understand why non-StoreVectorNode-related changes are needed in your patch.

Nevertheless, there's another case that seems to suffer from the very same problem - ReductionNode.

@DamonFool
Copy link
Member Author

Why not simply cache the vector type of value being stored during StoreVectorNode construction and use it for subsequent queries? Most of the vector nodes already do that since VectorNode subclasses TypeNode, but StoreVectorNode is a MemNode.

And that's the reason I don't understand why non-StoreVectorNode-related changes are needed in your patch.

Nevertheless, there's another case that seems to suffer from the very same problem - ReductionNode.

Thanks @iwanowww for your review.

Good suggestion!
Will re-do it based on your comments.
Thanks.

@DamonFool
Copy link
Member Author

Why not simply cache the vector type of value being stored during StoreVectorNode construction and use it for subsequent queries? Most of the vector nodes already do that since VectorNode subclasses TypeNode, but StoreVectorNode is a MemNode.

And that's the reason I don't understand why non-StoreVectorNode-related changes are needed in your patch.

Nevertheless, there's another case that seems to suffer from the very same problem - ReductionNode.

Hi @iwanowww ,

_vect_type has been added into StoreVectorNode to cache the vector type of value being stored during StoreVectorNode construction.

I didn't get your point why ReductionNode also suffers from the very same problem.
Could you please make it more clearer?
Thanks.

Best regards,
Jie

@DamonFool
Copy link
Member Author

/test

@openjdk
Copy link

openjdk bot commented Mar 13, 2021

@DamonFool you need to get approval to run the tests in tier1 for commits up until fc9bdaf

@iwanowww
Copy link
Contributor

I didn't get your point why ReductionNode also suffers from the very same problem.
Could you please make it more clearer?

The following code experiences the very same problem, doesn't it?

class ReductionNode : public Node {
...
  virtual const Type* bottom_type() const {
    BasicType vbt = in(1)->bottom_type()->basic_type();
    return Type::get_const_basic_type(vbt);
  }

@DamonFool
Copy link
Member Author

I didn't get your point why ReductionNode also suffers from the very same problem.
Could you please make it more clearer?

The following code experiences the very same problem, doesn't it?

class ReductionNode : public Node {
...
  virtual const Type* bottom_type() const {
    BasicType vbt = in(1)->bottom_type()->basic_type();
    return Type::get_const_basic_type(vbt);
  }

Hi @iwanowww ,

Di you mean the following assert may fail for ReductionNode?

  static const Type* get_const_basic_type(BasicType type) {
    assert((uint)type <= T_CONFLICT && _const_basic_type[type] != NULL, "bad type");
    return _const_basic_type[type];
  }

But it seems fine to pass a Type::TOP to it.

(gdb) call Type::TOP->basic_type()
$1 = T_VOID
(gdb) call Type::get_const_basic_type(Type::TOP->basic_type())
$2 = (const Type *) 0x7fff6402bc68

Am I missing something?

Thanks.
Best regards,
Jie

@iwanowww
Copy link
Contributor

I thought that get_const_basic_type() would assert on T_VOID, but still having ReductionNode::bottom_type() returning a NULL pointer (when in(1) is TOP) may cause problems in surprising places.

  _const_basic_type[T_VOID]        = TypePtr::NULL_PTR;   // reflection represents void this way

@DamonFool
Copy link
Member Author

but still having ReductionNode::bottom_type() returning a NULL pointer (when in(1) is TOP) may cause problems in surprising places.

OK, I got it.

Fixed.
Thanks.

@DamonFool
Copy link
Member Author

/test

@openjdk
Copy link

openjdk bot commented Mar 15, 2021

@DamonFool you need to get approval to run the tests in tier1 for commits up until e2f70ac

Copy link
Contributor

@iwanowww iwanowww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Jie. Looks good.

I'll submit it for testing.

Copy link
Contributor

@iwanowww iwanowww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hs-precheckin-comp, hs-tier1, and hs-tier2 are clean.

@openjdk
Copy link

openjdk bot commented Mar 16, 2021

@DamonFool This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8263164: assert(_base >= VectorA && _base <= VectorZ) failed: Not a Vector while calling StoreVectorNode::memory_size()

Reviewed-by: kvn, vlivanov

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 133 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Mar 16, 2021
@openjdk
Copy link

openjdk bot commented Mar 16, 2021

@DamonFool This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8263164: assert(_base >= VectorA && _base <= VectorZ) failed: Not a Vector while calling StoreVectorNode::memory_size()

Reviewed-by: vlivanov

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 127 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@DamonFool
Copy link
Member Author

Thanks, Jie. Looks good.

I'll submit it for testing.

Thanks @iwanowww .

@vnkozlov , are you fine with this change?
Thanks.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree with this version of fix.

@DamonFool
Copy link
Member Author

Yes, I agree with this version of fix.

Thanks @vnkozlov .

And also thanks @XiaohongGong and @theRealELiu for your help.
/integrate

@openjdk openjdk bot closed this Mar 16, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Mar 16, 2021
@openjdk
Copy link

openjdk bot commented Mar 16, 2021

@DamonFool Since your change was applied there have been 133 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

Pushed as commit 5069796.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated test-request
Development

Successfully merging this pull request may close these issues.

5 participants