Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standalone/stdx-ops.mlir crash #45

Closed
chelini opened this issue Oct 7, 2022 · 19 comments · Fixed by #73 or #531
Closed

Standalone/stdx-ops.mlir crash #45

chelini opened this issue Oct 7, 2022 · 19 comments · Fixed by #73 or #531
Labels
bug Something isn't working

Comments

@chelini
Copy link
Contributor

chelini commented Oct 7, 2022

FAIL: STANDALONE_OPT :: Standalone/stdx-ops.mlir (64 of 64)
******************** TEST 'STANDALONE_OPT :: Standalone/stdx-ops.mlir' FAILED ********************
Script:
--
: 'RUN: at line 1';   /home/lorenzo/tpp-sandbox/build/bin/standalone-opt /home/lorenzo/tpp-sandbox/test/Standalone/stdx-ops.mlir | /home/lorenzo/llvm-project/build/bin/FileCheck /home/lorenzo/tpp-sandbox/test/Standalone/stdx-ops.mlir
--
Exit Code: 2

Command Output (stderr):
--
realloc(): invalid pointer
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /home/lorenzo/tpp-sandbox/build/bin/standalone-opt /home/lorenzo/tpp-sandbox/test/Standalone/stdx-ops.mlir
1.	MLIR Parser: custom op parser 'func.func'
2.	MLIR Parser: custom op parser 'stdx.closure'
3.	MLIR Parser: custom op parser 'stdx.yield'
 #0 0x000055cb0c56a79a llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/lorenzo/llvm-project/llvm/lib/Support/Unix/Signals.inc:569:11
 #1 0x000055cb0c56a94b PrintStackTraceSignalHandler(void*) /home/lorenzo/llvm-project/llvm/lib/Support/Unix/Signals.inc:636:1
 #2 0x000055cb0c568f96 llvm::sys::RunSignalHandlers() /home/lorenzo/llvm-project/llvm/lib/Support/Signals.cpp:104:5
 #3 0x000055cb0c56b075 SignalHandler(int) /home/lorenzo/llvm-project/llvm/lib/Support/Unix/Signals.inc:407:1
 #4 0x00007f44396af520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #5 0x00007f4439703a7c __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #6 0x00007f4439703a7c __pthread_kill_internal ./nptl/pthread_kill.c:78:10
 #7 0x00007f4439703a7c pthread_kill ./nptl/pthread_kill.c:89:10
 #8 0x00007f44396af476 gsignal ./signal/../sysdeps/posix/raise.c:27:6
 #9 0x00007f44396957f3 abort ./stdlib/abort.c:81:7
#10 0x00007f44396f66f6 __libc_message ./libio/../sysdeps/posix/libc_fatal.c:155:5
#11 0x00007f443970dd7c ./malloc/malloc.c:5668:3
#12 0x00007f4439712b2c __libc_realloc ./malloc/malloc.c:3444:7
#13 0x000055cb0c4b075d llvm::safe_realloc(void*, unsigned long) /home/lorenzo/llvm-project/llvm/include/llvm/Support/MemAlloc.h:53:9
#14 0x000055cb0c4b0e32 llvm::SmallVectorBase<unsigned int>::grow_pod(void*, unsigned long, unsigned long) /home/lorenzo/llvm-project/llvm/lib/Support/SmallVector.cpp:151:13
#15 0x000055cb06504368 llvm::SmallVectorTemplateCommon<mlir::Type, void>::grow_pod(unsigned long, unsigned long) /home/lorenzo/llvm-project/llvm/include/llvm/ADT/SmallVector.h:142:3
#16 0x000055cb06504322 llvm::SmallVectorTemplateBase<mlir::Type, true>::grow(unsigned long) /home/lorenzo/llvm-project/llvm/include/llvm/ADT/SmallVector.h:529:71
#17 0x000055cb0650fabc mlir::Type const* llvm::SmallVectorTemplateCommon<mlir::Type, void>::reserveForParamAndGetAddressImpl<llvm::SmallVectorTemplateBase<mlir::Type, true>>(llvm::SmallVectorTemplateBase<mlir::Type, true>*, mlir::Type const&, unsigned long) /home/lorenzo/llvm-project/llvm/include/llvm/ADT/SmallVector.h:247:12
#18 0x000055cb0650fa45 llvm::SmallVectorTemplateBase<mlir::Type, true>::reserveForParamAndGetAddress(mlir::Type&, unsigned long) /home/lorenzo/llvm-project/llvm/include/llvm/ADT/SmallVector.h:540:5
#19 0x000055cb064f73a6 llvm::SmallVectorTemplateBase<mlir::Type, true>::push_back(mlir::Type) /home/lorenzo/llvm-project/llvm/include/llvm/ADT/SmallVector.h:566:23
#20 0x000055cb0650e0ea mlir::Type& llvm::SmallVectorTemplateBase<mlir::Type, true>::growAndEmplaceBack<>() /home/lorenzo/llvm-project/llvm/include/llvm/ADT/SmallVector.h:560:5
#21 0x000055cb0650e06f mlir::Type& llvm::SmallVectorImpl<mlir::Type>::emplace_back<>() /home/lorenzo/llvm-project/llvm/include/llvm/ADT/SmallVector.h:943:7
#22 0x000055cb0650e010 mlir::AsmParser::parseTypeList(llvm::SmallVectorImpl<mlir::Type>&)::'lambda'()::operator()() const /home/lorenzo/llvm-project/mlir/include/mlir/IR/OpImplementation.h:1123:41
#23 0x000055cb0650dfd5 mlir::ParseResult llvm::function_ref<mlir::ParseResult ()>::callback_fn<mlir::AsmParser::parseTypeList(llvm::SmallVectorImpl<mlir::Type>&)::'lambda'()>(long) /home/lorenzo/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
#24 0x000055cb0c1846f9 llvm::function_ref<mlir::ParseResult ()>::operator()() const /home/lorenzo/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
#25 0x000055cb0c17125e mlir::detail::Parser::parseCommaSeparatedList(mlir::AsmParser::Delimiter, llvm::function_ref<mlir::ParseResult ()>, llvm::StringRef) /home/lorenzo/llvm-project/mlir/lib/AsmParser/Parser.cpp:103:7
#26 0x000055cb0c1890cf mlir::detail::AsmParserImpl<mlir::OpAsmParser>::parseCommaSeparatedList(mlir::AsmParser::Delimiter, llvm::function_ref<mlir::ParseResult ()>, llvm::StringRef) /home/lorenzo/llvm-project/mlir/lib/AsmParser/AsmParserImpl.h:291:19
#27 0x000055cb084c460c mlir::stdx::YieldOp::parse(mlir::OpAsmParser&, mlir::OperationState&) /home/lorenzo/tpp-sandbox/build/include/Standalone/Dialect/Stdx/StdxOps.cpp.inc:346:3
@rengolin
Copy link
Contributor

This does not seem to be a random crash. I have executed that line a dozen thousand times (on both my desktop and the cluster - head node and compute node) and I can never reproduce it. We're seeing this error on our CI, correct? Can you reproduce it locally, on your machine?

A quick look at the error messages show the pattern:

  • YieldOp tries to parse the operand types (there's one)
  • It does find it, and that's why it tries to emplace_back
  • Which then triggers growth, and calls safe_realloc, which fails.

realloc doesn't crash when allocation fails, it returns a null pointer. It only fails (UB) if the pointer being passed hasn't been allocated with malloc or if it has been freed already. So, this is not a memory problem, it seems to be a code problem.

Be we are not calling safe_realloc directly, not even manipulating the SmallVector, we're using bog-standard assemblyFormat, so this is probably an issue with MLIR parsing libraries, or worse, SmallVector. :(

@rengolin
Copy link
Contributor

On YieldOp::parse, this is the effective code:

 ::llvm::SmallVector<::mlir::Type, 1> operandsTypes;
  if (parser.parseTypeList(operandsTypes))
    return ::mlir::failure();
  }

with parseTypeList as:

  ParseResult parseTypeList(SmallVectorImpl<Type> &result) {
      return parseCommaSeparatedList(
          [&]() { return parseType(result.emplace_back()); });
  }

So, the emplace_back() operate on a SmallVector being passed by reference to the lambda, which is called by parseCommaSeparatedList and should still be valid in memory when parseType() runs, and calls emplace_back().

I can see no fault in this code. :(

@rengolin
Copy link
Contributor

Turns out it was miscompilation with ccache on @chelini's own machine. We can re-enable the tests.

@hfp
Copy link
Contributor

hfp commented Oct 17, 2022

Turns out it was miscompilation with ccache on @chelini's own machine. We can re-enable the tests.

Side-note: we do not use ccache for our tests since we manually control reusing data based on NFS-distributed FS. For PlaidML on the cluster, I specifically turned off ccache with ./configure --launcher="" [...]". Sharing configuration data from CMake based on NFS breaks compilation (let alone reuse based on ccache).

rengolin added a commit to rengolin/tpp-mlir that referenced this issue Oct 17, 2022
Problem was ccache on a local machine.

Fixes plaidml#45
rengolin added a commit that referenced this issue Oct 17, 2022
Problem was ccache on a local machine.

Fixes #45
@rengolin
Copy link
Contributor

rengolin commented Nov 9, 2022

This has happened on the cluster a while ago and is now happening on my machine as well, always when using GCC 12.2.

I don't use ccache, so this has nothing to do with that. I'm guessing either we or LLVM SmallVector does some undefined behaviour that the UBSAN doesn't catch. Or there's a bug in GCC 12.2.

We need to understand what's the problem, because most people will just try and build with the system compiler, which on Linux happens to be GCC.

@rengolin rengolin reopened this Nov 9, 2022
@rengolin rengolin added the bug Something isn't working label Nov 9, 2022
@rengolin
Copy link
Contributor

rengolin commented Nov 9, 2022

Spoke too soon, now I get the same problem with clang... I'll try to reduce some test.

@rengolin
Copy link
Contributor

rengolin commented Nov 9, 2022

Before, it was crashing when parsing the yield op, now it also crashes when parsing the pack and unpack ops:

realloc(): invalid pointer
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt /home/rengolin/devel/intel/tpp-sandbox/test/TPP/bufferize-pack-unpack.mlir "-one-shot-bufferize=bufferize-function-boundaries allow-return-allocs function-boundary-type-conversion=identity-layout-map" -canonicalize -drop-equivalent-buffer-results -finalizing-bufferize
1.	MLIR Parser: custom op parser 'func.func'
2.	MLIR Parser: custom op parser 'linalgx.pack'
 #0 0x0000562a60c22194 PrintStackTraceSignalHandler(void*) Signals.cpp:0:0
 #1 0x0000562a60c1f85b SignalHandler(int) Signals.cpp:0:0
 #2 0x00007f0af9251a00 (/usr/lib/libc.so.6+0x38a00)
 #3 0x00007f0af92a164c (/usr/lib/libc.so.6+0x8864c)
 #4 0x00007f0af9251958 raise (/usr/lib/libc.so.6+0x38958)
 #5 0x00007f0af923b53d abort (/usr/lib/libc.so.6+0x2253d)
 #6 0x00007f0af92957ee (/usr/lib/libc.so.6+0x7c7ee)
 #7 0x00007f0af92ab3dc (/usr/lib/libc.so.6+0x923dc)
 #8 0x00007f0af92b0094 __libc_realloc (/usr/lib/libc.so.6+0x97094)
 #9 0x0000562a60be4714 llvm::SmallVectorBase<unsigned int>::grow_pod(void*, unsigned long, unsigned long) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0x1852714)
#10 0x0000562a5f509443 llvm::SmallVectorTemplateBase<mlir::Type, true>::push_back(mlir::Type) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0x177443)
#11 0x0000562a5f5094cf mlir::ParseResult llvm::function_ref<mlir::ParseResult ()>::callback_fn<mlir::AsmParser::parseTypeList(llvm::SmallVectorImpl<mlir::Type>&)::'lambda'()>(long) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0x1774cf)
#12 0x0000562a60a498d8 mlir::detail::Parser::parseCommaSeparatedList(mlir::AsmParser::Delimiter, llvm::function_ref<mlir::ParseResult ()>, llvm::StringRef) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0x16b78d8)
#13 0x0000562a603308e3 mlir::AsmParser::parseCommaSeparatedList(llvm::function_ref<mlir::ParseResult ()>) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0xf9e8e3)
#14 0x0000562a60324c56 mlir::AsmParser::parseTypeList(llvm::SmallVectorImpl<mlir::Type>&) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0xf92c56)
#15 0x0000562a6032464a mlir::linalgx::PackOp::parse(mlir::OpAsmParser&, mlir::OperationState&) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0xf9264a)
realloc(): invalid pointer
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt /home/rengolin/devel/intel/tpp-sandbox/test/TPP/stdx/stdx-ops.mlir
1.	MLIR Parser: custom op parser 'func.func'
2.	MLIR Parser: custom op parser 'stdx.closure'
3.	MLIR Parser: custom op parser 'stdx.yield'
 #0 0x0000558b1e415194 PrintStackTraceSignalHandler(void*) Signals.cpp:0:0
 #1 0x0000558b1e41285b SignalHandler(int) Signals.cpp:0:0
 #2 0x00007fb423051a00 (/usr/lib/libc.so.6+0x38a00)
 #3 0x00007fb4230a164c (/usr/lib/libc.so.6+0x8864c)
 #4 0x00007fb423051958 raise (/usr/lib/libc.so.6+0x38958)
 #5 0x00007fb42303b53d abort (/usr/lib/libc.so.6+0x2253d)
 #6 0x00007fb4230957ee (/usr/lib/libc.so.6+0x7c7ee)
 #7 0x00007fb4230ab3dc (/usr/lib/libc.so.6+0x923dc)
 #8 0x00007fb4230b0094 __libc_realloc (/usr/lib/libc.so.6+0x97094)
 #9 0x0000558b1e3d7714 llvm::SmallVectorBase<unsigned int>::grow_pod(void*, unsigned long, unsigned long) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0x1852714)
#10 0x0000558b1ccfc443 llvm::SmallVectorTemplateBase<mlir::Type, true>::push_back(mlir::Type) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0x177443)
#11 0x0000558b1ccfc4cf mlir::ParseResult llvm::function_ref<mlir::ParseResult ()>::callback_fn<mlir::AsmParser::parseTypeList(llvm::SmallVectorImpl<mlir::Type>&)::'lambda'()>(long) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0x1774cf)
#12 0x0000558b1e23c8d8 mlir::detail::Parser::parseCommaSeparatedList(mlir::AsmParser::Delimiter, llvm::function_ref<mlir::ParseResult ()>, llvm::StringRef) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0x16b78d8)
#13 0x0000558b1db238e3 mlir::AsmParser::parseCommaSeparatedList(llvm::function_ref<mlir::ParseResult ()>) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0xf9e8e3)
#14 0x0000558b1db17c56 mlir::AsmParser::parseTypeList(llvm::SmallVectorImpl<mlir::Type>&) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0xf92c56)
#15 0x0000558b1db412cb mlir::stdx::YieldOp::parse(mlir::OpAsmParser&, mlir::OperationState&) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0xfbc2cb)
realloc(): invalid pointer
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt -transform-dialect-interpreter -verify-diagnostics -split-input-file /home/rengolin/devel/intel/tpp-sandbox/test/TPP/transform/transform-propagation.mlir
1.	MLIR Parser: custom op parser 'func.func'
2.	MLIR Parser: custom op parser 'linalgx.unpack'
 #0 0x000056196ab53194 PrintStackTraceSignalHandler(void*) Signals.cpp:0:0
 #1 0x000056196ab5085b SignalHandler(int) Signals.cpp:0:0
 #2 0x00007fcbb4e51a00 (/usr/lib/libc.so.6+0x38a00)
 #3 0x00007fcbb4ea164c (/usr/lib/libc.so.6+0x8864c)
 #4 0x00007fcbb4e51958 raise (/usr/lib/libc.so.6+0x38958)
 #5 0x00007fcbb4e3b53d abort (/usr/lib/libc.so.6+0x2253d)
 #6 0x00007fcbb4e957ee (/usr/lib/libc.so.6+0x7c7ee)
 #7 0x00007fcbb4eab3dc (/usr/lib/libc.so.6+0x923dc)
 #8 0x00007fcbb4eb0094 __libc_realloc (/usr/lib/libc.so.6+0x97094)
 #9 0x000056196ab15714 llvm::SmallVectorBase<unsigned int>::grow_pod(void*, unsigned long, unsigned long) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0x1852714)
#10 0x000056196943a443 llvm::SmallVectorTemplateBase<mlir::Type, true>::push_back(mlir::Type) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0x177443)
#11 0x000056196943a4cf mlir::ParseResult llvm::function_ref<mlir::ParseResult ()>::callback_fn<mlir::AsmParser::parseTypeList(llvm::SmallVectorImpl<mlir::Type>&)::'lambda'()>(long) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0x1774cf)
#12 0x000056196a97a8d8 mlir::detail::Parser::parseCommaSeparatedList(mlir::AsmParser::Delimiter, llvm::function_ref<mlir::ParseResult ()>, llvm::StringRef) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0x16b78d8)
#13 0x000056196a2618e3 mlir::AsmParser::parseCommaSeparatedList(llvm::function_ref<mlir::ParseResult ()>) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0xf9e8e3)
#14 0x000056196a255c56 mlir::AsmParser::parseTypeList(llvm::SmallVectorImpl<mlir::Type>&) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0xf92c56)
#15 0x000056196a259347 mlir::linalgx::UnPackOp::parse(mlir::OpAsmParser&, mlir::OperationState&) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0xf96347)

@rengolin
Copy link
Contributor

The problem is still the same. In addition, the generated parser for the assembly format parses everything and only crash when it gets to parsing the return type.

The reduced logic in it is:

SmallVector<Type> resultType;
...
parseResultType(resultType); // by reference
...
parseType(resultType.emplace_back()); // create and pass reference
...
SmallVectorBase::grow_pod(...)
...
NewElts = llvm::safe_realloc(this->BeginX, NewCapacity * TSize); <- fails

It did not reach parseType, as it failed on emplace_back. It seems to be that the only way this can fail (as explained earlier, realloc doesn't crash on allocation failure) is if the original pointer (this->BeginX) was nullptr to begin with.

Need to build the whole of LLVM in Debug mode (not just line info) to inspect the objects...

@adam-smnk
Copy link
Collaborator

For completion, just adding the error I have encountered.

My setup: WSL2 Ubuntu-20.04, LLVM compiled with the default Ubuntu clang 10.0.0, tpp-sandbox compiled with the default Ubuntu gcc 9.4.0.

The following four tests consistently end up throwing the above realloc(): invalid pointer errors:

  TPP/bufferize-pack-unpack.mlir
  TPP/pack-unpack-canonicalize.mlir
  TPP/stdx/stdx-ops.mlir
  TPP/transform/transform-propagation.mlir

Also, in my case all benchmarks produce stack dumps. For example, Benchmarks/simple-gemm.mlir dump:

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60..
FAIL: TPP_OPT :: Benchmarks/simple-gemm.mlir (69 of 71)
******************** TEST 'TPP_OPT :: Benchmarks/simple-gemm.mlir' FAILED ********************
Script:
--
: 'RUN: at line 1';   /home/asiemien/tpp-sandbox/build/bin/tpp-opt /home/asiemien/tpp-sandbox/test/Benchmarks/simple-gemm.mlir -map-linalg-to-tpp  -one-shot-bufferize="bufferize-function-boundaries allow-return-allocs function-boundary-type-conversion=identity-layout-map"  -drop-equivalent-buffer-results -finalizing-bufferize -canonicalize  -convert-linalg-to-tpp -convert-tpp-to-xsmm  -convert-xsmm-to-func |  /home/asiemien/tpp-sandbox/build/bin/tpp-run   -e entry -entry-point-result=void   -shared-libs=/home/asiemien/llvm-project/build/./lib/libmlir_c_runner_utils.so,/home/asiemien/tpp-sandbox/build/lib//libtpp_c_runner_utils.so |  /home/asiemien/llvm-project/build/bin/FileCheck /home/asiemien/tpp-sandbox/test/Benchmarks/simple-gemm.mlir
: 'RUN: at line 12';   /home/asiemien/tpp-sandbox/build/bin/tpp-opt /home/asiemien/tpp-sandbox/test/Benchmarks/simple-gemm.mlir -map-linalg-to-tpp  -one-shot-bufferize="bufferize-function-boundaries allow-return-allocs function-boundary-type-conversion=identity-layout-map"  -drop-equivalent-buffer-results -finalizing-bufferize -canonicalize  -convert-linalg-to-tpp | /home/asiemien/llvm-project/build/bin/FileCheck -check-prefix=TPP /home/asiemien/tpp-sandbox/test/Benchmarks/simple-gemm.mlir
--
Exit Code: 2

Command Output (stderr):
--
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /home/asiemien/tpp-sandbox/build/bin/tpp-run -e entry -entry-point-result=void -shared-libs=/home/asiemien/llvm-project/build/./lib/libmlir_c_runner_utils.so,/home/asiemien/tpp-sandbox/build/lib//libtpp_c_runner_utils.so
 #0 0x0000559cc3da24f3 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/asiemien/llvm-project/llvm/lib/Support/Unix/Signals.inc:569:13
 #1 0x0000559cc3da0790 llvm::sys::RunSignalHandlers() /home/asiemien/llvm-project/llvm/lib/Support/Signals.cpp:105:18
 #2 0x0000559cc3da2b3f SignalHandler(int) /home/asiemien/llvm-project/llvm/lib/Support/Unix/Signals.inc:407:1
 #3 0x00007f84ce8f5420 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14420)
 #4 0x0000559cc4da54d7 std::__uniq_ptr_impl<mlir::MLIRContextImpl, std::default_delete<mlir::MLIRContextImpl>>::_M_ptr() const /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/unique_ptr.h:154:42
 #5 0x0000559cc4da54d7 std::unique_ptr<mlir::MLIRContextImpl, std::default_delete<mlir::MLIRContextImpl>>::get() const /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/unique_ptr.h:361:21
 #6 0x0000559cc4da54d7 std::unique_ptr<mlir::MLIRContextImpl, std::default_delete<mlir::MLIRContextImpl>>::operator*() const /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/unique_ptr.h:347:10
 #7 0x0000559cc4da54d7 mlir::MLIRContext::getImpl() /home/asiemien/llvm-project/mlir/include/mlir/IR/MLIRContext.h:197:39
 #8 0x0000559cc4da54d7 mlir::RegisteredOperationName::lookup(llvm::StringRef, mlir::MLIRContext*) /home/asiemien/llvm-project/mlir/lib/IR/MLIRContext.cpp:750:21
 #9 0x0000559cc3d69bca mlir::RegisteredOperationName mlir::OpBuilder::getCheckRegisteredInfo<mlir::vector::PrintOp>(mlir::MLIRContext*) /home/asiemien/llvm-project/mlir/include/mlir/IR/Builders.h:443:5
#10 0x0000559cc3d69bca mlir::vector::PrintOp mlir::OpBuilder::create<mlir::vector::PrintOp, mlir::vector::TransferReadOp&>(mlir::Location, mlir::vector::TransferReadOp&) /home/asiemien/llvm-project/mlir/include/mlir/IR/Builders.h:458:20
#11 0x0000559cc3d69bca prepareMLIRKernel(mlir::Operation*) /home/asiemien/tpp-sandbox/build/../tpp-run/tpp-run.cpp:254:48
#12 0x0000559cc4dcda9d mlir::LogicalResult::failed() const /home/asiemien/llvm-project/mlir/include/mlir/Support/LogicalResult.h:44:33
#13 0x0000559cc4dcda9d mlir::failed(mlir::LogicalResult) /home/asiemien/llvm-project/mlir/include/mlir/Support/LogicalResult.h:72:58
#14 0x0000559cc4dcda9d mlir::JitRunnerMain(int, char**, mlir::DialectRegistry const&, mlir::JitRunnerConfig) /home/asiemien/llvm-project/mlir/lib/ExecutionEngine/JitRunner.cpp:368:9
#15 0x0000559cc3d133c7 std::vector<std::unique_ptr<mlir::DialectExtensionBase, std::default_delete<mlir::DialectExtensionBase>>, std::allocator<std::unique_ptr<mlir::DialectExtensionBase, std::default_delete<mlir::DialectExtensionBase>>>>::~vector() /usr/include/c++/9/bits/stl_vector.h:677:15
#16 0x0000559cc3d133c7 mlir::DialectRegistry::~DialectRegistry() /home/asiemien/llvm-project/mlir/include/mlir/IR/DialectRegistry.h:109:7
#17 0x0000559cc3d133c7 main /home/asiemien/tpp-sandbox/build/../tpp-run/tpp-run.cpp:287:19
#18 0x00007f84ce38f083 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24083)
#19 0x0000559cc3d64f1e _start (/home/asiemien/tpp-sandbox/build/bin/tpp-run+0x2dff1e)

Compiling tpp-sandbox with the same default clang 10.0.0 appears to address these problems as the errors are no longer present when running --target check-tpp-opt.
So, just sticking to only LLVM-based toolchain might be a good workaround/mitigation solution for now.

@rengolin
Copy link
Contributor

Ok, the benchmark one is a slightly different error, it's not in the parsing of the instruction, but in the tpp-run IR builder. Perhaps we should open a new issue for that one, as it might just be a bug in the code.

@rengolin
Copy link
Contributor

As a test of patience, when LLVM is build in Debug mode, the error doesn't happen, so I can't step through the code and will have to resort to printfs inside LLVM. Yay!

But this also tells me the bug is more likely in LLVM (as UB) or GCC (as an optimisation bug).

@rengolin rengolin removed their assignment Nov 15, 2022
@adam-smnk adam-smnk self-assigned this Nov 15, 2022
@nhasabni
Copy link
Contributor

Just to report my findings, I see following tests fail consistently in my setup:

Failed Tests (7):
  TPP_OPT :: Benchmarks/matmul_kernel_12x6x9.mlir
  TPP_OPT :: Benchmarks/matmul_kernel_48x64x96.mlir
  TPP_OPT :: Benchmarks/matmul_kernel_64x48x96.mlir
  TPP_OPT :: Benchmarks/matmul_kernel_64x64x64.mlir
  TPP_OPT :: Benchmarks/mlp_kernel.mlir
  TPP_OPT :: Benchmarks/simple-gemm.mlir
  TPP_OPT :: Benchmarks/simple_copy.mlir

The call stack looks same as what Adam shared.

I am using gcc-9.4.0 to compile LLVM and tap-sandbox.

@chelini
Copy link
Contributor Author

chelini commented Nov 16, 2022

I think these failures are because we do not register properly the operations in the jitter.

@rengolin
Copy link
Contributor

If you mean the benchmark failures, that's tracked by #158. Unfortunately, this issue got mixed up with those failures, but this one is about the SmallVector crashing on emplace_back.

@hfp
Copy link
Contributor

hfp commented Nov 30, 2022

Building the tpp-mlir with GCC goes smoothly. However, running tests crashes with realloc claiming an invalid pointer. This can have two causes:

  1. The pointer attempted to be reallocated does not originate from a compatible function like malloc.
  2. The pointer which was reallocated remains in use (inside of another data structure), and is from there on invalid. If that pointer is reallocated once more, the reallocation fails.

For the record, the following debugs into the issue: (enable-tpp, enable-all, or source enable-gdb):

gdb-oneapi --args /path/to/tpp-mlir/build/bin/tpp-opt -transform-dialect-interpreter -verify-diagnostics -split-input-file /path/to/tpp-mlir/test/TPP/transform/transform-propagation.mlir

Note: our standard environment needs a newer gdb, e.g., gdb-oneapi comes handy, and is aliased by above mentioned environment scripts, i.e., "gdb" refers to the improved version.

@rengolin
Copy link
Contributor

rengolin commented Jan 18, 2023

Just did a new test and here are the ops failing in the same way (parseCommaSeparatedList):

  • mlir::vnni::MatmulOp
  • mlir::perf::BenchOp

which have the following Variadic results (respectively):

  • let results = (outs Variadic<VNNIOperand>:$dest);
  • let results = (outs Variadic<AnyType>:$bodyResults);

Which is exactly the same problem as our pack and unpack, with:

  • let results = (outs Variadic<AnyShaped>:$results);

But it also fails on variadic arguments, which is the case for Perf but not VNNI.

However, the failure isn't when parsing the result types, but when parsing the argument types. And both argTypes and resultTypes are SmallVector<.., 1> which clearly has size = 1.

But if SmallVector is a stack object, and grow_pod uses realloc, then obviously the "pointer" wasn't allocated by malloc, since it's in the stack. Why is it using realloc in the first place?

@rengolin
Copy link
Contributor

Looking at grow_pod, it seems the behaviour is to use llvm::safe_malloc when it's the first allocation out of the stack, but llvm::safe_realloc otherwise (which makes sense).

The possible undefined behaviour here is that in the case of the emplace_back() inside a lambda (like parseCommaSeparatedList), it may get the condition (BeginX == FirstEl) wrong?

@rengolin
Copy link
Contributor

rengolin commented Mar 15, 2023

The only two ops that have this problem as of today is perf::BenchOp and vnni::MatMulOp, which are the only ones with variadic results, except for vnni::BRGEMM, so curious as to why the latter doesn't fail.

The problem seems to be when there actually is a return value.

This works:

func.func @vnni_dialect(%arg0: memref<4x256x512xbf16>,
                  %arg1: memref<4x256x1024x2xbf16>,
                  %arg2: memref<256x1024xbf16>,
                  %arg3: memref<512x2048x2xbf16>,
                  %arg4: memref<256x2048xbf16>)  {
  vnni.brgemm ins(%arg0 : memref<4x256x512xbf16>, %arg1 : memref<4x256x1024x2xbf16>) out(%arg2 : memref<256x1024xbf16>)
  vnni.matmul ins(%arg2: memref<256x1024xbf16>, %arg3: memref<512x2048x2xbf16>) out(%arg4: memref<256x2048xbf16>)

  return
}

While this crashes:

func.func @vnni_dialect(%arg0: memref<4x256x512xbf16>,
                  %arg1: memref<4x256x1024x2xbf16>,
                  %arg2: memref<256x1024xbf16>,
                  %arg3: memref<512x2048x2xbf16>,
                  %arg4: memref<256x2048xbf16>) -> memref<256x2048xbf16> {
  vnni.brgemm ins(%arg0 : memref<4x256x512xbf16>, %arg1 : memref<4x256x1024x2xbf16>) out(%arg2 : memref<256x1024xbf16>)
  %ret = vnni.matmul ins(%arg2: memref<256x1024xbf16>, %arg3: memref<512x2048x2xbf16>) out(%arg4: memref<256x2048xbf16>) -> memref<256x2048xbf16>

  return %ret :  memref<256x2048xbf16>
}

Same thing on tensors, so this is not a tensor vs memref issue, it's a parsing issue only.

This was referenced Mar 17, 2023
@adam-smnk adam-smnk removed their assignment Mar 17, 2023
@rengolin
Copy link
Contributor

Just tried on VNNI dialect and using Optiona<> instead of Variadic<> as a result type works with GCC.

This should be fine for most of our ops (tensor return vs memref outs), but it doesn't work for the current perf dialect's perf.bench op, which returns multiple values. There, we need a custom parser.

rengolin added a commit to rengolin/tpp-mlir that referenced this issue May 3, 2023
Adds a parser and printer for both bench and yield ops as parseTypeList
was crashing the parser with `grow_pod` on `emplace_back`. This is
likely an upstream problem that isn't being hit by upstream tests
because no one uses `Variadic<>` quite like we do with the inline
assembly (and TableGen code).

So we add a custom parser/printer for both (mainly stolen from TableGen
implementation, replacing parseTypeList with parseCommaSeparatedList and
a similar lambda. This takes care of the GCC code generation crash.

Fixes plaidml#45
rengolin added a commit to rengolin/tpp-mlir that referenced this issue May 3, 2023
rengolin added a commit to rengolin/tpp-mlir that referenced this issue May 3, 2023
Adds a parser and printer for both bench and yield ops as parseTypeList
was crashing the parser with `grow_pod` on `emplace_back`. This is
likely an upstream problem that isn't being hit by upstream tests
because no one uses `Variadic<>` quite like we do with the inline
assembly (and TableGen code).

So we add a custom parser/printer for both (mainly stolen from TableGen
implementation, replacing parseTypeList with parseCommaSeparatedList and
a similar lambda. This takes care of the GCC code generation crash.

Fixes plaidml#45
rengolin added a commit to rengolin/tpp-mlir that referenced this issue May 3, 2023
rengolin added a commit to rengolin/tpp-mlir that referenced this issue May 3, 2023
Adds a parser and printer for both bench and yield ops as parseTypeList
was crashing the parser with `grow_pod` on `emplace_back`. This is
likely an upstream problem that isn't being hit by upstream tests
because no one uses `Variadic<>` quite like we do with the inline
assembly (and TableGen code).

So we add a custom parser/printer for both (mainly stolen from TableGen
implementation, replacing parseTypeList with parseCommaSeparatedList and
a similar lambda. This takes care of the GCC code generation crash.

Fixes plaidml#45
rengolin added a commit to rengolin/tpp-mlir that referenced this issue May 3, 2023
rengolin added a commit that referenced this issue May 4, 2023
Adds a parser and printer for both bench and yield ops as parseTypeList
was crashing the parser with `grow_pod` on `emplace_back`. This is
likely an upstream problem that isn't being hit by upstream tests
because no one uses `Variadic<>` quite like we do with the inline
assembly (and TableGen code).

So we add a custom parser/printer for both (mainly stolen from TableGen
implementation, replacing parseTypeList with parseCommaSeparatedList and
a similar lambda. This takes care of the GCC code generation crash.

Fixes #45
rengolin added a commit that referenced this issue May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants