Standalone/stdx-ops.mlir crash #45

chelini · 2022-10-07T16:54:56Z

FAIL: STANDALONE_OPT :: Standalone/stdx-ops.mlir (64 of 64)
******************** TEST 'STANDALONE_OPT :: Standalone/stdx-ops.mlir' FAILED ********************
Script:
--
: 'RUN: at line 1';   /home/lorenzo/tpp-sandbox/build/bin/standalone-opt /home/lorenzo/tpp-sandbox/test/Standalone/stdx-ops.mlir | /home/lorenzo/llvm-project/build/bin/FileCheck /home/lorenzo/tpp-sandbox/test/Standalone/stdx-ops.mlir
--
Exit Code: 2

Command Output (stderr):
--
realloc(): invalid pointer
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /home/lorenzo/tpp-sandbox/build/bin/standalone-opt /home/lorenzo/tpp-sandbox/test/Standalone/stdx-ops.mlir
1.	MLIR Parser: custom op parser 'func.func'
2.	MLIR Parser: custom op parser 'stdx.closure'
3.	MLIR Parser: custom op parser 'stdx.yield'
 #0 0x000055cb0c56a79a llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/lorenzo/llvm-project/llvm/lib/Support/Unix/Signals.inc:569:11
 #1 0x000055cb0c56a94b PrintStackTraceSignalHandler(void*) /home/lorenzo/llvm-project/llvm/lib/Support/Unix/Signals.inc:636:1
 #2 0x000055cb0c568f96 llvm::sys::RunSignalHandlers() /home/lorenzo/llvm-project/llvm/lib/Support/Signals.cpp:104:5
 #3 0x000055cb0c56b075 SignalHandler(int) /home/lorenzo/llvm-project/llvm/lib/Support/Unix/Signals.inc:407:1
 #4 0x00007f44396af520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #5 0x00007f4439703a7c __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #6 0x00007f4439703a7c __pthread_kill_internal ./nptl/pthread_kill.c:78:10
 #7 0x00007f4439703a7c pthread_kill ./nptl/pthread_kill.c:89:10
 #8 0x00007f44396af476 gsignal ./signal/../sysdeps/posix/raise.c:27:6
 #9 0x00007f44396957f3 abort ./stdlib/abort.c:81:7
#10 0x00007f44396f66f6 __libc_message ./libio/../sysdeps/posix/libc_fatal.c:155:5
#11 0x00007f443970dd7c ./malloc/malloc.c:5668:3
#12 0x00007f4439712b2c __libc_realloc ./malloc/malloc.c:3444:7
#13 0x000055cb0c4b075d llvm::safe_realloc(void*, unsigned long) /home/lorenzo/llvm-project/llvm/include/llvm/Support/MemAlloc.h:53:9
#14 0x000055cb0c4b0e32 llvm::SmallVectorBase<unsigned int>::grow_pod(void*, unsigned long, unsigned long) /home/lorenzo/llvm-project/llvm/lib/Support/SmallVector.cpp:151:13
#15 0x000055cb06504368 llvm::SmallVectorTemplateCommon<mlir::Type, void>::grow_pod(unsigned long, unsigned long) /home/lorenzo/llvm-project/llvm/include/llvm/ADT/SmallVector.h:142:3
#16 0x000055cb06504322 llvm::SmallVectorTemplateBase<mlir::Type, true>::grow(unsigned long) /home/lorenzo/llvm-project/llvm/include/llvm/ADT/SmallVector.h:529:71
#17 0x000055cb0650fabc mlir::Type const* llvm::SmallVectorTemplateCommon<mlir::Type, void>::reserveForParamAndGetAddressImpl<llvm::SmallVectorTemplateBase<mlir::Type, true>>(llvm::SmallVectorTemplateBase<mlir::Type, true>*, mlir::Type const&, unsigned long) /home/lorenzo/llvm-project/llvm/include/llvm/ADT/SmallVector.h:247:12
#18 0x000055cb0650fa45 llvm::SmallVectorTemplateBase<mlir::Type, true>::reserveForParamAndGetAddress(mlir::Type&, unsigned long) /home/lorenzo/llvm-project/llvm/include/llvm/ADT/SmallVector.h:540:5
#19 0x000055cb064f73a6 llvm::SmallVectorTemplateBase<mlir::Type, true>::push_back(mlir::Type) /home/lorenzo/llvm-project/llvm/include/llvm/ADT/SmallVector.h:566:23
#20 0x000055cb0650e0ea mlir::Type& llvm::SmallVectorTemplateBase<mlir::Type, true>::growAndEmplaceBack<>() /home/lorenzo/llvm-project/llvm/include/llvm/ADT/SmallVector.h:560:5
#21 0x000055cb0650e06f mlir::Type& llvm::SmallVectorImpl<mlir::Type>::emplace_back<>() /home/lorenzo/llvm-project/llvm/include/llvm/ADT/SmallVector.h:943:7
#22 0x000055cb0650e010 mlir::AsmParser::parseTypeList(llvm::SmallVectorImpl<mlir::Type>&)::'lambda'()::operator()() const /home/lorenzo/llvm-project/mlir/include/mlir/IR/OpImplementation.h:1123:41
#23 0x000055cb0650dfd5 mlir::ParseResult llvm::function_ref<mlir::ParseResult ()>::callback_fn<mlir::AsmParser::parseTypeList(llvm::SmallVectorImpl<mlir::Type>&)::'lambda'()>(long) /home/lorenzo/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
#24 0x000055cb0c1846f9 llvm::function_ref<mlir::ParseResult ()>::operator()() const /home/lorenzo/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
#25 0x000055cb0c17125e mlir::detail::Parser::parseCommaSeparatedList(mlir::AsmParser::Delimiter, llvm::function_ref<mlir::ParseResult ()>, llvm::StringRef) /home/lorenzo/llvm-project/mlir/lib/AsmParser/Parser.cpp:103:7
#26 0x000055cb0c1890cf mlir::detail::AsmParserImpl<mlir::OpAsmParser>::parseCommaSeparatedList(mlir::AsmParser::Delimiter, llvm::function_ref<mlir::ParseResult ()>, llvm::StringRef) /home/lorenzo/llvm-project/mlir/lib/AsmParser/AsmParserImpl.h:291:19
#27 0x000055cb084c460c mlir::stdx::YieldOp::parse(mlir::OpAsmParser&, mlir::OperationState&) /home/lorenzo/tpp-sandbox/build/include/Standalone/Dialect/Stdx/StdxOps.cpp.inc:346:3

The text was updated successfully, but these errors were encountered:

rengolin · 2022-10-13T10:10:06Z

This does not seem to be a random crash. I have executed that line a dozen thousand times (on both my desktop and the cluster - head node and compute node) and I can never reproduce it. We're seeing this error on our CI, correct? Can you reproduce it locally, on your machine?

A quick look at the error messages show the pattern:

YieldOp tries to parse the operand types (there's one)
It does find it, and that's why it tries to emplace_back
Which then triggers growth, and calls safe_realloc, which fails.

realloc doesn't crash when allocation fails, it returns a null pointer. It only fails (UB) if the pointer being passed hasn't been allocated with malloc or if it has been freed already. So, this is not a memory problem, it seems to be a code problem.

Be we are not calling safe_realloc directly, not even manipulating the SmallVector, we're using bog-standard assemblyFormat, so this is probably an issue with MLIR parsing libraries, or worse, SmallVector. :(

rengolin · 2022-10-13T10:36:49Z

On YieldOp::parse, this is the effective code:

 ::llvm::SmallVector<::mlir::Type, 1> operandsTypes;
  if (parser.parseTypeList(operandsTypes))
    return ::mlir::failure();
  }

with parseTypeList as:

  ParseResult parseTypeList(SmallVectorImpl<Type> &result) {
      return parseCommaSeparatedList(
          [&]() { return parseType(result.emplace_back()); });
  }

So, the emplace_back() operate on a SmallVector being passed by reference to the lambda, which is called by parseCommaSeparatedList and should still be valid in memory when parseType() runs, and calls emplace_back().

I can see no fault in this code. :(

rengolin · 2022-10-17T08:06:45Z

Turns out it was miscompilation with ccache on @chelini's own machine. We can re-enable the tests.

hfp · 2022-10-17T14:28:36Z

Turns out it was miscompilation with ccache on @chelini's own machine. We can re-enable the tests.

Side-note: we do not use ccache for our tests since we manually control reusing data based on NFS-distributed FS. For PlaidML on the cluster, I specifically turned off ccache with ./configure --launcher="" [...]". Sharing configuration data from CMake based on NFS breaks compilation (let alone reuse based on ccache).

Problem was ccache on a local machine. Fixes plaidml#45

Problem was ccache on a local machine. Fixes #45

rengolin · 2022-11-09T12:30:21Z

This has happened on the cluster a while ago and is now happening on my machine as well, always when using GCC 12.2.

I don't use ccache, so this has nothing to do with that. I'm guessing either we or LLVM SmallVector does some undefined behaviour that the UBSAN doesn't catch. Or there's a bug in GCC 12.2.

We need to understand what's the problem, because most people will just try and build with the system compiler, which on Linux happens to be GCC.

rengolin · 2022-11-09T12:35:07Z

Spoke too soon, now I get the same problem with clang... I'll try to reduce some test.

rengolin · 2022-11-09T12:44:52Z

Before, it was crashing when parsing the yield op, now it also crashes when parsing the pack and unpack ops:

realloc(): invalid pointer
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt /home/rengolin/devel/intel/tpp-sandbox/test/TPP/bufferize-pack-unpack.mlir "-one-shot-bufferize=bufferize-function-boundaries allow-return-allocs function-boundary-type-conversion=identity-layout-map" -canonicalize -drop-equivalent-buffer-results -finalizing-bufferize
1.	MLIR Parser: custom op parser 'func.func'
2.	MLIR Parser: custom op parser 'linalgx.pack'
 #0 0x0000562a60c22194 PrintStackTraceSignalHandler(void*) Signals.cpp:0:0
 #1 0x0000562a60c1f85b SignalHandler(int) Signals.cpp:0:0
 #2 0x00007f0af9251a00 (/usr/lib/libc.so.6+0x38a00)
 #3 0x00007f0af92a164c (/usr/lib/libc.so.6+0x8864c)
 #4 0x00007f0af9251958 raise (/usr/lib/libc.so.6+0x38958)
 #5 0x00007f0af923b53d abort (/usr/lib/libc.so.6+0x2253d)
 #6 0x00007f0af92957ee (/usr/lib/libc.so.6+0x7c7ee)
 #7 0x00007f0af92ab3dc (/usr/lib/libc.so.6+0x923dc)
 #8 0x00007f0af92b0094 __libc_realloc (/usr/lib/libc.so.6+0x97094)
 #9 0x0000562a60be4714 llvm::SmallVectorBase<unsigned int>::grow_pod(void*, unsigned long, unsigned long) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0x1852714)
#10 0x0000562a5f509443 llvm::SmallVectorTemplateBase<mlir::Type, true>::push_back(mlir::Type) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0x177443)
#11 0x0000562a5f5094cf mlir::ParseResult llvm::function_ref<mlir::ParseResult ()>::callback_fn<mlir::AsmParser::parseTypeList(llvm::SmallVectorImpl<mlir::Type>&)::'lambda'()>(long) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0x1774cf)
#12 0x0000562a60a498d8 mlir::detail::Parser::parseCommaSeparatedList(mlir::AsmParser::Delimiter, llvm::function_ref<mlir::ParseResult ()>, llvm::StringRef) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0x16b78d8)
#13 0x0000562a603308e3 mlir::AsmParser::parseCommaSeparatedList(llvm::function_ref<mlir::ParseResult ()>) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0xf9e8e3)
#14 0x0000562a60324c56 mlir::AsmParser::parseTypeList(llvm::SmallVectorImpl<mlir::Type>&) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0xf92c56)
#15 0x0000562a6032464a mlir::linalgx::PackOp::parse(mlir::OpAsmParser&, mlir::OperationState&) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0xf9264a)

realloc(): invalid pointer
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt /home/rengolin/devel/intel/tpp-sandbox/test/TPP/stdx/stdx-ops.mlir
1.	MLIR Parser: custom op parser 'func.func'
2.	MLIR Parser: custom op parser 'stdx.closure'
3.	MLIR Parser: custom op parser 'stdx.yield'
 #0 0x0000558b1e415194 PrintStackTraceSignalHandler(void*) Signals.cpp:0:0
 #1 0x0000558b1e41285b SignalHandler(int) Signals.cpp:0:0
 #2 0x00007fb423051a00 (/usr/lib/libc.so.6+0x38a00)
 #3 0x00007fb4230a164c (/usr/lib/libc.so.6+0x8864c)
 #4 0x00007fb423051958 raise (/usr/lib/libc.so.6+0x38958)
 #5 0x00007fb42303b53d abort (/usr/lib/libc.so.6+0x2253d)
 #6 0x00007fb4230957ee (/usr/lib/libc.so.6+0x7c7ee)
 #7 0x00007fb4230ab3dc (/usr/lib/libc.so.6+0x923dc)
 #8 0x00007fb4230b0094 __libc_realloc (/usr/lib/libc.so.6+0x97094)
 #9 0x0000558b1e3d7714 llvm::SmallVectorBase<unsigned int>::grow_pod(void*, unsigned long, unsigned long) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0x1852714)
#10 0x0000558b1ccfc443 llvm::SmallVectorTemplateBase<mlir::Type, true>::push_back(mlir::Type) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0x177443)
#11 0x0000558b1ccfc4cf mlir::ParseResult llvm::function_ref<mlir::ParseResult ()>::callback_fn<mlir::AsmParser::parseTypeList(llvm::SmallVectorImpl<mlir::Type>&)::'lambda'()>(long) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0x1774cf)
#12 0x0000558b1e23c8d8 mlir::detail::Parser::parseCommaSeparatedList(mlir::AsmParser::Delimiter, llvm::function_ref<mlir::ParseResult ()>, llvm::StringRef) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0x16b78d8)
#13 0x0000558b1db238e3 mlir::AsmParser::parseCommaSeparatedList(llvm::function_ref<mlir::ParseResult ()>) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0xf9e8e3)
#14 0x0000558b1db17c56 mlir::AsmParser::parseTypeList(llvm::SmallVectorImpl<mlir::Type>&) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0xf92c56)
#15 0x0000558b1db412cb mlir::stdx::YieldOp::parse(mlir::OpAsmParser&, mlir::OperationState&) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0xfbc2cb)

realloc(): invalid pointer
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt -transform-dialect-interpreter -verify-diagnostics -split-input-file /home/rengolin/devel/intel/tpp-sandbox/test/TPP/transform/transform-propagation.mlir
1.	MLIR Parser: custom op parser 'func.func'
2.	MLIR Parser: custom op parser 'linalgx.unpack'
 #0 0x000056196ab53194 PrintStackTraceSignalHandler(void*) Signals.cpp:0:0
 #1 0x000056196ab5085b SignalHandler(int) Signals.cpp:0:0
 #2 0x00007fcbb4e51a00 (/usr/lib/libc.so.6+0x38a00)
 #3 0x00007fcbb4ea164c (/usr/lib/libc.so.6+0x8864c)
 #4 0x00007fcbb4e51958 raise (/usr/lib/libc.so.6+0x38958)
 #5 0x00007fcbb4e3b53d abort (/usr/lib/libc.so.6+0x2253d)
 #6 0x00007fcbb4e957ee (/usr/lib/libc.so.6+0x7c7ee)
 #7 0x00007fcbb4eab3dc (/usr/lib/libc.so.6+0x923dc)
 #8 0x00007fcbb4eb0094 __libc_realloc (/usr/lib/libc.so.6+0x97094)
 #9 0x000056196ab15714 llvm::SmallVectorBase<unsigned int>::grow_pod(void*, unsigned long, unsigned long) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0x1852714)
#10 0x000056196943a443 llvm::SmallVectorTemplateBase<mlir::Type, true>::push_back(mlir::Type) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0x177443)
#11 0x000056196943a4cf mlir::ParseResult llvm::function_ref<mlir::ParseResult ()>::callback_fn<mlir::AsmParser::parseTypeList(llvm::SmallVectorImpl<mlir::Type>&)::'lambda'()>(long) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0x1774cf)
#12 0x000056196a97a8d8 mlir::detail::Parser::parseCommaSeparatedList(mlir::AsmParser::Delimiter, llvm::function_ref<mlir::ParseResult ()>, llvm::StringRef) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0x16b78d8)
#13 0x000056196a2618e3 mlir::AsmParser::parseCommaSeparatedList(llvm::function_ref<mlir::ParseResult ()>) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0xf9e8e3)
#14 0x000056196a255c56 mlir::AsmParser::parseTypeList(llvm::SmallVectorImpl<mlir::Type>&) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0xf92c56)
#15 0x000056196a259347 mlir::linalgx::UnPackOp::parse(mlir::OpAsmParser&, mlir::OperationState&) (/home/rengolin/devel/intel/tpp-sandbox/build/bin/tpp-opt+0xf96347)

rengolin · 2022-11-11T12:48:29Z

The problem is still the same. In addition, the generated parser for the assembly format parses everything and only crash when it gets to parsing the return type.

The reduced logic in it is:

SmallVector<Type> resultType;
...
parseResultType(resultType); // by reference
...
parseType(resultType.emplace_back()); // create and pass reference
...
SmallVectorBase::grow_pod(...)
...
NewElts = llvm::safe_realloc(this->BeginX, NewCapacity * TSize); <- fails

It did not reach parseType, as it failed on emplace_back. It seems to be that the only way this can fail (as explained earlier, realloc doesn't crash on allocation failure) is if the original pointer (this->BeginX) was nullptr to begin with.

Need to build the whole of LLVM in Debug mode (not just line info) to inspect the objects...

adam-smnk · 2022-11-14T09:32:44Z

For completion, just adding the error I have encountered.

My setup: WSL2 Ubuntu-20.04, LLVM compiled with the default Ubuntu clang 10.0.0, tpp-sandbox compiled with the default Ubuntu gcc 9.4.0.

The following four tests consistently end up throwing the above realloc(): invalid pointer errors:

  TPP/bufferize-pack-unpack.mlir
  TPP/pack-unpack-canonicalize.mlir
  TPP/stdx/stdx-ops.mlir
  TPP/transform/transform-propagation.mlir

Also, in my case all benchmarks produce stack dumps. For example, Benchmarks/simple-gemm.mlir dump:

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60..
FAIL: TPP_OPT :: Benchmarks/simple-gemm.mlir (69 of 71)
******************** TEST 'TPP_OPT :: Benchmarks/simple-gemm.mlir' FAILED ********************
Script:
--
: 'RUN: at line 1';   /home/asiemien/tpp-sandbox/build/bin/tpp-opt /home/asiemien/tpp-sandbox/test/Benchmarks/simple-gemm.mlir -map-linalg-to-tpp  -one-shot-bufferize="bufferize-function-boundaries allow-return-allocs function-boundary-type-conversion=identity-layout-map"  -drop-equivalent-buffer-results -finalizing-bufferize -canonicalize  -convert-linalg-to-tpp -convert-tpp-to-xsmm  -convert-xsmm-to-func |  /home/asiemien/tpp-sandbox/build/bin/tpp-run   -e entry -entry-point-result=void   -shared-libs=/home/asiemien/llvm-project/build/./lib/libmlir_c_runner_utils.so,/home/asiemien/tpp-sandbox/build/lib//libtpp_c_runner_utils.so |  /home/asiemien/llvm-project/build/bin/FileCheck /home/asiemien/tpp-sandbox/test/Benchmarks/simple-gemm.mlir
: 'RUN: at line 12';   /home/asiemien/tpp-sandbox/build/bin/tpp-opt /home/asiemien/tpp-sandbox/test/Benchmarks/simple-gemm.mlir -map-linalg-to-tpp  -one-shot-bufferize="bufferize-function-boundaries allow-return-allocs function-boundary-type-conversion=identity-layout-map"  -drop-equivalent-buffer-results -finalizing-bufferize -canonicalize  -convert-linalg-to-tpp | /home/asiemien/llvm-project/build/bin/FileCheck -check-prefix=TPP /home/asiemien/tpp-sandbox/test/Benchmarks/simple-gemm.mlir
--
Exit Code: 2

Command Output (stderr):
--
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /home/asiemien/tpp-sandbox/build/bin/tpp-run -e entry -entry-point-result=void -shared-libs=/home/asiemien/llvm-project/build/./lib/libmlir_c_runner_utils.so,/home/asiemien/tpp-sandbox/build/lib//libtpp_c_runner_utils.so
 #0 0x0000559cc3da24f3 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/asiemien/llvm-project/llvm/lib/Support/Unix/Signals.inc:569:13
 #1 0x0000559cc3da0790 llvm::sys::RunSignalHandlers() /home/asiemien/llvm-project/llvm/lib/Support/Signals.cpp:105:18
 #2 0x0000559cc3da2b3f SignalHandler(int) /home/asiemien/llvm-project/llvm/lib/Support/Unix/Signals.inc:407:1
 #3 0x00007f84ce8f5420 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14420)
 #4 0x0000559cc4da54d7 std::__uniq_ptr_impl<mlir::MLIRContextImpl, std::default_delete<mlir::MLIRContextImpl>>::_M_ptr() const /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/unique_ptr.h:154:42
 #5 0x0000559cc4da54d7 std::unique_ptr<mlir::MLIRContextImpl, std::default_delete<mlir::MLIRContextImpl>>::get() const /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/unique_ptr.h:361:21
 #6 0x0000559cc4da54d7 std::unique_ptr<mlir::MLIRContextImpl, std::default_delete<mlir::MLIRContextImpl>>::operator*() const /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/unique_ptr.h:347:10
 #7 0x0000559cc4da54d7 mlir::MLIRContext::getImpl() /home/asiemien/llvm-project/mlir/include/mlir/IR/MLIRContext.h:197:39
 #8 0x0000559cc4da54d7 mlir::RegisteredOperationName::lookup(llvm::StringRef, mlir::MLIRContext*) /home/asiemien/llvm-project/mlir/lib/IR/MLIRContext.cpp:750:21
 #9 0x0000559cc3d69bca mlir::RegisteredOperationName mlir::OpBuilder::getCheckRegisteredInfo<mlir::vector::PrintOp>(mlir::MLIRContext*) /home/asiemien/llvm-project/mlir/include/mlir/IR/Builders.h:443:5
#10 0x0000559cc3d69bca mlir::vector::PrintOp mlir::OpBuilder::create<mlir::vector::PrintOp, mlir::vector::TransferReadOp&>(mlir::Location, mlir::vector::TransferReadOp&) /home/asiemien/llvm-project/mlir/include/mlir/IR/Builders.h:458:20
#11 0x0000559cc3d69bca prepareMLIRKernel(mlir::Operation*) /home/asiemien/tpp-sandbox/build/../tpp-run/tpp-run.cpp:254:48
#12 0x0000559cc4dcda9d mlir::LogicalResult::failed() const /home/asiemien/llvm-project/mlir/include/mlir/Support/LogicalResult.h:44:33
#13 0x0000559cc4dcda9d mlir::failed(mlir::LogicalResult) /home/asiemien/llvm-project/mlir/include/mlir/Support/LogicalResult.h:72:58
#14 0x0000559cc4dcda9d mlir::JitRunnerMain(int, char**, mlir::DialectRegistry const&, mlir::JitRunnerConfig) /home/asiemien/llvm-project/mlir/lib/ExecutionEngine/JitRunner.cpp:368:9
#15 0x0000559cc3d133c7 std::vector<std::unique_ptr<mlir::DialectExtensionBase, std::default_delete<mlir::DialectExtensionBase>>, std::allocator<std::unique_ptr<mlir::DialectExtensionBase, std::default_delete<mlir::DialectExtensionBase>>>>::~vector() /usr/include/c++/9/bits/stl_vector.h:677:15
#16 0x0000559cc3d133c7 mlir::DialectRegistry::~DialectRegistry() /home/asiemien/llvm-project/mlir/include/mlir/IR/DialectRegistry.h:109:7
#17 0x0000559cc3d133c7 main /home/asiemien/tpp-sandbox/build/../tpp-run/tpp-run.cpp:287:19
#18 0x00007f84ce38f083 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24083)
#19 0x0000559cc3d64f1e _start (/home/asiemien/tpp-sandbox/build/bin/tpp-run+0x2dff1e)

Compiling tpp-sandbox with the same default clang 10.0.0 appears to address these problems as the errors are no longer present when running --target check-tpp-opt.
So, just sticking to only LLVM-based toolchain might be a good workaround/mitigation solution for now.

rengolin · 2022-11-14T10:49:13Z

Ok, the benchmark one is a slightly different error, it's not in the parsing of the instruction, but in the tpp-run IR builder. Perhaps we should open a new issue for that one, as it might just be a bug in the code.

rengolin · 2022-11-15T10:16:43Z

As a test of patience, when LLVM is build in Debug mode, the error doesn't happen, so I can't step through the code and will have to resort to printfs inside LLVM. Yay!

But this also tells me the bug is more likely in LLVM (as UB) or GCC (as an optimisation bug).

nhasabni · 2022-11-15T23:22:17Z

Just to report my findings, I see following tests fail consistently in my setup:

Failed Tests (7):
  TPP_OPT :: Benchmarks/matmul_kernel_12x6x9.mlir
  TPP_OPT :: Benchmarks/matmul_kernel_48x64x96.mlir
  TPP_OPT :: Benchmarks/matmul_kernel_64x48x96.mlir
  TPP_OPT :: Benchmarks/matmul_kernel_64x64x64.mlir
  TPP_OPT :: Benchmarks/mlp_kernel.mlir
  TPP_OPT :: Benchmarks/simple-gemm.mlir
  TPP_OPT :: Benchmarks/simple_copy.mlir

The call stack looks same as what Adam shared.

I am using gcc-9.4.0 to compile LLVM and tap-sandbox.

chelini · 2022-11-16T08:08:31Z

I think these failures are because we do not register properly the operations in the jitter.

rengolin · 2022-11-16T11:52:36Z

If you mean the benchmark failures, that's tracked by #158. Unfortunately, this issue got mixed up with those failures, but this one is about the SmallVector crashing on emplace_back.

hfp · 2022-11-30T11:54:15Z

Building the tpp-mlir with GCC goes smoothly. However, running tests crashes with realloc claiming an invalid pointer. This can have two causes:

The pointer attempted to be reallocated does not originate from a compatible function like malloc.
The pointer which was reallocated remains in use (inside of another data structure), and is from there on invalid. If that pointer is reallocated once more, the reallocation fails.

For the record, the following debugs into the issue: (enable-tpp, enable-all, or source enable-gdb):

gdb-oneapi --args /path/to/tpp-mlir/build/bin/tpp-opt -transform-dialect-interpreter -verify-diagnostics -split-input-file /path/to/tpp-mlir/test/TPP/transform/transform-propagation.mlir

Note: our standard environment needs a newer gdb, e.g., gdb-oneapi comes handy, and is aliased by above mentioned environment scripts, i.e., "gdb" refers to the improved version.

rengolin · 2023-01-18T11:11:59Z

Just did a new test and here are the ops failing in the same way (parseCommaSeparatedList):

mlir::vnni::MatmulOp
mlir::perf::BenchOp

which have the following Variadic results (respectively):

let results = (outs Variadic<VNNIOperand>:$dest);
let results = (outs Variadic<AnyType>:$bodyResults);

Which is exactly the same problem as our pack and unpack, with:

let results = (outs Variadic<AnyShaped>:$results);

But it also fails on variadic arguments, which is the case for Perf but not VNNI.

However, the failure isn't when parsing the result types, but when parsing the argument types. And both argTypes and resultTypes are SmallVector<.., 1> which clearly has size = 1.

But if SmallVector is a stack object, and grow_pod uses realloc, then obviously the "pointer" wasn't allocated by malloc, since it's in the stack. Why is it using realloc in the first place?

rengolin · 2023-01-18T12:19:48Z

Looking at grow_pod, it seems the behaviour is to use llvm::safe_malloc when it's the first allocation out of the stack, but llvm::safe_realloc otherwise (which makes sense).

The possible undefined behaviour here is that in the case of the emplace_back() inside a lambda (like parseCommaSeparatedList), it may get the condition (BeginX == FirstEl) wrong?

rengolin · 2023-03-15T13:02:01Z

The only two ops that have this problem as of today is perf::BenchOp and vnni::MatMulOp, which are the only ones with variadic results, except for vnni::BRGEMM, so curious as to why the latter doesn't fail.

The problem seems to be when there actually is a return value.

This works:

func.func @vnni_dialect(%arg0: memref<4x256x512xbf16>,
                  %arg1: memref<4x256x1024x2xbf16>,
                  %arg2: memref<256x1024xbf16>,
                  %arg3: memref<512x2048x2xbf16>,
                  %arg4: memref<256x2048xbf16>)  {
  vnni.brgemm ins(%arg0 : memref<4x256x512xbf16>, %arg1 : memref<4x256x1024x2xbf16>) out(%arg2 : memref<256x1024xbf16>)
  vnni.matmul ins(%arg2: memref<256x1024xbf16>, %arg3: memref<512x2048x2xbf16>) out(%arg4: memref<256x2048xbf16>)

  return
}

While this crashes:

func.func @vnni_dialect(%arg0: memref<4x256x512xbf16>,
                  %arg1: memref<4x256x1024x2xbf16>,
                  %arg2: memref<256x1024xbf16>,
                  %arg3: memref<512x2048x2xbf16>,
                  %arg4: memref<256x2048xbf16>) -> memref<256x2048xbf16> {
  vnni.brgemm ins(%arg0 : memref<4x256x512xbf16>, %arg1 : memref<4x256x1024x2xbf16>) out(%arg2 : memref<256x1024xbf16>)
  %ret = vnni.matmul ins(%arg2: memref<256x1024xbf16>, %arg3: memref<512x2048x2xbf16>) out(%arg4: memref<256x2048xbf16>) -> memref<256x2048xbf16>

  return %ret :  memref<256x2048xbf16>
}

Same thing on tensors, so this is not a tensor vs memref issue, it's a parsing issue only.

rengolin · 2023-03-28T12:51:16Z

Just tried on VNNI dialect and using Optiona<> instead of Variadic<> as a result type works with GCC.

This should be fine for most of our ops (tensor return vs memref outs), but it doesn't work for the current perf dialect's perf.bench op, which returns multiple values. There, we need a custom parser.

Adds a parser and printer for both bench and yield ops as parseTypeList was crashing the parser with `grow_pod` on `emplace_back`. This is likely an upstream problem that isn't being hit by upstream tests because no one uses `Variadic<>` quite like we do with the inline assembly (and TableGen code). So we add a custom parser/printer for both (mainly stolen from TableGen implementation, replacing parseTypeList with parseCommaSeparatedList and a similar lambda. This takes care of the GCC code generation crash. Fixes plaidml#45

Adds a parser and printer for both bench and yield ops as parseTypeList was crashing the parser with `grow_pod` on `emplace_back`. This is likely an upstream problem that isn't being hit by upstream tests because no one uses `Variadic<>` quite like we do with the inline assembly (and TableGen code). So we add a custom parser/printer for both (mainly stolen from TableGen implementation, replacing parseTypeList with parseCommaSeparatedList and a similar lambda. This takes care of the GCC code generation crash. Fixes #45

chelini assigned rengolin Oct 7, 2022

rengolin mentioned this issue Oct 7, 2022

exclude test #46

Merged

rengolin added a commit to rengolin/tpp-mlir that referenced this issue Oct 17, 2022

Re-enable stdx-ops test

071dbf2

Problem was ccache on a local machine. Fixes plaidml#45

rengolin mentioned this issue Oct 17, 2022

Re-enable stdx-ops test #73

Merged

rengolin closed this as completed in #73 Oct 17, 2022

rengolin added a commit that referenced this issue Oct 17, 2022

Re-enable stdx-ops test (#73)

173ff61

Problem was ccache on a local machine. Fixes #45

rengolin reopened this Nov 9, 2022

rengolin added the bug Something isn't working label Nov 9, 2022

rengolin mentioned this issue Nov 14, 2022

Crash when building vector print in tpp-run #158

Closed

rengolin removed their assignment Nov 15, 2022

adam-smnk self-assigned this Nov 15, 2022

This was referenced Dec 5, 2022

Move our use of pack/unpack to upstream tensor #201

Closed

Reduce diff between upstream pack/unpack and in-tree ones #204

Merged

This was referenced Mar 17, 2023

Implement perf.bench parser/printer #406

Closed

Remove VNNI dialect #389

Closed

adam-smnk removed their assignment Mar 17, 2023

rengolin added a commit to rengolin/tpp-mlir that referenced this issue May 3, 2023

Enable GCC tests after fixing plaidml#45

8bcec7e

rengolin mentioned this issue May 3, 2023

Parser/printer for perf ops #531

Merged

rengolin added a commit to rengolin/tpp-mlir that referenced this issue May 3, 2023

Enable GCC tests after fixing plaidml#45

e17e7c5

rengolin added a commit to rengolin/tpp-mlir that referenced this issue May 3, 2023

Enable GCC tests after fixing plaidml#45

5072804

rengolin closed this as completed in #531 May 4, 2023

rengolin added a commit that referenced this issue May 4, 2023

Enable GCC tests after fixing #45

1a1d6f6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standalone/stdx-ops.mlir crash #45

Standalone/stdx-ops.mlir crash #45

chelini commented Oct 7, 2022

rengolin commented Oct 13, 2022

rengolin commented Oct 13, 2022

rengolin commented Oct 17, 2022

hfp commented Oct 17, 2022

rengolin commented Nov 9, 2022 •

edited

Loading

rengolin commented Nov 9, 2022

rengolin commented Nov 9, 2022 •

edited

Loading

rengolin commented Nov 11, 2022

adam-smnk commented Nov 14, 2022

rengolin commented Nov 14, 2022

rengolin commented Nov 15, 2022

nhasabni commented Nov 15, 2022

chelini commented Nov 16, 2022

rengolin commented Nov 16, 2022

hfp commented Nov 30, 2022

rengolin commented Jan 18, 2023 •

edited

Loading

rengolin commented Jan 18, 2023

rengolin commented Mar 15, 2023 •

edited

Loading

rengolin commented Mar 28, 2023

Standalone/stdx-ops.mlir crash #45

Standalone/stdx-ops.mlir crash #45

Comments

chelini commented Oct 7, 2022

rengolin commented Oct 13, 2022

rengolin commented Oct 13, 2022

rengolin commented Oct 17, 2022

hfp commented Oct 17, 2022

rengolin commented Nov 9, 2022 • edited Loading

rengolin commented Nov 9, 2022

rengolin commented Nov 9, 2022 • edited Loading

rengolin commented Nov 11, 2022

adam-smnk commented Nov 14, 2022

rengolin commented Nov 14, 2022

rengolin commented Nov 15, 2022

nhasabni commented Nov 15, 2022

chelini commented Nov 16, 2022

rengolin commented Nov 16, 2022

hfp commented Nov 30, 2022

rengolin commented Jan 18, 2023 • edited Loading

rengolin commented Jan 18, 2023

rengolin commented Mar 15, 2023 • edited Loading

rengolin commented Mar 28, 2023

rengolin commented Nov 9, 2022 •

edited

Loading

rengolin commented Nov 9, 2022 •

edited

Loading

rengolin commented Jan 18, 2023 •

edited

Loading

rengolin commented Mar 15, 2023 •

edited

Loading