Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

realloc(): invalid pointer when parsing when Triton is built with clang #2398

Closed
jlebar opened this issue Sep 27, 2023 · 25 comments
Closed

realloc(): invalid pointer when parsing when Triton is built with clang #2398

jlebar opened this issue Sep 27, 2023 · 25 comments
Assignees

Comments

@jlebar
Copy link
Contributor

jlebar commented Sep 27, 2023

This is an old issue that people ran into a few months ago on Slack. I'm now hitting it and wanted a place to record the debugging I'm doing.

Steps to reproduce:

$ clang --version
clang --version
Debian clang version 14.0.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

$ ld.lld --version
Debian LLD 14.0.6 (compatible with GNU linkers)

$ rm -rf python/build
$ TRITON_BUILD_WITH_CLANG_LLD=true pip install -e python --no-build-isolation
$ cat >>EOF > /tmp/test.mlir
module {
  tt.func public @load_reduce_kernel_0d1d2de3c4c(%arg0: !tt.ptr<f16, 1> {tt.divisibility = 16 : i32}, %arg1: !tt.ptr<f16, 1> {tt.divisibility = 16 : i32}, %arg2: i32 {tt.divisibility = 16 : i32, tt.max_divisibility = 8 : i32}) attributes {noinline = false} {
    %0 = arith.extsi %arg2 : i32 to i64
    %1 = tt.make_range {end = 128 : i32, start = 0 : i32} : tensor<128xi32>
    %2 = arith.extsi %1 : tensor<128xi32> to tensor<128xi64>
    %3 = tt.expand_dims %2 {axis = 1 : i32} : (tensor<128xi64>) -> tensor<128x1xi64>
    %4 = tt.splat %0 : (i64) -> tensor<128x1xi64>
    %5 = arith.muli %3, %4 : tensor<128x1xi64>
    %6 = tt.splat %arg0 : (!tt.ptr<f16, 1>) -> tensor<128x1x!tt.ptr<f16, 1>>
    %7 = tt.addptr %6, %5 : tensor<128x1x!tt.ptr<f16, 1>>, tensor<128x1xi64>
    %8 = tt.broadcast %7 : (tensor<128x1x!tt.ptr<f16, 1>>) -> tensor<128x64x!tt.ptr<f16, 1>>
    %9 = tt.make_range {end = 64 : i32, start = 0 : i32} : tensor<64xi32>
    %10 = arith.extsi %9 : tensor<64xi32> to tensor<64xi64>
    %11 = tt.expand_dims %10 {axis = 0 : i32} : (tensor<64xi64>) -> tensor<1x64xi64>
    %12 = tt.broadcast %11 : (tensor<1x64xi64>) -> tensor<128x64xi64>
    %13 = tt.addptr %8, %12 : tensor<128x64x!tt.ptr<f16, 1>>, tensor<128x64xi64>
    %14 = tt.load %13 {cache = 1 : i32, evict = 1 : i32, isVolatile = false} : tensor<128x64xf16>
    %15 = arith.extf %14 : tensor<128x64xf16> to tensor<128x64xf32>
    %16 = "tt.reduce"(%15) <{axis = 1 : i32}> ({
    ^bb0(%arg3: f32, %arg4: f32):
      %20 = arith.maxf %arg3, %arg4 : f32
      tt.reduce.return %20 : f32
    }) : (tensor<128x64xf32>) -> tensor<128xf32>
    %17 = tt.splat %arg1 : (!tt.ptr<f16, 1>) -> tensor<128x!tt.ptr<f16, 1>>
    %18 = tt.addptr %17, %1 : tensor<128x!tt.ptr<f16, 1>>, tensor<128xi32>
    %19 = arith.truncf %16 : tensor<128xf32> to tensor<128xf16>
    tt.store %18, %19 {cache = 1 : i32, evict = 1 : i32} : tensor<128xf16>
    tt.return
  }
}
EOF

$ python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt /tmp/test.mlir -split-input-file -canonicalize -triton-combine

This outputs the following (flakily -- sometimes it exits successfully). I think I'm getting a stacktrace now because I installed llvm-symbolize. Previously I just got realloc(): invalid pointer.

realloc(): invalid pointer
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt /usr/local/google/home/jlebar/code/triton/test/Triton/reduce.mlir -split-input-file -canonicalize -triton-combine
1.      MLIR Parser: custom op parser 'builtin.module'
2.      MLIR Parser: custom op parser 'tt.func'
3.      MLIR Parser: custom op parser 'tt.reduce.return'
 #0 0x0000564e688b76bb llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt+0x55056bb)
 #1 0x0000564e688b5514 SignalHandler(int) Signals.cpp:0:0
 #2 0x00007f6c8383a540 (/lib/x86_64-linux-gnu/libc.so.6+0x3c540)
 #3 0x00007f6c8388812c (/lib/x86_64-linux-gnu/libc.so.6+0x8a12c)
 #4 0x00007f6c8383a4a2 raise (/lib/x86_64-linux-gnu/libc.so.6+0x3c4a2)
 #5 0x00007f6c838244b2 abort (/lib/x86_64-linux-gnu/libc.so.6+0x264b2)
 #6 0x00007f6c838251ed (/lib/x86_64-linux-gnu/libc.so.6+0x271ed)
 #7 0x00007f6c83891aa5 (/lib/x86_64-linux-gnu/libc.so.6+0x93aa5)
 #8 0x00007f6c8389674c __libc_realloc (/lib/x86_64-linux-gnu/libc.so.6+0x9874c)
 #9 0x0000564e6886b7bf llvm::SmallVectorBase<unsigned int>::grow_pod(void*, unsigned long, unsigned long) (/usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt+0x54b97bf)
#10 0x0000564e650a0263 llvm::SmallVectorTemplateBase<mlir::Type, true>::push_back(mlir::Type) AffineOps.cpp:0:0
#11 0x0000564e650a02cf mlir::ParseResult llvm::function_ref<mlir::ParseResult ()>::callback_fn<mlir::AsmParser::parseTypeList(llvm::SmallVectorImpl<mlir::Type>&)::'lambda'()>(long) AffineOps.cpp:0:0
#12 0x0000564e686465ea mlir::detail::Parser::parseCommaSeparatedList(mlir::AsmParser::Delimiter, llvm::function_ref<mlir::ParseResult ()>, llvm::StringRef) (/usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt+0x52945ea)
#13 0x0000564e65eafb34 mlir::LogicalResult::failed() const /usr/local/google/home/jlebar/.triton/llvm/llvm+mlir-17.0.0-x86_64-linux-gnu-ubuntu-18.04-release/include/mlir/Support/LogicalResult.h:44:33
#14 0x0000564e65eafb34 mlir::ParseResult::operator bool() const /usr/local/google/home/jlebar/.triton/llvm/llvm+mlir-17.0.0-x86_64-linux-gnu-ubuntu-18.04-release/include/mlir/Support/LogicalResult.h:126:43
#15 0x0000564e65eafb34 mlir::triton::ReduceReturnOp::parse(mlir::OpAsmParser&, mlir::OperationState&) /usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/include/triton/Dialect/Triton/IR/Ops.cpp.inc:9362:7
#16 0x0000564e686543e1 (anonymous namespace)::OperationParser::parseOperation() Parser.cpp:0:0
#17 0x0000564e68657b85 (anonymous namespace)::OperationParser::parseBlock(mlir::Block*&) (.part.0) Parser.cpp:0:0
#18 0x0000564e686581a5 (anonymous namespace)::OperationParser::parseRegionBody(mlir::Region&, llvm::SMLoc, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) Parser.cpp:0:0
#19 0x0000564e68658611 (anonymous namespace)::OperationParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) Parser.cpp:0:0
#20 0x0000564e68653325 (anonymous namespace)::OperationParser::parseGenericOperationAfterOpName(mlir::OperationState&, std::optional<llvm::ArrayRef<mlir::OpAsmParser::UnresolvedOperand>>, std::optional<llvm::ArrayRef<mlir::Block*>>, std::optional<llvm::MutableArrayRef<std::unique_ptr<mlir::Region, std::default_delete<mlir::Region>>>>, std::optional<llvm::ArrayRef<mlir::NamedAttribute>>, std::optional<mlir::Attribute>, std::optional<mlir::FunctionType>) (.isra.0) Parser.cpp:0:0
#21 0x0000564e68653aa6 (anonymous namespace)::OperationParser::parseGenericOperation() Parser.cpp:0:0
#22 0x0000564e686546b0 (anonymous namespace)::OperationParser::parseOperation() Parser.cpp:0:0
#23 0x0000564e68657f2d (anonymous namespace)::OperationParser::parseRegionBody(mlir::Region&, llvm::SMLoc, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) Parser.cpp:0:0
#24 0x0000564e68658611 (anonymous namespace)::OperationParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) Parser.cpp:0:0
#25 0x0000564e686587c5 (anonymous namespace)::CustomOpAsmParser::parseOptionalRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) Parser.cpp:0:0
#26 0x0000564e687c4322 mlir::function_interface_impl::parseFunctionOp(mlir::OpAsmParser&, mlir::OperationState&, bool, mlir::StringAttr, llvm::function_ref<mlir::Type (mlir::Builder&, llvm::ArrayRef<mlir::Type>, llvm::ArrayRef<mlir::Type>, mlir::function_interface_impl::VariadicFlag, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>&)>, mlir::StringAttr, mlir::StringAttr) (/usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt+0x5412322)
#27 0x0000564e65eba190 mlir::triton::FuncOp::parse(mlir::OpAsmParser&, mlir::OperationState&) /usr/local/google/home/jlebar/code/triton/lib/Dialect/Triton/IR/Ops.cpp:824:10
#28 0x0000564e686543e1 (anonymous namespace)::OperationParser::parseOperation() Parser.cpp:0:0
#29 0x0000564e68657f2d (anonymous namespace)::OperationParser::parseRegionBody(mlir::Region&, llvm::SMLoc, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) Parser.cpp:0:0
#30 0x0000564e68658611 (anonymous namespace)::OperationParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) Parser.cpp:0:0
#31 0x0000564e686586a1 (anonymous namespace)::CustomOpAsmParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) Parser.cpp:0:0
#32 0x0000564e68781bda mlir::ModuleOp::parse(mlir::OpAsmParser&, mlir::OperationState&) (/usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt+0x53cfbda)
#33 0x0000564e686543e1 (anonymous namespace)::OperationParser::parseOperation() Parser.cpp:0:0
#34 0x0000564e68655c4a mlir::parseAsmSourceFile(llvm::SourceMgr const&, mlir::Block*, mlir::ParserConfig const&, mlir::AsmParserState*, mlir::AsmParserCodeCompleteContext*) (/usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt+0x52a3c4a)
#35 0x0000564e68614934 mlir::parseSourceFile(std::shared_ptr<llvm::SourceMgr> const&, mlir::Block*, mlir::ParserConfig const&, mlir::LocationAttr*) (/usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt+0x5262934)
#36 0x0000564e665654f0 performActions(llvm::raw_ostream&, std::shared_ptr<llvm::SourceMgr> const&, mlir::MLIRContext*, mlir::MlirOptMainConfig const&) MlirOptMain.cpp:0:0
#37 0x0000564e665663b5 processBuffer(llvm::raw_ostream&, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, mlir::MlirOptMainConfig const&, mlir::DialectRegistry&, llvm::ThreadPool*) MlirOptMain.cpp:0:0
#38 0x0000564e665664a0 mlir::LogicalResult llvm::function_ref<mlir::LogicalResult (std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>::callback_fn<mlir::MlirOptMain(llvm::raw_ostream&, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&)::'lambda'(std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>(long, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&) MlirOptMain.cpp:0:0
#39 0x0000564e6881624e mlir::splitAndProcessBuffer(std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<mlir::LogicalResult (std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, bool, bool)::'lambda'(llvm::StringRef)::operator()(llvm::StringRef) const ToolUtilities.cpp:0:0
#40 0x0000564e68816866 mlir::splitAndProcessBuffer(std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<mlir::LogicalResult (std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, bool, bool) (/usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt+0x5464866)
#41 0x0000564e665643a3 mlir::MlirOptMain(llvm::raw_ostream&, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&) (/usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt+0x31b23a3)
#42 0x0000564e665667d3 mlir::MlirOptMain(int, char**, llvm::StringRef, mlir::DialectRegistry&) (/usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt+0x31b47d3)
#43 0x0000564e64fbe0eb main /usr/local/google/home/jlebar/code/triton/bin/triton-opt.cpp:9:33
#44 0x00007f6c838256ca (/lib/x86_64-linux-gnu/libc.so.6+0x276ca)
#45 0x00007f6c83825785 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x27785)
#46 0x0000564e64fbdfe1 _start (/usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt+0x1c0bfe1)

I got the test MLIR by running one of the Triton Python tests with a patch to output the temp file that's created with Triton MLIR.

$ git diff
diff --git a/python/src/triton.cc b/python/src/triton.cc
index 0068a23f8..49c6639a5 100644
--- a/python/src/triton.cc
+++ b/python/src/triton.cc
@@ -552,6 +552,7 @@ void init_triton_ir(py::module &&m) {
       [](const std::string &inputFilename, mlir::MLIRContext &context) {
         // initialize registry
         // note: we initialize llvm for undef
+        std::cerr << "parse_mlir_module: " << inputFilename << std::endl;
         mlir::DialectRegistry registry;
         registry.insert<
             mlir::triton::TritonDialect, mlir::triton::gpu::TritonGPUDialect,
@@ -564,6 +565,7 @@ void init_triton_ir(py::module &&m) {
         context.loadAllAvailableDialects();

         // parse module
+        std::cerr << "Calling mlir parseSourceFile." << std::endl;
         mlir::OwningOpRef<mlir::ModuleOp> module =
             mlir::parseSourceFile<mlir::ModuleOp>(inputFilename, &context);
         if (!module)
$ python -m pytest python/test/unit/hopper/test_mixed_io.py -k load_reduce --verbose -s
# prints a temp file it wrote the Triton IR to.
@jlebar
Copy link
Contributor Author

jlebar commented Sep 27, 2023

(@ThomasRaoux feel free to assign this to me.)

@jlebar
Copy link
Contributor Author

jlebar commented Sep 27, 2023

This really looks like a bug in the MLIR parser to me. There isn't any Triton code in here.

I wonder if this is fixed at LLVM HEAD and we're just on a bad LLVM revision?

@jlebar
Copy link
Contributor Author

jlebar commented Sep 27, 2023

Oh, I take it back, it even says it: MLIR Parser: custom op parser 'tt.reduce.return'

@jlebar jlebar changed the title Segfault in parsing when Triton is built with clang realloc(): invalid pointer when parsing when Triton is built with clang Sep 27, 2023
@joker-eph
Copy link
Contributor

Oh, I take it back, it even says it: MLIR Parser: custom op parser 'tt.reduce.return'

The parse is auto-generated from a declarative description in TableGen. That's "not allowed to crash" in theory ;) so still a MLIR bug maybe (unless there is some memory corruption before getting there, did you try running with sanitizers?)

@jlebar
Copy link
Contributor Author

jlebar commented Sep 27, 2023

Aha, yup, when I build llvm with asan, it indicates its displeasure.

$ ASAN_OPTIONS=detect_odr_violation=0 lit -vvv python/build/cmake.linux-x86_64-cpython-3.11/test/Triton/reduce.mlir
-- Testing: 1 tests, 1 workers --
FAIL: TRITON :: Triton/reduce.mlir (1 of 1)
******************** TEST 'TRITON :: Triton/reduce.mlir' FAILED ********************
Script:
--
: 'RUN: at line 1';   MSAN_OPTIONS=abort_on_error=0 /usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt /usr/local/google/home/jlebar/code/triton/test/Triton/reduce.mlir -split-input-file -canonicalize -triton-combine | FileCheck /usr/local/google/home/jlebar/code/triton/test/Triton/reduce.mlir
--
Exit Code: 2

Command Output (stderr):
--
error: no check strings found with prefix 'CHECK:'
=================================================================
==2817286==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffe6bdfb750 at pc 0x559e3d547a39 bp 0x7ffe6bdfadf0 sp 0x7ffe6bdfade8
READ of size 8 at 0x7ffe6bdfb750 thread T0
    #0 0x559e3d547a38 in mlir::ParseResult llvm::function_ref<mlir::ParseResult ()>::callback_fn<mlir::AsmParser::parseTypeList(llvm::SmallVectorImpl<mlir::Type>&)::'lambda'()>(long) AffineOps.cpp
    #1 0x559e4fddb878 in mlir::detail::Parser::parseCommaSeparatedList(mlir::AsmParser::Delimiter, llvm::function_ref<mlir::ParseResult ()>, llvm::StringRef) (/usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt+0x19bc3878) (BuildId: 1c4e980b97f8f8cb)
    #2 0x559e43003fc4 in mlir::AsmParser::parseCommaSeparatedList(llvm::function_ref<mlir::ParseResult ()>) /usr/local/google/home/jlebar/code/llvm/mlir/include/mlir/IR/OpImplementation.h:710:12
    #3 0x559e43003fc4 in mlir::AsmParser::parseTypeList(llvm::SmallVectorImpl<mlir::Type>&) /usr/local/google/home/jlebar/code/llvm/mlir/include/mlir/IR/OpImplementation.h:1176:12
    #4 0x559e43003fc4 in mlir::triton::ReduceReturnOp::parse(mlir::OpAsmParser&, mlir::OperationState&) /usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/include/triton/Dialect/Triton/IR/Ops.cpp.inc:9362:14
    #5 0x559e4fdc750f in mlir::ParseResult llvm::function_ref<mlir::ParseResult (mlir::OpAsmParser&, mlir::OperationState&)>::callback_fn<llvm::unique_function<mlir::ParseResult (mlir::OpAsmParser&, mlir::OperationState&)> >(long, mlir::OpAsmParser&, mlir::OperationState&) Parser.cpp
    #6 0x559e4fe2af66 in (anonymous namespace)::OperationParser::parseOperation() Parser.cpp
    #7 0x559e4fe3ecfe in (anonymous namespace)::OperationParser::parseBlock(mlir::Block*&) Parser.cpp
    #8 0x559e4fe40711 in (anonymous namespace)::OperationParser::parseRegionBody(mlir::Region&, llvm::SMLoc, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) Parser.cpp
    #9 0x559e4fe426f8 in (anonymous namespace)::OperationParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) Parser.cpp
    #10 0x559e4fe22cc5 in (anonymous namespace)::OperationParser::parseGenericOperationAfterOpName(mlir::OperationState&, std::optional<llvm::ArrayRef<mlir::OpAsmParser::UnresolvedOperand> >, std::optional<llvm::ArrayRef<mlir::Block*> >, std::optional<llvm::MutableArrayRef<std::unique_ptr<mlir::Region, std::default_delete<mlir::Region> > > >, std::optional<llvm::ArrayRef<mlir::NamedAttribute> >, std::optional<mlir::Attribute>, std::optional<mlir::FunctionType>) (.isra.0) Parser.cpp
    #11 0x559e4fe26f0c in (anonymous namespace)::OperationParser::parseGenericOperation() Parser.cpp
    #12 0x559e4fe2c31b in (anonymous namespace)::OperationParser::parseOperation() Parser.cpp
    #13 0x559e4fe3dcc1 in (anonymous namespace)::OperationParser::parseBlock(mlir::Block*&) Parser.cpp
    #14 0x559e4fe41778 in (anonymous namespace)::OperationParser::parseRegionBody(mlir::Region&, llvm::SMLoc, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) Parser.cpp
    #15 0x559e4fe426f8 in (anonymous namespace)::OperationParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) Parser.cpp
    #16 0x559e4fe42b2d in (anonymous namespace)::CustomOpAsmParser::parseOptionalRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) Parser.cpp
    #17 0x559e506494b0 in mlir::function_interface_impl::parseFunctionOp(mlir::OpAsmParser&, mlir::OperationState&, bool, mlir::StringAttr, llvm::function_ref<mlir::Type (mlir::Builder&, llvm::ArrayRef<mlir::Type>, llvm::ArrayRef<mlir::Type>, mlir::function_interface_impl::VariadicFlag, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&)>, mlir::StringAttr, mlir::StringAttr) (/usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt+0x1a4314b0) (BuildId: 1c4e980b97f8f8cb)
    #18 0x559e43028b14 in mlir::triton::FuncOp::parse(mlir::OpAsmParser&, mlir::OperationState&) /usr/local/google/home/jlebar/code/triton/lib/Dialect/Triton/IR/Ops.cpp:824:10
    #19 0x559e4fdc750f in mlir::ParseResult llvm::function_ref<mlir::ParseResult (mlir::OpAsmParser&, mlir::OperationState&)>::callback_fn<llvm::unique_function<mlir::ParseResult (mlir::OpAsmParser&, mlir::OperationState&)> >(long, mlir::OpAsmParser&, mlir::OperationState&) Parser.cpp
    #20 0x559e4fe2af66 in (anonymous namespace)::OperationParser::parseOperation() Parser.cpp
    #21 0x559e4fe3dcc1 in (anonymous namespace)::OperationParser::parseBlock(mlir::Block*&) Parser.cpp
    #22 0x559e4fe40711 in (anonymous namespace)::OperationParser::parseRegionBody(mlir::Region&, llvm::SMLoc, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) Parser.cpp
    #23 0x559e4fe43a2e in (anonymous namespace)::CustomOpAsmParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) Parser.cpp
    #24 0x559e504c7934 in mlir::ModuleOp::parse(mlir::OpAsmParser&, mlir::OperationState&) (/usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt+0x1a2af934) (BuildId: 1c4e980b97f8f8cb)
    #25 0x559e4fdc750f in mlir::ParseResult llvm::function_ref<mlir::ParseResult (mlir::OpAsmParser&, mlir::OperationState&)>::callback_fn<llvm::unique_function<mlir::ParseResult (mlir::OpAsmParser&, mlir::OperationState&)> >(long, mlir::OpAsmParser&, mlir::OperationState&) Parser.cpp
    #26 0x559e4fe2af66 in (anonymous namespace)::OperationParser::parseOperation() Parser.cpp
    #27 0x559e4fe342e0 in mlir::parseAsmSourceFile(llvm::SourceMgr const&, mlir::Block*, mlir::ParserConfig const&, mlir::AsmParserState*, mlir::AsmParserCodeCompleteContext*) (/usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt+0x19c1c2e0) (BuildId: 1c4e980b97f8f8cb)
    #28 0x559e4fce0adb in mlir::parseSourceFile(std::shared_ptr<llvm::SourceMgr> const&, mlir::Block*, mlir::ParserConfig const&, mlir::LocationAttr*) (/usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt+0x19ac8adb) (BuildId: 1c4e980b97f8f8cb)
    #29 0x559e45885c58 in performActions(llvm::raw_ostream&, std::shared_ptr<llvm::SourceMgr> const&, mlir::MLIRContext*, mlir::MlirOptMainConfig const&) MlirOptMain.cpp
    #30 0x559e458898ec in processBuffer(llvm::raw_ostream&, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer> >, mlir::MlirOptMainConfig const&, mlir::DialectRegistry&, llvm::ThreadPool*) MlirOptMain.cpp
    #31 0x559e4588a172 in mlir::LogicalResult llvm::function_ref<mlir::LogicalResult (std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer> >, llvm::raw_ostream&)>::callback_fn<mlir::MlirOptMain(llvm::raw_ostream&, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer> >, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&)::'lambda'(std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer> >, llvm::raw_ostream&)>(long, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer> >, llvm::raw_ostream&) MlirOptMain.cpp
    #32 0x559e5080ee3f in mlir::splitAndProcessBuffer(std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer> >, llvm::function_ref<mlir::LogicalResult (std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer> >, llvm::raw_ostream&)>, llvm::raw_ostream&, bool, bool)::'lambda'(llvm::StringRef)::operator()(llvm::StringRef) const ToolUtilities.cpp
    #33 0x559e508124f4 in mlir::splitAndProcessBuffer(std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer> >, llvm::function_ref<mlir::LogicalResult (std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer> >, llvm::raw_ostream&)>, llvm::raw_ostream&, bool, bool) (/usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt+0x1a5fa4f4) (BuildId: 1c4e980b97f8f8cb)
    #34 0x559e4587466a in mlir::MlirOptMain(llvm::raw_ostream&, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer> >, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&) (/usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt+0xf65c66a) (BuildId: 1c4e980b97f8f8cb)
    #35 0x559e4588b4e2 in mlir::MlirOptMain(int, char**, llvm::StringRef, mlir::DialectRegistry&) (/usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt+0xf6734e2) (BuildId: 1c4e980b97f8f8cb)
    #36 0x559e3d1fca6e in main /usr/local/google/home/jlebar/code/triton/bin/triton-opt.cpp:9:33
    #37 0x7ff0a80456c9 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #38 0x7ff0a8045784 in __libc_start_main csu/../csu/libc-start.c:360:3
    #39 0x559e3d1480a0 in _start (/usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt+0x6f300a0) (BuildId: 1c4e980b97f8f8cb)

Address 0x7ffe6bdfb750 is located in stack of thread T0 at offset 208 in frame
    #0 0x559e4fdc70bf in mlir::ParseResult llvm::function_ref<mlir::ParseResult (mlir::OpAsmParser&, mlir::OperationState&)>::callback_fn<llvm::unique_function<mlir::ParseResult (mlir::OpAsmParser&, mlir::OperationState&)> >(long, mlir::OpAsmParser&, mlir::OperationState&) Parser.cpp

  This frame has 9 object(s):
    [32, 40) '<unknown>'
    [64, 72) '<unknown>'
    [96, 104) 'V'
    [128, 136) 'P'
    [160, 168) 'V'
    [192, 200) 'P' <== Memory access at offset 208 overflows this variable
    [224, 232) '<unknown>' <== Memory access at offset 208 underflows this variable
    [256, 264) 'V'
    [288, 296) 'P'
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow AffineOps.cpp in mlir::ParseResult llvm::function_ref<mlir::ParseResult ()>::callback_fn<mlir::AsmParser::parseTypeList(llvm::SmallVectorImpl<mlir::Type>&)::'lambda'()>(long)
Shadow bytes around the buggy address:
  0x10004d7b7690: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10004d7b76a0: 00 00 f2 f2 f2 f2 f2 f2 f2 f2 00 00 00 f3 f3 f3
  0x10004d7b76b0: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
  0x10004d7b76c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10004d7b76d0: f1 f1 f1 f1 f8 f2 f2 f2 f8 f2 f2 f2 f8 f2 f2 f2
=>0x10004d7b76e0: f8 f2 f2 f2 f8 f2 f2 f2 f8 f2[f2]f2 00 f2 f2 f2
  0x10004d7b76f0: 00 f2 f2 f2 00 f3 f3 f3 00 00 00 00 00 00 00 00
  0x10004d7b7700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10004d7b7710: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10004d7b7720: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1
  0x10004d7b7730: f1 f1 f1 f1 01 f2 01 f2 01 f2 01 f2 01 f2 01 f2
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==2817286==ABORTING

--

********************
********************
Failed Tests (1):
  TRITON :: Triton/reduce.mlir


Testing Time: 0.55s
  Failed: 1

@ThomasRaoux
Copy link
Collaborator

Interesting, the wrong memory access still happens in the tablegen generated parse function:
triton::ReduceReturnOp::parse(

that's where the crash was already. This would point to a bug in MLIR although this seems to be a commonly used case so it would be surprising.
I wonder if this is a bug in the symbols but it is very odd that it points to AffineOps.cpp

    #0 0x559e3d547a38 in mlir::ParseResult llvm::function_ref<mlir::ParseResult ()>::callback_fn<mlir::AsmParser::parseTypeList(llvm::SmallVectorImpl<mlir::Type>&)::'lambda'()>(long) AffineOps.cpp

@ThomasRaoux
Copy link
Collaborator

are you running the debug version of MLIR? (can be done by setting TRITON_USE_ASSERT_ENABLED_LLVM=1) that might have better symbol information.

@jlebar
Copy link
Contributor Author

jlebar commented Sep 27, 2023

Hm, I'm not really sure what's going on.

Starting at frame `#4 0x559e43003fc4 in mlir::triton::ReduceReturnOp::parse(mlir::OpAsmParser&, mlir::OperationState&) /usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/include/triton/Dialect/Triton/IR/Ops.cpp.inc:9362:14

The relevant function is

::mlir::ParseResult ReduceReturnOp::parse(::mlir::OpAsmParser &parser, ::mlir::OperationState &result) {
  ::llvm::SmallVector<::mlir::OpAsmParser::UnresolvedOperand, 4> resultOperands;
  ::llvm::SMLoc resultOperandsLoc;
  (void)resultOperandsLoc;
  ::llvm::SmallVector<::mlir::Type, 1> resultTypes;

  resultOperandsLoc = parser.getCurrentLocation();
  if (parser.parseOperandList(resultOperands))
    return ::mlir::failure();
  {
    auto loc = parser.getCurrentLocation();(void)loc;
    if (parser.parseOptionalAttrDict(result.attributes))
      return ::mlir::failure();
  }
  if (parser.parseColon())
    return ::mlir::failure();

  if (parser.parseTypeList(resultTypes))
    return ::mlir::failure();
  if (parser.resolveOperands(resultOperands, resultTypes, resultOperandsLoc, result.operands))
    return ::mlir::failure();
  return ::mlir::success();
}

We are inside parseTypeList() at #3 0x559e43003fc4 in mlir::AsmParser::parseTypeList(llvm::SmallVectorImpl<mlir::Type>&) /usr/local/google/home/jlebar/code/llvm/mlir/include/mlir/IR/OpImplementation.h:1176:12

  ParseResult parseTypeList(SmallVectorImpl<Type> &result) {
    return parseCommaSeparatedList(
        [&]() { return parseType(result.emplace_back()); });
  }

OK, presumably the problem is in this lambda.

I changed it to this

    ParseResult parseTypeList(SmallVectorImpl<Type> &result) {
      return parseCommaSeparatedList(
         [&, result = &result]() { return parseType(result->emplace_back()); });
    }

and now it seems to work? But...surely [&] does not capture a reference-to-a-reference for result. What the heck?

@jlebar
Copy link
Contributor Author

jlebar commented Sep 27, 2023

are you running the debug version of MLIR? (can be done by setting TRITON_USE_ASSERT_ENABLED_LLVM=1) that might have better symbol information.

Yes, I built LLVM myself with assertions enabled.

@jlebar
Copy link
Contributor Author

jlebar commented Sep 27, 2023

Also confirmed that this code in OpImplementation.h has not changed since it was implemented in 2022.

Also also if there really is a bug in LLVM, how do we explain the fact that this crashes only when we build Triton with clang, irrespective of how we built LLVM itself?

@ThomasRaoux
Copy link
Collaborator

Also confirmed that this code in OpImplementation.h has not changed since it was implemented in 2022.

Also also if there really is a bug in LLVM, how do we explain the fact that this crashes only when we build Triton with clang, irrespective of how we built LLVM itself?

That's a good point. Are you building LLVM with the same compiler and standard lib as triton?

@ThomasRaoux
Copy link
Collaborator

Also confirmed that this code in OpImplementation.h has not changed since it was implemented in 2022.

Also also if there really is a bug in LLVM, how do we explain the fact that this crashes only when we build Triton with clang, irrespective of how we built LLVM itself?

actually even though this code is in LLVM since it is in a header it gets built along with triton and not with LLVM right?

@jlebar
Copy link
Contributor Author

jlebar commented Sep 27, 2023

actually even though this code is in LLVM since it is in a header it gets built along with triton and not with LLVM right?

Ah, indeed. It should get compiled into whatever is the translation unit that includes the header, which is our (generated) code, include/triton/Dialect/Triton/IR/Ops.cpp.inc:9362:14.

Are you building LLVM with the same compiler and standard lib as triton?

Same standard lib, which is important, but I was compiling LLVM with GCC. Maybe this is not okay somehow? I...don't really see how this is not okay, but I don't understand how exactly we are linking together MLIR+Triton. I'll try rebuilding LLVM with clang.

Something really weird is going on; I just rebuilt LLVM exactly how I had done before (asan but without my [result=&result] change) and now I don't get a stacktrace anymore.

PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /usr/local/google/home/jlebar/code/triton/python/build/cmake.linux-x86_64-cpython-3.11/bin/triton-opt /usr/local/google/home/jlebar/code/triton/test/Triton/reduce.mlir -split-input-file -canonicalize -triton-combine
1.      MLIR Parser: custom op parser 'builtin.module'
2.      MLIR Parser: custom op parser 'tt.func'
3.      MLIR Parser: custom op parser 'tt.reduce.return'
AddressSanitizer:DEADLYSIGNAL
=================================================================
==2855683==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000003 (pc 0x7fbfe284e8ee bp 0x6320000126b0 sp 0x632000012660 T0)
==2855683==The signal is caused by a READ memory access.
==2855683==Hint: address points to the zero page.
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer: nested bug in the same thread, aborting.

@joker-eph
Copy link
Contributor

There is a known incompatibility between gcc and clang actually related to lambda...

@jlebar
Copy link
Contributor Author

jlebar commented Sep 27, 2023

There is a known incompatibility between gcc and clang actually related to lambda...

Oh, well that's almost definitely it then. Looks basically identical to https://discourse.llvm.org/t/gcc-abi-compatibility-lambda-captures/70850 ?

@joker-eph
Copy link
Contributor

Here is the bug I filed against GCC: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109963 ; it has a reference to the clang discussion.

In this case, I suspect we should just move this method out-of-line to avoid inlining and cross-compiler calls.

@jlebar
Copy link
Contributor Author

jlebar commented Sep 27, 2023

In this case, I suspect we should just move this method out-of-line to avoid inlining and cross-compiler calls.

Is that even worth doing? Like, this ABI incompatibility means that I can never safely pass a lambda across "compiler boundaries". Surely there are lots of places where this can happen in LLVM and MLIR, not just this one function?

Almost feels like we should somehow check for it and warn/crash, but I'm also not sure how to do that.

I was definitely not expecting to hit an ABI compatibility bug today.

@ThomasRaoux
Copy link
Collaborator

Does that mean that we should never be linking LLVM built with gcc with triton built with clang?
That means if we want to enable clang build we should generate new artifact for LLVM using clang and pick the right one based on the compiler the user is picking?

@joker-eph
Copy link
Contributor

Is that even worth doing? Like, this ABI incompatibility means that I can never safely pass a lambda across "compiler boundaries". Surely there are lots of places where this can happen in LLVM and MLIR, not just this one function?

It is a bit more tricky I believe: a lambda is never expressed in the ABI in general (because... how? Its type is anonymous right?).
We'd mostly use function_ref or std::function or a function pointer for interfaces.

What's weird here is that the function is inlined and so the code that builds the lambda is done by clang (if you build triton with clang) but the actual symbol called (the lambda constructor?) is in the DSO built with gcc.
So the "anonymous" type that models the lambda is what's incompatible and it can only blow up with this combination of inlining somehow.

@joker-eph
Copy link
Contributor

This instance is fixed with llvm/llvm-project@76ce4736721a

@jlebar
Copy link
Contributor Author

jlebar commented Sep 27, 2023

that the function is inlined and so the code that builds the lambda is done by clang (if you build triton with clang) but the actual symbol called (the lambda constructor?) is in the DSO built with gcc.

Maybe I misunderstand, but is that actually what's happening?

The TU is Ops.cpp.inc. This is being built by clang.

The TU includes OpImplementation.h, which defines the lambda. So the lambda anonymous struct and operator() are also compiled by clang.

Eventually the lambda is wrapped in function_ref and passed to mlir::detail::Parser::parseCommaSeparatedList, which is in mlir/lib/AsmParser/Parser.cpp, so is compiled by GCC. This then invokes the function_ref, which invokes the lambda, and boom.

So wouldn't we have exactly the same problem in the much simpler situation of:

# gcc.h
void foo(function_ref<void> fn);

# gcc.cc (compiled with gcc)
void foo(function_ref<void> fn) { fn(); }

# clang.cc (compiled with clang)
#include <gcc.h>

void boom() { foo([&] { ... } }

That is, the problem is caused anytime a lambda is passed across compiler boundaries (via function_ref or std::function or whatever)? Or am I missing something?

@joker-eph
Copy link
Contributor

I don't think so, in your example gcc compiles function_ref which "type erase" the lambda: I don't see how it can fail?

But let me try to write a minimal repro!

@jlebar
Copy link
Contributor Author

jlebar commented Sep 27, 2023

Oh, I think I see what you're saying, Mehdi. I could write an arbitrary struct and pass it into the function_ref, and that should still work. So the fact that GCC and LLVM do something different inside the lambda shouldn't make a difference.

@jlebar
Copy link
Contributor Author

jlebar commented Sep 27, 2023

Hm, I think I understand. itanium-cxx-abi/cxx-abi#141 (comment) seems to describe it pretty well.

AIUI it's essentially like an ODR violation. We have two different definitions of the lambda body, one generated by clang and one generated by GCC. The linker may decide to use either of these definitions.

In our case, when the function_ref type-erases the operator() of the lambda, it has to get a pointer to the operator() implementation. It can choose either of the two implementations. But they're not equivalent.

So, yeah, I think this "only" happens if:

  • you have a header file in LLVM which creates a lambda, and
  • the lambda is eventually passed into a .cc file in LLVM, and
  • the header is "public", so it's included by a project that uses LLVM.

In this case the lambda body can be compiled twice.

@joker-eph
Copy link
Contributor

This should be fixed with the last merge from upstream in #2403

@jlebar jlebar closed this as completed Oct 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants