Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bytecode from iree-import-tflite causes debug info to be dropped in translateModuleToLLVMIR #16781

Open
bjacob opened this issue Mar 14, 2024 · 27 comments

Comments

@bjacob
Copy link
Contributor

bjacob commented Mar 14, 2024

This is to continue the investigation in #15756, which is being reverted now.

Summary:

#15756 should have been a 1-line PR, simply adding -gline-tables-only to the clang flags that we use to compile ukernels to bitcode. All it does is generate some debug info in the compiled ukernels bitcode.

This runs into a bug discussed on this comment thread: https://github.com/openxla/iree/pull/15756/files#r1411583898

  • The bug in itself actually makes sense, as noted in this subsequent comment on that thread: https://github.com/openxla/iree/pull/15756/files#r1518132234 .
    • It's saying that a callee with debug info (here the ukernel) doesn't like getting inlined into a caller without debug info (here the dispatch function).
    • The mystery is why is the dispatch function lacking debug info? We are generating debug info by default, and in most models (more on that in the next bullet point) we are getting debug info and all works fine.
    • It seems that we had a preexisting bug, that debug info was sometimes missing from dispatch functions, that we only discovered now that linking with ukernels with debug info made it a compilation error.
  • The only models affected are some TFLite models. Even then, the problem goes away:
    • If the output of iree-import-tflite is run through iree-opt with no flags...
    • ... or just mlir-opt, or even just dumping the IR immediately after loading the input file in mlir-opt...
    • ... so that it seems that simply disassembling bitcode back to textual IR is enough to fix it.
    • This is the key problem making this so hard to debug: our normal first debugging step, looking at the text IR, prevents reproducing!
  • The problem also goes away:
  • The dumped .codegen.bc actually does have some limited debug info (on each function, but not on the ops inside each function). But the debug info somehow gets dropped inside of mlir::translateModuleToLLVMIR (https://github.com/openxla/iree/pull/15756/files#r1518324689). I haven't been able to debug exactly where, but that's the closest i've been able to get.
@bjacob
Copy link
Contributor Author

bjacob commented Mar 14, 2024

@bjacob
Copy link
Contributor Author

bjacob commented Mar 15, 2024

If at the very start of the serializeExecutable function I dump variantOp, the resulting (text) dump is identical in the bad and good cases.

If I dump the parent mlir::ModuleOp, the only difference is in the namespace string; if I replace the name (sed 's/iree_PoseNet_fp32_tflite_/llvm_module/g') then the ModuleOp is identical.

I think at this point we have narrowed this bug down to something about bitcode. In the bad case, the module as loaded from bitcode is corrupted in a way that somehow causes debug info to be dropped in translateModuleToLLVMIR, but that corruption doesn't survive text-serialization, so we can't see it in any kind of textual IR dump.

@bjacob
Copy link
Contributor Author

bjacob commented Mar 15, 2024

Confirmed with this experiment:

If I process the bad input file with iree-opt now with --emit-bytecode then the problem is still there in the output.

Whereas as noted earlier, without --emit-bytecode the problem goes away.

If I process first without --emit-bytecode (so I get text) and then I process that with --emit-bytecode then the problem is still gone. So the first processing to text IR did resolve some kind of corruption in the original IR. It's not like this MLIR module is consistently showing an issue whenever reencoded back to bytecode.

@bjacob bjacob changed the title Ukernels: enable limited debug information Bytecode from iree-import-tflite causes debug info to be dropped in translateModuleToLLVMIR Mar 15, 2024
@dan-garvey
Copy link
Contributor

Fascinating, I would have gotten lost the moment the bug failed to express as text

@bjacob
Copy link
Contributor Author

bjacob commented Mar 15, 2024

Fully traced through BytecodeReader, here is the divergence:

Call stack at point where things diverge:

(gdb) bt
#0  (anonymous namespace)::DialectReader::readVarInt (this=0x7fffffff9eb0, result=@0x7fffffff9ce0: 15) at /home/benoit/iree/third_party/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:993
#1  0x00007fffeac5e811 in (anonymous namespace)::readAttribute (context=0x5555555e90a0, reader=...) at llvm-project/tools/mlir/include/mlir/IR/BuiltinDialectBytecode.cpp.inc:360
#2  0x00007fffeac5e715 in (anonymous namespace)::BuiltinDialectBytecodeInterface::readAttribute (this=0x5555555f6540, reader=...) at /home/benoit/iree/third_party/llvm-project/mlir/lib/IR/BuiltinDialectBytecode.cpp:94
#3  0x00007ffff204c5d3 in (anonymous namespace)::AttrTypeReader::parseCustomEntry<mlir::Attribute> (this=0x7fffffffb510, entry=..., reader=..., entryType=...) at /home/benoit/iree/third_party/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:1328
#4  0x00007ffff204c11f in (anonymous namespace)::AttrTypeReader::resolveEntry<mlir::Attribute> (this=0x7fffffffb510, entries=..., index=0, entryType=...) at /home/benoit/iree/third_party/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:1236
#5  0x00007ffff204bf21 in (anonymous namespace)::AttrTypeReader::resolveAttribute (this=0x7fffffffb510, index=0) at /home/benoit/iree/third_party/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:824
#6  0x00007ffff204be91 in (anonymous namespace)::AttrTypeReader::parseAttribute (this=0x7fffffffb510, reader=..., result=...) at /home/benoit/iree/third_party/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:833
#7  0x00007ffff2050e6f in (anonymous namespace)::AttrTypeReader::parseAttribute<mlir::LocationAttr> (this=0x7fffffffb510, reader=..., result=...) at /home/benoit/iree/third_party/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:859
#8  0x00007ffff204ff29 in mlir::BytecodeReader::Impl::parseAttribute<mlir::LocationAttr> (this=0x7fffffffb4b0, reader=..., result=...) at /home/benoit/iree/third_party/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:1454
#9  0x00007ffff2046817 in mlir::BytecodeReader::Impl::parseOpWithoutRegions (this=0x7fffffffb4b0, reader=..., readState=..., isIsolatedFromAbove=@0x7fffffffab97: false) at /home/benoit/iree/third_party/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:2273
#10 0x00007ffff2045da7 in mlir::BytecodeReader::Impl::parseRegions (this=0x7fffffffb4b0, regionStack=std::vector of length 1, capacity 1 = {...}, readState=...) at /home/benoit/iree/third_party/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:2190
#11 0x00007ffff204406b in mlir::BytecodeReader::Impl::parseIRSection (this=0x7fffffffb4b0, sectionData=..., block=0x7fffffffbca8) at /home/benoit/iree/third_party/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:2126
#12 0x00007ffff204211c in mlir::BytecodeReader::Impl::read(mlir::Block*, llvm::function_ref<bool (mlir::Operation*)>) (this=0x7fffffffb4b0, block=0x7fffffffbca8, lazyOpsCallback=...) at /home/benoit/iree/third_party/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:1741
#13 0x00007ffff2047ddf in readBytecodeFileImpl (buffer=..., block=0x7fffffffbca8, config=..., bufferOwnerRef=std::shared_ptr<llvm::SourceMgr> (empty) = {...}) at /home/benoit/iree/third_party/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:2641
#14 0x00007ffff2047bab in mlir::readBytecodeFile (buffer=..., block=0x7fffffffbca8, config=...) at /home/benoit/iree/third_party/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:2649
#15 0x00007fffeadb12e2 in mlir::parseSourceFile (sourceMgr=..., block=0x7fffffffbca8, config=..., sourceFileLoc=0x7fffffffbcf0) at /home/benoit/iree/third_party/llvm-project/mlir/lib/Parser/Parser.cpp:30
#16 0x00007fffea951f1f in mlir::detail::parseSourceFile<mlir::ModuleOp, llvm::SourceMgr const&> (config=..., args=...) at /home/benoit/iree/third_party/llvm-project/mlir/include/mlir/Parser/Parser.h:159
#17 0x00007fffea9506c8 in mlir::parseSourceFile<mlir::ModuleOp> (sourceMgr=..., config=...) at /home/benoit/iree/third_party/llvm-project/mlir/include/mlir/Parser/Parser.h:178
#18 0x00007fffea918263 in mlir::iree_compiler::embed::(anonymous namespace)::Invocation::parseSource (this=0x5555555b26c0, source=...) at /home/benoit/iree/compiler/src/iree/compiler/API/Internal/CompilerDriver.cpp:822
#19 0x00007fffea918182 in ireeCompilerInvocationParseSource (inv=0x5555555b26c0, source=0x555555648080) at /home/benoit/iree/compiler/src/iree/compiler/API/Internal/CompilerDriver.cpp:1371
#20 0x00007fffeae3e820 in mlir::iree_compiler::runIreecMain(int, char**)::$_2::operator()(iree_compiler_source_t*) const (this=0x7fffffffc040, source=0x555555648080) at /home/benoit/iree/compiler/src/iree/compiler/Tools/iree_compile_lib.cc:241
#21 0x00007fffeae3dd49 in mlir::iree_compiler::runIreecMain (argc=9, argv=0x7fffffffda08) at /home/benoit/iree/compiler/src/iree/compiler/Tools/iree_compile_lib.cc:348
#22 0x00007fffea96167b in ireeCompilerRunMain (argc=9, argv=0x7fffffffda08) at /home/benoit/iree/compiler/src/iree/compiler/API/Internal/IREECompileToolEntryPoint.cpp:12
#23 0x00005555555557a2 in main (argc=9, argv=0x7fffffffda08) at /home/benoit/iree/tools/iree-compile-main.cc:9

Looking at frame #1, in the bad case, the readVarInt(kind) produces kind = 15, while in the good case it produces kind = 11.

Since it's in generated code I can't link it easily so here's a paste. This is at ${BUILD_DIR}/llvm-project/tools/mlir/include/mlir/IR/BuiltinDialectBytecode.cpp.inc:360.

static Attribute readAttribute(MLIRContext* context, DialectBytecodeReader &reader) {
  uint64_t kind;
  if (failed(reader.readVarInt(kind)))
    return Attribute();
  switch (kind) {
    case 0:
      return readArrayAttr(context, reader);
    case 1:
      return readDictionaryAttr(context, reader);
    case 2:
      return readStringAttr(context, reader);
    case 3:
      return readStringAttrWithType(context, reader);
    case 4:
      return readFlatSymbolRefAttr(context, reader);
    case 5:
      return readSymbolRefAttr(context, reader);
    case 6:
      return readTypeAttr(context, reader);
    case 7:
      return readUnitAttr(context, reader);
    case 8:
      return readIntegerAttr(context, reader);
    case 9:
      return readFloatAttr(context, reader);
    case 10:
      return readCallSiteLoc(context, reader);
    case 11:
      return readFileLineColLoc(context, reader);
    case 12:
      return readFusedLoc(context, reader);
    case 13:
      return readFusedLocWithMetadata(context, reader);
    case 14:
      return readNameLoc(context, reader);
    case 15:
      return readUnknownLoc(context, reader);
    case 16:
      return readDenseResourceElementsAttr(context, reader);
    case 17:
      return readDenseArrayAttr(context, reader);
    case 18:
      return readDenseIntOrFPElementsAttr(context, reader);
    case 19:
      return readDenseStringElementsAttr(context, reader);
    case 20:
      return readSparseElementsAttr(context, reader);
    case 21:
      return readDistinctAttr(context, reader);
    default:
      reader.emitError() << "unknown attribute code: " << kind;
      return Attribute();
  }
  return Attribute();
}

So:

  • The bad value 15 means we call readUnknownLoc.
  • The good value 11 means we call readFileLineColLoc.

Interpretation:

  • The bad bytecode file produced by iree-import-tflite has UnknownLoc attributes. These result in eventually dropping most debug information at the end of compilation in translateModuleToLLVMIR.
  • The good bytecode file (obtained from the bad one by round-tripping through text IR) has FileLineColLoc attributes. These result in real debug information surviving translateModuleToLLVMIR.

What's exactly the bug here? If simply round-tripping via text IR is enough to fix those locations, maybe that means that they should have been good locations in the first place, so the bug is iree-import-tflite producing UnknownLoc in the first place? Alternatively, could translateModuleToLLVMIR do something more graceful that just drop all debug info when it sees UnknownLoc ?

@ScottTodd
Copy link
Collaborator

ScottTodd commented Mar 15, 2024

The good bytecode file (obtained from the bad one by round-tripping through text IR) has FileLineColLoc attributes.

Can you give a sample value for one of those? Does it reference the .tflite file?

@bjacob
Copy link
Contributor Author

bjacob commented Mar 15, 2024

Again the good bytecode file is obtained from the bad bytecode file here by two successive iree-opt commands round-tripping via text IR and back to bytecode. So there isn't any actual new information in the good bytecode file compared to what there is in the bad bytecode file. I'll still dump what those FileLineColLoc contain, moment.

@ScottTodd
Copy link
Collaborator

Here are some hypotheses:

  • iree-import-tflite does not include source information
  • round tripping puts .mlir[bc] locations, not .tflite locations

We should also decouple the existence of source locations from whatever other debug information may or may not be included.

@bjacob
Copy link
Contributor Author

bjacob commented Mar 18, 2024

Indeed:

  • iree-import-tflite creates MLIR bytecode with UnknownLoc locations.
  • Converting from .mlirbc to text .mlir results in locations being created, referencing lines/columns in that text .mlir file. Then converting back to .mlirbc preserves these locations into that text .mlir file. That is how round-tripping avoids the issue.

Things I considered:

  1. I pondered whether it is a bug that iree-import-tflite creates UnknownLoc locations, but I couldn't convince myself that it is, as tflite files are binaries with no canonical text representation, so it's hard to see how any notion of location would be actually usable as debug information. Being a binary format is a size vs usability trade-off in the first place, and generating UnknownLoc is merely going with that flow.
  2. I pondered for a second whether it would make sense then to store text MLIR form as the source that debug info could refer to... and then I realized that that is exactly what Embedding executable source contents in binaries for tracing. #16757 just did.
  3. I tried hacking the logic inside translateModuleToLLVMIR to handle UnknownLoc gracefully. I found that the relevant logic is in third_party/llvm-project/mlir/lib/Target/LLVMIR/DebugTranslation.cpp, where UnknownLoc is special-cased in a couple of places. The problem, as explained in a comment, is that LLVM IR does not have a notion of "unknown loc", so all locations have to be real locations. I tried returning a make-believe line-column location with (0, 0) but stumbled on errors complaining that the scope was invalid. I guess I just learned why the existing code had to special-case UnknownLoc. At this point I thought back about what I had just learned about Embedding executable source contents in binaries for tracing. #16757 and decided to give up and instead do as follows:

What I now think we should do:

Remember that I need this in the first place to have debug info in ukernels, and the motivation for this is having a good Tracy source view. Now, as of #16757 , users who care about having a good source view are users who pass --iree-hal-executable-debug-level=3, in which case CaptureExecutableSourcesPass runs and we have an actual source that debug info can refer to. So I now think that we should:

  1. When debugLevel < 3, strip debug info from ukernel bitcode, since the user won't get a good source view in the parent scopes anyway, not much point fighting for a good source view in the leaf ukernel scopes.
  2. When debugLevel >= 3, ensure that we do have source locations. As noted earlier, --iree-hal-dump-executable-sources-to fixes the problem by creating source locations. Now, Embedding executable source contents in binaries for tracing. #16757 notes: "Unlike the DumpExecutableSourcesPass this does not change the original source locations [...]" and indeed I can see that passing --iree-hal-executable-debug-level=3 does not have the same effect of fixing my bug. But I'm hoping that some minor change here will do the trick (e.g. perhaps making this add source locations like --iree-hal-dump-executable-sources-to does? I'm not clear enough about the nuances here to be sure I'm making sense...)

@benvanik @ScottTodd

@ScottTodd
Copy link
Collaborator

(e.g. perhaps making this add source locations like --iree-hal-dump-executable-sources-to does? I'm not clear enough about the nuances here to be sure I'm making sense...)

The decoupling there was intentional. Tracing showing snapshotted sources produced by --iree-hal-dump-executable-sources was a clever trick. The new mechanism is explicit rather than implicit, so it should be easier to understand and work with.

I think the best paths forward are some mix of

  • Fix upstream MLIR to not conflate "has debug information" with "has source locations" (just one type/property of debug information)
  • Don't add debug information if a backend can't handle it (e.g. for LLVM if the module is missing location information)

Whether that is connected with --iree-hal-executable-debug-level... idk. The numbers there are arbitrary and mostly based on how much they increase binary size. We haven't done too much consistently with that option yet. Could possibly snapshot locations if they are missing and the level is >= 1?

@benvanik
Copy link
Collaborator

We need the upstream bug fixed. There's no reason debug info should fail if there's an UnknownLoc. I think there's a bit too much focus on workarounds that dictate how features behave in ways that should be orthogonal. --iree-hal-executable-debug-level= is explicitly designed to not reassign source locations for reasons.

@bjacob
Copy link
Contributor Author

bjacob commented Mar 18, 2024

I meant something like Scott alludes to in the last sentence of his above comment:

Could possibly snapshot locations if they are missing and the level is >= 1?

Emphasis on "if they are missing".

The situation here is the locations are UnknownLoc, so there is a need to add actual locations to get a useful source view. This is likely a preexisting problem: I bet that if we try to tracy those tflite models with UnknownLoc, source view isn't showing anything. My point was, this doesn't even feel like an "upstream bug" - there is no source location because there is no source, and even that may not be a bug given that TFLite is a binary format. So like Scott suggests in the above pasted sentence, I was thinking that for high enough debugLevel (I was suggesting debugLevel >= 3) we could actually rewrite UnknownLoc locations with the snapshotted source locations. I'm OK with leaving alone any existing actual locations (anything else than UnknownLoc).

If you still see an "upstream bug" here that needs fixing, I would like to understand what that is. I can think of two potential candidates but I'm way out of my depth in either:

  1. DebugTranslation.cpp special-casing UnknownLoc and completely giving up when it sees it.
  2. The lower-level LLVM error when a function with debug info is inlined at a call site without debug info. Certainly that seems unnecessarily rigid, but I'm extremely out of my depth there.

@bjacob
Copy link
Contributor Author

bjacob commented Mar 18, 2024

After posting the previous comment I thought: Wait, either I'm missing something, or #16757 does not actually fix #15699.

I am at current top-of-tree (ee32fc7). I did an experiment on a OPT-1.3b model from shark_tank, so this is a PyTorch model not TFLite. The command lines are in this gist: https://gist.github.com/bjacob/75717f6335603628d09c58313aa0d47d . It's totally self contained with a link to download the imported MLIR from.

I just added the usual Tracy-ing flags, and ran two experiments:

  1. With just --iree-hal-executable-debug-level=3 added to the compilation command line. Result: no source view (first screenshot below).
  2. With also --iree-hal-dump-executable-sources-to=/tmp. Result: source view (second screenshot below).

I thought that "#16757 fixes #15699" meant that --iree-hal-executable-debug-level=3 by itself would suffice to obtain the Tracy source view, without adding --iree-hal-dump-executable-sources-to=/tmp.

Screenshots:

  1. No source view with --iree-hal-executable-debug-level=3 :
  • bad
  1. Source view with --iree-hal-dump-executable-sources-to=/tmp:
  • good

@ScottTodd
Copy link
Collaborator

I just added the usual Tracy-ing flags, and ran two experiments:

  1. With just --iree-hal-executable-debug-level=3 added to the compilation command line. Result: no source view (first screenshot below).
  2. With also --iree-hal-dump-executable-sources-to=/tmp. Result: source view (second screenshot below).

For Tracy, you should prefer the new --iree-hal-capture-executable-sources pass. That stays in process (instead of dumping files to disk) and is explicit about the link between source files and tracing. (I'll update the docs soon)

@ScottTodd
Copy link
Collaborator

I bet that if we try to tracy those tflite models with UnknownLoc, source view isn't showing anything.

Source locations and debug information should be separable. We don't want to have a tight coupling between those concepts. If one frontend like TFLite doesn't include one type of metadata, that shouldn't stop us from including other kinds of metadata.

@bjacob
Copy link
Contributor Author

bjacob commented Mar 18, 2024

For Tracy, you should prefer the new --iree-hal-capture-executable-sources pass.

Ah, do I need to patch some PR? Trying --iree-hal-capture-executable-sources gives me:

iree-compile: Unknown command line argument '--iree-hal-capture-executable-sources'.  Try: 'tools/iree-compile --help'
iree-compile: Did you mean '--iree-hal-substitute-executable-source'?

I tried the suggested one but it needs a value?

That stays in process (instead of dumping files to disk) and is explicit about the link between source files and tracing. (I'll update the docs soon)

I see, that sounds great. I had assumed that --iree-hal-executable-debug-level=3 would have that behavior as of #16757 . Why wouldn't it? When I specify --iree-hal-executable-debug-level=3 , I'm already conveying "go ahead and bloat my module but make Tracy work!"

Source locations and debug information should be separable. We don't want to have a tight coupling between those concepts.

I need a lecture on this topic. I just don't really understand at the moment.

@ScottTodd
Copy link
Collaborator

Let me confirm how the new pass is working... it might be actually be added automatically with debug level >= 3.

Ah, do I need to patch some PR?

Depends on what version/commit you're building at :p. You do need Ben's PR / 15d9039

@bjacob
Copy link
Contributor Author

bjacob commented Mar 18, 2024

Yep, as mentioned above I am at current top-of-tree (ee32fc7) which is past that (I checked).

@benvanik
Copy link
Collaborator

the extra sources are embedded with debug-level>=3

What source locations are passed to LLVM and what source files we embed in the executables are orthogonal. It is useful to be able to point at our new captured sources, but that should be a separate flag. We could have an iree-hal-executable-source-stage= flag that does the location rewrite to a captured stage by name. This way you could have the debug info reference tensor linalg, buffer linag, LLVM dialect, etc. If we added that flag we could remove the implicit source location overwriting of dump-sources, as that was always a hack.

But we want that to be optional and off by default: to users (not internal devs) the source locations should always point back to their source programs by default. And if because of a bug that means this fails to work we need to fix that bug.

So:

  • users should always have some level of debug info, and by default it should be their source programs (not whatever we want)
  • we can capture source locations with optional flags at various stages now when debug-level>=3

@bjacob
Copy link
Contributor Author

bjacob commented Mar 18, 2024

I see.

What I'm seeing here is that with both TFLite and Pytorch sources, often we just don't have source locations out of the import process. So tracy source view doesn't show anything by default. Could we in that case and under debug-level>=3 set source locations to the embedded executable sources (which are tensor-linalg) ? Way more useful than no source at all.

Also a technicality - I get your explanation of "What source locations are passed to LLVM and what source files we embed in the executables are orthogonal", but then I don't understand how that works. If the source files in the embedded executable sources are orthogonal to the source locations passed to LLVM, how is the mapping done between locations in the final object code and locations in the embedded executable sources ?

@bjacob bjacob changed the title Bytecode from iree-import-tflite causes debug info to be dropped in translateModuleToLLVMIR Bytecode from importers (both iree-import-tflite and torch-mlir) causes debug info to be dropped in translateModuleToLLVMIR Mar 18, 2024
@benvanik
Copy link
Collaborator

I think I'm missing something - are you saying there are zero locations from tflite and pytorch? Or just some missing locations?
I don't think I understand how there can be zero locations - there's always some source file, unless the user strips them and sets them all to UnknownLoc - but users should never do that (and if there's an option for that in those tools it should be removed). If there's some partial missing information then that's the bug to fix.

The pass that captures the locations (which runs several times, each time with a compilation stage name) can set the flag to replace the source locations to point at the captured source - that's exactly what dump-sources does today - the only difference is in one we update the locations and in the other we don't.

@bjacob
Copy link
Contributor Author

bjacob commented Mar 18, 2024

Aah so to actually answer that question I had to patch the above-mentioned generated .cpp.inc parser code and my diff is there: https://gist.github.com/bjacob/ef04e138d571348edea0f5808c43fc3c

And you're right actually the "bad" imported bytecode has only one UnknownLoc and a large number of NameLoc. The "good" bytecode obtained by round-trip via text, has all FileLineLoc.

In the below table, "bad" means exhibits the pathologies discussed here, no source view etc.

Input to iree-compile UnknownLoc NameLoc FileLineColLoc
PoseNet from iree-import-tflite (bad) 1 85 0
the same, round-tripped via text (good) 0 0 128

Now I tried running the same experiment on that OPT-1.3 model but realized that it is text IR to begin with so it just doesn't go through that bytecode-reader code, and I have no idea why it exhibits the pathology.

@bjacob bjacob changed the title Bytecode from importers (both iree-import-tflite and torch-mlir) causes debug info to be dropped in translateModuleToLLVMIR Bytecode from iree-import-tflite causes debug info to be dropped in translateModuleToLLVMIR Mar 18, 2024
@bjacob
Copy link
Contributor Author

bjacob commented Mar 18, 2024

These are the NameLoc values in this PoseNet model:
https://gist.github.com/bjacob/31fdeb0902a42461d5cca6902aef6080

@bjacob
Copy link
Contributor Author

bjacob commented Mar 18, 2024

At this point I think I agree that it seems really unnecessarily rigid of the current DebugTranslation.cpp code to just give up on this perfectly good debug info just because of one UnknownLoc.

@bjacob
Copy link
Contributor Author

bjacob commented Mar 18, 2024

Note though, at this point we are trying to solve 2 problems, and a fix at that level would only solve 1 not 2:

  1. Failure to inline ukernels that have debug info.
  2. Failure to show a useful source view.

@bjacob
Copy link
Contributor Author

bjacob commented Mar 18, 2024

I tried again fixing translateLoc in DebugTranslation.cpp and noticed that in our situation where all the locations are NameLoc, it emits no LLVM DI at all. Indeed, here is what it does with NameLoc:

https://github.com/llvm/llvm-project/blob/6800f422c22ffd672b1e31f0d0a3fa29d19b7a13/mlir/lib/Target/LLVMIR/DebugTranslation.cpp#L434

So it's just ignoring the name string in the NameLoc and walking straight to the ChildLoc, ending with an UnknownLoc, returning null.

@jpienaar
Copy link
Member

Note: the round tripped via text is actually bad. It loses all traces to the original Tflite file and references an intermediate file. It accidentally works. The bug is how namedloc is being treated, well and indeed mixing two concerns together (but one definitely can have a namedloc and debug info and that should work).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants