Skip to content

Conversation

@krzysz00
Copy link
Contributor

Currently, the declarative pattern rewrite generator will always print the [source]:line from which a pattern came. This is a useful debugging hint, but it causes problem when absolute paths are used as arguments to mlir-tblgen (which LLVM's build rules automatically do). Specifially, it causes the source to be tied to the build location, harning reproducability and our collective ability to get ccache hits from, say, separate worktrees.

This commit resolves the issue by replacing absolute paths in thes "Generated from:" comments with their filenames. (The alternative would have been to implement an entire file-prefix-map the way the C compilers do, but since this is an isolated incident, I chose to resolve it locally.)

Currently, the declarative pattern rewrite generator will always print
the [source]:[line](s) from which a pattern came. This is a useful
debugging hint, but it causes problem when absolute paths are used as
arguments to mlir-tblgen (which LLVM's build rules automatically do).
Specifially, it causes the source to be tied to the build location,
harning reproducability and our collective ability to get ccache hits
from, say, separate worktrees.

This commit resolves the issue by replacing absolute paths in thes
"Generated from:" comments with their filenames. (The alternative
would have been to implement an entire file-prefix-map the way the C
compilers do, but since this is an isolated incident, I chose to
resolve it locally.)
@llvmbot llvmbot added mlir:core MLIR Core Infrastructure mlir labels Nov 21, 2025
@llvmbot
Copy link
Member

llvmbot commented Nov 21, 2025

@llvm/pr-subscribers-mlir-core

@llvm/pr-subscribers-mlir

Author: Krzysztof Drewniak (krzysz00)

Changes

Currently, the declarative pattern rewrite generator will always print the [source]:line from which a pattern came. This is a useful debugging hint, but it causes problem when absolute paths are used as arguments to mlir-tblgen (which LLVM's build rules automatically do). Specifially, it causes the source to be tied to the build location, harning reproducability and our collective ability to get ccache hits from, say, separate worktrees.

This commit resolves the issue by replacing absolute paths in thes "Generated from:" comments with their filenames. (The alternative would have been to implement an entire file-prefix-map the way the C compilers do, but since this is an isolated incident, I chose to resolve it locally.)


Full diff: https://github.com/llvm/llvm-project/pull/168984.diff

3 Files Affected:

  • (modified) mlir/include/mlir/TableGen/Pattern.h (+4-2)
  • (modified) mlir/lib/TableGen/Pattern.cpp (+17-4)
  • (modified) mlir/tools/mlir-tblgen/RewriterGen.cpp (+2-1)
diff --git a/mlir/include/mlir/TableGen/Pattern.h b/mlir/include/mlir/TableGen/Pattern.h
index 49b2dae62dc22..d2610f09b6241 100644
--- a/mlir/include/mlir/TableGen/Pattern.h
+++ b/mlir/include/mlir/TableGen/Pattern.h
@@ -643,8 +643,10 @@ class Pattern {
   using IdentifierLine = std::pair<StringRef, unsigned>;
 
   // Returns the file location of the pattern (buffer identifier + line number
-  // pair).
-  std::vector<IdentifierLine> getLocation() const;
+  // pair). If `forSourceOutput` is true, replace absolute paths in the buffer
+  // identifier with just their filename so that we don't leak build paths into
+  // the generated code.
+  std::vector<IdentifierLine> getLocation(bool forSourceOutput = false) const;
 
   // Recursively collects all bound symbols inside the DAG tree rooted
   // at `tree` and updates the given `infoMap`.
diff --git a/mlir/lib/TableGen/Pattern.cpp b/mlir/lib/TableGen/Pattern.cpp
index 1a1a58ad271bb..ce09f5c3f5183 100644
--- a/mlir/lib/TableGen/Pattern.cpp
+++ b/mlir/lib/TableGen/Pattern.cpp
@@ -18,6 +18,7 @@
 #include "llvm/ADT/Twine.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/FormatVariadic.h"
+#include "llvm/Support/Path.h"
 #include "llvm/TableGen/Error.h"
 #include "llvm/TableGen/Record.h"
 
@@ -771,15 +772,27 @@ int Pattern::getBenefit() const {
   return initBenefit + dyn_cast<IntInit>(delta->getArg(0))->getValue();
 }
 
-std::vector<Pattern::IdentifierLine> Pattern::getLocation() const {
+std::vector<Pattern::IdentifierLine>
+Pattern::getLocation(bool forSourceOutput) const {
   std::vector<std::pair<StringRef, unsigned>> result;
   result.reserve(def.getLoc().size());
   for (auto loc : def.getLoc()) {
     unsigned buf = llvm::SrcMgr.FindBufferContainingLoc(loc);
     assert(buf && "invalid source location");
-    result.emplace_back(
-        llvm::SrcMgr.getBufferInfo(buf).Buffer->getBufferIdentifier(),
-        llvm::SrcMgr.getLineAndColumn(loc, buf).first);
+
+    StringRef bufferName =
+        llvm::SrcMgr.getBufferInfo(buf).Buffer->getBufferIdentifier();
+    // If we're emitting a generated file, we'd like to have some indication of
+    // where our patterns came from. However, LLVM's build rules use absolute
+    // paths as arguments to TableGen, and naively echoing such paths makes the
+    // contents of the generated source file depend on the build location,
+    // making MLIR builds substantially less reproducable. As a compromise, we
+    // trim absolute paths back to only the filename component.
+    if (forSourceOutput && llvm::sys::path::is_absolute(bufferName))
+      bufferName = llvm::sys::path::filename(bufferName);
+
+    result.emplace_back(bufferName,
+                        llvm::SrcMgr.getLineAndColumn(loc, buf).first);
   }
   return result;
 }
diff --git a/mlir/tools/mlir-tblgen/RewriterGen.cpp b/mlir/tools/mlir-tblgen/RewriterGen.cpp
index c3034bb843c4a..7698355c8fa6e 100644
--- a/mlir/tools/mlir-tblgen/RewriterGen.cpp
+++ b/mlir/tools/mlir-tblgen/RewriterGen.cpp
@@ -28,6 +28,7 @@
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/FormatAdapters.h"
+#include "llvm/Support/Path.h"
 #include "llvm/Support/PrettyStackTrace.h"
 #include "llvm/Support/Signals.h"
 #include "llvm/TableGen/Error.h"
@@ -1129,7 +1130,7 @@ void PatternEmitter::emit(StringRef rewriteName) {
   LLVM_DEBUG(llvm::dbgs() << "done collecting ops used in result patterns\n");
 
   // Emit RewritePattern for Pattern.
-  auto locs = pattern.getLocation();
+  auto locs = pattern.getLocation(/*forSourceOutput=*/true);
   os << formatv("/* Generated from:\n    {0:$[ instantiating\n    ]}\n*/\n",
                 llvm::reverse(locs));
   os << formatv(R"(struct {0} : public ::mlir::RewritePattern {

@github-actions
Copy link

github-actions bot commented Nov 21, 2025

🐧 Linux x64 Test Results

  • 7117 tests passed
  • 594 tests skipped

@ftynse
Copy link
Member

ftynse commented Nov 21, 2025

I think this is okay as long as we are not calling all of the files Patterns.td and just differentiate them by source tree paths. An alternative could be to try to somehow keep the path relative to the root of the source tree, e.g., by looking for llvm-project/mlir, though it's not guaranteed since the root directory name may change (I have multiple llvms for example).

Copy link
Member

@ftynse ftynse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please give other folks time to chime in

@krzysz00 krzysz00 merged commit af0fcf8 into llvm:main Nov 25, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mlir:core MLIR Core Infrastructure mlir

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants