Skip to content

Conversation

mgehre-amd
Copy link
Contributor

@mgehre-amd mgehre-amd commented Jan 17, 2025

A emitc.file represents a file that can be emitted
into a single C++ file.

This allows to manage multiple source files within the same MLIR module,
but emit them into separate files.

This feature is opt-in.
By default, mlir-translate emits all ops outside of emitc.file
and ignores all emitc.file ops and their bodies.

When specifying the -file-id=id flag,
mlir-translate emits all ops outside of emitc.file and
the ops within the emitc.file with matching id.

Example:

emitc.file "main" {
  func @func_one() {
    return
  }
}
emitc.file "test" {
  func @func_two() {
   return
  }
}

mlir-translate -file-id=main will emit func_one and
mlir-translate -file-id=test will emit func_two.

@llvmbot
Copy link
Member

llvmbot commented Jan 17, 2025

@llvm/pr-subscribers-mlir-emitc

Author: Matthias Gehre (mgehre-amd)

Changes

A emitc.tu represents a translation unit that can be emitted into a single C++ file.

This allows to manage multiple translation units within the same MLIR module, but emit them into separate files.
This feature is opt-in; by default,
mlir-translate will continue to emit the whole module.

When specifying the -translation-unit-id=id flag, mlir-translate emits only the translation unit with that id.

Example:

emitc.tu "main" {
  func @<!-- -->func_one() {
    return
  }
}
emitc.tu "test" {
  func @<!-- -->func_two() {
   return
  }
}

mlir-translate -translation-unit-id=main will emit func_one and mlir-translate -translation-unit-id=test will emit func_two.


Full diff: https://github.com/llvm/llvm-project/pull/123298.diff

6 Files Affected:

  • (modified) mlir/include/mlir/Dialect/EmitC/IR/EmitC.td (+49)
  • (modified) mlir/include/mlir/Target/Cpp/CppEmitter.h (+3-1)
  • (modified) mlir/lib/Dialect/EmitC/IR/EmitC.cpp (+10)
  • (modified) mlir/lib/Target/Cpp/TranslateRegistration.cpp (+7-1)
  • (modified) mlir/lib/Target/Cpp/TranslateToCpp.cpp (+31-7)
  • (added) mlir/test/Target/Cpp/tu.mlir (+29)
diff --git a/mlir/include/mlir/Dialect/EmitC/IR/EmitC.td b/mlir/include/mlir/Dialect/EmitC/IR/EmitC.td
index b16f5a8619fe7b..1fe4e34b3fa5ab 100644
--- a/mlir/include/mlir/Dialect/EmitC/IR/EmitC.td
+++ b/mlir/include/mlir/Dialect/EmitC/IR/EmitC.td
@@ -23,6 +23,7 @@ include "mlir/Interfaces/FunctionInterfaces.td"
 include "mlir/Interfaces/SideEffectInterfaces.td"
 include "mlir/IR/OpAsmInterface.td"
 include "mlir/IR/RegionKindInterface.td"
+include "mlir/IR/BuiltinAttributes.td"
 
 //===----------------------------------------------------------------------===//
 // EmitC op definitions
@@ -56,6 +57,54 @@ def IntegerIndexOrOpaqueType : Type<CPred<"emitc::isIntegerIndexOrOpaqueType($_s
 "integer, index or opaque type supported by EmitC">;
 def FloatIntegerIndexOrOpaqueType : AnyTypeOf<[EmitCFloatType, IntegerIndexOrOpaqueType]>;
 
+def EmitC_TranslationUnitOp
+    : EmitC_Op<"tu", [IsolatedFromAbove, NoRegionArguments, SymbolTable,
+                      OpAsmOpInterface]#GraphRegionNoTerminator.traits> {
+  let summary = "A translation unit container operation";
+  let description = [{
+    A `tu` represents a translation unit that can be emitted
+    into a single C++ file.
+
+    `mlir-translate` emits only the translation unit selected via
+    the `-translation-unit-id=id` flag. By default, no translation units are
+    emitted.
+
+    Example:
+
+    ```mlir
+    emitc.tu "main" {
+      emitc.func @func_one() {
+        emitc.return
+      }
+    }
+    ```
+  }];
+
+  let arguments = (ins Builtin_StringAttr:$id);
+  let regions = (region SizedRegion<1>:$bodyRegion);
+
+  let assemblyFormat = "$id attr-dict-with-keyword $bodyRegion";
+  let builders = [OpBuilder<(ins CArg<"StringRef">:$id)>];
+  let extraClassDeclaration = [{
+    /// Construct a module from the given location with an optional name.
+    static TranslationUnitOp create(Location loc, StringRef name);
+
+    //===------------------------------------------------------------------===//
+    // OpAsmOpInterface Methods
+    //===------------------------------------------------------------------===//
+
+    /// EmitC ops in the body of the translation_unit can omit their 'emitc.'
+    /// prefix in the assembly.
+    static ::llvm::StringRef getDefaultDialect() {
+      return "emitc";
+    }
+  }];
+
+  // We need to ensure that the body region has a block;
+  // the auto-generated builders do not guarantee that.
+  let skipDefaultBuilders = 1;
+}
+
 def EmitC_AddOp : EmitC_BinaryOp<"add", [CExpression]> {
   let summary = "Addition operation";
   let description = [{
diff --git a/mlir/include/mlir/Target/Cpp/CppEmitter.h b/mlir/include/mlir/Target/Cpp/CppEmitter.h
index 99d8696cc8e077..d76cfc9107332e 100644
--- a/mlir/include/mlir/Target/Cpp/CppEmitter.h
+++ b/mlir/include/mlir/Target/Cpp/CppEmitter.h
@@ -14,6 +14,7 @@
 #define MLIR_TARGET_CPP_CPPEMITTER_H
 
 #include "mlir/Support/LLVM.h"
+#include "llvm/ADT/StringRef.h"
 
 namespace mlir {
 class Operation;
@@ -24,7 +25,8 @@ namespace emitc {
 /// 'declareVariablesAtTop' enforces that all variables for op results and block
 /// arguments are declared at the beginning of the function.
 LogicalResult translateToCpp(Operation *op, raw_ostream &os,
-                             bool declareVariablesAtTop = false);
+                             bool declareVariablesAtTop = false,
+                             StringRef onlyTu = "");
 } // namespace emitc
 } // namespace mlir
 
diff --git a/mlir/lib/Dialect/EmitC/IR/EmitC.cpp b/mlir/lib/Dialect/EmitC/IR/EmitC.cpp
index fdc21d6c6e24b9..b51221b721dde3 100644
--- a/mlir/lib/Dialect/EmitC/IR/EmitC.cpp
+++ b/mlir/lib/Dialect/EmitC/IR/EmitC.cpp
@@ -1289,6 +1289,16 @@ void SwitchOp::getRegionInvocationBounds(
     bounds.emplace_back(/*lb=*/0, /*ub=*/regIndex == liveIndex);
 }
 
+//===----------------------------------------------------------------------===//
+// TranslationUnitOp
+//===----------------------------------------------------------------------===//
+void TranslationUnitOp::build(OpBuilder &builder, OperationState &state,
+                              StringRef id) {
+  state.addRegion()->emplaceBlock();
+  state.attributes.push_back(
+      builder.getNamedAttr("id", builder.getStringAttr(id)));
+}
+
 //===----------------------------------------------------------------------===//
 // TableGen'd op method definitions
 //===----------------------------------------------------------------------===//
diff --git a/mlir/lib/Target/Cpp/TranslateRegistration.cpp b/mlir/lib/Target/Cpp/TranslateRegistration.cpp
index 1aa98834a73f49..7e2bc9ad012b38 100644
--- a/mlir/lib/Target/Cpp/TranslateRegistration.cpp
+++ b/mlir/lib/Target/Cpp/TranslateRegistration.cpp
@@ -29,12 +29,18 @@ void registerToCppTranslation() {
       llvm::cl::desc("Declare variables at top when emitting C/C++"),
       llvm::cl::init(false));
 
+  static llvm::cl::opt<std::string> onlyTu(
+      "translation-unit-id",
+      llvm::cl::desc("Only emit the translation unit with the matching id"),
+      llvm::cl::init(""));
+
   TranslateFromMLIRRegistration reg(
       "mlir-to-cpp", "translate from mlir to cpp",
       [](Operation *op, raw_ostream &output) {
         return emitc::translateToCpp(
             op, output,
-            /*declareVariablesAtTop=*/declareVariablesAtTop);
+            /*declareVariablesAtTop=*/declareVariablesAtTop,
+            /*onlyTu=*/onlyTu);
       },
       [](DialectRegistry &registry) {
         // clang-format off
diff --git a/mlir/lib/Target/Cpp/TranslateToCpp.cpp b/mlir/lib/Target/Cpp/TranslateToCpp.cpp
index a91f5ab9311401..c9d66ad349db52 100644
--- a/mlir/lib/Target/Cpp/TranslateToCpp.cpp
+++ b/mlir/lib/Target/Cpp/TranslateToCpp.cpp
@@ -114,7 +114,8 @@ static FailureOr<int> getOperatorPrecedence(Operation *operation) {
 namespace {
 /// Emitter that uses dialect specific emitters to emit C++ code.
 struct CppEmitter {
-  explicit CppEmitter(raw_ostream &os, bool declareVariablesAtTop);
+  explicit CppEmitter(raw_ostream &os, bool declareVariablesAtTop,
+                      StringRef onlyTu);
 
   /// Emits attribute or returns failure.
   LogicalResult emitAttribute(Location loc, Attribute attr);
@@ -231,6 +232,9 @@ struct CppEmitter {
   /// be declared at the beginning of a function.
   bool shouldDeclareVariablesAtTop() { return declareVariablesAtTop; };
 
+  /// Returns whether this translation unit should be emitted
+  bool shouldEmitTu(TranslationUnitOp tu) { return tu.getId() == onlyTu; }
+
   /// Get expression currently being emitted.
   ExpressionOp getEmittedExpression() { return emittedExpression; }
 
@@ -258,6 +262,9 @@ struct CppEmitter {
   /// includes results from ops located in nested regions.
   bool declareVariablesAtTop;
 
+  /// Only emit translation units whos id matches this value.
+  std::string onlyTu;
+
   /// Map from value to name of C++ variable that contain the name.
   ValueMapper valueMapper;
 
@@ -960,6 +967,19 @@ static LogicalResult printOperation(CppEmitter &emitter, ModuleOp moduleOp) {
   return success();
 }
 
+static LogicalResult printOperation(CppEmitter &emitter, TranslationUnitOp tu) {
+  if (!emitter.shouldEmitTu(tu))
+    return success();
+
+  CppEmitter::Scope scope(emitter);
+
+  for (Operation &op : tu) {
+    if (failed(emitter.emitOperation(op, /*trailingSemicolon=*/false)))
+      return failure();
+  }
+  return success();
+}
+
 static LogicalResult printFunctionArgs(CppEmitter &emitter,
                                        Operation *functionOp,
                                        ArrayRef<Type> arguments) {
@@ -1159,8 +1179,10 @@ static LogicalResult printOperation(CppEmitter &emitter,
   return success();
 }
 
-CppEmitter::CppEmitter(raw_ostream &os, bool declareVariablesAtTop)
-    : os(os), declareVariablesAtTop(declareVariablesAtTop) {
+CppEmitter::CppEmitter(raw_ostream &os, bool declareVariablesAtTop,
+                       StringRef onlyTu)
+    : os(os), declareVariablesAtTop(declareVariablesAtTop),
+      onlyTu(onlyTu.str()) {
   valueInScopeCount.push(0);
   labelInScopeCount.push(0);
 }
@@ -1561,8 +1583,9 @@ LogicalResult CppEmitter::emitOperation(Operation &op, bool trailingSemicolon) {
                 emitc::GlobalOp, emitc::IfOp, emitc::IncludeOp, emitc::LoadOp,
                 emitc::LogicalAndOp, emitc::LogicalNotOp, emitc::LogicalOrOp,
                 emitc::MulOp, emitc::RemOp, emitc::ReturnOp, emitc::SubOp,
-                emitc::SwitchOp, emitc::UnaryMinusOp, emitc::UnaryPlusOp,
-                emitc::VariableOp, emitc::VerbatimOp>(
+                emitc::SwitchOp, emitc::TranslationUnitOp, emitc::UnaryMinusOp,
+                emitc::UnaryPlusOp, emitc::VariableOp, emitc::VerbatimOp>(
+
               [&](auto op) { return printOperation(*this, op); })
           // Func ops.
           .Case<func::CallOp, func::FuncOp, func::ReturnOp>(
@@ -1742,7 +1765,8 @@ LogicalResult CppEmitter::emitTupleType(Location loc, ArrayRef<Type> types) {
 }
 
 LogicalResult emitc::translateToCpp(Operation *op, raw_ostream &os,
-                                    bool declareVariablesAtTop) {
-  CppEmitter emitter(os, declareVariablesAtTop);
+                                    bool declareVariablesAtTop,
+                                    StringRef onlyTu) {
+  CppEmitter emitter(os, declareVariablesAtTop, onlyTu);
   return emitter.emitOperation(*op, /*trailingSemicolon=*/false);
 }
diff --git a/mlir/test/Target/Cpp/tu.mlir b/mlir/test/Target/Cpp/tu.mlir
new file mode 100644
index 00000000000000..ca10e0263a64fc
--- /dev/null
+++ b/mlir/test/Target/Cpp/tu.mlir
@@ -0,0 +1,29 @@
+// RUN: mlir-translate -mlir-to-cpp %s | FileCheck %s --check-prefix NO-FILTER
+// RUN: mlir-translate -mlir-to-cpp -translation-unit-id=non-existing %s | FileCheck %s --check-prefix NON-EXISTING
+// RUN: mlir-translate -mlir-to-cpp -translation-unit-id=tu_one %s | FileCheck %s --check-prefix TU-ONE
+// RUN: mlir-translate -mlir-to-cpp -translation-unit-id=tu_two %s | FileCheck %s --check-prefix TU-TWO
+
+
+// NO-FILTER-NOT: func_one
+// NO-FILTER-NOT: func_two
+
+// NON-EXISTING-NOT: func_one
+// NON-EXISTING-NOT: func_two
+
+// TU-ONE: func_one
+// TU-ONE-NOT: func_two
+
+// TU-TWO-NOT: func_one
+// TU-TWO: func_two
+
+emitc.tu "tu_one" {
+  emitc.func @func_one(%arg: f32) {
+    emitc.return
+  }
+}
+
+emitc.tu "tu_two" {
+  emitc.func @func_two(%arg: f32) {
+    emitc.return
+  }
+}

@llvmbot
Copy link
Member

llvmbot commented Jan 17, 2025

@llvm/pr-subscribers-mlir

Author: Matthias Gehre (mgehre-amd)

Changes

A emitc.tu represents a translation unit that can be emitted into a single C++ file.

This allows to manage multiple translation units within the same MLIR module, but emit them into separate files.
This feature is opt-in; by default,
mlir-translate will continue to emit the whole module.

When specifying the -translation-unit-id=id flag, mlir-translate emits only the translation unit with that id.

Example:

emitc.tu "main" {
  func @<!-- -->func_one() {
    return
  }
}
emitc.tu "test" {
  func @<!-- -->func_two() {
   return
  }
}

mlir-translate -translation-unit-id=main will emit func_one and mlir-translate -translation-unit-id=test will emit func_two.


Full diff: https://github.com/llvm/llvm-project/pull/123298.diff

6 Files Affected:

  • (modified) mlir/include/mlir/Dialect/EmitC/IR/EmitC.td (+49)
  • (modified) mlir/include/mlir/Target/Cpp/CppEmitter.h (+3-1)
  • (modified) mlir/lib/Dialect/EmitC/IR/EmitC.cpp (+10)
  • (modified) mlir/lib/Target/Cpp/TranslateRegistration.cpp (+7-1)
  • (modified) mlir/lib/Target/Cpp/TranslateToCpp.cpp (+31-7)
  • (added) mlir/test/Target/Cpp/tu.mlir (+29)
diff --git a/mlir/include/mlir/Dialect/EmitC/IR/EmitC.td b/mlir/include/mlir/Dialect/EmitC/IR/EmitC.td
index b16f5a8619fe7b..1fe4e34b3fa5ab 100644
--- a/mlir/include/mlir/Dialect/EmitC/IR/EmitC.td
+++ b/mlir/include/mlir/Dialect/EmitC/IR/EmitC.td
@@ -23,6 +23,7 @@ include "mlir/Interfaces/FunctionInterfaces.td"
 include "mlir/Interfaces/SideEffectInterfaces.td"
 include "mlir/IR/OpAsmInterface.td"
 include "mlir/IR/RegionKindInterface.td"
+include "mlir/IR/BuiltinAttributes.td"
 
 //===----------------------------------------------------------------------===//
 // EmitC op definitions
@@ -56,6 +57,54 @@ def IntegerIndexOrOpaqueType : Type<CPred<"emitc::isIntegerIndexOrOpaqueType($_s
 "integer, index or opaque type supported by EmitC">;
 def FloatIntegerIndexOrOpaqueType : AnyTypeOf<[EmitCFloatType, IntegerIndexOrOpaqueType]>;
 
+def EmitC_TranslationUnitOp
+    : EmitC_Op<"tu", [IsolatedFromAbove, NoRegionArguments, SymbolTable,
+                      OpAsmOpInterface]#GraphRegionNoTerminator.traits> {
+  let summary = "A translation unit container operation";
+  let description = [{
+    A `tu` represents a translation unit that can be emitted
+    into a single C++ file.
+
+    `mlir-translate` emits only the translation unit selected via
+    the `-translation-unit-id=id` flag. By default, no translation units are
+    emitted.
+
+    Example:
+
+    ```mlir
+    emitc.tu "main" {
+      emitc.func @func_one() {
+        emitc.return
+      }
+    }
+    ```
+  }];
+
+  let arguments = (ins Builtin_StringAttr:$id);
+  let regions = (region SizedRegion<1>:$bodyRegion);
+
+  let assemblyFormat = "$id attr-dict-with-keyword $bodyRegion";
+  let builders = [OpBuilder<(ins CArg<"StringRef">:$id)>];
+  let extraClassDeclaration = [{
+    /// Construct a module from the given location with an optional name.
+    static TranslationUnitOp create(Location loc, StringRef name);
+
+    //===------------------------------------------------------------------===//
+    // OpAsmOpInterface Methods
+    //===------------------------------------------------------------------===//
+
+    /// EmitC ops in the body of the translation_unit can omit their 'emitc.'
+    /// prefix in the assembly.
+    static ::llvm::StringRef getDefaultDialect() {
+      return "emitc";
+    }
+  }];
+
+  // We need to ensure that the body region has a block;
+  // the auto-generated builders do not guarantee that.
+  let skipDefaultBuilders = 1;
+}
+
 def EmitC_AddOp : EmitC_BinaryOp<"add", [CExpression]> {
   let summary = "Addition operation";
   let description = [{
diff --git a/mlir/include/mlir/Target/Cpp/CppEmitter.h b/mlir/include/mlir/Target/Cpp/CppEmitter.h
index 99d8696cc8e077..d76cfc9107332e 100644
--- a/mlir/include/mlir/Target/Cpp/CppEmitter.h
+++ b/mlir/include/mlir/Target/Cpp/CppEmitter.h
@@ -14,6 +14,7 @@
 #define MLIR_TARGET_CPP_CPPEMITTER_H
 
 #include "mlir/Support/LLVM.h"
+#include "llvm/ADT/StringRef.h"
 
 namespace mlir {
 class Operation;
@@ -24,7 +25,8 @@ namespace emitc {
 /// 'declareVariablesAtTop' enforces that all variables for op results and block
 /// arguments are declared at the beginning of the function.
 LogicalResult translateToCpp(Operation *op, raw_ostream &os,
-                             bool declareVariablesAtTop = false);
+                             bool declareVariablesAtTop = false,
+                             StringRef onlyTu = "");
 } // namespace emitc
 } // namespace mlir
 
diff --git a/mlir/lib/Dialect/EmitC/IR/EmitC.cpp b/mlir/lib/Dialect/EmitC/IR/EmitC.cpp
index fdc21d6c6e24b9..b51221b721dde3 100644
--- a/mlir/lib/Dialect/EmitC/IR/EmitC.cpp
+++ b/mlir/lib/Dialect/EmitC/IR/EmitC.cpp
@@ -1289,6 +1289,16 @@ void SwitchOp::getRegionInvocationBounds(
     bounds.emplace_back(/*lb=*/0, /*ub=*/regIndex == liveIndex);
 }
 
+//===----------------------------------------------------------------------===//
+// TranslationUnitOp
+//===----------------------------------------------------------------------===//
+void TranslationUnitOp::build(OpBuilder &builder, OperationState &state,
+                              StringRef id) {
+  state.addRegion()->emplaceBlock();
+  state.attributes.push_back(
+      builder.getNamedAttr("id", builder.getStringAttr(id)));
+}
+
 //===----------------------------------------------------------------------===//
 // TableGen'd op method definitions
 //===----------------------------------------------------------------------===//
diff --git a/mlir/lib/Target/Cpp/TranslateRegistration.cpp b/mlir/lib/Target/Cpp/TranslateRegistration.cpp
index 1aa98834a73f49..7e2bc9ad012b38 100644
--- a/mlir/lib/Target/Cpp/TranslateRegistration.cpp
+++ b/mlir/lib/Target/Cpp/TranslateRegistration.cpp
@@ -29,12 +29,18 @@ void registerToCppTranslation() {
       llvm::cl::desc("Declare variables at top when emitting C/C++"),
       llvm::cl::init(false));
 
+  static llvm::cl::opt<std::string> onlyTu(
+      "translation-unit-id",
+      llvm::cl::desc("Only emit the translation unit with the matching id"),
+      llvm::cl::init(""));
+
   TranslateFromMLIRRegistration reg(
       "mlir-to-cpp", "translate from mlir to cpp",
       [](Operation *op, raw_ostream &output) {
         return emitc::translateToCpp(
             op, output,
-            /*declareVariablesAtTop=*/declareVariablesAtTop);
+            /*declareVariablesAtTop=*/declareVariablesAtTop,
+            /*onlyTu=*/onlyTu);
       },
       [](DialectRegistry &registry) {
         // clang-format off
diff --git a/mlir/lib/Target/Cpp/TranslateToCpp.cpp b/mlir/lib/Target/Cpp/TranslateToCpp.cpp
index a91f5ab9311401..c9d66ad349db52 100644
--- a/mlir/lib/Target/Cpp/TranslateToCpp.cpp
+++ b/mlir/lib/Target/Cpp/TranslateToCpp.cpp
@@ -114,7 +114,8 @@ static FailureOr<int> getOperatorPrecedence(Operation *operation) {
 namespace {
 /// Emitter that uses dialect specific emitters to emit C++ code.
 struct CppEmitter {
-  explicit CppEmitter(raw_ostream &os, bool declareVariablesAtTop);
+  explicit CppEmitter(raw_ostream &os, bool declareVariablesAtTop,
+                      StringRef onlyTu);
 
   /// Emits attribute or returns failure.
   LogicalResult emitAttribute(Location loc, Attribute attr);
@@ -231,6 +232,9 @@ struct CppEmitter {
   /// be declared at the beginning of a function.
   bool shouldDeclareVariablesAtTop() { return declareVariablesAtTop; };
 
+  /// Returns whether this translation unit should be emitted
+  bool shouldEmitTu(TranslationUnitOp tu) { return tu.getId() == onlyTu; }
+
   /// Get expression currently being emitted.
   ExpressionOp getEmittedExpression() { return emittedExpression; }
 
@@ -258,6 +262,9 @@ struct CppEmitter {
   /// includes results from ops located in nested regions.
   bool declareVariablesAtTop;
 
+  /// Only emit translation units whos id matches this value.
+  std::string onlyTu;
+
   /// Map from value to name of C++ variable that contain the name.
   ValueMapper valueMapper;
 
@@ -960,6 +967,19 @@ static LogicalResult printOperation(CppEmitter &emitter, ModuleOp moduleOp) {
   return success();
 }
 
+static LogicalResult printOperation(CppEmitter &emitter, TranslationUnitOp tu) {
+  if (!emitter.shouldEmitTu(tu))
+    return success();
+
+  CppEmitter::Scope scope(emitter);
+
+  for (Operation &op : tu) {
+    if (failed(emitter.emitOperation(op, /*trailingSemicolon=*/false)))
+      return failure();
+  }
+  return success();
+}
+
 static LogicalResult printFunctionArgs(CppEmitter &emitter,
                                        Operation *functionOp,
                                        ArrayRef<Type> arguments) {
@@ -1159,8 +1179,10 @@ static LogicalResult printOperation(CppEmitter &emitter,
   return success();
 }
 
-CppEmitter::CppEmitter(raw_ostream &os, bool declareVariablesAtTop)
-    : os(os), declareVariablesAtTop(declareVariablesAtTop) {
+CppEmitter::CppEmitter(raw_ostream &os, bool declareVariablesAtTop,
+                       StringRef onlyTu)
+    : os(os), declareVariablesAtTop(declareVariablesAtTop),
+      onlyTu(onlyTu.str()) {
   valueInScopeCount.push(0);
   labelInScopeCount.push(0);
 }
@@ -1561,8 +1583,9 @@ LogicalResult CppEmitter::emitOperation(Operation &op, bool trailingSemicolon) {
                 emitc::GlobalOp, emitc::IfOp, emitc::IncludeOp, emitc::LoadOp,
                 emitc::LogicalAndOp, emitc::LogicalNotOp, emitc::LogicalOrOp,
                 emitc::MulOp, emitc::RemOp, emitc::ReturnOp, emitc::SubOp,
-                emitc::SwitchOp, emitc::UnaryMinusOp, emitc::UnaryPlusOp,
-                emitc::VariableOp, emitc::VerbatimOp>(
+                emitc::SwitchOp, emitc::TranslationUnitOp, emitc::UnaryMinusOp,
+                emitc::UnaryPlusOp, emitc::VariableOp, emitc::VerbatimOp>(
+
               [&](auto op) { return printOperation(*this, op); })
           // Func ops.
           .Case<func::CallOp, func::FuncOp, func::ReturnOp>(
@@ -1742,7 +1765,8 @@ LogicalResult CppEmitter::emitTupleType(Location loc, ArrayRef<Type> types) {
 }
 
 LogicalResult emitc::translateToCpp(Operation *op, raw_ostream &os,
-                                    bool declareVariablesAtTop) {
-  CppEmitter emitter(os, declareVariablesAtTop);
+                                    bool declareVariablesAtTop,
+                                    StringRef onlyTu) {
+  CppEmitter emitter(os, declareVariablesAtTop, onlyTu);
   return emitter.emitOperation(*op, /*trailingSemicolon=*/false);
 }
diff --git a/mlir/test/Target/Cpp/tu.mlir b/mlir/test/Target/Cpp/tu.mlir
new file mode 100644
index 00000000000000..ca10e0263a64fc
--- /dev/null
+++ b/mlir/test/Target/Cpp/tu.mlir
@@ -0,0 +1,29 @@
+// RUN: mlir-translate -mlir-to-cpp %s | FileCheck %s --check-prefix NO-FILTER
+// RUN: mlir-translate -mlir-to-cpp -translation-unit-id=non-existing %s | FileCheck %s --check-prefix NON-EXISTING
+// RUN: mlir-translate -mlir-to-cpp -translation-unit-id=tu_one %s | FileCheck %s --check-prefix TU-ONE
+// RUN: mlir-translate -mlir-to-cpp -translation-unit-id=tu_two %s | FileCheck %s --check-prefix TU-TWO
+
+
+// NO-FILTER-NOT: func_one
+// NO-FILTER-NOT: func_two
+
+// NON-EXISTING-NOT: func_one
+// NON-EXISTING-NOT: func_two
+
+// TU-ONE: func_one
+// TU-ONE-NOT: func_two
+
+// TU-TWO-NOT: func_one
+// TU-TWO: func_two
+
+emitc.tu "tu_one" {
+  emitc.func @func_one(%arg: f32) {
+    emitc.return
+  }
+}
+
+emitc.tu "tu_two" {
+  emitc.func @func_two(%arg: f32) {
+    emitc.return
+  }
+}

Copy link
Contributor

@simon-camp simon-camp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@simon-camp
Copy link
Contributor

This feature is opt-in; by default,
mlir-translate will continue to emit the whole module.

Can you reformat this line, please; and maybe I'm reading this wrong, but I expected the code to emit all TUs if not filtered by the translation-unit-id flag. Should we reformulate to something like mlir-translate will continue to emit operations defined outside of tu operations?

@marbre
Copy link
Member

marbre commented Jan 17, 2025

Quick question before having a change to take a deeper look. Would it make sense to introduce an emitc.module instead of an emitc.tu? Only referring to the naming here.

@mgehre-amd
Copy link
Contributor Author

Quick question before having a change to take a deeper look. Would it make sense to introduce an emitc.module instead of an emitc.tu? Only referring to the naming here.

I don't mind the naming. In practice, we use a structure like

module {
  emitc.tu {
   func ...
  }
  emitc.tu {
   func ...
  }
}

and I liked that the emitc.tu was visually different from the enclosing module. But both is fine for me.

@mgehre-amd
Copy link
Contributor Author

This feature is opt-in; by default,
mlir-translate will continue to emit the whole module.

Can you reformat this line, please; and maybe I'm reading this wrong, but I expected the code to emit all TUs if not filtered by the translation-unit-id flag. Should we reformulate to something like mlir-translate will continue to emit operations defined outside of tu operations?

I have reformulated the description. Does that sound better?

@aniragil
Copy link
Contributor

Quick question before having a change to take a deeper look. Would it make sense to introduce an emitc.module instead of an emitc.tu? Only referring to the naming here.

I don't mind the naming. In practice, we use a structure like

module {
  emitc.tu {
   func ...
  }
  emitc.tu {
   func ...
  }
}

and I liked that the emitc.tu was visually different from the enclosing module. But both is fine for me.

At least in C99, "translation unit" seems to refer to the output of the preprocessor, i.e. the source file after all preprocessing directives have been expanded. As emitc supports emitting preprocessing directives, notably #include statements, using this term might be misleading. The standard's term for the file keeping the user's source code seems to be "source file" / "preprocessing file", so emitc.source_file / emitc.source might be more accurate.

As C++20 defined a C++ module construct, emitc.module may be conflicting / misleading as well.

@marbre
Copy link
Member

marbre commented Jan 17, 2025

At least in C99, "translation unit" seems to refer to the output of the preprocessor, i.e. the source file after all preprocessing directives have been expanded. As emitc supports emitting preprocessing directives, notably #include statements, using this term might be misleading. The standard's term for the file keeping the user's source code seems to be "source file" / "preprocessing file", so emitc.source_file / emitc.source might be more accurate.

As C++20 defined a C++ module construct, emitc.module may be conflicting / misleading as well.

Thanks, I didn't had C++20 modules in mind at that moment, but let's avoid this therefore.

@mgehre-amd
Copy link
Contributor Author

How about emitc.file? This would indicate clearly that the body of this op is intended to be generated into its own file, and it would stay at the syntactic level where most of emitc is.

@aniragil
Copy link
Contributor

Thinking about the semantics of the proposed op, such a file-scope emitc op may be useful beyond the functionality introduced by this patch, e.g. for

(a) Specifying the C dialect the emitc code belongs to, e.g. in order for the translator to know how to emit vector types when targeting OpenCL, GCC vector extensions etc.

(b) For offload/launch semantics such as in gpu.launch_func/gpu.module, which could prove useful for OpenCL, OpenMP etc.

(c) Validation: Utilizing standard op validation to verify that all its internal ops are emitc and potentially that these ops and types are valid in its designated C-dialect (e.g. vector types, address spaces, templates).

Would be good to discuss this op's designated uses to make sure we don't introduce backward compatibility later.
WDYT?

@marbre
Copy link
Member

marbre commented Jan 21, 2025

Thinking about the semantics of the proposed op, such a file-scope emitc op may be useful beyond the functionality introduced by this patch, e.g. for

(a) Specifying the C dialect the emitc code belongs to, e.g. in order for the translator to know how to emit vector types when targeting OpenCL, GCC vector extensions etc.

I think this needs to be modeled in the dialect. With this you wouldn't pass a flag to the emitter but of course the emitter would need to know how to handle those ops with dialect information.

(b) For offload/launch semantics such as in gpu.launch_func/gpu.module, which could prove useful for OpenCL, OpenMP etc.

Not sure I see how this is related to a file-scope as proposed by this patch.

(c) Validation: Utilizing standard op validation to verify that all its internal ops are emitc and potentially that these ops and types are valid in its designated C-dialect (e.g. vector types, address spaces, templates).

Adding a validation pass is something I started some time ago but didn't had time to finish. So yes, +1 from me. However, does it influence changed introduced by this patch?

Would be good to discuss this op's designated uses to make sure we don't introduce backward compatibility later. WDYT?

Makes sense, yes.

@marbre
Copy link
Member

marbre commented Jan 22, 2025

How about emitc.file? This would indicate clearly that the body of this op is intended to be generated into its own file, and it would stay at the syntactic level where most of emitc is.

Thinking more about the naming I actually also ended several times with emitc.file:

  • emitc.source_file is rather lengthy
  • emitc.source works but here the "source" gets emitted thus file might be the better fit
  • emitc.unit might work as well and is closer to tu but I am not really in favor of this

Any further opinions or suggestions?

@aniragil
Copy link
Contributor

Thinking about the semantics of the proposed op, such a file-scope emitc op may be useful beyond the functionality introduced by this patch, e.g. for
(a) Specifying the C dialect the emitc code belongs to, e.g. in order for the translator to know how to emit vector types when targeting OpenCL, GCC vector extensions etc.

I think this needs to be modeled in the dialect. With this you wouldn't pass a flag to the emitter but of course the emitter would need to know how to handle those ops with dialect information.

Exactly. It could also guide the lowering process, e.g. by scalarizing vectors for C, implementing tuples as std::tuple for C++ or as structs for C etc.

(b) For offload/launch semantics such as in gpu.launch_func/gpu.module, which could prove useful for OpenCL, OpenMP etc.

Not sure I see how this is related to a file-scope as proposed by this patch.

So in order to support gpu.launch_func, gpu.module is a symbol. We could add this later, but we might want to plan ahead. For instance, if we allow any string as the proposed id it might later be unusable as a valid symbol name. We could then add a second attribute for the symbol name or pose restrictions on the id attribute, with both options breaking backward compatibility of existing MLIR files.

Another aspect is the necessity of a dedicated emitc op. If its sole purpose is to provide a named scope, would the builtin module suffice (as IINM in the LLVM dialect)?

(c) Validation: Utilizing standard op validation to verify that all its internal ops are emitc and potentially that these ops and types are valid in its designated C-dialect (e.g. vector types, address spaces, templates).

Adding a validation pass is something I started some time ago but didn't had time to finish. So yes, +1 from me. However, does it influence changed introduced by this patch?

So I was thinking whether the op should have a ::verify() method. Unless the translator itself runs such a verifying pass (which I'm not sure translators are supposed to do) this pass would be optional, letting users call the translator on invalid files. If we implement ::verify() the op would have to be introduced after all lowering to emitc has been done. If we do this later on, we might be breaking downstream compilers which expected it to be usable earlier.

Would be good to discuss this op's designated uses to make sure we don't introduce backward compatibility later. WDYT?

Makes sense, yes.

Any other potential uses for this op?

@mgehre-amd
Copy link
Contributor Author

So in order to support gpu.launch_func, gpu.module is a symbol. We could add this later, but we might want to plan ahead. >For instance, if we allow any string as the proposed id it might later be unusable as a valid symbol name. We could then add a second attribute for the symbol name or pose restrictions on the id attribute, with both options breaking backward compatibility of existing MLIR files.

Adding an additional attribute later would not break compatibility. And we don't know yet if we actually going to get this, right?

Another aspect is the necessity of a dedicated emitc op. If its sole purpose is to provide a named scope, would the builtin module suffice (as IINM in the LLVM dialect)?

I think that would work. It would limit us if we want to extend it in the future, e.g. by adding language standards to the scope.

(c) Validation: Utilizing standard op validation to verify that all its internal ops are emitc and potentially that these ops and types are valid in its designated C-dialect (e.g. vector types, address spaces, templates).

Adding a validation pass is something I started some time ago but didn't had time to finish. So yes, +1 from me. However, does it influence changed introduced by this patch?

So I was thinking whether the op should have a ::verify() method. Unless the translator itself runs such a verifying pass (which I'm not sure translators are supposed to do) this pass would be optional, letting users call the translator on invalid files. If we implement ::verify() the op would have to be introduced after all lowering to emitc has been done. If we do this later on, we might be breaking downstream compilers which expected it to be usable earlier.

I'm opposed to recursive verification in ::verify as this is very costly and ::verify gets called a lot. Others have done this heavy validation via separate validation passes (e.g. TOSA).

Would be good to discuss this op's designated uses to make sure we don't introduce backward compatibility later. WDYT?

Makes sense, yes.

Any other potential uses for this op?

I think we should be pragmatic here. A source file is a source file. If we need different more elaborate constructs, there is always the option to add new ops, or change existing onces. MLIR/emitc have not been backwards compatible, so we shouldn't make it too hard for us.

In summary, I see consensus to rename this to emitc.file.

Is there any objection to afterwards get this merged? I heard many ideas about possible extensions,
but I don't see how the current design (an operation with an identifier) would make any of them impossible.

@marbre
Copy link
Member

marbre commented Jan 23, 2025

Adding an additional attribute later would not break compatibility. And we don't know yet if we actually going to get this, right?

Yes, that sounds reasonable to me.

So I was thinking whether the op should have a ::verify() method. Unless the translator itself runs such a verifying pass (which I'm not sure translators are supposed to do) this pass would be optional, letting users call the translator on invalid files. If we implement ::verify() the op would have to be introduced after all lowering to emitc has been done. If we do this later on, we might be breaking downstream compilers which expected it to be usable earlier.

I'm opposed to recursive verification in ::verify as this is very costly and ::verify gets called a lot. Others have done this heavy validation via separate validation passes (e.g. TOSA).

Yes, with Adding a validation pass in my comment above I was referring to a validation pass like the one TOSA has to check for compliance with a TOSA spec. A downstream user would still need to run this pass (or add it to a pipeline) but that should be acceptable.

Any other potential uses for this op?

I think we should be pragmatic here. A source file is a source file. If we need different more elaborate constructs, there is always the option to add new ops, or change existing onces. MLIR/emitc have not been backwards compatible, so we shouldn't make it too hard for us.

👍

In summary, I see consensus to rename this to emitc.file.

Is there any objection to afterwards get this merged? I heard many ideas about possible extensions, but I don't see how the current design (an operation with an identifier) would make any of them impossible.

No objections. I would say, lets go ahead with emitc.file.

@aniragil
Copy link
Contributor

Adding an additional attribute later would not break compatibility. And we don't know yet if we actually going to get this, right?

Yes, that sounds reasonable to me.

So I was thinking whether the op should have a ::verify() method. Unless the translator itself runs such a verifying pass (which I'm not sure translators are supposed to do) this pass would be optional, letting users call the translator on invalid files. If we implement ::verify() the op would have to be introduced after all lowering to emitc has been done. If we do this later on, we might be breaking downstream compilers which expected it to be usable earlier.

I'm opposed to recursive verification in ::verify as this is very costly and ::verify gets called a lot. Others have done this heavy validation via separate validation passes (e.g. TOSA).

Yes, with Adding a validation pass in my comment above I was referring to a validation pass like the one TOSA has to check for compliance with a TOSA spec. A downstream user would still need to run this pass (or add it to a pipeline) but that should be acceptable.

Any other potential uses for this op?

I think we should be pragmatic here. A source file is a source file. If we need different more elaborate constructs, there is always the option to add new ops, or change existing onces. MLIR/emitc have not been backwards compatible, so we shouldn't make it too hard for us.

👍

In summary, I see consensus to rename this to emitc.file.
Is there any objection to afterwards get this merged? I heard many ideas about possible extensions, but I don't see how the current design (an operation with an identifier) would make any of them impossible.

Yes, IIUC we're all OK with changing its behavior later as needed, so let's move forward.

No objections. I would say, lets go ahead with emitc.file.

Agreed.

@mgehre-amd mgehre-amd changed the title [MLIR] emitc: Add emitc translation unit op [MLIR] emitc: Add emitc.file op Jan 28, 2025
A `emitc.file` represents a file that can be emitted
into a single C++ file.

This allows to manage multiple source files within the same MLIR module,
but emit them into separate files.

This feature is opt-in.
By default, `mlir-translate` emits all ops outside of `emitc.file`
and ignores all `emitc.file` ops and their bodies.

When specifying the `-file-id=id` flag,
`mlir-translate` emits all ops outside of `emitc.file` and
the ops within the `emitc.file` with matching `id`.

Example:

```mlir
emitc.file "main" {
  func @func_one() {
    return
  }
}
emitc.file "test" {
  func @func_two() {
   return
  }
}
```

`mlir-translate -file-id=main` will emit `func_one` and
`mlir-translate -file-id=test` will emit `func_two`.
@mgehre-amd
Copy link
Contributor Author

I updated the PR to emitc.file

Copy link
Member

@marbre marbre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some quick comments.

@mgehre-amd mgehre-amd requested a review from marbre January 30, 2025 09:31
@mgehre-amd mgehre-amd requested a review from aniragil February 4, 2025 11:12
Copy link
Member

@marbre marbre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mgehre-amd mgehre-amd merged commit 4cc7d60 into llvm:main Feb 18, 2025
8 checks passed
@mgehre-amd mgehre-amd deleted the matthias.emitc_tu branch February 18, 2025 14:21
wldfngrs pushed a commit to wldfngrs/llvm-project that referenced this pull request Feb 19, 2025
A `emitc.file` represents a file that can be emitted
into a single C++ file.

This allows to manage multiple source files within the same MLIR module,
but emit them into separate files.

This feature is opt-in.
By default, `mlir-translate` emits all ops outside of `emitc.file`
and ignores all `emitc.file` ops and their bodies.

When specifying the `-file-id=id` flag,
`mlir-translate` emits all ops outside of `emitc.file` and
the ops within the `emitc.file` with matching `id`.

Example:

```mlir
emitc.file "main" {
  func @func_one() {
    return
  }
}
emitc.file "test" {
  func @func_two() {
   return
  }
}
```

`mlir-translate -file-id=main` will emit `func_one` and
`mlir-translate -file-id=test` will emit `func_two`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants