Skip to content

Conversation

@DataCorrupted
Copy link
Member

@DataCorrupted DataCorrupted commented Nov 13, 2025

We often wish to iterate over all function definitions in a module for transformations.
Such API doesn't exists yet, so what people normally used is to iterate over all functions and skip the declarations:

for (auto& F: M) {
    if (F.isDeclaration()) continue;
}

A regex search of the following pattern shows 100+ hits inside llvm, let along other patterns, e.g. collect all definitions in a module into a worklist.

for \(.*F.*\).*\n.*if \(F.isDeclaration\(\)\).*\n.*continue;

This pattern is verbose and hard to review in a long loop.
As an alternative, this patch provides a new API that iterates over definitions only.

Tests:

  1. Added unit test
  2. Touching @svkeerthy 's tool (also to show the cleanness of the new API).

Note: function_definitions seems a bit long, I'm open to suggestions to other names.

@llvmbot
Copy link
Member

llvmbot commented Nov 13, 2025

@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-mlgo

Author: Peter Rong (DataCorrupted)

Changes

We often wish to iterate over all function definitions in a module for transformations.
Such API doesn't exists yet, so what people normally used is to iterate over all functions and skip the declarations:

for (auto& F: M) {
    if (F.isDeclaration()) contiue;
}

A regex search of the following pattern shows 100+ hits inside llvm, let along other patterns, e.g. collect all definitions in a module into a worklist.

for \(.*F.*\).*\n.*if \(F.isDeclaration\(\)\).*\n.*continue;

This pattern is verbose and hard to review in a long loop.
As an alternative, this patch provides a new API that iterates over definitions only.

Tests:

  1. Added unit test
  2. Touching @svkeerthy 's tool (also to show the cleanness of the new API).

Note: function_definitions seems a bit long, I'm open to suggestions to other names.


Full diff: https://github.com/llvm/llvm-project/pull/167972.diff

3 Files Affected:

  • (modified) llvm/include/llvm/IR/Module.h (+11)
  • (modified) llvm/tools/llvm-ir2vec/llvm-ir2vec.cpp (+3-9)
  • (modified) llvm/unittests/IR/ModuleTest.cpp (+78)
diff --git a/llvm/include/llvm/IR/Module.h b/llvm/include/llvm/IR/Module.h
index a99937a90cbb7..f307d8748c6c9 100644
--- a/llvm/include/llvm/IR/Module.h
+++ b/llvm/include/llvm/IR/Module.h
@@ -710,6 +710,17 @@ class LLVM_ABI Module {
     return make_range(begin(), end());
   }
 
+  /// Get an iterator range over all function definitions (excluding
+  /// declarations).
+  auto function_definitions() {
+    return make_filter_range(functions(),
+                             [](Function &F) { return !F.isDeclaration(); });
+  }
+  auto function_definitions() const {
+    return make_filter_range(
+        functions(), [](const Function &F) { return !F.isDeclaration(); });
+  }
+
 /// @}
 /// @name Alias Iteration
 /// @{
diff --git a/llvm/tools/llvm-ir2vec/llvm-ir2vec.cpp b/llvm/tools/llvm-ir2vec/llvm-ir2vec.cpp
index 7402782bfd404..19ecee9788f0e 100644
--- a/llvm/tools/llvm-ir2vec/llvm-ir2vec.cpp
+++ b/llvm/tools/llvm-ir2vec/llvm-ir2vec.cpp
@@ -395,9 +395,7 @@ class MIR2VecTool {
   /// FIXME: Use --target option to get target info directly, avoiding the need
   /// to parse machine functions for pre-training operations.
   bool initializeVocabularyForLayout(const Module &M) {
-    for (const Function &F : M) {
-      if (F.isDeclaration())
-        continue;
+    for (const Function &F : M.function_definitions()) {
 
       MachineFunction *MF = MMI.getMachineFunction(F);
       if (!MF)
@@ -431,9 +429,7 @@ class MIR2VecTool {
     std::string Relationships;
     raw_string_ostream RelOS(Relationships);
 
-    for (const Function &F : M) {
-      if (F.isDeclaration())
-        continue;
+    for (const Function &F : M.function_definitions()) {
 
       MachineFunction *MF = MMI.getMachineFunction(F);
       if (!MF) {
@@ -532,9 +528,7 @@ class MIR2VecTool {
       return;
     }
 
-    for (const Function &F : M) {
-      if (F.isDeclaration())
-        continue;
+    for (const Function &F : M.function_definitions()) {
 
       MachineFunction *MF = MMI.getMachineFunction(F);
       if (!MF) {
diff --git a/llvm/unittests/IR/ModuleTest.cpp b/llvm/unittests/IR/ModuleTest.cpp
index 30eda738020d0..609cd0d580f96 100644
--- a/llvm/unittests/IR/ModuleTest.cpp
+++ b/llvm/unittests/IR/ModuleTest.cpp
@@ -433,4 +433,82 @@ define void @Foo2() {
   ASSERT_EQ(M2Str, M1Print);
 }
 
+TEST(ModuleTest, FunctionDefinitions) {
+  // Test function_definitions() method which returns only functions with bodies
+  LLVMContext Context;
+  SMDiagnostic Err;
+  std::unique_ptr<Module> M = parseAssemblyString(R"(
+declare void @Decl1()
+declare void @Decl2()
+
+define void @Def1() {
+  ret void
+}
+
+define void @Def2() {
+  ret void
+}
+
+declare void @Decl3()
+
+define void @Def3() {
+  ret void
+}
+)",
+                                                  Err, Context);
+  ASSERT_TRUE(M);
+
+  // Count total functions (should be 6: 3 declarations + 3 definitions)
+  size_t TotalFunctions = 0;
+  for (Function &F : *M) {
+    (void)F;
+    ++TotalFunctions;
+  }
+  EXPECT_EQ(TotalFunctions, 6u);
+
+  // Count function definitions only (should be 3)
+  size_t DefinitionCount = 0;
+  for (Function &F : M->function_definitions()) {
+    EXPECT_FALSE(F.isDeclaration());
+    ++DefinitionCount;
+  }
+  EXPECT_EQ(DefinitionCount, 3u);
+
+  // Verify the names of the definitions
+  auto DefRange = M->function_definitions();
+  auto It = DefRange.begin();
+  EXPECT_EQ(It->getName(), "Def1");
+  ++It;
+  EXPECT_EQ(It->getName(), "Def2");
+  ++It;
+  EXPECT_EQ(It->getName(), "Def3");
+  ++It;
+  EXPECT_EQ(It, DefRange.end());
+}
+
+TEST(ModuleTest, FunctionDefinitionsEmpty) {
+  // Test function_definitions() with no definitions (only declarations)
+  LLVMContext Context;
+  SMDiagnostic Err;
+  std::unique_ptr<Module> M = parseAssemblyString(R"(
+declare void @Decl1()
+declare void @Decl2()
+declare void @Decl3()
+)",
+                                                  Err, Context);
+  ASSERT_TRUE(M);
+
+  // Should have functions
+  EXPECT_FALSE(M->empty());
+  EXPECT_EQ(M->size(), 3u);
+
+  // But no definitions
+  size_t DefinitionCount = 0;
+  for (Function &F : M->function_definitions()) {
+    (void)F;
+    ++DefinitionCount;
+  }
+  EXPECT_EQ(DefinitionCount, 0u);
+}
+
 } // end namespace

@mtrofin
Copy link
Member

mtrofin commented Nov 13, 2025

Very nice!

I'm assuming the plan for the 100+ other uses could be "if you find it, patch it"? (where "you" is "anyone")

@DataCorrupted
Copy link
Member Author

Anything that regex can identify can be replaced fairly easily, other cases would be "if you find it, patch it"; I should probably make a post on the forum to raise awareness once this lands


/// Get an iterator range over all function definitions (excluding
/// declarations).
auto function_definitions() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of naming conventions, methods are in general camelCase I believe (see getNamedMDList, getIFuncList, ...).

Suggested change
auto function_definitions() {
auto getFunctionDefinitions() {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason the functions around this point do not follow the convention. Not sure if one should follow the environment to not look out of place, or follow the convention.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that STL-style utilities (especially iterators) followed the naming convention of snake case (e.g. global_begin() and global_empty() above, like @drodriguez said, there's a bit confusion here). Also, getters hints that this is a concrete type std::vector or SmallVector, but this method only returns a range. Overall I'm more inclined to snake_case, but I'd love to hear if other people have opinions on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exact style guide text is the following:

As an exception, classes that mimic STL classes can have member names in STL’s style of lowercase words separated by underscores (e.g. begin(), push_back(), and empty()). Classes that provide multiple iterators should add a singular prefix to begin() and end() (e.g. global_begin() and use_begin()).

Module is not really mimicking a STL class. I think sticking with the typical LLVM naming convention of camel case starting with a lowercase letter makes the most sense here.

Copy link
Member Author

@DataCorrupted DataCorrupted Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am for camel case now.

Would definitions() (similar to functions()) or getDefinitions() a better option? getFunctionDefinitions is even longer than I'm happy with, and the return type (function iterator) should've implied this returns function definition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

functionDefinitions should work, but only cuts three characters. definitions() is probably fine, but I would prefer something more explicit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I stick to methods that start with verbs, but I don't think we have a rule for this, so I'll leave it to your preference!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather not have just definitions() or getDefinitions() because without context I wouldn't be sure what this was. Does it include globals? Metadata?
getFunctionDefinitions() seems fine, and I don't mind the length. But if we wanted to save space we could do getFunctionDefs().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you all for the feedback. I weighted on all your opinions, and find that getFunctionDefs() can strike a balance between length and clarity.

@llvmbot llvmbot added the mlgo label Nov 15, 2025
@ellishg
Copy link
Contributor

ellishg commented Nov 19, 2025

By the way, @DataCorrupted posted an RFC on this in https://discourse.llvm.org/t/rfc-llvm-getfunctiondefs-to-iterate-over-function-definitions/88872. I think this is a useful API so LGTM, but since it's a major change I want to make sure others have a chance to review and give feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants