-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[RFC] [IR] Modules can make filtered range to iterate function definitions only #167972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@llvm/pr-subscribers-llvm-ir @llvm/pr-subscribers-mlgo Author: Peter Rong (DataCorrupted) ChangesWe often wish to iterate over all function definitions in a module for transformations. for (auto& F: M) {
if (F.isDeclaration()) contiue;
}A regex search of the following pattern shows 100+ hits inside llvm, let along other patterns, e.g. collect all definitions in a module into a worklist. This pattern is verbose and hard to review in a long loop. Tests:
Note: Full diff: https://github.com/llvm/llvm-project/pull/167972.diff 3 Files Affected:
diff --git a/llvm/include/llvm/IR/Module.h b/llvm/include/llvm/IR/Module.h
index a99937a90cbb7..f307d8748c6c9 100644
--- a/llvm/include/llvm/IR/Module.h
+++ b/llvm/include/llvm/IR/Module.h
@@ -710,6 +710,17 @@ class LLVM_ABI Module {
return make_range(begin(), end());
}
+ /// Get an iterator range over all function definitions (excluding
+ /// declarations).
+ auto function_definitions() {
+ return make_filter_range(functions(),
+ [](Function &F) { return !F.isDeclaration(); });
+ }
+ auto function_definitions() const {
+ return make_filter_range(
+ functions(), [](const Function &F) { return !F.isDeclaration(); });
+ }
+
/// @}
/// @name Alias Iteration
/// @{
diff --git a/llvm/tools/llvm-ir2vec/llvm-ir2vec.cpp b/llvm/tools/llvm-ir2vec/llvm-ir2vec.cpp
index 7402782bfd404..19ecee9788f0e 100644
--- a/llvm/tools/llvm-ir2vec/llvm-ir2vec.cpp
+++ b/llvm/tools/llvm-ir2vec/llvm-ir2vec.cpp
@@ -395,9 +395,7 @@ class MIR2VecTool {
/// FIXME: Use --target option to get target info directly, avoiding the need
/// to parse machine functions for pre-training operations.
bool initializeVocabularyForLayout(const Module &M) {
- for (const Function &F : M) {
- if (F.isDeclaration())
- continue;
+ for (const Function &F : M.function_definitions()) {
MachineFunction *MF = MMI.getMachineFunction(F);
if (!MF)
@@ -431,9 +429,7 @@ class MIR2VecTool {
std::string Relationships;
raw_string_ostream RelOS(Relationships);
- for (const Function &F : M) {
- if (F.isDeclaration())
- continue;
+ for (const Function &F : M.function_definitions()) {
MachineFunction *MF = MMI.getMachineFunction(F);
if (!MF) {
@@ -532,9 +528,7 @@ class MIR2VecTool {
return;
}
- for (const Function &F : M) {
- if (F.isDeclaration())
- continue;
+ for (const Function &F : M.function_definitions()) {
MachineFunction *MF = MMI.getMachineFunction(F);
if (!MF) {
diff --git a/llvm/unittests/IR/ModuleTest.cpp b/llvm/unittests/IR/ModuleTest.cpp
index 30eda738020d0..609cd0d580f96 100644
--- a/llvm/unittests/IR/ModuleTest.cpp
+++ b/llvm/unittests/IR/ModuleTest.cpp
@@ -433,4 +433,82 @@ define void @Foo2() {
ASSERT_EQ(M2Str, M1Print);
}
+TEST(ModuleTest, FunctionDefinitions) {
+ // Test function_definitions() method which returns only functions with bodies
+ LLVMContext Context;
+ SMDiagnostic Err;
+ std::unique_ptr<Module> M = parseAssemblyString(R"(
+declare void @Decl1()
+declare void @Decl2()
+
+define void @Def1() {
+ ret void
+}
+
+define void @Def2() {
+ ret void
+}
+
+declare void @Decl3()
+
+define void @Def3() {
+ ret void
+}
+)",
+ Err, Context);
+ ASSERT_TRUE(M);
+
+ // Count total functions (should be 6: 3 declarations + 3 definitions)
+ size_t TotalFunctions = 0;
+ for (Function &F : *M) {
+ (void)F;
+ ++TotalFunctions;
+ }
+ EXPECT_EQ(TotalFunctions, 6u);
+
+ // Count function definitions only (should be 3)
+ size_t DefinitionCount = 0;
+ for (Function &F : M->function_definitions()) {
+ EXPECT_FALSE(F.isDeclaration());
+ ++DefinitionCount;
+ }
+ EXPECT_EQ(DefinitionCount, 3u);
+
+ // Verify the names of the definitions
+ auto DefRange = M->function_definitions();
+ auto It = DefRange.begin();
+ EXPECT_EQ(It->getName(), "Def1");
+ ++It;
+ EXPECT_EQ(It->getName(), "Def2");
+ ++It;
+ EXPECT_EQ(It->getName(), "Def3");
+ ++It;
+ EXPECT_EQ(It, DefRange.end());
+}
+
+TEST(ModuleTest, FunctionDefinitionsEmpty) {
+ // Test function_definitions() with no definitions (only declarations)
+ LLVMContext Context;
+ SMDiagnostic Err;
+ std::unique_ptr<Module> M = parseAssemblyString(R"(
+declare void @Decl1()
+declare void @Decl2()
+declare void @Decl3()
+)",
+ Err, Context);
+ ASSERT_TRUE(M);
+
+ // Should have functions
+ EXPECT_FALSE(M->empty());
+ EXPECT_EQ(M->size(), 3u);
+
+ // But no definitions
+ size_t DefinitionCount = 0;
+ for (Function &F : M->function_definitions()) {
+ (void)F;
+ ++DefinitionCount;
+ }
+ EXPECT_EQ(DefinitionCount, 0u);
+}
+
} // end namespace
|
|
Very nice! I'm assuming the plan for the 100+ other uses could be "if you find it, patch it"? (where "you" is "anyone") |
|
Anything that regex can identify can be replaced fairly easily, other cases would be "if you find it, patch it"; I should probably make a post on the forum to raise awareness once this lands |
llvm/include/llvm/IR/Module.h
Outdated
|
|
||
| /// Get an iterator range over all function definitions (excluding | ||
| /// declarations). | ||
| auto function_definitions() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of naming conventions, methods are in general camelCase I believe (see getNamedMDList, getIFuncList, ...).
| auto function_definitions() { | |
| auto getFunctionDefinitions() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason the functions around this point do not follow the convention. Not sure if one should follow the environment to not look out of place, or follow the convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that STL-style utilities (especially iterators) followed the naming convention of snake case (e.g. global_begin() and global_empty() above, like @drodriguez said, there's a bit confusion here). Also, getters hints that this is a concrete type std::vector or SmallVector, but this method only returns a range. Overall I'm more inclined to snake_case, but I'd love to hear if other people have opinions on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exact style guide text is the following:
As an exception, classes that mimic STL classes can have member names in STL’s style of lowercase words separated by underscores (e.g. begin(), push_back(), and empty()). Classes that provide multiple iterators should add a singular prefix to begin() and end() (e.g. global_begin() and use_begin()).
Module is not really mimicking a STL class. I think sticking with the typical LLVM naming convention of camel case starting with a lowercase letter makes the most sense here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am for camel case now.
Would definitions() (similar to functions()) or getDefinitions() a better option? getFunctionDefinitions is even longer than I'm happy with, and the return type (function iterator) should've implied this returns function definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
functionDefinitions should work, but only cuts three characters. definitions() is probably fine, but I would prefer something more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I stick to methods that start with verbs, but I don't think we have a rule for this, so I'll leave it to your preference!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather not have just definitions() or getDefinitions() because without context I wouldn't be sure what this was. Does it include globals? Metadata?
getFunctionDefinitions() seems fine, and I don't mind the length. But if we wanted to save space we could do getFunctionDefs().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you all for the feedback. I weighted on all your opinions, and find that getFunctionDefs() can strike a balance between length and clarity.
|
By the way, @DataCorrupted posted an RFC on this in https://discourse.llvm.org/t/rfc-llvm-getfunctiondefs-to-iterate-over-function-definitions/88872. I think this is a useful API so LGTM, but since it's a major change I want to make sure others have a chance to review and give feedback. |
We often wish to iterate over all function definitions in a module for transformations.
Such API doesn't exists yet, so what people normally used is to iterate over all functions and skip the declarations:
A regex search of the following pattern shows 100+ hits inside llvm, let along other patterns, e.g. collect all definitions in a module into a worklist.
This pattern is verbose and hard to review in a long loop.
As an alternative, this patch provides a new API that iterates over definitions only.
Tests:
Note:
function_definitionsseems a bit long, I'm open to suggestions to other names.