- 
                Notifications
    You must be signed in to change notification settings 
- Fork 15k
[SpecialCaseList] Filtering Globs with matching prefix #164531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SpecialCaseList] Filtering Globs with matching prefix #164531
Conversation
Created using spr 1.3.7 [skip ci]
Created using spr 1.3.7
| @llvm/pr-subscribers-llvm-support Author: Vitaly Buka (vitalybuka) ChangesThis commit optimizes  Full diff: https://github.com/llvm/llvm-project/pull/164531.diff 2 Files Affected: 
 diff --git a/llvm/include/llvm/Support/SpecialCaseList.h b/llvm/include/llvm/Support/SpecialCaseList.h
index ead765562504d..16f309329a0b5 100644
--- a/llvm/include/llvm/Support/SpecialCaseList.h
+++ b/llvm/include/llvm/Support/SpecialCaseList.h
@@ -13,10 +13,13 @@
 #define LLVM_SUPPORT_SPECIALCASELIST_H
 
 #include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/StringMap.h"
+#include "llvm/ADT/iterator_range.h"
 #include "llvm/Support/Allocator.h"
 #include "llvm/Support/Compiler.h"
 #include "llvm/Support/GlobPattern.h"
+#include "llvm/Support/RadixTree.h"
 #include "llvm/Support/Regex.h"
 #include <memory>
 #include <string>
@@ -162,6 +165,10 @@ class SpecialCaseList {
     };
 
     std::vector<GlobMatcher::Glob> Globs;
+
+    RadixTree<iterator_range<StringRef::const_iterator>,
+              SmallVector<const GlobMatcher::Glob *, 1>>
+        PrefixToGlob;
   };
 
   /// Represents a set of patterns and their line numbers
diff --git a/llvm/lib/Support/SpecialCaseList.cpp b/llvm/lib/Support/SpecialCaseList.cpp
index f74e52a3a7fa9..2a86cc37b6000 100644
--- a/llvm/lib/Support/SpecialCaseList.cpp
+++ b/llvm/lib/Support/SpecialCaseList.cpp
@@ -89,14 +89,28 @@ void SpecialCaseList::GlobMatcher::preprocess(bool BySize) {
       return A.Name.size() < B.Name.size();
     });
   }
+
+  for (auto &G : Globs) {
+    StringRef Prefix = G.Pattern.prefix();
+
+    auto &V = PrefixToGlob.emplace(Prefix).first->second;
+    V.emplace_back(&G);
+  }
 }
 
 void SpecialCaseList::GlobMatcher::match(
     StringRef Query,
     llvm::function_ref<void(StringRef Rule, unsigned LineNo)> Cb) const {
-  for (const auto &G : reverse(Globs))
-    if (G.Pattern.match(Query))
-      return Cb(G.Name, G.LineNo);
+  if (!PrefixToGlob.empty()) {
+    for (const auto &[_, V] : PrefixToGlob.find_prefixes(Query)) {
+      for (const auto *G : reverse(V)) {
+        if (G->Pattern.match(Query)) {
+          Cb(G->Name, G->LineNo);
+          break;
+        }
+      }
+    }
+  }
 }
 
 SpecialCaseList::Matcher::Matcher(bool UseGlobs, bool RemoveDotSlash)
 | 
Created using spr 1.3.7 [skip ci]
Created using spr 1.3.7 [skip ci]
Created using spr 1.3.7 [skip ci]
Created using spr 1.3.7 [skip ci]
This commit introduces a RadixTree implementation to LLVM. RadixTree, as a Trie, is very efficient by searching for prefixes. A Radix Tree is more efficient implementation of Trie. The tree will be used to optimize Glob matching in SpecialCaseList: * #164531 * #164543 * #164545 --------- Co-authored-by: Kazu Hirata <kazu@google.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This commit introduces a RadixTree implementation to LLVM. RadixTree, as a Trie, is very efficient by searching for prefixes. A Radix Tree is more efficient implementation of Trie. The tree will be used to optimize Glob matching in SpecialCaseList: * llvm/llvm-project#164531 * llvm/llvm-project#164543 * llvm/llvm-project#164545 --------- Co-authored-by: Kazu Hirata <kazu@google.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes SpecialCaseList by introducing a RadixTree-based prefix filtering mechanism for glob patterns. The optimization reduces the number of glob patterns that need full evaluation during matching by first filtering candidates based on their prefixes, achieving significant performance improvements (81.77% reduction for general lookups, 98.19% for prefix-pattern lookups).
Key Changes:
- Added RadixTreeto index glob patterns by their prefixes during preprocessing
- Modified the matching algorithm to use prefix-based filtering before glob evaluation
- Maintained sorting order within prefix groups to preserve best-to-worst matching priority
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description | 
|---|---|
| llvm/include/llvm/Support/SpecialCaseList.h | Added RadixTreemember and necessary includes to support prefix-based glob filtering | 
| llvm/lib/Support/SpecialCaseList.cpp | Implemented prefix indexing in preprocess()and updatedmatch()to use theRadixTreefor efficient pattern filtering | 
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| ✅ With the latest revision this PR passed the C/C++ code formatter. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| LLVM Buildbot has detected a new failure on builder  Full details are available at: https://lab.llvm.org/buildbot/#/builders/138/builds/20915 Here is the relevant piece of the build log for the reference | 
This commit optimizes
SpecialCaseListby using aRadixTreeto filterglob patterns based on their prefixes. When matching a query, the
RadixTreequickly identifies all glob patterns whose prefixes matchthe query's prefix. This significantly reduces the number of glob
patterns that need to be fully evaluated, leading to performance
improvements, especially when dealing with a large number of patterns.
According to SpecialCaseListBM:
Lookup benchmarks (significant improvements):
Lookup like
prefix*benchmarks (huge improvements):https://gist.github.com/vitalybuka/824884bcbc1713e815068c279159dafe