- 
                Notifications
    You must be signed in to change notification settings 
- Fork 15k
[SpecialCaseList] Filtering Globs with matching prefix and suffix #164543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[SpecialCaseList] Filtering Globs with matching prefix and suffix #164543
Conversation
Created using spr 1.3.7 [skip ci]
Created using spr 1.3.7
| @llvm/pr-subscribers-llvm-support Author: Vitaly Buka (vitalybuka) ChangesThis commit enhances the  Previously, the  This change introduces a nested  Full diff: https://github.com/llvm/llvm-project/pull/164543.diff 2 Files Affected: 
 diff --git a/llvm/include/llvm/Support/SpecialCaseList.h b/llvm/include/llvm/Support/SpecialCaseList.h
index 16f309329a0b5..471f8e779fa24 100644
--- a/llvm/include/llvm/Support/SpecialCaseList.h
+++ b/llvm/include/llvm/Support/SpecialCaseList.h
@@ -167,8 +167,9 @@ class SpecialCaseList {
     std::vector<GlobMatcher::Glob> Globs;
 
     RadixTree<iterator_range<StringRef::const_iterator>,
-              SmallVector<const GlobMatcher::Glob *, 1>>
-        PrefixToGlob;
+              RadixTree<iterator_range<StringRef::const_reverse_iterator>,
+                        SmallVector<const GlobMatcher::Glob *, 1>>>
+        PrefixSuffixToGlob;
   };
 
   /// Represents a set of patterns and their line numbers
diff --git a/llvm/lib/Support/SpecialCaseList.cpp b/llvm/lib/Support/SpecialCaseList.cpp
index 2a86cc37b6000..9bd1c199695d1 100644
--- a/llvm/lib/Support/SpecialCaseList.cpp
+++ b/llvm/lib/Support/SpecialCaseList.cpp
@@ -92,8 +92,10 @@ void SpecialCaseList::GlobMatcher::preprocess(bool BySize) {
 
   for (auto &G : Globs) {
     StringRef Prefix = G.Pattern.prefix();
+    StringRef Suffix = G.Pattern.suffix();
 
-    auto &V = PrefixToGlob.emplace(Prefix).first->second;
+    auto &SToGlob = PrefixSuffixToGlob.emplace(Prefix).first->second;
+    auto &V = SToGlob.emplace(reverse(Suffix)).first->second;
     V.emplace_back(&G);
   }
 }
@@ -101,12 +103,14 @@ void SpecialCaseList::GlobMatcher::preprocess(bool BySize) {
 void SpecialCaseList::GlobMatcher::match(
     StringRef Query,
     llvm::function_ref<void(StringRef Rule, unsigned LineNo)> Cb) const {
-  if (!PrefixToGlob.empty()) {
-    for (const auto &[_, V] : PrefixToGlob.find_prefixes(Query)) {
-      for (const auto *G : reverse(V)) {
-        if (G->Pattern.match(Query)) {
-          Cb(G->Name, G->LineNo);
-          break;
+  if (!PrefixSuffixToGlob.empty()) {
+    for (const auto &[_, SToGlob] : PrefixSuffixToGlob.find_prefixes(Query)) {
+      for (const auto &[_, V] : SToGlob.find_prefixes(reverse(Query))) {
+        for (const auto *G : reverse(V)) {
+          if (G->Pattern.match(Query)) {
+            Cb(G->Name, G->LineNo);
+            break;
+          }
         }
       }
     }
 | 
Created using spr 1.3.7 [skip ci]
Created using spr 1.3.7 [skip ci]
Created using spr 1.3.7 [skip ci]
Created using spr 1.3.7 [skip ci]
This commit introduces a RadixTree implementation to LLVM. RadixTree, as a Trie, is very efficient by searching for prefixes. A Radix Tree is more efficient implementation of Trie. The tree will be used to optimize Glob matching in SpecialCaseList: * #164531 * #164543 * #164545 --------- Co-authored-by: Kazu Hirata <kazu@google.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This commit introduces a RadixTree implementation to LLVM. RadixTree, as a Trie, is very efficient by searching for prefixes. A Radix Tree is more efficient implementation of Trie. The tree will be used to optimize Glob matching in SpecialCaseList: * llvm/llvm-project#164531 * llvm/llvm-project#164543 * llvm/llvm-project#164545 --------- Co-authored-by: Kazu Hirata <kazu@google.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Created using spr 1.3.7 [skip ci]
| ✅ With the latest revision this PR passed the C/C++ code formatter. | 
Created using spr 1.3.7 [skip ci]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes the SpecialCaseList::GlobMatcher by introducing a nested RadixTree structure that filters globs using both prefix and suffix matching, reducing unnecessary glob match attempts and improving lookup performance by ~58% overall and ~93% for patterns with suffixes.
Key Changes:
- Introduced nested RadixTreestructure mapping prefixes to suffix-based RadixTrees
- Modified preprocess()to build the nested structure using both prefix and reversed suffix
- Updated match()to perform two-level filtering: prefix matching followed by reversed suffix matching
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description | 
|---|---|
| llvm/lib/Support/SpecialCaseList.cpp | Implements nested RadixTree lookup logic with prefix and suffix matching | 
| llvm/include/llvm/Support/SpecialCaseList.h | Updates data structure from single RadixTree to nested RadixTree for prefix-suffix indexing | 
This commit enhances the
SpecialCaseList::GlobMatcherto filter globsmore efficiently by considering both prefixes and suffixes.
Previously, the
GlobMatcherused aRadixTreeto store globs basedon their prefixes. This allowed for quick lookup of potential matches
by matching the query string's prefix against the stored prefixes.
However, for globs with common prefixes but different suffixes,
unnecessary glob matching attempts could still occur.
This change introduces a nested
RadixTreestructure:PrefixSuffixToGlob: RadixTree<Prefix, RadixTree<Suffix, Globs>>.Now, when a query string is matched, it first finds matching prefixes,
and then within those prefix matches, it further filters by matching
the reversed suffix of the query string against the reversed suffixes
of the globs. This significantly reduces the number of
Glob::matchcalls, especially for large special case lists with many globs sharing
common prefixes but differing in their suffixes.
According to SpecialCaseListBM:
Lookup benchmarks (significant improvements):
Lookup
*suffixandprefix*suffixlike benchmarks (huge improvements):https://gist.github.com/vitalybuka/e586751902760ced6beefcdf0d7b26fd