Skip to content

Conversation

@vitalybuka
Copy link
Collaborator

@vitalybuka vitalybuka commented Oct 22, 2025

This commit enhances the SpecialCaseList::GlobMatcher to filter globs
more efficiently by considering both prefixes and suffixes.

Previously, the GlobMatcher used a RadixTree to store globs based
on their prefixes. This allowed for quick lookup of potential matches
by matching the query string's prefix against the stored prefixes.
However, for globs with common prefixes but different suffixes,
unnecessary glob matching attempts could still occur.

This change introduces a nested RadixTree structure:
PrefixSuffixToGlob: RadixTree<Prefix, RadixTree<Suffix, Globs>>.
Now, when a query string is matched, it first finds matching prefixes,
and then within those prefix matches, it further filters by matching
the reversed suffix of the query string against the reversed suffixes
of the globs. This significantly reduces the number of Glob::match
calls, especially for large special case lists with many globs sharing
common prefixes but differing in their suffixes.

According to SpecialCaseListBM:

Lookup benchmarks (significant improvements):

OVERALL_GEOMEAN                       -0.5815

Lookup *suffix and prefix*suffix like benchmarks (huge improvements):

OVERALL_GEOMEAN                       -0.9316

https://gist.github.com/vitalybuka/e586751902760ced6beefcdf0d7b26fd

@llvmbot
Copy link
Member

llvmbot commented Oct 22, 2025

@llvm/pr-subscribers-llvm-support

Author: Vitaly Buka (vitalybuka)

Changes

This commit enhances the SpecialCaseList::GlobMatcher to filter globs
more efficiently by considering both prefixes and suffixes.

Previously, the GlobMatcher used a RadixTree to store globs based
on their prefixes. This allowed for quick lookup of potential matches
by matching the query string's prefix against the stored prefixes.
However, for globs with common prefixes but different suffixes,
unnecessary glob matching attempts could still occur.

This change introduces a nested RadixTree structure:
PrefixSuffixToGlob: RadixTree&lt;Prefix, RadixTree&lt;Suffix, Globs&gt;&gt;.
Now, when a query string is matched, it first finds matching prefixes,
and then within those prefix matches, it further filters by matching
the reversed suffix of the query string against the reversed suffixes
of the globs. This significantly reduces the number of Glob::match
calls, especially for large special case lists with many globs sharing
common prefixes but differing in their suffixes.


Full diff: https://github.com/llvm/llvm-project/pull/164543.diff

2 Files Affected:

  • (modified) llvm/include/llvm/Support/SpecialCaseList.h (+3-2)
  • (modified) llvm/lib/Support/SpecialCaseList.cpp (+11-7)
diff --git a/llvm/include/llvm/Support/SpecialCaseList.h b/llvm/include/llvm/Support/SpecialCaseList.h
index 16f309329a0b5..471f8e779fa24 100644
--- a/llvm/include/llvm/Support/SpecialCaseList.h
+++ b/llvm/include/llvm/Support/SpecialCaseList.h
@@ -167,8 +167,9 @@ class SpecialCaseList {
     std::vector<GlobMatcher::Glob> Globs;
 
     RadixTree<iterator_range<StringRef::const_iterator>,
-              SmallVector<const GlobMatcher::Glob *, 1>>
-        PrefixToGlob;
+              RadixTree<iterator_range<StringRef::const_reverse_iterator>,
+                        SmallVector<const GlobMatcher::Glob *, 1>>>
+        PrefixSuffixToGlob;
   };
 
   /// Represents a set of patterns and their line numbers
diff --git a/llvm/lib/Support/SpecialCaseList.cpp b/llvm/lib/Support/SpecialCaseList.cpp
index 2a86cc37b6000..9bd1c199695d1 100644
--- a/llvm/lib/Support/SpecialCaseList.cpp
+++ b/llvm/lib/Support/SpecialCaseList.cpp
@@ -92,8 +92,10 @@ void SpecialCaseList::GlobMatcher::preprocess(bool BySize) {
 
   for (auto &G : Globs) {
     StringRef Prefix = G.Pattern.prefix();
+    StringRef Suffix = G.Pattern.suffix();
 
-    auto &V = PrefixToGlob.emplace(Prefix).first->second;
+    auto &SToGlob = PrefixSuffixToGlob.emplace(Prefix).first->second;
+    auto &V = SToGlob.emplace(reverse(Suffix)).first->second;
     V.emplace_back(&G);
   }
 }
@@ -101,12 +103,14 @@ void SpecialCaseList::GlobMatcher::preprocess(bool BySize) {
 void SpecialCaseList::GlobMatcher::match(
     StringRef Query,
     llvm::function_ref<void(StringRef Rule, unsigned LineNo)> Cb) const {
-  if (!PrefixToGlob.empty()) {
-    for (const auto &[_, V] : PrefixToGlob.find_prefixes(Query)) {
-      for (const auto *G : reverse(V)) {
-        if (G->Pattern.match(Query)) {
-          Cb(G->Name, G->LineNo);
-          break;
+  if (!PrefixSuffixToGlob.empty()) {
+    for (const auto &[_, SToGlob] : PrefixSuffixToGlob.find_prefixes(Query)) {
+      for (const auto &[_, V] : SToGlob.find_prefixes(reverse(Query))) {
+        for (const auto *G : reverse(V)) {
+          if (G->Pattern.match(Query)) {
+            Cb(G->Name, G->LineNo);
+            break;
+          }
         }
       }
     }

Created using spr 1.3.7

[skip ci]
Created using spr 1.3.7
Created using spr 1.3.7

[skip ci]
Created using spr 1.3.7
Created using spr 1.3.7

[skip ci]
Created using spr 1.3.7
Created using spr 1.3.7

[skip ci]
Created using spr 1.3.7
vitalybuka added a commit that referenced this pull request Oct 25, 2025
This commit introduces a RadixTree implementation to LLVM.

RadixTree, as a Trie, is very efficient by searching for prefixes.

A Radix Tree is more efficient implementation of Trie.

The tree will be used to optimize Glob matching in SpecialCaseList:
* #164531 
* #164543 
* #164545

---------

Co-authored-by: Kazu Hirata <kazu@google.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Oct 25, 2025
This commit introduces a RadixTree implementation to LLVM.

RadixTree, as a Trie, is very efficient by searching for prefixes.

A Radix Tree is more efficient implementation of Trie.

The tree will be used to optimize Glob matching in SpecialCaseList:
* llvm/llvm-project#164531
* llvm/llvm-project#164543
* llvm/llvm-project#164545

---------

Co-authored-by: Kazu Hirata <kazu@google.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Created using spr 1.3.7

[skip ci]
Created using spr 1.3.7
@github-actions
Copy link

github-actions bot commented Oct 25, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Created using spr 1.3.7

[skip ci]
Created using spr 1.3.7
@vitalybuka vitalybuka changed the base branch from users/vitalybuka/spr/main.specialcaselist-filtering-globs-with-matching-prefix-and-suffix to main October 25, 2025 07:22
Created using spr 1.3.7
@vitalybuka vitalybuka requested a review from Copilot October 25, 2025 07:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the SpecialCaseList::GlobMatcher by introducing a nested RadixTree structure that filters globs using both prefix and suffix matching, reducing unnecessary glob match attempts and improving lookup performance by ~58% overall and ~93% for patterns with suffixes.

Key Changes:

  • Introduced nested RadixTree structure mapping prefixes to suffix-based RadixTrees
  • Modified preprocess() to build the nested structure using both prefix and reversed suffix
  • Updated match() to perform two-level filtering: prefix matching followed by reversed suffix matching

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
llvm/lib/Support/SpecialCaseList.cpp Implements nested RadixTree lookup logic with prefix and suffix matching
llvm/include/llvm/Support/SpecialCaseList.h Updates data structure from single RadixTree to nested RadixTree for prefix-suffix indexing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants