Skip to content

Conversation

@vitalybuka
Copy link
Collaborator

@vitalybuka vitalybuka commented Oct 22, 2025

This commit adds a new RadixTree to SpecialCaseList for handling
substring matches. Previously, SpecialCaseList only supported prefix
and suffix matching. With this change, patterns that have neither
prefixes nor suffixes can now be efficiently filtered.

According to SpecialCaseListBM:

Lookup benchmarks (significant improvements):

OVERALL_GEOMEAN                       -0.7809

Lookup *test* like benchmarks (huge improvements):

OVERALL_GEOMEAN                       -0.9947

https://gist.github.com/vitalybuka/ee7f681b448eb18974386ab35e2d4d27

@llvmbot
Copy link
Member

llvmbot commented Oct 22, 2025

@llvm/pr-subscribers-llvm-support

Author: Vitaly Buka (vitalybuka)

Changes

This commit adds a new RadixTree to SpecialCaseList for handling
substring matches. Previously, SpecialCaseList only supported prefix
and suffix matching. With this change, patterns that have neither
prefixes nor suffixes can now be efficiently filtered.


Full diff: https://github.com/llvm/llvm-project/pull/164545.diff

2 Files Affected:

  • (modified) llvm/include/llvm/Support/SpecialCaseList.h (+4)
  • (modified) llvm/lib/Support/SpecialCaseList.cpp (+28)
diff --git a/llvm/include/llvm/Support/SpecialCaseList.h b/llvm/include/llvm/Support/SpecialCaseList.h
index c077f8857c9c8..f66cd6fe733a7 100644
--- a/llvm/include/llvm/Support/SpecialCaseList.h
+++ b/llvm/include/llvm/Support/SpecialCaseList.h
@@ -170,6 +170,10 @@ class SpecialCaseList {
               RadixTree<iterator_range<StringRef::const_iterator>,
                         SmallVector<const GlobMatcher::Glob *, 1>>>
         SuffixPrefixToGlob;
+
+    RadixTree<iterator_range<StringRef::const_iterator>,
+              SmallVector<const GlobMatcher::Glob *, 1>>
+        SubstrToGlob;
   };
 
   /// Represents a set of patterns and their line numbers
diff --git a/llvm/lib/Support/SpecialCaseList.cpp b/llvm/lib/Support/SpecialCaseList.cpp
index 15367afd91e72..37fd5bfad750d 100644
--- a/llvm/lib/Support/SpecialCaseList.cpp
+++ b/llvm/lib/Support/SpecialCaseList.cpp
@@ -94,6 +94,19 @@ void SpecialCaseList::GlobMatcher::preprocess(bool BySize) {
     StringRef Prefix = G.Pattern.prefix();
     StringRef Suffix = G.Pattern.suffix();
 
+    if (Suffix.empty() && Prefix.empty()) {
+      // If both prefix and suffix are empty put into special tree to search by
+      // substring in a middle.
+      StringRef Substr = G.Pattern.longest_substr();
+      if (!Substr.empty()) {
+        // But only if substring is not empty. Searching this tree is more
+        // expensive.
+        auto &V = SubstrToGlob.emplace(Substr).first->second;
+        V.emplace_back(&G);
+        continue;
+      }
+    }
+
     auto &PToGlob = SuffixPrefixToGlob.emplace(reverse(Suffix)).first->second;
     auto &V = PToGlob.emplace(Prefix).first->second;
     V.emplace_back(&G);
@@ -116,6 +129,21 @@ void SpecialCaseList::GlobMatcher::match(
       }
     }
   }
+
+  if (!SubstrToGlob.empty()) {
+    // As we don't know when substring exactly starts, we will try all
+    // possibilities. In most cases search will fail on first characters.
+    for (StringRef Q = Query; !Q.empty(); Q = Q.drop_front()) {
+      for (const auto &[_, V] : SubstrToGlob.find_prefixes(Q)) {
+        for (const auto *G : reverse(V)) {
+          if (G->Pattern.match(Query)) {
+            Cb(G->Name, G->LineNo);
+            break;
+          }
+        }
+      }
+    }
+  }
 }
 
 SpecialCaseList::Matcher::Matcher(bool UseGlobs, bool RemoveDotSlash)

@vitalybuka vitalybuka changed the title [SubstringTree] Add RadixTree for substring matching [SpecialCaseList] Add RadixTree for substring matching Oct 22, 2025
Created using spr 1.3.7

[skip ci]
Created using spr 1.3.7
vitalybuka added a commit that referenced this pull request Oct 23, 2025
Finds longest (almost) plain substring in the pattern.

Implementation is conservative to avoid false positives.

The result is not used to optimize
`GlobPattern::match()` so it's calculated on
request.

For
* #164545

---------

Co-authored-by: Luke Lau <luke@igalia.com>
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Oct 23, 2025
…512)

Finds longest (almost) plain substring in the pattern.

Implementation is conservative to avoid false positives.

The result is not used to optimize
`GlobPattern::match()` so it's calculated on
request.

For
* llvm/llvm-project#164545

---------

Co-authored-by: Luke Lau <luke@igalia.com>
Created using spr 1.3.7

[skip ci]
Created using spr 1.3.7
Created using spr 1.3.7

[skip ci]
Created using spr 1.3.7
mikolaj-pirog pushed a commit to mikolaj-pirog/llvm-project that referenced this pull request Oct 23, 2025
Finds longest (almost) plain substring in the pattern.

Implementation is conservative to avoid false positives.

The result is not used to optimize
`GlobPattern::match()` so it's calculated on
request.

For
* llvm#164545

---------

Co-authored-by: Luke Lau <luke@igalia.com>
@fmayer
Copy link
Contributor

fmayer commented Oct 23, 2025

nit: "Use RadixTree" in commit message?

Created using spr 1.3.7

[skip ci]
Created using spr 1.3.7
Created using spr 1.3.7

[skip ci]
Created using spr 1.3.7
vitalybuka added a commit that referenced this pull request Oct 25, 2025
This commit introduces a RadixTree implementation to LLVM.

RadixTree, as a Trie, is very efficient by searching for prefixes.

A Radix Tree is more efficient implementation of Trie.

The tree will be used to optimize Glob matching in SpecialCaseList:
* #164531 
* #164543 
* #164545

---------

Co-authored-by: Kazu Hirata <kazu@google.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Oct 25, 2025
This commit introduces a RadixTree implementation to LLVM.

RadixTree, as a Trie, is very efficient by searching for prefixes.

A Radix Tree is more efficient implementation of Trie.

The tree will be used to optimize Glob matching in SpecialCaseList:
* llvm/llvm-project#164531
* llvm/llvm-project#164543
* llvm/llvm-project#164545

---------

Co-authored-by: Kazu Hirata <kazu@google.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Created using spr 1.3.7

[skip ci]
Created using spr 1.3.7
Created using spr 1.3.7

[skip ci]
Created using spr 1.3.7
Created using spr 1.3.7

[skip ci]
Created using spr 1.3.7
Created using spr 1.3.7

[skip ci]
Created using spr 1.3.7
@vitalybuka vitalybuka changed the base branch from users/vitalybuka/spr/main.substringtree-add-radixtree-for-substring-matching to main October 25, 2025 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants