-
Notifications
You must be signed in to change notification settings - Fork 15k
[SpecialCaseList] Add RadixTree for substring matching #164545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[SpecialCaseList] Add RadixTree for substring matching #164545
Conversation
Created using spr 1.3.7 [skip ci]
Created using spr 1.3.7
|
@llvm/pr-subscribers-llvm-support Author: Vitaly Buka (vitalybuka) ChangesThis commit adds a new RadixTree to Full diff: https://github.com/llvm/llvm-project/pull/164545.diff 2 Files Affected:
diff --git a/llvm/include/llvm/Support/SpecialCaseList.h b/llvm/include/llvm/Support/SpecialCaseList.h
index c077f8857c9c8..f66cd6fe733a7 100644
--- a/llvm/include/llvm/Support/SpecialCaseList.h
+++ b/llvm/include/llvm/Support/SpecialCaseList.h
@@ -170,6 +170,10 @@ class SpecialCaseList {
RadixTree<iterator_range<StringRef::const_iterator>,
SmallVector<const GlobMatcher::Glob *, 1>>>
SuffixPrefixToGlob;
+
+ RadixTree<iterator_range<StringRef::const_iterator>,
+ SmallVector<const GlobMatcher::Glob *, 1>>
+ SubstrToGlob;
};
/// Represents a set of patterns and their line numbers
diff --git a/llvm/lib/Support/SpecialCaseList.cpp b/llvm/lib/Support/SpecialCaseList.cpp
index 15367afd91e72..37fd5bfad750d 100644
--- a/llvm/lib/Support/SpecialCaseList.cpp
+++ b/llvm/lib/Support/SpecialCaseList.cpp
@@ -94,6 +94,19 @@ void SpecialCaseList::GlobMatcher::preprocess(bool BySize) {
StringRef Prefix = G.Pattern.prefix();
StringRef Suffix = G.Pattern.suffix();
+ if (Suffix.empty() && Prefix.empty()) {
+ // If both prefix and suffix are empty put into special tree to search by
+ // substring in a middle.
+ StringRef Substr = G.Pattern.longest_substr();
+ if (!Substr.empty()) {
+ // But only if substring is not empty. Searching this tree is more
+ // expensive.
+ auto &V = SubstrToGlob.emplace(Substr).first->second;
+ V.emplace_back(&G);
+ continue;
+ }
+ }
+
auto &PToGlob = SuffixPrefixToGlob.emplace(reverse(Suffix)).first->second;
auto &V = PToGlob.emplace(Prefix).first->second;
V.emplace_back(&G);
@@ -116,6 +129,21 @@ void SpecialCaseList::GlobMatcher::match(
}
}
}
+
+ if (!SubstrToGlob.empty()) {
+ // As we don't know when substring exactly starts, we will try all
+ // possibilities. In most cases search will fail on first characters.
+ for (StringRef Q = Query; !Q.empty(); Q = Q.drop_front()) {
+ for (const auto &[_, V] : SubstrToGlob.find_prefixes(Q)) {
+ for (const auto *G : reverse(V)) {
+ if (G->Pattern.match(Query)) {
+ Cb(G->Name, G->LineNo);
+ break;
+ }
+ }
+ }
+ }
+ }
}
SpecialCaseList::Matcher::Matcher(bool UseGlobs, bool RemoveDotSlash)
|
Created using spr 1.3.7 [skip ci]
Finds longest (almost) plain substring in the pattern. Implementation is conservative to avoid false positives. The result is not used to optimize `GlobPattern::match()` so it's calculated on request. For * #164545 --------- Co-authored-by: Luke Lau <luke@igalia.com>
…512) Finds longest (almost) plain substring in the pattern. Implementation is conservative to avoid false positives. The result is not used to optimize `GlobPattern::match()` so it's calculated on request. For * llvm/llvm-project#164545 --------- Co-authored-by: Luke Lau <luke@igalia.com>
Created using spr 1.3.7 [skip ci]
Created using spr 1.3.7 [skip ci]
Finds longest (almost) plain substring in the pattern. Implementation is conservative to avoid false positives. The result is not used to optimize `GlobPattern::match()` so it's calculated on request. For * llvm#164545 --------- Co-authored-by: Luke Lau <luke@igalia.com>
|
nit: "Use RadixTree" in commit message? |
Created using spr 1.3.7 [skip ci]
Created using spr 1.3.7 [skip ci]
This commit introduces a RadixTree implementation to LLVM. RadixTree, as a Trie, is very efficient by searching for prefixes. A Radix Tree is more efficient implementation of Trie. The tree will be used to optimize Glob matching in SpecialCaseList: * #164531 * #164543 * #164545 --------- Co-authored-by: Kazu Hirata <kazu@google.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This commit introduces a RadixTree implementation to LLVM. RadixTree, as a Trie, is very efficient by searching for prefixes. A Radix Tree is more efficient implementation of Trie. The tree will be used to optimize Glob matching in SpecialCaseList: * llvm/llvm-project#164531 * llvm/llvm-project#164543 * llvm/llvm-project#164545 --------- Co-authored-by: Kazu Hirata <kazu@google.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Created using spr 1.3.7 [skip ci]
Created using spr 1.3.7 [skip ci]
Created using spr 1.3.7 [skip ci]
Created using spr 1.3.7 [skip ci]
This commit adds a new RadixTree to
SpecialCaseListfor handlingsubstring matches. Previously,
SpecialCaseListonly supported prefixand suffix matching. With this change, patterns that have neither
prefixes nor suffixes can now be efficiently filtered.
According to SpecialCaseListBM:
Lookup benchmarks (significant improvements):
Lookup
*test*like benchmarks (huge improvements):https://gist.github.com/vitalybuka/ee7f681b448eb18974386ab35e2d4d27