Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #452 +/- ##
==========================================
+ Coverage 55.86% 56.72% +0.85%
==========================================
Files 128 130 +2
Lines 8053 8136 +83
==========================================
+ Hits 4499 4615 +116
+ Misses 3182 3153 -29
+ Partials 372 368 -4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
dcf6cae to
1bddc50
Compare
There was a problem hiding this comment.
Pull Request Overview
This pull request optimizes domain matching in the domainscrawl postprocessor by introducing an Adaptive Radix Tree (ART) for more efficient subdomain lookups. The optimization maintains O(1) exact domain matches via a map while providing O(k) subdomain matching where k is the domain length, replacing the previous O(n) iteration over all stored domains.
- Replaces map-based domain storage with a hybrid ART + map approach
- Implements efficient subdomain matching using reversed hostnames for prefix searches
- Adds comprehensive test coverage and benchmarks for the new functionality
Reviewed Changes
Copilot reviewed 16 out of 17 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/pkg/postprocessor/domainscrawl/tree.go | Core ART implementation with domain insertion and matching logic |
| internal/pkg/postprocessor/domainscrawl/reversehost.go | Host reversal utility for efficient prefix matching in the ART |
| internal/pkg/postprocessor/domainscrawl/domainscrawl.go | Updated to use ART instead of map for domain storage and matching |
| internal/pkg/postprocessor/domainscrawl/*_test.go | Comprehensive test coverage for ART, host reversal, and integration |
| e2e/test/domainscrawl/ | End-to-end testing for domains crawl functionality |
| go.mod | Added dependency on go-adaptive-radix-tree library |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
NGTmeaty
left a comment
There was a problem hiding this comment.
Looks good to me. Thanks for the improvements here!
| "strings" | ||
| ) | ||
|
|
||
| // reverseHost turns "www.google.com" -> "com.google.www". |
There was a problem hiding this comment.
when I wrote that, I was like "wait, that seems familiar" 🤣
This pull request introduces significant improvements to the
domainscrawlpostprocessor package, focusing on more efficient domain matching. It doesn't touch the URLs/regex matching part.We now use an Adaptive Radix Tree (ART) in addition to the map. The map is still used in priority for O(1) lookup of domains, but when we go to the second step to check for subdomain match we now use the ART with prefix-based search on the reversed hostname to allow for O(k) lookup where k is the domain length.
It also adds simple e2e testing for the domains crawl feature.