feat: enhance auto pagination detection logic#976
Conversation
WalkthroughRefactors browser/client pagination detection into a multi-path system (pagination wrapper, nearby elements, full-document, infinite-scroll), adds multilingual text/arrow/load-more patterns, visibility/proximity helpers, and returns explicit actions ( Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@server/src/sdk/browserSide/pageAnalyzer.js`:
- Around line 2015-2032: The scrollToTopPatterns array in pageAnalyzer.js is
missing the accessibility/text selectors present in the TS version; update the
scrollToTopPatterns constant (the array used by the for...of loop that queries
document.querySelector and checks isVisible) to include the aria-label and title
selectors—specifically add selectors like '[aria-label*="scroll to top" i]' and
'[title*="back to top" i]' so the loop will detect those elements as well.
In `@src/helpers/clientPaginationDetector.ts`:
- Around line 339-357: The nav exclusion in shouldSkipElement is too narrow;
update the nav-detection logic (inside shouldSkipElement, where nav is computed)
to treat a nav as pagination if its className or aria-label matches a broader
allowlist (e.g., /\b(paginat(ion|e)|page|page[- ]navigation|prev|next)\b/i) OR
switch to a denylist approach by excluding common non-pagination navs (e.g.,
class/aria values containing 'site', 'main-nav', 'primary', 'header', 'footer',
'breadcrumb', 'sidebar') and only skip when a nav matches a denylist; modify the
conditional that currently uses /\bpaginat(ion|e)\b/i so it uses the new regex
or denylist check on nav.className and nav.getAttribute('aria-label')
accordingly.
- Around line 434-453: The JS implementation of scroll detection is missing two
attribute selectors that the TypeScript version includes; update the
scrollToTopPatterns array (the one used with doc.querySelector in the loop that
checks isVisible) to include '[aria-label*="scroll to top" i]' and
'[title*="back to top" i]' so both JS and TS versions stay in sync and the
for-loop that queries selectors behaves identically.
- Around line 406-432: The selector '[class*="infinite-scroll"]' is duplicated
in sentinelPatterns and infiniteScrollLibraries causing double-counting; update
clientPaginationDetector by removing the duplicate from one of the arrays (or
merge the two arrays into a single set) so that
doc.querySelector('[class*="infinite-scroll"]') only increments score
once—adjust the loops that iterate over sentinelPatterns and
infiniteScrollLibraries (or replace them with a single combinedDedupedPatterns
list) to ensure each unique selector contributes at most one score increment
used by hasStrongInfiniteScrollSignals/score logic.
- Around line 95-104: Anchored regexes are being matched against combinedText
(the concatenation of text, ariaLabel, and title), so start/end-anchored
patterns like those in nextButtonTextPatterns, loadMorePatterns, and
nextButtonArrowPatterns fail when ariaLabel/title are non-empty; update the
detection logic in the block that builds combinedText so anchored patterns are
tested only against the raw text variable (text) while
non-anchored/word-boundary patterns are tested against combinedText—use
matchesAnyPattern (or a new helper) to run anchoredPatterns on text and
unanchoredPatterns on combinedText, then adjust scoring/reasons the same way
(score += 10/8 and reasons.push) so Load More/Next detection works when
ariaLabel/title exist.
🧹 Nitpick comments (3)
src/helpers/clientPaginationDetector.ts (2)
65-74: Dead code:prevButtonTextPatternsandprevButtonArrowPatternsare defined but never used.These two pattern arrays were introduced but no code in this file references them. If prev-button detection was intentionally removed, these declarations should be cleaned up.
🧹 Suggested removal
- const prevButtonTextPatterns = [ - /^\s*prev(ious)?\s*$/i, - /\bprevious\s+page\b/i, - /\bpage\s+précédente\b/i, - ]; - - const prevButtonArrowPatterns = [ - /^[<\s‹←«]+$/, - /^<<$/, - ];
39-63: Significant code duplication withpageAnalyzer.js.Nearly the entire pagination detection logic — patterns,
shouldSkipElement,isNearList,detectInfiniteScrollIndicators, candidate scoring, threshold constants, and decision tree — is duplicated across this TypeScript class and the JS IIFE inpageAnalyzer.js. Any bug fix or tuning change must be applied twice, and the two copies have already diverged slightly (e.g.,scrollToTopPatternsmismatch noted separately).Consider extracting the shared constants (patterns, thresholds) and core scoring logic into a single source of truth that both consumers import or inline from.
Also applies to: 78-173, 175-235, 395-463
server/src/sdk/browserSide/pageAnalyzer.js (1)
1043-1067: Substantial duplication withclientPaginationDetector.ts.All patterns, constants,
shouldSkipElement,isNearList, infinite scroll detection, and the scoring/decision tree are duplicated. This has already caused a drift (scrollToTopPatterns). A shared module or build-time inlining would prevent future divergence.Also applies to: 1076-1091, 1847-1875, 1976-2042, 2047-2103
| const combinedText = `${text} ${ariaLabel} ${title}`; | ||
| if (this.matchesAnyPattern(combinedText, nextButtonTextPatterns)) { | ||
| score += 10; | ||
| reasons.push('text match (+10)'); | ||
| } else if (text.length <= 3 && this.matchesAnyPattern(text, nextButtonArrowPatterns)) { | ||
| score += 8; | ||
| reasons.push('arrow match (+8)'); | ||
| } | ||
|
|
||
| if (this.isNearList(element, listContainer)) { | ||
| if (score === 0) continue; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check loadMorePatterns usage
rg -n 'loadMorePatterns' src/helpers/clientPaginationDetector.ts
rg -n 'loadMorePatterns' server/src/sdk/browserSide/pageAnalyzer.jsRepository: getmaxun/maxun
Length of output: 275
🏁 Script executed:
# Also check for pattern definitions to understand if they're anchored
rg -n 'nextButtonTextPatterns\|nextButtonArrowPatterns\|loadMorePatterns' -A 5 src/helpers/clientPaginationDetector.ts | head -100Repository: getmaxun/maxun
Length of output: 40
🏁 Script executed:
# Get the full context of the file structure
wc -l src/helpers/clientPaginationDetector.tsRepository: getmaxun/maxun
Length of output: 101
🏁 Script executed:
# Get pattern definitions
sed -n '50,80p' src/helpers/clientPaginationDetector.tsRepository: getmaxun/maxun
Length of output: 859
🏁 Script executed:
# Get the context around lines 95-104
sed -n '85,115p' src/helpers/clientPaginationDetector.tsRepository: getmaxun/maxun
Length of output: 1064
🏁 Script executed:
# Find and display the matchesAnyPattern method
rg -n 'matchesAnyPattern' src/helpers/clientPaginationDetector.ts -A 10 | head -50Repository: getmaxun/maxun
Length of output: 1355
🏁 Script executed:
# Get nextButtonTextPatterns and nextButtonArrowPatterns definitions
rg -n 'nextButtonTextPatterns\|nextButtonArrowPatterns' -B 2 -A 8 src/helpers/clientPaginationDetector.ts | head -80Repository: getmaxun/maxun
Length of output: 40
🏁 Script executed:
# Get nextButtonTextPatterns definition specifically
rg -n 'const nextButtonTextPatterns' -A 15 src/helpers/clientPaginationDetector.tsRepository: getmaxun/maxun
Length of output: 475
🏁 Script executed:
# Let's also check if there's a nextButtonArrowPatterns definition
rg -n 'const nextButtonArrowPatterns' -A 10 src/helpers/clientPaginationDetector.tsRepository: getmaxun/maxun
Length of output: 375
Anchored patterns fail when ariaLabel or title are non-empty in combinedText.
combinedText concatenates text, ariaLabel, and title with spaces. Patterns with start/end anchors (like /^\s*next\s*$/i, /^\s*load\s+more\s*$/i) cannot match when ariaLabel or title are non-empty, because the combined string will have trailing content after the expected text.
Specifically:
- In
nextButtonTextPatterns: the first pattern/^\s*next\s*$/i(line 42) is anchored and will fail; only the word-boundary patterns (lines 43–47) will match incombinedText. - All
loadMorePatterns(lines 55–62) are anchored, so a "Load More" button with a non-emptyaria-labelortitlewill score 0 despite matching the text itself. - The same issue exists with
nextButtonArrowPatterns(lines 50–51).
Consider either:
- Testing anchored patterns against
textonly, and word-boundary patterns againstcombinedText, or - Removing the anchors from patterns that need to match inside
combinedText.
🤖 Prompt for AI Agents
In `@src/helpers/clientPaginationDetector.ts` around lines 95 - 104, Anchored
regexes are being matched against combinedText (the concatenation of text,
ariaLabel, and title), so start/end-anchored patterns like those in
nextButtonTextPatterns, loadMorePatterns, and nextButtonArrowPatterns fail when
ariaLabel/title are non-empty; update the detection logic in the block that
builds combinedText so anchored patterns are tested only against the raw text
variable (text) while non-anchored/word-boundary patterns are tested against
combinedText—use matchesAnyPattern (or a new helper) to run anchoredPatterns on
text and unanchoredPatterns on combinedText, then adjust scoring/reasons the
same way (score += 10/8 and reasons.push) so Load More/Next detection works when
ariaLabel/title exist.
| const sentinelPatterns = [ | ||
| '[class*="sentinel"]', | ||
| '[class*="trigger"]', | ||
| '[data-infinite]', | ||
| '[data-scroll-trigger]', | ||
| '#infinite-scroll-trigger', | ||
| '[class*="infinite"]', | ||
| '[id*="infinite"]' | ||
| '[class*="infinite-scroll"]', | ||
| '[id*="infinite-scroll"]', | ||
| ]; | ||
|
|
||
| for (const selector of sentinelPatterns) { | ||
| if (doc.querySelector(selector)) { | ||
| score += 4; | ||
| indicators.push(`Sentinel element: ${selector} (+4)`); | ||
| score += 6; | ||
| break; | ||
| } | ||
| } | ||
|
|
||
| const scrollToTopPatterns = [ | ||
| '[class*="scroll"][class*="top"]', | ||
| '[aria-label*="scroll to top"]', | ||
| '[title*="back to top"]', | ||
| '.back-to-top', | ||
| '#back-to-top', | ||
| '[class*="scrolltop"]', | ||
| '[class*="backtotop"]', | ||
| 'button[class*="top"]', | ||
| 'a[href="#top"]', | ||
| 'a[href="#"]' | ||
| ]; | ||
|
|
||
| for (const selector of scrollToTopPatterns) { | ||
| const element = doc.querySelector(selector); | ||
| if (element && this.isVisible(element as HTMLElement)) { | ||
| score += 2; | ||
| indicators.push(`Scroll-to-top button (+2)`); | ||
| break; | ||
| } | ||
| } | ||
|
|
||
| if (initialHeight > viewportHeight * 3) { | ||
| score += 3; | ||
| indicators.push(`Very tall page (${(initialHeight / viewportHeight).toFixed(1)}x viewport) (+3)`); | ||
| } else if (initialHeight > viewportHeight * 2) { | ||
| score += 2; | ||
| indicators.push(`Tall page (${(initialHeight / viewportHeight).toFixed(1)}x viewport) (+2)`); | ||
| } | ||
|
|
||
| if (initialItemCount >= 20) { | ||
| score += 2; | ||
| indicators.push(`Many list items (${initialItemCount}) (+2)`); | ||
| } else if (initialItemCount >= 10) { | ||
| score += 1; | ||
| indicators.push(`Good number of list items (${initialItemCount}) (+1)`); | ||
| } | ||
|
|
||
| const infiniteScrollLibraries = [ | ||
| '.infinite-scroll', | ||
| '[data-infinite-scroll]', | ||
| '[data-flickity]', | ||
| '[data-slick]', | ||
| '.masonry', | ||
| '[data-masonry]', | ||
| '[class*="infinite-scroll"]', | ||
| '[class*="lazy-load"]', | ||
| '[data-lazy]' | ||
| ]; | ||
|
|
||
| for (const selector of infiniteScrollLibraries) { | ||
| if (doc.querySelector(selector)) { | ||
| score += 4; | ||
| indicators.push(`Infinite scroll library: ${selector} (+4)`); | ||
| score += 6; | ||
| break; | ||
| } | ||
| } |
There was a problem hiding this comment.
Double-counting: sentinelPatterns and infiniteScrollLibraries both match [class*="infinite-scroll"].
If a page has an element with class="infinite-scroll-component", both loops match and the score jumps by 12 (6+6), which meets the hasStrongInfiniteScrollSignals threshold on its own. This single CSS class match would cause the detector to return scrollDown with medium confidence — potentially a false positive for pages that simply use an infinite-scroll library wrapper around a paginated list.
Either deduplicate the selector or treat the two groups as a single check.
♻️ Suggested deduplication
const sentinelPatterns = [
'[data-infinite]',
'[data-scroll-trigger]',
'#infinite-scroll-trigger',
- '[class*="infinite-scroll"]',
- '[id*="infinite-scroll"]',
- ];
-
- for (const selector of sentinelPatterns) {
- if (doc.querySelector(selector)) {
- score += 6;
- break;
- }
- }
-
- const infiniteScrollLibraries = [
- '.infinite-scroll',
- '[data-infinite-scroll]',
'[class*="infinite-scroll"]',
+ '[id*="infinite-scroll"]',
+ '.infinite-scroll',
+ '[data-infinite-scroll]',
];
- for (const selector of infiniteScrollLibraries) {
- if (doc.querySelector(selector)) {
+ for (const selector of sentinelPatterns) {
+ if (doc.querySelector(selector)) {
score += 6;
break;
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const sentinelPatterns = [ | |
| '[class*="sentinel"]', | |
| '[class*="trigger"]', | |
| '[data-infinite]', | |
| '[data-scroll-trigger]', | |
| '#infinite-scroll-trigger', | |
| '[class*="infinite"]', | |
| '[id*="infinite"]' | |
| '[class*="infinite-scroll"]', | |
| '[id*="infinite-scroll"]', | |
| ]; | |
| for (const selector of sentinelPatterns) { | |
| if (doc.querySelector(selector)) { | |
| score += 4; | |
| indicators.push(`Sentinel element: ${selector} (+4)`); | |
| score += 6; | |
| break; | |
| } | |
| } | |
| const scrollToTopPatterns = [ | |
| '[class*="scroll"][class*="top"]', | |
| '[aria-label*="scroll to top"]', | |
| '[title*="back to top"]', | |
| '.back-to-top', | |
| '#back-to-top', | |
| '[class*="scrolltop"]', | |
| '[class*="backtotop"]', | |
| 'button[class*="top"]', | |
| 'a[href="#top"]', | |
| 'a[href="#"]' | |
| ]; | |
| for (const selector of scrollToTopPatterns) { | |
| const element = doc.querySelector(selector); | |
| if (element && this.isVisible(element as HTMLElement)) { | |
| score += 2; | |
| indicators.push(`Scroll-to-top button (+2)`); | |
| break; | |
| } | |
| } | |
| if (initialHeight > viewportHeight * 3) { | |
| score += 3; | |
| indicators.push(`Very tall page (${(initialHeight / viewportHeight).toFixed(1)}x viewport) (+3)`); | |
| } else if (initialHeight > viewportHeight * 2) { | |
| score += 2; | |
| indicators.push(`Tall page (${(initialHeight / viewportHeight).toFixed(1)}x viewport) (+2)`); | |
| } | |
| if (initialItemCount >= 20) { | |
| score += 2; | |
| indicators.push(`Many list items (${initialItemCount}) (+2)`); | |
| } else if (initialItemCount >= 10) { | |
| score += 1; | |
| indicators.push(`Good number of list items (${initialItemCount}) (+1)`); | |
| } | |
| const infiniteScrollLibraries = [ | |
| '.infinite-scroll', | |
| '[data-infinite-scroll]', | |
| '[data-flickity]', | |
| '[data-slick]', | |
| '.masonry', | |
| '[data-masonry]', | |
| '[class*="infinite-scroll"]', | |
| '[class*="lazy-load"]', | |
| '[data-lazy]' | |
| ]; | |
| for (const selector of infiniteScrollLibraries) { | |
| if (doc.querySelector(selector)) { | |
| score += 4; | |
| indicators.push(`Infinite scroll library: ${selector} (+4)`); | |
| score += 6; | |
| break; | |
| } | |
| } | |
| const sentinelPatterns = [ | |
| '[data-infinite]', | |
| '[data-scroll-trigger]', | |
| '#infinite-scroll-trigger', | |
| '[class*="infinite-scroll"]', | |
| '[id*="infinite-scroll"]', | |
| '.infinite-scroll', | |
| '[data-infinite-scroll]', | |
| ]; | |
| for (const selector of sentinelPatterns) { | |
| if (doc.querySelector(selector)) { | |
| score += 6; | |
| break; | |
| } | |
| } |
🤖 Prompt for AI Agents
In `@src/helpers/clientPaginationDetector.ts` around lines 406 - 432, The selector
'[class*="infinite-scroll"]' is duplicated in sentinelPatterns and
infiniteScrollLibraries causing double-counting; update clientPaginationDetector
by removing the duplicate from one of the arrays (or merge the two arrays into a
single set) so that doc.querySelector('[class*="infinite-scroll"]') only
increments score once—adjust the loops that iterate over sentinelPatterns and
infiniteScrollLibraries (or replace them with a single combinedDedupedPatterns
list) to ensure each unique selector contributes at most one score increment
used by hasStrongInfiniteScrollSignals/score logic.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@server/src/sdk/browserSide/pageAnalyzer.js`:
- Around line 2029-2037: In detectFromNearbyElements, the element.closest(...)
call that awards the +5 pagination-ancestor bonus only checks class selectors;
update that closest call to also include the [aria-label*="paginat" i] attribute
selector (i.e., add , [aria-label*="paginat" i] alongside [class*="paginat"],
[class*="pager"]) so elements inside containers identified by aria-label receive
the same +5 bonus; keep the existing try/catch and scoring logic unchanged.
- Around line 1871-1872: The getListContainer function currently returns
listElements[0] when listElements.length === 0 (producing undefined); change the
conditional so empty input returns a clear value (e.g., null) and otherwise
return the first element — i.e., in getListContainer replace the faulty branch
so when listElements.length === 0 it returns null (or another explicit sentinel)
and otherwise returns listElements[0], ensuring a stable, non-latent NPE-safe
return for callers.
In `@src/helpers/clientPaginationDetector.ts`:
- Line 278: The ternary assigning confidence is dead because both branches
return 'high'; update the expression in clientPaginationDetector.ts where
confidence is set (variable name: confidence, condition: hasNumberedPages) so
that the false branch returns 'medium' instead of 'high' (i.e., make the ternary
yield 'high' when hasNumberedPages is true and 'medium' when false) to reflect
differing confidence levels.
- Around line 116-118: getListContainer currently returns listElements[0] even
when listElements is empty which yields undefined typed as HTMLElement; change
getListContainer to return HTMLElement | null and update its implementation to
return null when listElements.length === 0 (otherwise return listElements[0]),
then update any callers of getListContainer to handle a possible null return
(e.g., guard before dereferencing the result or bail out) so you no longer risk
an NPE; reference: getListContainer(listElements: HTMLElement[]), and ensure
callers that previously assumed an HTMLElement now check for null.
🧹 Nitpick comments (2)
src/helpers/clientPaginationDetector.ts (1)
231-267: Significant code duplication across the three detection phases.
detectFromPaginationWrapper,detectFromNearbyElements, anddetectFromFullDocumentshare nearly identical loops for candidate scoring (text extraction, pattern matching, score accumulation). Consider extracting a sharedscoreCandidates(clickables, listContainer, options)helper to reduce duplication and risk of the phases drifting out of sync.Also applies to: 332-382, 416-460
server/src/sdk/browserSide/pageAnalyzer.js (1)
1940-2151: Same code duplication concern as the TS file — three detection phase functions share near-identical candidate scoring loops.Consider extracting the shared text-extraction + pattern-matching + scoring logic into a reusable helper to keep the JS and TS implementations maintainable and consistent.
| function getListContainer(listElements) { | ||
| if (listElements.length === 0) return listElements[0]; |
There was a problem hiding this comment.
Same empty-array bug as the TS file: getListContainer returns undefined for empty input.
Line 1872: listElements[0] is undefined when the array is empty. The caller guards against this, but this is a latent NPE if the guard is ever removed.
🐛 Proposed fix
function getListContainer(listElements) {
- if (listElements.length === 0) return listElements[0];
+ if (listElements.length === 0) return null;
var firstParent = listElements[0].parentElement;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function getListContainer(listElements) { | |
| if (listElements.length === 0) return listElements[0]; | |
| function getListContainer(listElements) { | |
| if (listElements.length === 0) return null; | |
| var firstParent = listElements[0].parentElement; |
🤖 Prompt for AI Agents
In `@server/src/sdk/browserSide/pageAnalyzer.js` around lines 1871 - 1872, The
getListContainer function currently returns listElements[0] when
listElements.length === 0 (producing undefined); change the conditional so empty
input returns a clear value (e.g., null) and otherwise return the first element
— i.e., in getListContainer replace the faulty branch so when
listElements.length === 0 it returns null (or another explicit sentinel) and
otherwise returns listElements[0], ensuring a stable, non-latent NPE-safe return
for callers.
| private getListContainer(listElements: HTMLElement[]): HTMLElement { | ||
| if (listElements.length === 0) return listElements[0]; | ||
|
|
There was a problem hiding this comment.
getListContainer returns undefined when listElements is empty.
Line 117: listElements[0] is undefined when the array is empty. While the caller guards against this at line 61, this method has a latent bug — accessing an empty array and returning undefined typed as HTMLElement will cause NPEs downstream if this guard is ever bypassed.
🐛 Proposed fix
private getListContainer(listElements: HTMLElement[]): HTMLElement {
- if (listElements.length === 0) return listElements[0];
+ if (listElements.length === 0) {
+ throw new Error('getListContainer called with empty listElements');
+ }🤖 Prompt for AI Agents
In `@src/helpers/clientPaginationDetector.ts` around lines 116 - 118,
getListContainer currently returns listElements[0] even when listElements is
empty which yields undefined typed as HTMLElement; change getListContainer to
return HTMLElement | null and update its implementation to return null when
listElements.length === 0 (otherwise return listElements[0]), then update any
callers of getListContainer to handle a possible null return (e.g., guard before
dereferencing the result or bail out) so you no longer risk an NPE; reference:
getListContainer(listElements: HTMLElement[]), and ensure callers that
previously assumed an HTMLElement now check for null.
| } | ||
| if (nextButton) { | ||
| const selector = this.generateSelectorsForElement(nextButton, doc, selectorGenerator); | ||
| const confidence = hasNumberedPages ? 'high' : 'high'; |
There was a problem hiding this comment.
Dead ternary — both branches return 'high'.
hasNumberedPages ? 'high' : 'high' always evaluates to 'high'. If the intent was to differentiate confidence based on numbered pages, one branch should be 'medium'.
🐛 Proposed fix
- const confidence = hasNumberedPages ? 'high' : 'high';
+ const confidence = hasNumberedPages ? 'high' : 'medium';📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const confidence = hasNumberedPages ? 'high' : 'high'; | |
| const confidence = hasNumberedPages ? 'high' : 'medium'; |
🤖 Prompt for AI Agents
In `@src/helpers/clientPaginationDetector.ts` at line 278, The ternary assigning
confidence is dead because both branches return 'high'; update the expression in
clientPaginationDetector.ts where confidence is set (variable name: confidence,
condition: hasNumberedPages) so that the false branch returns 'medium' instead
of 'high' (i.e., make the ternary yield 'high' when hasNumberedPages is true and
'medium' when false) to reflect differing confidence levels.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@server/src/sdk/browserSide/pageAnalyzer.js`:
- Around line 2061-2100: detectInfiniteScrollScore currently double-counts the
same selector because sentinelPatterns and infiniteScrollLibraries both include
'[class*="infinite-scroll"]', inflating score; update detectInfiniteScrollScore
to deduplicate selectors (e.g., merge sentinelPatterns and
infiniteScrollLibraries into a single array or filter out duplicates before
querying) so each unique selector (from
sentinelPatterns/infiniteScrollLibraries) only adds to score once, keeping the
rest of the logic (score increments, scrollToTopPatterns checks,
initialHeight/viewportHeight checks) unchanged.
- Around line 1085-1096: In getClickableElements, avoid using
clickables.push.apply(clickables, Array.from(elements)) because spreading
thousands of nodes can throw a RangeError; instead concatenate or iterate to
append elements safely (e.g., use clickables =
clickables.concat(Array.from(elements)) or loop over elements and push
individually), keep the existing de-duplication via Array.from(new
Set(clickables)), and ensure the branch that adds root (when root !== document
and root.tagName/role indicates a button/link) still pushes the single root
element.
🧹 Nitpick comments (2)
server/src/sdk/browserSide/pageAnalyzer.js (2)
1953-1977: First-match-wins: scoring indetectFromPaginationWrapperdoesn't differentiate between candidates.Both
nextScoreandlmScoreare unconditionally set to 20, and the guardif (20 > lmScore)/if (20 > nextScore)prevents any subsequent match from replacing the first one. Unlike the other phase functions (detectFromNearbyElements,detectFromFullDocument) where scores vary based on element properties (tag, class, ancestor), here every candidate gets an identical fixed score.If the pagination wrapper contains multiple clickable elements matching
nextButtonTextPatterns(e.g., a disabled-looking "Next" inside an inner div and the real "Next" button), the first one encountered in DOM order wins regardless of quality. Consider adding differentiating signals (e.g., tag type, proximity to active page) as done in the other phases, or at minimum use>=so later (potentially better-positioned) elements can replace earlier ones.
2000-2001:getClickableElements(document)is called twice — once per phase — re-querying the entire DOM.Both
detectFromNearbyElements(line 2001) anddetectFromFullDocument(line 2103) independently callgetClickableElements(document), which runs sixquerySelectorAllcalls each time across the full document. On large pages this is wasteful. Consider computing the list once in the orchestration block and passing it as a parameter.Also applies to: 2102-2103
| function getClickableElements(root) { | ||
| var clickables = []; | ||
| var selectors = ['button', 'a', '[role="button"]', '[onclick]', '.btn', '.button']; | ||
| for (var i = 0; i < selectors.length; i++) { | ||
| var elements = root.querySelectorAll(selectors[i]); | ||
| clickables.push.apply(clickables, Array.from(elements)); | ||
| } | ||
| if (root !== document && (root.tagName === 'BUTTON' || root.tagName === 'A' || root.getAttribute('role') === 'button')) { | ||
| clickables.push(root); | ||
| } | ||
| return Array.from(new Set(clickables)); | ||
| } |
There was a problem hiding this comment.
push.apply can throw a RangeError on pages with many clickable elements.
clickables.push.apply(clickables, Array.from(elements)) spreads all elements as individual arguments. If a single querySelectorAll returns thousands of nodes (common for a tags on link-heavy pages), this can exceed the engine's maximum call-stack size. Use concat or a loop instead.
♻️ Suggested fix
function getClickableElements(root) {
var clickables = [];
var selectors = ['button', 'a', '[role="button"]', '[onclick]', '.btn', '.button'];
for (var i = 0; i < selectors.length; i++) {
var elements = root.querySelectorAll(selectors[i]);
- clickables.push.apply(clickables, Array.from(elements));
+ for (var j = 0; j < elements.length; j++) {
+ clickables.push(elements[j]);
+ }
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function getClickableElements(root) { | |
| var clickables = []; | |
| var selectors = ['button', 'a', '[role="button"]', '[onclick]', '.btn', '.button']; | |
| for (var i = 0; i < selectors.length; i++) { | |
| var elements = root.querySelectorAll(selectors[i]); | |
| clickables.push.apply(clickables, Array.from(elements)); | |
| } | |
| if (root !== document && (root.tagName === 'BUTTON' || root.tagName === 'A' || root.getAttribute('role') === 'button')) { | |
| clickables.push(root); | |
| } | |
| return Array.from(new Set(clickables)); | |
| } | |
| function getClickableElements(root) { | |
| var clickables = []; | |
| var selectors = ['button', 'a', '[role="button"]', '[onclick]', '.btn', '.button']; | |
| for (var i = 0; i < selectors.length; i++) { | |
| var elements = root.querySelectorAll(selectors[i]); | |
| for (var j = 0; j < elements.length; j++) { | |
| clickables.push(elements[j]); | |
| } | |
| } | |
| if (root !== document && (root.tagName === 'BUTTON' || root.tagName === 'A' || root.getAttribute('role') === 'button')) { | |
| clickables.push(root); | |
| } | |
| return Array.from(new Set(clickables)); | |
| } |
🤖 Prompt for AI Agents
In `@server/src/sdk/browserSide/pageAnalyzer.js` around lines 1085 - 1096, In
getClickableElements, avoid using clickables.push.apply(clickables,
Array.from(elements)) because spreading thousands of nodes can throw a
RangeError; instead concatenate or iterate to append elements safely (e.g., use
clickables = clickables.concat(Array.from(elements)) or loop over elements and
push individually), keep the existing de-duplication via Array.from(new
Set(clickables)), and ensure the branch that adds root (when root !== document
and root.tagName/role indicates a button/link) still pushes the single root
element.
| function detectInfiniteScrollScore() { | ||
| try { | ||
| const debugInfo = { | ||
| indicators: [], | ||
| score: 0, | ||
| threshold: 5 | ||
| }; | ||
|
|
||
| const initialItemCount = listElements.length; | ||
| const initialHeight = document.documentElement.scrollHeight; | ||
| const viewportHeight = window.innerHeight; | ||
| const currentScrollY = window.scrollY; | ||
|
|
||
| if (initialHeight <= viewportHeight) { | ||
| return 0; | ||
| } | ||
|
|
||
| const loadingIndicators = [ | ||
| '[class*="loading"]', | ||
| '[class*="spinner"]', | ||
| '[class*="skeleton"]', | ||
| '[aria-busy="true"]', | ||
| '[data-loading="true"]', | ||
| '.loader', | ||
| '.load-more-spinner', | ||
| '[class*="load"]', | ||
| '[id*="loading"]', | ||
| '[id*="spinner"]' | ||
| ]; | ||
| var score = 0; | ||
| var initialHeight = document.documentElement.scrollHeight; | ||
| var viewportHeight = window.innerHeight; | ||
|
|
||
| for (const selector of loadingIndicators) { | ||
| if (document.querySelector(selector)) { | ||
| debugInfo.score += 3; | ||
| debugInfo.indicators.push(`Loading indicator: ${selector} (+3)`); | ||
| break; | ||
| } | ||
| } | ||
| if (initialHeight <= viewportHeight) return 0; | ||
|
|
||
| const sentinelPatterns = [ | ||
| '[class*="sentinel"]', | ||
| '[class*="trigger"]', | ||
| '[data-infinite]', | ||
| '[data-scroll-trigger]', | ||
| '#infinite-scroll-trigger', | ||
| '[class*="infinite"]', | ||
| '[id*="infinite"]' | ||
| var sentinelPatterns = [ | ||
| '[data-infinite]', '[data-scroll-trigger]', | ||
| '#infinite-scroll-trigger', '[class*="infinite-scroll"]', '[id*="infinite-scroll"]', | ||
| ]; | ||
|
|
||
| for (const selector of sentinelPatterns) { | ||
| if (document.querySelector(selector)) { | ||
| debugInfo.score += 4; | ||
| debugInfo.indicators.push(`Sentinel element: ${selector} (+4)`); | ||
| break; | ||
| } | ||
| for (var i = 0; i < sentinelPatterns.length; i++) { | ||
| if (document.querySelector(sentinelPatterns[i])) { score += 6; break; } | ||
| } | ||
|
|
||
| const scrollToTopPatterns = [ | ||
| '[class*="scroll"][class*="top"]', | ||
| '[aria-label*="scroll to top"]', | ||
| '[title*="back to top"]', | ||
| '.back-to-top', | ||
| '#back-to-top', | ||
| '[class*="scrolltop"]', | ||
| '[class*="backtotop"]', | ||
| 'button[class*="top"]', | ||
| 'a[href="#top"]', | ||
| 'a[href="#"]' | ||
| var infiniteScrollLibraries = [ | ||
| '.infinite-scroll', '[data-infinite-scroll]', '[class*="infinite-scroll"]', | ||
| ]; | ||
|
|
||
| for (const selector of scrollToTopPatterns) { | ||
| const element = document.querySelector(selector); | ||
| if (element && isVisible(element)) { | ||
| debugInfo.score += 2; | ||
| debugInfo.indicators.push('Scroll-to-top button (+2)'); | ||
| break; | ||
| } | ||
| for (var j = 0; j < infiniteScrollLibraries.length; j++) { | ||
| if (document.querySelector(infiniteScrollLibraries[j])) { score += 6; break; } | ||
| } | ||
|
|
||
| if (initialHeight > viewportHeight * 3) { | ||
| debugInfo.score += 3; | ||
| debugInfo.indicators.push(`Very tall page (${(initialHeight / viewportHeight).toFixed(1)}x viewport) (+3)`); | ||
| } else if (initialHeight > viewportHeight * 2) { | ||
| debugInfo.score += 2; | ||
| debugInfo.indicators.push(`Tall page (${(initialHeight / viewportHeight).toFixed(1)}x viewport) (+2)`); | ||
| } | ||
|
|
||
| if (initialItemCount >= 20) { | ||
| debugInfo.score += 2; | ||
| debugInfo.indicators.push(`Many list items (${initialItemCount}) (+2)`); | ||
| } else if (initialItemCount >= 10) { | ||
| debugInfo.score += 1; | ||
| debugInfo.indicators.push(`Good number of list items (${initialItemCount}) (+1)`); | ||
| } | ||
|
|
||
| const infiniteScrollLibraries = [ | ||
| '.infinite-scroll', | ||
| '[data-infinite-scroll]', | ||
| '[data-flickity]', | ||
| '[data-slick]', | ||
| '.masonry', | ||
| '[data-masonry]', | ||
| '[class*="infinite-scroll"]', | ||
| '[class*="lazy-load"]', | ||
| '[data-lazy]' | ||
| var scrollToTopPatterns = [ | ||
| '[aria-label*="scroll to top" i]', '[title*="back to top" i]', | ||
| '.back-to-top', '#back-to-top', '[class*="scrolltop"]', '[class*="backtotop"]', | ||
| ]; | ||
|
|
||
| for (const selector of infiniteScrollLibraries) { | ||
| if (document.querySelector(selector)) { | ||
| debugInfo.score += 4; | ||
| debugInfo.indicators.push(`Infinite scroll library: ${selector} (+4)`); | ||
| break; | ||
| } | ||
| for (var k = 0; k < scrollToTopPatterns.length; k++) { | ||
| try { | ||
| var el = document.querySelector(scrollToTopPatterns[k]); | ||
| if (el && isVisible(el)) { score += 2; break; } | ||
| } catch (e) { continue; } | ||
| } | ||
|
|
||
| const lastListItem = listElements[listElements.length - 1]; | ||
| if (lastListItem) { | ||
| const lastItemRect = lastListItem.getBoundingClientRect(); | ||
| const lastItemY = lastItemRect.bottom + currentScrollY; | ||
| const viewportBottom = currentScrollY + viewportHeight; | ||
| if (initialHeight > viewportHeight * 5) score += 2; | ||
| return score; | ||
| } catch (e) { | ||
| return 0; | ||
| } | ||
| } |
There was a problem hiding this comment.
detectInfiniteScrollScore: sentinel and library patterns overlap, inflating score.
sentinelPatterns includes [class*="infinite-scroll"] and [id*="infinite-scroll"] (line 2071), while infiniteScrollLibraries also includes [class*="infinite-scroll"] (line 2078). A single element matching [class*="infinite-scroll"] would contribute +6 from each loop (total +12), which could push the score past the threshold of 8 on this overlap alone — potentially misidentifying pages that simply have an "infinite-scroll" class for styling purposes.
Consider deduplicating the selectors or merging the two loops into one so a single element doesn't double-count.
🤖 Prompt for AI Agents
In `@server/src/sdk/browserSide/pageAnalyzer.js` around lines 2061 - 2100,
detectInfiniteScrollScore currently double-counts the same selector because
sentinelPatterns and infiniteScrollLibraries both include
'[class*="infinite-scroll"]', inflating score; update detectInfiniteScrollScore
to deduplicate selectors (e.g., merge sentinelPatterns and
infiniteScrollLibraries into a single array or filter out duplicates before
querying) so each unique selector (from
sentinelPatterns/infiniteScrollLibraries) only adds to score once, keeping the
rest of the logic (score increments, scrollToTopPatterns checks,
initialHeight/viewportHeight checks) unchanged.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/components/recorder/RightSidePanel.tsx (2)
636-640:⚠️ Potential issue | 🟡 MinorSocket listener cleanup relies on a 5-second timeout — fragile on slow connections.
If
paginationScrollTestResultarrives after 5 seconds, the handler has already been removed and the result is silently lost. The user sees no feedback andautoDetectedPaginationstaysnull. Consider removing the arbitrary timeout and instead tying cleanup to component unmount or an explicit cancel action.
460-712:⚠️ Potential issue | 🟠 Major
handleConfirmListCaptureis very large and has a stale-closure risk withautoDetectedPagination.The
useCallbackat Line 460 does not listautoDetectedPaginationin its dependency array (Line 712). The check at Line 649 (autoDetectedPagination && ...) will therefore use a stale value. If auto-detection completes after the first render of this callback, the guard will incorrectly evaluate tofalse, andstartPaginationMode()will be called even though auto-detection succeeded.Add
autoDetectedPaginationto the dependency array:- }, [captureStage, paginationType, limitType, customLimit, startPaginationMode, setShowPaginationOptions, setCaptureStage, notify, stopPaginationMode, startLimitMode, setShowLimitOptions, stopLimitMode, stopCaptureAndEmitGetListSettings, t, browserSteps, currentListActionId, updateListStepLimit]); + }, [captureStage, paginationType, limitType, customLimit, startPaginationMode, setShowPaginationOptions, setCaptureStage, notify, stopPaginationMode, startLimitMode, setShowLimitOptions, stopLimitMode, stopCaptureAndEmitGetListSettings, t, browserSteps, currentListActionId, updateListStepLimit, autoDetectedPagination]);Also consider extracting the
'initial'case body into a separate function for readability.
🤖 Fix all issues with AI agents
In `@src/components/recorder/RightSidePanel.tsx`:
- Line 492: Several new notify(...) calls in RightSidePanel.tsx use hardcoded
English strings and bypass i18n; replace each raw string with the translated
string using the existing t() function (e.g., change notify("success", "Scroll
Down pagination has been auto-detected.") to notify("success", t("..."))).
Update every new notify invocation introduced in this PR (references: notify
calls around the identifiers in RightSidePanel.tsx where "Scroll Down pagination
has been auto-detected." and the other occurrences flagged) to wrap the message
in t(...) with an appropriate translation key or literal used elsewhere in this
file so the UI text is translatable. Ensure you import or reference the same t
function used in this component and keep the notify severity and signature
unchanged.
- Around line 522-527: The branch handling detectionResult.type currently
notifies for 'scrollDown'/'scrollUp' but ignores 'clickNext' and
'clickLoadMore', leaving users without feedback when selector is falsy; update
the branch that checks detectionResult.type to also call notify("success", ...)
for 'clickNext' and 'clickLoadMore' (using the same or an appropriate message)
when those types are detected, and then prevent duplicate notifications by
guarding the later notification that fires when detectionResult.selector is
present (e.g., remove or add a conditional so the later notify only runs when
the earlier one didn't already run or when selector exists). Ensure changes
reference detectionResult.type, detectionResult.selector and the notify(...)
call in the RightSidePanel component.
🧹 Nitpick comments (1)
src/components/recorder/RightSidePanel.tsx (1)
549-596:evaluateSelectoris duplicated five times in this file.The same XPath/CSS selector evaluation logic is copy-pasted at Lines 95, 389, 549, 767, and 972. Extract it into a shared utility (e.g., in a helper module alongside
clientSelectorGenerator) and import it.Sketch
// e.g., src/helpers/evaluateSelector.ts export function evaluateSelector(selector: string, doc: Document): Element[] { if (selector.startsWith('//') || selector.startsWith('(//')) { try { const result = doc.evaluate(selector, doc, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null); const elements: Element[] = []; for (let i = 0; i < result.snapshotLength; i++) { const node = result.snapshotItem(i); if (node && node.nodeType === Node.ELEMENT_NODE) { elements.push(node as Element); } } return elements; } catch { return []; } } try { return Array.from(doc.querySelectorAll(selector)); } catch { return []; } }Also applies to: 767-789, 972-994, 389-411, 95-117
|
|
||
| const handleScrollTestResult = (result: any) => { | ||
| if (result.success && result.contentLoaded) { | ||
| notify("success", "Scroll Down pagination has been auto-detected."); |
There was a problem hiding this comment.
Hardcoded English strings bypass i18n.
All new notification messages use raw English strings instead of the t() translation function used elsewhere in this file. This will result in untranslated UI text for non-English users.
Example fix for one notification
- notify("success", "Scroll Down pagination has been auto-detected.");
+ notify("success", t('right_panel.pagination.auto_detected_scroll_down'));Apply the same pattern to every new notify(...) call introduced in this PR.
Also applies to: 524-524, 526-526, 615-615, 624-624
🤖 Prompt for AI Agents
In `@src/components/recorder/RightSidePanel.tsx` at line 492, Several new
notify(...) calls in RightSidePanel.tsx use hardcoded English strings and bypass
i18n; replace each raw string with the translated string using the existing t()
function (e.g., change notify("success", "Scroll Down pagination has been
auto-detected.") to notify("success", t("..."))). Update every new notify
invocation introduced in this PR (references: notify calls around the
identifiers in RightSidePanel.tsx where "Scroll Down pagination has been
auto-detected." and the other occurrences flagged) to wrap the message in t(...)
with an appropriate translation key or literal used elsewhere in this file so
the UI text is translatable. Ensure you import or reference the same t function
used in this component and keep the notify severity and signature unchanged.
| if (detectionResult.type) { | ||
| if (detectionResult.type === 'scrollDown') { | ||
| notify("success", "Scroll Down pagination has been auto-detected."); | ||
| } else if (detectionResult.type === 'scrollUp') { | ||
| notify("success", "Scroll Up pagination has been auto-detected."); | ||
| } |
There was a problem hiding this comment.
Missing notification for clickNext/clickLoadMore detection types in this branch.
When detectionResult.type is clickNext or clickLoadMore, this block silently falls through. If the element also has a selector, notification eventually fires at Line 615—but if detectionResult.selector is falsy, the user receives no feedback at all about the auto-detection result.
Proposed fix
if (detectionResult.type) {
if (detectionResult.type === 'scrollDown') {
notify("success", "Scroll Down pagination has been auto-detected.");
} else if (detectionResult.type === 'scrollUp') {
notify("success", "Scroll Up pagination has been auto-detected.");
+ } else if (detectionResult.type === 'clickNext') {
+ notify("success", "Next Button pagination has been auto-detected.");
+ } else if (detectionResult.type === 'clickLoadMore') {
+ notify("success", "Load More pagination has been auto-detected.");
}Note: with this change, clickNext/clickLoadMore with a selector would get notified twice (here and at Line 615). You may want to guard the Line 615 notification or remove it in favor of this single notification point.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (detectionResult.type) { | |
| if (detectionResult.type === 'scrollDown') { | |
| notify("success", "Scroll Down pagination has been auto-detected."); | |
| } else if (detectionResult.type === 'scrollUp') { | |
| notify("success", "Scroll Up pagination has been auto-detected."); | |
| } | |
| if (detectionResult.type) { | |
| if (detectionResult.type === 'scrollDown') { | |
| notify("success", "Scroll Down pagination has been auto-detected."); | |
| } else if (detectionResult.type === 'scrollUp') { | |
| notify("success", "Scroll Up pagination has been auto-detected."); | |
| } else if (detectionResult.type === 'clickNext') { | |
| notify("success", "Next Button pagination has been auto-detected."); | |
| } else if (detectionResult.type === 'clickLoadMore') { | |
| notify("success", "Load More pagination has been auto-detected."); | |
| } |
🤖 Prompt for AI Agents
In `@src/components/recorder/RightSidePanel.tsx` around lines 522 - 527, The
branch handling detectionResult.type currently notifies for
'scrollDown'/'scrollUp' but ignores 'clickNext' and 'clickLoadMore', leaving
users without feedback when selector is falsy; update the branch that checks
detectionResult.type to also call notify("success", ...) for 'clickNext' and
'clickLoadMore' (using the same or an appropriate message) when those types are
detected, and then prevent duplicate notifications by guarding the later
notification that fires when detectionResult.selector is present (e.g., remove
or add a conditional so the later notify only runs when the earlier one didn't
already run or when selector exists). Ensure changes reference
detectionResult.type, detectionResult.selector and the notify(...) call in the
RightSidePanel component.
What this PR does?
Improves the auto pagination detection logic during capture list action.
Summary by CodeRabbit
New Features
Bug Fixes