-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat: add auto pagination detection #904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis PR implements automatic pagination detection for the DOM-mode recorder. It adds a client-side detection algorithm, integrates pagination selector propagation through the rendering pipeline, synchronizes UI state with active list steps, and introduces new context APIs and UI components for displaying detected pagination patterns. Changes
Sequence DiagramsequenceDiagram
participant User
participant RightSidePanel
participant clientPaginationDetector
participant BrowserSteps as BrowserStepsContext
participant DOMRenderer as DOMBrowserRenderer
participant BrowserWindow
User->>RightSidePanel: Trigger auto-detection or pagination action
RightSidePanel->>clientPaginationDetector: autoDetectPagination(doc, listSelector, selectorGenerator)
clientPaginationDetector->>clientPaginationDetector: Evaluate list container via selector
clientPaginationDetector->>clientPaginationDetector: Scan clickable elements & infinite scroll signals
clientPaginationDetector->>clientPaginationDetector: Score candidates (next, prev, load-more, scroll)
clientPaginationDetector-->>RightSidePanel: PaginationDetectionResult {type, selector, confidence}
RightSidePanel->>RightSidePanel: Highlight detected element in iframe
RightSidePanel->>RightSidePanel: Scroll detected element into view
RightSidePanel->>RightSidePanel: Update autoDetectedPagination state
RightSidePanel->>RightSidePanel: Display "Auto-detected" summary box
User->>RightSidePanel: Accept detected pagination
RightSidePanel->>BrowserSteps: updateListStepPagination(listId, {type, selector})
BrowserSteps->>BrowserSteps: Update list step pagination object
BrowserSteps-->>RightSidePanel: Confirmation
RightSidePanel->>BrowserWindow: Refresh with updated pagination state
BrowserWindow->>DOMRenderer: Pass paginationSelector prop
DOMRenderer->>DOMRenderer: Update highlighting logic based on paginationSelector
DOMRenderer-->>User: Visual feedback (suppress/highlight as appropriate)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/components/browser/BrowserWindow.tsx (1)
163-163: Tighten hook dependencies and field comparison aroundpaginationSelectorThe new
paginationSelectorstate and syncing logic are directionally good, but there are a few hook/dependency issues that can lead to subtle desyncs:
Pagination-mode sync effect missing
paginationTypedependency (Lines 2106–2129)
- The effect branches on
paginationType(['clickNext', 'clickLoadMore'].includes(paginationType)) butpaginationTypeis not in the dependency array.- If the user changes the pagination type while
paginationModeis true, this effect won’t re-run and may leavepaginationSelectorin an inconsistent state.Consider:
- }, [browserSteps, paginationMode, currentListActionId, paginationSelector]); + }, [browserSteps, paginationMode, currentListActionId, paginationSelector, paginationType]);Highlighter callback using stale
paginationSelector(Lines 1526–1610)
highlighterHandlerbails out whenpaginationMode && paginationSelector, but theuseCallbackdeps do not includepaginationSelector.- Once
paginationSelectorchanges, the closure may not reflect the new selector value, so the bail-out condition can be wrong.Update deps accordingly:
- }, [getList, socket, listSelector, paginationMode, paginationType, limitMode]); + }, [getList, socket, listSelector, paginationMode, paginationType, limitMode, paginationSelector]);Field sync effect uses
JSON.stringify(fields)comparison (Lines 1245–1266)
- Comparing
fieldsandactiveStep.fieldsviaJSON.stringifyis potentially expensive and can flip from equal to unequal if insertion order changes, even when contents are logically the same.- If
fieldsis large or frequently updated, this can become a perf hotspot.If you only need to keep in sync with the active step, a cheaper approach is to key off a simple shape (e.g.
Object.keys(fields).lengthand maybe a version counter), or rely on referential equality from the context if you know when you’ve just updated that step.The rest of the
paginationSelectorwiring (state init at Line 163 and prop pass intoDOMBrowserRendererat Lines 2395–2396) looks consistent with the other pagination-related guards.Also applies to: 1245-1266, 1526-1610, 2106-2129, 2395-2396
src/components/recorder/RightSidePanel.tsx (1)
454-647: Address BiomenoSwitchDeclarationsinhandleConfirmListCaptureBiome correctly flags
constdeclarations (currentListStepForAutoDetect,shouldSkipPaginationMode) directly in theswitchbody (Lines 468–472 and 635–640). This pattern can cause subtle TDZ/scope issues across cases and violatesnoSwitchDeclarations.Wrap the
'initial'case body in a block so these declarations are scoped to that case only:- switch (captureStage) { - case 'initial': + switch (captureStage) { + case 'initial': { const hasValidListSelectorForCurrentAction = browserSteps.some(/* ... */); // ... const currentListStepForAutoDetect = browserSteps.find(/* ... */); // ... const shouldSkipPaginationMode = autoDetectedPagination && ( /* ... */ ); if (!shouldSkipPaginationMode) { startPaginationMode(); } - - setShowPaginationOptions(true); - setCaptureStage('pagination'); - break; + setShowPaginationOptions(true); + setCaptureStage('pagination'); + break; + }This should resolve the Biome errors without changing runtime behavior.
🧹 Nitpick comments (8)
src/helpers/clientSelectorGenerator.ts (1)
2479-2517:generateSelectorsFromElementis functionally sound; consider minor cleanupsThe new helper correctly:
- Centers the target element in the iframe viewport (vertical scroll only) in a guarded inner
try.- Recomputes the bounding rect post-scroll and delegates to
getSelectorsusing the element’s visual center.- Catches and logs both scroll errors and selector-generation errors, returning
nullon failure.Two small follow-ups you might consider:
- Avoid the second
getBoundingClientRectif you don’t depend on post-scroll layout, or add a comment clarifying that it’s intentional (to use the updated rect after scrolling).- Tighten the return type from
any | nullto the actual selector structure returned bygetSelectorsfor better type-safety across callers.Also applies to: 4340-4340
src/context/browserSteps.tsx (1)
83-83:updateListStepPaginationwiring looks consistent with existing state modelThe new context API and implementation correctly:
- Locate the list step by
id.- Replace its
paginationobject with the provided payload, normalizingnull/falsy selectors to"", which matches theListStep.pagination.selector: stringtype and downstream assumptions.- Expose the updater through the provider for consumers.
One small nit: if you ever need to distinguish “unset” vs “intentionally empty” selectors, you may want to preserve
nullinstead of coercing, but for CSS/XPath selectors this is unlikely to matter.Also applies to: 483-501, 557-557
src/components/browser/BrowserWindow.tsx (1)
1663-1677: Factor out duplicatedtargetListId/targetFieldsresolution for pagination clicksBoth
handleDOMElementSelectionandhandleClicknow contain nearly identical logic to derivetargetListIdandtargetFieldsfrom either local state or the active list step (Lines 1663–1677 and 1908–1923), andbrowserStepshas been added to the callback deps (Lines 1838–1856) to support this.Functionally this is correct and fixes the “no currentListId” edge case, but the duplication makes future changes easy to miss in one of the handlers.
Consider extracting a small helper, e.g.:
const resolveTargetListContext = ( currentListId: number | null, fields: Record<string, TextStep>, browserSteps: BrowserStep[], currentListActionId?: string | null ): { targetListId: number; targetFields: Record<string, TextStep> } => { let targetListId = currentListId ?? 0; let targetFields = fields; if ((!targetListId || targetListId === 0) && currentListActionId) { const activeStep = browserSteps.find( s => s.type === 'list' && s.actionId === currentListActionId ) as ListStep | undefined; if (activeStep) { targetListId = activeStep.id; if (!Object.keys(targetFields).length && Object.keys(activeStep.fields).length) { targetFields = activeStep.fields; } } } return { targetListId, targetFields }; };and reuse it in both handlers. This keeps the behaviour in one place and simplifies the hook dependency reasoning.
Also applies to: 1908-1923, 1838-1856
src/components/recorder/RightSidePanel.tsx (3)
749-799: Ensure pagination highlight is always cleaned up when leaving auto‑detected flowYou correctly remove the highlight for
autoDetectedPagination.selectoron discard (Lines 749–787) and in “Choose Different Element” (Lines 955–993), but:
- When going “Back” from pagination (Lines 700–716), you only clear
autoDetectedPaginationand stage, not any outline styles applied earlier.- That can leave dangling outlines/z‑indices on the page even though the UI no longer shows an active auto‑detected pagination.
To keep the page clean and predictable, consider extracting a small helper to remove the current auto‑detected highlight and reuse it from all exit paths (back, discard, “Choose Different Element”):
+ const clearAutoDetectedHighlight = useCallback(() => { + if (!autoDetectedPagination?.selector) return; + + const iframeElement = document.querySelector('#browser-window iframe') as HTMLIFrameElement; + if (!iframeElement?.contentDocument) return; + + try { + const elements = evaluateSelector(autoDetectedPagination.selector, iframeElement.contentDocument); + elements.forEach((el: Element) => { + const htmlEl = el as HTMLElement; + htmlEl.style.outline = ''; + htmlEl.style.outlineOffset = ''; + htmlEl.style.zIndex = ''; + }); + } catch (error) { + console.error('Error removing pagination highlight:', error); + } + }, [autoDetectedPagination]);Then call
clearAutoDetectedHighlight()in:
handleBackCaptureListbeforesetAutoDetectedPagination(null).discardGetListinstead of inlining similar logic.- The “Choose Different Element” handler.
This removes duplication and guarantees highlight cleanup across all flows.
Also applies to: 910-1015
96-122: Factor out repeated CSS/XPathevaluateSelectorhelpers
evaluateSelector(CSS + XPath support) is implemented multiple times in this file:
- Lines 96–122 (clearing previous pagination highlight).
- Lines 534–583 (highlighting detected button and handling chained selectors).
- Lines 753–775 (discard flow cleanup).
- Lines 959–981 (resetting selection on “Choose Different Element”).
This duplication makes the behavior harder to keep consistent and maintain (e.g., if you later tweak XPath handling or chained selectors).
Consider extracting one or two shared helpers at module level (or within the component) and reusing them:
+function evaluateSelector(selector: string, doc: Document): Element[] { + try { + const isXPath = selector.startsWith('//') || selector.startsWith('(//'); + if (isXPath) { + const result = doc.evaluate( + selector, + doc, + null, + XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, + null + ); + const elements: Element[] = []; + for (let i = 0; i < result.snapshotLength; i++) { + const node = result.snapshotItem(i); + if (node && node.nodeType === Node.ELEMENT_NODE) { + elements.push(node as Element); + } + } + return elements; + } + return Array.from(doc.querySelectorAll(selector)); + } catch (err) { + console.error('[RightSidePanel] Selector evaluation failed:', selector, err); + return []; + } +}If you need the “chained selector parts” behavior in one place only, consider a thin wrapper around this base helper rather than a fully separate implementation.
Also applies to: 534-583, 753-775, 959-981
476-477: Route new user‑visible strings through i18nSeveral new strings are hard‑coded in English:
- Line 476–477:
'Detecting pagination...'- Line 601–603: notification about the auto‑detected button.
- Line 944–945: “You can continue with this or manually select…”
- Line 1000–1001: “Please select a different pagination element”
- Line 1012–1013: “Choose Different Element”
Elsewhere in this component you already rely on
t(...)for translations. For consistency and localization support, consider adding appropriate keys in your i18n resources and switching tothere, for example:- notify('info', 'Detecting pagination...'); + notify('info', t('right_panel.pagination.detecting')); - const paginationTypeLabel = detectionResult.type === 'clickNext' ? 'Next Button' : 'Load More Button'; - notify('info', `${paginationTypeLabel} has been auto-detected and highlighted on the page`); + const paginationTypeLabel = + detectionResult.type === 'clickNext' + ? t('right_panel.pagination.next_button') + : t('right_panel.pagination.load_more_button'); + notify('info', t('right_panel.pagination.autodetected_highlighted', { paginationTypeLabel }));And similarly for the helper text and button label in the auto‑detected box.
Also applies to: 601-603, 944-945, 1000-1001, 1012-1013
src/helpers/clientPaginationDetector.ts (2)
394-542: UnusedlistContainerparameter indetectInfiniteScrollIndicators
detectInfiniteScrollIndicatorstakeslistContainer: HTMLElement(Line 394) but never uses it inside the function body. This is slightly confusing and may suggest missing logic.If you don’t need the container here, consider removing the parameter from both the signature and the call site:
- private detectInfiniteScrollIndicators(doc: Document, listElements: HTMLElement[], listContainer: HTMLElement): number { + private detectInfiniteScrollIndicators(doc: Document, listElements: HTMLElement[]): number { // ... } - const infiniteScrollScore = options?.disableScrollDetection - ? 0 - : this.detectInfiniteScrollIndicators(doc, listElements, listContainer); + const infiniteScrollScore = options?.disableScrollDetection + ? 0 + : this.detectInfiniteScrollIndicators(doc, listElements);If you do intend to factor the container into the heuristics later (e.g., checking for sentinels near the bottom of the list), adding a TODO comment to that effect would clarify the intent.
547-583: Consider tightening the debug/result typing ingenerateSelectorsForElement
generateSelectorsForElementbuilds a comma‑joined selector chain from whatever theClientSelectorGeneratorreturns. The logic is fine, but the surrounding type surface is quite loose:
PaginationDetectionResult.debugisany.- The structure of
primaryis accessed via'key' in primarychecks, also effectivelyany.If this type is going to be consumed elsewhere (e.g., logged, surfaced in diagnostics), consider introducing a dedicated interface for the selector generation result and using it here, and narrowing
debugto something likeRecord<string, unknown>or a dedicated debug shape. This would make refactors toClientSelectorGeneratorsafer.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
src/components/browser/BrowserWindow.tsx(8 hunks)src/components/recorder/DOMBrowserRenderer.tsx(3 hunks)src/components/recorder/RightSidePanel.tsx(9 hunks)src/context/browserSteps.tsx(3 hunks)src/helpers/clientPaginationDetector.ts(1 hunks)src/helpers/clientSelectorGenerator.ts(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/helpers/clientPaginationDetector.ts (2)
maxun-core/src/browserSide/scraper.js (4)
doc(225-225)selector(28-28)node(664-664)style(932-932)src/helpers/clientSelectorGenerator.ts (1)
ClientSelectorGenerator(4340-4340)
src/components/recorder/RightSidePanel.tsx (4)
src/context/browserActions.tsx (1)
PaginationType(6-6)src/context/browserSteps.tsx (1)
BrowserStep(43-43)src/helpers/clientPaginationDetector.ts (2)
evaluateSelector(290-318)clientPaginationDetector(586-586)src/helpers/clientSelectorGenerator.ts (1)
clientSelectorGenerator(4341-4341)
🪛 Biome (2.1.2)
src/components/recorder/RightSidePanel.tsx
[error] 468-472: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.
The declaration is defined in this switch clause:
Safe fix: Wrap the declaration in a block.
(lint/correctness/noSwitchDeclarations)
[error] 635-640: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.
The declaration is defined in this switch clause:
Safe fix: Wrap the declaration in a block.
(lint/correctness/noSwitchDeclarations)
🔇 Additional comments (2)
src/components/recorder/RightSidePanel.tsx (1)
82-147: Overall auto‑pagination flow looks soundThe integration of
clientPaginationDetectorinto the list capture flow is cohesive:
- The
useEffectat 82–147 correctly reacts topaginationTypechanges for click‑based types and clears prior highlights/selectors when the type changes.- The
'initial'arm ofhandleConfirmListCapturewires the scroll test, auto‑detection,updateListStepPagination, andupdatePaginationTypein a way that keepsbrowserStepsand UI in sync.- The “Auto‑detected” box at 910–1015 gives clear feedback and a path to override the detection.
Once the smaller issues above are addressed, the behavior should be robust and user‑friendly.
Also applies to: 468-640, 910-1015
src/helpers/clientPaginationDetector.ts (1)
24-285: autoDetectPagination heuristics and error‑handling look solidThe main detection routine:
- Validates the list selector and exits gracefully when no elements are found.
- Scores
clickNext,clickLoadMore, andprevcandidates with visibility and proximity checks, and independently scores infinite scroll indicators.- Uses clear thresholds and confidence levels, and falls back to
{ type: '', selector: null, confidence: 'low' }with useful debug metadata.- Wraps the whole method in a
try/catch, avoiding unhandled exceptions from DOM access.From a correctness and resilience standpoint, this is a good baseline for client‑side pagination detection.
| paginationSelector?: string; | ||
| paginationType?: string; | ||
| limitMode?: boolean; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include paginationSelector in handleDOMHighlighting dependencies
The new paginationSelector prop and the extra guard in the list highlighting logic look consistent with the browser window behaviour (suppress highlighting once a pagination element is chosen). However, handleDOMHighlighting now reads paginationSelector but the useCallback dependency array does not include it (Lines 334–343).
That means updates to paginationSelector won’t be reflected in the callback, and shouldHighlight may be computed using a stale selector value.
Consider updating the deps as follows:
- }, [
- getText,
- getList,
- listSelector,
- paginationMode,
- cachedChildSelectors,
- paginationType,
- limitMode,
- onHighlight,
- ]
+ }, [
+ getText,
+ getList,
+ listSelector,
+ paginationMode,
+ cachedChildSelectors,
+ paginationType,
+ paginationSelector,
+ limitMode,
+ onHighlight,
+ ]Also applies to: 157-159, 262-269, 334-343
🤖 Prompt for AI Agents
In src/components/recorder/DOMBrowserRenderer.tsx around lines 103 and
specifically at the useCallback definitions near lines 157-159, 262-269, and
334-343, the new paginationSelector prop is read inside handleDOMHighlighting
(and related callbacks) but not included in their dependency arrays; update each
callback's dependency array to include paginationSelector so the callbacks
capture the latest selector value and avoid stale shouldHighlight calculations,
and run tests / lint to ensure no other missing deps remain.
| const [autoDetectedPagination, setAutoDetectedPagination] = useState<{ | ||
| type: PaginationType; | ||
| selector: string | null; | ||
| confidence: 'high' | 'medium' | 'low'; | ||
| } | null>(null); | ||
| const autoDetectionRunRef = useRef<string | null>(null); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
autoDetectionRunRef prevents re‑running detection after going back
autoDetectionRunRef is used to guard multiple runs per currentListActionId, but it’s never reset when the user navigates back from the pagination step:
- Line 472–476: once
autoDetectionRunRef.currentis set tocurrentListActionId, subsequent calls for the same action short‑circuit the detection block. - Line 700–716: when backing out of the pagination step you clear
autoDetectedPaginationand resetcaptureStageto'initial', butautoDetectionRunRef.currentremains set.
This means: if a user auto‑detects pagination, goes “Back” to adjust/retry, and confirms again, auto‑detection will not run a second time for that list action, even though the UI suggests a fresh attempt.
Consider resetting the guard when you abandon the previous detection flow, e.g.:
- if (currentListStepForAutoDetect?.listSelector) {
- if (autoDetectionRunRef.current !== currentListActionId) {
- autoDetectionRunRef.current = currentListActionId;
+ if (
+ currentListStepForAutoDetect?.listSelector &&
+ autoDetectionRunRef.current !== currentListActionId
+ ) {
+ autoDetectionRunRef.current = currentListActionId;
// ... existing detection logic ...
}
}and clear the ref when leaving pagination for this action:
- case 'pagination':
+ case 'pagination':
stopPaginationMode();
setShowPaginationOptions(false);
- setAutoDetectedPagination(null);
+ setAutoDetectedPagination(null);
+ autoDetectionRunRef.current = null;
setCaptureStage('initial');
break;Similarly, you may want to reset autoDetectionRunRef.current in discardGetList when discarding the list action.
Also applies to: 472-476, 700-716
🤖 Prompt for AI Agents
In src/components/recorder/RightSidePanel.tsx around lines 48-54 and the
referenced blocks at 472-476 and 700-716, autoDetectionRunRef.current is used to
prevent repeated detection but is never cleared when the user abandons the
pagination flow; update the code so that whenever you clear
autoDetectedPagination and reset captureStage to 'initial' (the "Back" path) you
also set autoDetectionRunRef.current = null, and likewise clear it in the
discardGetList path so a subsequent confirmation will allow auto-detection to
run again for the same currentListActionId.
What this PR does?
Adds the feature to auto detect pagination type and auto capture the pagination element.
Summary by CodeRabbit
New Features
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.