-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
katana version:
Last version 1.1.2
Current Behavior:
Currently, the page processing flow relies on two steps: waitLoad to wait for the page to be fully loaded, followed by waitIdle, which depends on the window.requestIdleCallback call in the browser. While this method assumes the main thread is inactive (indicating no urgent tasks are running), it has a significant limitation:
Since waitIdle does not account for ongoing network requests, some XHR, fetch, or iframe requests may still be in progress when the callback is triggered. As a result, these pending requests may not complete before the page processing starts and the page is closed, leading to data loss or incomplete content retrieval.
Expected Behavior:
The WaitRequestIdle function offers a more robust solution by explicitly monitoring network activity. It ensures there has been a defined period of network inactivity before proceeding, reducing the risk of pending requests being overlooked. Additionally, it provides filtering options (includes, excludes, and excludeTypes) to precisely target the types of requests that should be tracked.
Recommendation: Adopting WaitStable for Comprehensive Stability Control
To achieve full coverage and ensure the page is stable before processing, the WaitStable function combines:
WaitLoadfor initial page loadingWaitRequestIdleto confirm network inactivityWaitDomStableto verify the DOM has remained stable for a specified period
By implementing `WaitStable, the risk of missing incomplete network requests is mitigated, ensuring all relevant data is captured before proceeding with page processing.
Steps To Reproduce:
You would just need to add an argument in etreer to set the time to wait until the page is stable before continuing processing.
Anything else:
I have already created an PR to fix this issue : #1217