New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arguments of the chrome headless helper #22
Comments
I have only spent about an hour on this function. There is certainly a lot of room for improvement. We should definitely add such an argument to allow some waiting time before printing. I just didn't know the name of the argument, so thanks for the tip! Currently you can do this: chrome_print('https://pagedown.rbind.io', extra_args = '--virtual-time-budget=5000000') but it makes more sense to promote it to an argument of The only issue with puppeteer is the dependency on Node. I'm not sure if an average R user is willing to install it. That said, we could certainly provide a wrapper function for puppeteer for those who don't mind installing Node. |
@RLesur I wonder if you could clear my confusion about the |
I've had a quick look (mainly reading the sources). My guess is that when the network stack becomes empty the virtual time advances. In other words, I won't be surprised that it takes similar real times if you test 1,000, 10,000 or 100,000 virtual seconds... (I'm not totally sure) Details I think I could make some experiments with the DevTools Protocol in order to mimic the |
So when the page is still being loaded, the virtual time won't advance. After it is fully loaded, the virtual time will start to advance. My confusion is why it doesn't make much difference whether I want it to advance for 10 seconds or 1000 seconds. |
The virtual time budget Here's one test with Chrome in remote debugging mode (precision: I think that this script does not replicate the behavior of the print-to-pdf Chrome CLI): remotes::install_github('rlesur/crrri')
Sys.setenv(DEBUGME='crrri')
Sys.setenv(DEBUGME_OUTPUT_FILE='log1e5.txt')
library(crrri)
chrome <- chr_connect()
chrome %>%
Network.enable() %>%
Page.enable() %>%
Emulation.setVirtualTimePolicy(policy = 'pauseIfNetworkFetchesPending', budget = 100000L, waitForNavigation = TRUE) %>%
Page.navigate(url = 'https://www.chromestatus.com/') %>%
Emulation.virtualTimeBudgetExpired() %>%
chr_disconnect() log file: log1e5.txt If you inspect the log, you will see that it takes about 5 seconds for 100 virtual seconds. These results are obtained with Chrome in remote debugging mode: Chrome surely takes extra time to send the messages through its websocket server. I suspect that the virtual time flows faster with Chrome CLI. There's a document referenced in the Chromium issue opened for Emulation.setVirtualTimePolicy: it describes the concept of virtual time. As an intermediate conclusion, I think that we cannot easily establish a rule that transforms virtual time in real time. |
Okay. That is very helpful! Thanks! |
The
chrome_print
function is great! Thanks!When I use chrome headless on html pages with paged.js, I always need the
--virtual-time-budget
CLI argument. It seems to be logical because paged.js is launched when the page is loaded. If I understand well chrome headless, the pdf is built at the same time. So, the DOM is not processed by paged.js before the pdf generation.On average documents (paged.js+mathjax stuff, 30 pages), I often need to allow a budget of 5 to 10e+06 virtual milliseconds (5-6 effective seconds).
BTW, with puppeteer, it is easier to control that paged.js has finished its job.
What do you think of adding an argument to
chrome_print()
for the virtual time budget?The text was updated successfully, but these errors were encountered: