-
|
I am developing an online store parser in PHP. Task: I need to get the html-code of the product catalog, without waiting for the full load of the site. For this purpose I use PHP Webdriver with the Chrome browser (ChromeDriver 81.0.4044.69). To prevent the store from banning me, I use a proxy, which slows down the loading of the site. The problem is that the html code of the catalog appears in the first seconds of loading, but the commands after $driver->get('....') will not be executed until the store is fully loaded with all scripts, styles, images, etc. In order to increase the performance of the parser, I would like, as soon as the html-code of the catalog appears, interrupt the download and proceed to further analysis of the resulting html. The best solution to this problem is to use the Page loading strategy in eager or none mode. I know that this is possible with version ChromeDriver 77.0. I found a solution for other programming languages, but I do not have enough experience to implement the same thing on PHP Webdriver. This resource also talks about:
host = 'http://localhost:4444';
$options = new ChromeOptions();
$options->addArguments([
'--window-size=1500,800',
'-proxy-server=socks4://proxyIP:proxyPort',
]);
$desiredCapabilities = DesiredCapabilities::chrome();
$desiredCapabilities->setCapability(ChromeOptions::CAPABILITY, $options);
$driver = RemoteWebDriver::create($host, $desiredCapabilities);
$content = $driver->get('https://www.some-internet-store.com/')->getPageSource();
// waits for the page to load completely before proceeding further
$catalog = ... html analysis: getting the catalog fragment that interests me from $content;
file_put_contents('Catalog.html', $catalog);Details
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
|
Hi, you need to set up the capability to configure the browser, it is done very similarly as in other languages. Have look at readme: https://github.com/php-webdriver/php-webdriver#3-customize-desired-capabilities So you will need to create browser like this: $capabilities = DesiredCapabilities::chrome();
$capabilities->setCapability('pageLoadStrategy', 'eager');
$driver = RemoteWebDriver::create($host, $capabilities);
... |
Beta Was this translation helpful? Give feedback.
Hi, you need to set up the capability to configure the browser, it is done very similarly as in other languages.
Have look at readme: https://github.com/php-webdriver/php-webdriver#3-customize-desired-capabilities
So you will need to create browser like this: