New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stress test: scrape search results user story #89
Comments
Why do you want to press enter? Usually there is an equally clickable element. |
let Browser = require('../lib/Browser.js');
let browser = new Browser({ args: ['--no-sandbox'] });
browser.newPage().then(async page => {
await page.navigate('https://www.google.com');
let query = 'input[title="Search"]';
page.focus(query);
page.type('blin');
page.keyboard.press('Enter');
await page.waitFor('h3 a');
let urls = await page.$$('h3 a', a => a.href);
let i = 0;
for (let url of urls) {
i++;
await page.navigate(url);
await page.screenshot({
path: i + '.png'
});
}
browser.close();
});
|
|
The A few observations:
The following works for me, but I don't really click the links and don't really click the "search" button. But we'll fix it! const {Browser} = new require('.');
const browser = new Browser({headless: false});
const VIEWPORT = {width: 1000, height: 600};
browser.newPage().then(async page => {
await page.setViewport(VIEWPORT);
await page.navigate('https://google.com');
await page.focus('#lst-ib');
await page.type('Blin');
await page.keyboard.press('Enter');
try {
await page.waitFor('div.g');
} catch(e) {
// Page might issue a navigation, so the page.waitFor will throw.
await page.waitFor('div.g');
}
await page.screenshot({path: 'ee.png'});
let links = await page.$$('div.g h3 > a', link => link.href);
// have to do this sequentially =/
for (let i = 0; i < links.length; ++i)
await screenshotURL(links[i], `${i}.png`);
browser.close();
});
async function screenshotURL(url, name) {
let page = await browser.newPage();
await page.setViewport(VIEWPORT);
try {
await page.navigate(url, { maxTime: 5000 });
} catch (e) {
// we did our best.
}
await page.screenshot({path: name});
page.close();
console.log('Done: ' + name);
} |
After the offline discussion of the stress-test, here are the bullets:
|
This patch introduces Page.waitForNavigation which allows to wait for render-initiated navigation. This patch also does a nice refactoring, replacing Navigator with NavigatorWatcher which is not a part of a page state. References #89
This patch introduces page.goBack/page.goForward methods to navigate the page history. References #89.
This patch improves on DEBUG module to trace all puppeteer's public API calls. References #89.
Currently, it's impossible to do screenshots in parallel. This patch: - makes all screenshot tasks sequential inside one browser - starts activating target before taking screenshot - adds a test to make sure it's possible to take screenshots across tabs - starts waiting for a proper page closing after each test. This might finally solve the ECONNRESET issues in tests. References #89
This patch: - introduces page.press() method - adds more input tests References #89
This patch implements page.waitFor method which survives navigation. References #89.
This patch implements page.waitFor method which survives navigation. References #89.
The resulting script looks like this: const {Browser} = require('puppeteer');
const browser = new Browser({headless: false});
browser.newPage().then(async page => {
page.on('load', () => console.log('LOADED: ' + page.url()));
await page.navigate('https://google.com');
await page.waitFor('input[name=q]');
await page.focus('input[name=q]');
await page.type('blin');
await page.press('Enter');
for (let i = 0; i < 10; ++i) {
let searchResult = `div.g:nth-child(${i + 1}) h3 a`;
await page.waitFor(searchResult, {visible: true});
page.click(searchResult);
await page.waitForNavigation();
await page.screenshot({path: `screenshot-${i + 1}.png`});
await page.goBack();
}
browser.close();
}); The two checkboxes from feedback which are not addressed are filed separately:
|
You might be in different buckets too. I'm sure google is doing A/B testing. There's a way to force a bucket, but I'm not sure how. |
I attempted to create a following stress test:
This is pretty much impossible using existing API :) Dumping observations here...
page.waitFor
resolves even if element is not on the screen (display:none, etc). Many elements are in DOM too early, no way to click-via-screen them.page.waitFor
does not time timeoutpage.*
.timestamp
s. google.com does not like thatpage.waitFor(predicate)
page.sleep(time)
page.waitForNavigation
in case navigation is initiated via click [i'll fix]page.waitFor(selector)
in my code is always followed by eitherpage.click
orpage.focus
with that element. Handles would make it look like(await page.waitFor(selector)).click()
.^^ @aslushnikov @JoelEinbinder @dgozman @paulirish
The text was updated successfully, but these errors were encountered: