Stress test: scrape search results user story #89

pavelfeldman · 2017-07-18T06:14:28Z

I attempted to create a following stress test:

Navigate to google.com
Enter "Blin"
Capture screenshot with results
Go over the results
Capture screenshot for every result page

This is pretty much impossible using existing API :) Dumping observations here...

page.waitFor resolves even if element is not on the screen (display:none, etc). Many elements are in DOM too early, no way to click-via-screen them.
page.waitFor does not time timeout
keyboard API surface is large and cryptic. Did you figure out how to press 'Enter' using just page.*.
keyboard API does not allow for natural click delays, does not populate timestamps. google.com does not like that
Expose custom raf or timer-based page.waitFor(predicate)
Expose page.sleep(time)
Need to be able to page.waitForNavigation in case navigation is initiated via click [i'll fix]
Reload brings me back to the originally navigated page [i'll fix]
page.waitFor(selector) in my code is always followed by either page.click or page.focus with that element. Handles would make it look like (await page.waitFor(selector)).click().
navigation API is not exposed, I need to be able to click browser's back/forward.
Taking screenshot halts on mac headless good half of the time [i'll look at it]

^^ @aslushnikov @JoelEinbinder @dgozman @paulirish

The text was updated successfully, but these errors were encountered:

JoelEinbinder · 2017-07-18T06:36:45Z

Why do you want to press enter? Usually there is an equally clickable element.

JoelEinbinder · 2017-07-18T07:54:33Z

let Browser = require('../lib/Browser.js');
let browser = new Browser({ args: ['--no-sandbox'] });
browser.newPage().then(async page => {
  await page.navigate('https://www.google.com');
  let query = 'input[title="Search"]';
  page.focus(query);
  page.type('blin');
  page.keyboard.press('Enter');
  await page.waitFor('h3 a');
  let urls = await page.$$('h3 a', a => a.href);
  let i = 0;
  for (let url of urls) {
    i++;
    await page.navigate(url);
    await page.screenshot({
      path: i + '.png'
    });
  }
  browser.close();
});

page.waitFor didn't work for me. Added text nodes crash it, and a new node could make the selector resolve anywhere in the document.

pavelfeldman · 2017-07-18T15:19:45Z

Running your script on a Mac, it does not work. I get stuck with blink| in the search field.
Also, this script does not do what I want - I want to be clicking search results, not navigating manually. That covers the back/forward use cases
And to add stress to it, I want to be clicking the search button. It is in DOM all the time, but invisible half of the time which adds fun!

aslushnikov · 2017-07-18T16:58:30Z

The page.waitFor was broken and is now fixed in 28265fc

A few observations:

For some reason, google.com handles "Enter" in input field differently every other time. Sometimes it issues a navigation to the https://www.google.com/search?q=blin endpoint, and sometimes it behaves as SPA and navigates to https://www.google.com/#q=Blin.
Opening multiple tabs and navigating them altogether doesn't work - half of the tabs timeout navigation
Trying to take screenshot simultaneously from multiple tabs doesn't work good - screenshotting hangs

The following works for me, but I don't really click the links and don't really click the "search" button. But we'll fix it!

const {Browser} = new require('.');
const browser = new Browser({headless: false});

const VIEWPORT = {width: 1000, height: 600};

browser.newPage().then(async page => {
  await page.setViewport(VIEWPORT);
  await page.navigate('https://google.com');
  await page.focus('#lst-ib');
  await page.type('Blin');
  await page.keyboard.press('Enter');

  try {
    await page.waitFor('div.g');
  } catch(e) {
    // Page might issue a navigation, so the page.waitFor will throw.
    await page.waitFor('div.g');
  }

  await page.screenshot({path: 'ee.png'});
  let links = await page.$$('div.g h3 > a', link => link.href);
  // have to do this sequentially =/
  for (let i = 0; i < links.length; ++i)
    await screenshotURL(links[i], `${i}.png`);
  browser.close();
});

async function screenshotURL(url, name) {
  let page = await browser.newPage();
  await page.setViewport(VIEWPORT);
  try {
    await page.navigate(url, { maxTime: 5000 });
  } catch (e) {
    // we did our best.
  }
  await page.screenshot({path: name});
  page.close();
  console.log('Done: ' + name);
}

aslushnikov · 2017-07-18T22:03:13Z

This patch introduces Page.waitForNavigation which allows to wait for render-initiated navigation. This patch also does a nice refactoring, replacing Navigator with NavigatorWatcher which is not a part of a page state. References #89

This patch introduces page.goBack/page.goForward methods to navigate the page history. References #89.

This patch improves on DEBUG module to trace all puppeteer's public API calls. References #89.

Currently, it's impossible to do screenshots in parallel. This patch: - makes all screenshot tasks sequential inside one browser - starts activating target before taking screenshot - adds a test to make sure it's possible to take screenshots across tabs - starts waiting for a proper page closing after each test. This might finally solve the ECONNRESET issues in tests. References #89

References #89

This patch: - introduces page.press() method - adds more input tests References #89

This patch implements page.waitFor method which survives navigation. References #89.

This patch adds a 'visible' option to the Page.waitFor method, making it possible to wait for the element to become actually visible. References #89, #91.

This patch implements timeout option for page.waitFor. The function will throw if the selector doesn't appear during timeout milliseconds of waittime. References #89, #91.

This patch: - adds Mouse class which holds mouse state and implements mouse primitives, such as moving, button down and button up. - implements high-level mouse api, such as `page.click` and `page.hover`. References #40, References #89

aslushnikov · 2017-07-24T23:20:34Z

The resulting script looks like this:

const {Browser} = require('puppeteer');
const browser = new Browser({headless: false});

browser.newPage().then(async page => {
  page.on('load', () => console.log('LOADED: ' + page.url()));
  await page.navigate('https://google.com');
  await page.waitFor('input[name=q]');
  await page.focus('input[name=q]');
  await page.type('blin');
  await page.press('Enter');
  for (let i = 0; i < 10; ++i) {
    let searchResult = `div.g:nth-child(${i + 1}) h3 a`;
    await page.waitFor(searchResult, {visible: true});
    page.click(searchResult);
    await page.waitForNavigation();
    await page.screenshot({path: `screenshot-${i + 1}.png`});
    await page.goBack();
  }
  browser.close();
});

The two checkboxes from feedback which are not addressed are filed separately:

"screenshots on mac seem to be unstable" - which is UnitTests on Mac are timeouting #100
"don't do any network throttling other than offline mode. We're bad at emulating it" - which is Emulate offline #63

ralyodio · 2018-06-02T07:13:39Z

You might be in different buckets too. I'm sure google is doing A/B testing. There's a way to force a bucket, but I'm not sure how.

aslushnikov pushed a commit that referenced this issue Jul 19, 2017

Introduce page.goBack/page.goForward (#93)

f154d53

This patch introduces page.goBack/page.goForward methods to navigate the page history. References #89.

aslushnikov added a commit that referenced this issue Jul 19, 2017

Introduce DEBUG module which traces public API calls

55acae4

This patch improves on DEBUG module to trace all puppeteer's public API calls. References #89.

aslushnikov pushed a commit that referenced this issue Jul 19, 2017

Rename keyboard.hold and release to up and down (#95)

71f8c76

References #89

aslushnikov pushed a commit that referenced this issue Jul 19, 2017

Inroduce page.press (#96)

febd747

This patch: - introduces page.press() method - adds more input tests References #89

aslushnikov added a commit that referenced this issue Jul 20, 2017

Implement waitFor which survives navigation

0162830

This patch implements page.waitFor method which survives navigation. References #89.

aslushnikov mentioned this issue Jul 20, 2017

Implement waitFor which survives navigation #99

Merged

aslushnikov added a commit that referenced this issue Jul 20, 2017

Implement waitFor which survives navigation (#99)

a63a019

This patch implements page.waitFor method which survives navigation. References #89.

aslushnikov added a commit that referenced this issue Jul 21, 2017

Implement visible option for Page.waitFor method

52de757

This patch adds a 'visible' option to the Page.waitFor method, making it possible to wait for the element to become actually visible. References #89, #91.

aslushnikov added a commit that referenced this issue Jul 21, 2017

Implement timeout option for page.waitFor

1f954fa

This patch implements timeout option for page.waitFor. The function will throw if the selector doesn't appear during timeout milliseconds of waittime. References #89, #91.

aslushnikov closed this as completed Jul 24, 2017

This was referenced Jan 29, 2022

[Snyk] Upgrade extract-zip from 1.7.0 to 2.0.1 tjenkinson/puppeteer#5

Open

[Snyk] Upgrade extract-zip from 1.7.0 to 2.0.1 tjenkinson/puppeteer#9

Open

matt-glich mentioned this issue Jan 9, 2023

[Snyk] Security upgrade extract-zip from 1.7.0 to 2.0.0 matt-glich/puppeteer#5

Open

tjenkinson mentioned this issue Jan 9, 2023

[Snyk] Security upgrade extract-zip from 1.7.0 to 2.0.0 tjenkinson/puppeteer#11

Open

qsays mentioned this issue Jan 9, 2023

[Snyk] Security upgrade extract-zip from 1.7.0 to 2.0.0 qsays/puppeteer#3

Open

abdullahceylan mentioned this issue Jan 10, 2023

[Snyk] Security upgrade extract-zip from 1.7.0 to 2.0.0 abdullahceylan/puppeteer#4

Open

qsays mentioned this issue Jan 10, 2023

[Snyk] Security upgrade extract-zip from 1.7.0 to 2.0.0 qsays/puppeteer#4

Open

kkonopka123 mentioned this issue Jan 10, 2023

[Snyk] Security upgrade extract-zip from 1.7.0 to 2.0.0 kkonopka123/puppeteer#4

Open

snyk-bot mentioned this issue Jan 10, 2023

[Snyk] Security upgrade extract-zip from 1.7.0 to 2.0.0 ajesse11x/puppeteer#7

Open

erdun mentioned this issue Jan 10, 2023

[Snyk] Fix for 1 vulnerabilities erdun/puppeteer#6

Open

tjenkinson mentioned this issue Jan 10, 2023

[Snyk] Security upgrade extract-zip from 1.7.0 to 2.0.0 tjenkinson/puppeteer#12

Open

snyk-bot mentioned this issue Jan 10, 2023

[Snyk] Security upgrade extract-zip from 1.7.0 to 2.0.0 Jeremip11/puppeteer#4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stress test: scrape search results user story #89

Stress test: scrape search results user story #89

pavelfeldman commented Jul 18, 2017

JoelEinbinder commented Jul 18, 2017 •

edited

JoelEinbinder commented Jul 18, 2017 •

edited by pavelfeldman

pavelfeldman commented Jul 18, 2017

aslushnikov commented Jul 18, 2017

aslushnikov commented Jul 18, 2017 •

edited

aslushnikov commented Jul 24, 2017

ralyodio commented Jun 2, 2018

Stress test: scrape search results user story #89

Stress test: scrape search results user story #89

Comments

pavelfeldman commented Jul 18, 2017

JoelEinbinder commented Jul 18, 2017 • edited

JoelEinbinder commented Jul 18, 2017 • edited by pavelfeldman

pavelfeldman commented Jul 18, 2017

aslushnikov commented Jul 18, 2017

aslushnikov commented Jul 18, 2017 • edited

aslushnikov commented Jul 24, 2017

ralyodio commented Jun 2, 2018

JoelEinbinder commented Jul 18, 2017 •

edited

JoelEinbinder commented Jul 18, 2017 •

edited by pavelfeldman

aslushnikov commented Jul 18, 2017 •

edited