puppeteer-scraping

Scrape anything with very few lines of code.

puppeteer-scraping is a light framework to help you save time scraping any website with Puppeteer.

Motivation

Scraping websites is often about following the same steps
We should code only what's unique to the scraped website: the paths taken and the data to extract
Puppeteer could be a easier/quicker to code with

Installation

Using npm:

npm install puppeteer-scraping

Using yarn:

yarn add puppeteer-scraping

Example

const puppeteer = require('puppeteer')
const scraping = require('puppeteer-scraping')
const StealthPlugin = require('puppeteer-extra-plugin-stealth')

module.exports = async (req, res) => {  
  const method = {
    startPage: 'https://example.com',
    goToPages: {
      '//a[@class="example"]': {
        extractItems: {
          'products': {
            'productTitle': { path: '//h1', getFirst: true },
            'productUrl': { value: context => context.page.url() }
          },
        }
      }
    }
  }

  const { items } = await scraping({
    puppeteer,
    options: { headless: true },
    plugins: [StealthPlugin],
    proxies: ['http://1.2.3.4', 'http://5.6.7.8']
    method
  })
  
  res.json(items.products)
}

Documentation

(Coming soon)

Contributing

Any contribution is welcome! If you think an important feature is missing, please send a message to puppeteer-scraping@samy.mn.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
README.md		README.md
_method.js		_method.js
index.js		index.js
package.json		package.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

README.md

README.md

_method.js

_method.js

index.js

index.js

package.json

package.json

yarn.lock

yarn.lock

Repository files navigation

puppeteer-scraping

Motivation

Installation

Example

Documentation

Contributing

License

About

Releases

Packages

Languages

samy-mnasri/puppeteer-scraping

Folders and files

Latest commit

History

Repository files navigation

puppeteer-scraping

Motivation

Installation

Example

Documentation

Contributing

License

About

Resources

Stars

Watchers

Forks

Languages