Skip to content
A puppeter-like Node.js library for interacting with Headless production scenarios.
Branch: master
Clone or download
Travis CI
Latest commit 863f275 Jun 10, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
docs Add codecopy Apr 5, 2018
packages chore(release): v5.5.1 Jun 10, 2019
static Setup monorepo Dec 31, 2018
.editorconfig
.gitattributes First commit Sep 5, 2017
.gitignore build: not commit auto generated rules file Apr 19, 2019
.npmignore First commit Sep 5, 2017
.npmrc First commit Sep 5, 2017
.travis.yml build: ensure checkout master Jan 25, 2019
CHANGELOG.md chore(release): v5.5.1 Jun 10, 2019
CNAME Add CNAME Feb 25, 2018
LICENSE First commit Sep 5, 2017
README.md build: add disable animation support Jun 10, 2019
gulpfile.js Setup monorepo Dec 31, 2018
index.html build: update meta Jan 27, 2019
lerna.json chore(release): v5.5.1 Jun 10, 2019
package.json build: update dependencies Jun 10, 2019

README.md

browserless

Last version Build Status Coverage Status Dependency status Dev Dependencies Status NPM Status

A puppeter-like Node.js library for interacting with Headless production scenarios.

Why

Although you can think puppeteer could be enough, there is a set of use cases that make sense built on top of puppeteer and they are necessary to support into robust production scenario, like:

  • Sensible good defaults, aborting unnecessary requests based of what you are doing (e.g, aborting image request if you just want to get .html content).
  • Easily create a pool of instance (via @browserless/pool).
  • Built-in adblocker for aborting ads requests.

Install

browserless is built on top of puppeteer, so you need to install it as well.

$ npm install puppeteer browserless --save

You can use browserless together with puppeteer, puppeteer-core or puppeteer-firefox.

Internally, the library is divided into different packages based on the functionality

Usage

The browserless API is like puppeteer, but doing more things under the hood (not too much, I promise).

For example, if you want to take an screenshot, just do:

const browserless = require('browserless')()

browserless
  .screenshot('http://example.com', { device: 'iPhone 6' })
  .then(buffer => {
    console.log(`your screenshot is here!`)
  })

You can see more common recipes at @browserless/examples.

API

All methods follow the same interface:

  • url (required): The target URL.
  • options (optional): Specific settings for the method.

The methods returns a Promise or a Node.js callback if pass an additional function as the last parameter.

.constructor(options)

It creates the browser instance, using puppeter.launch method.

// Creating a simple instance
const browserless = require('browserless')()

or passing specific launchers options:

// Creating an instance for running it at AWS Lambda
const browserless = require('browserless')({
  ignoreHTTPSErrors: true,
  args: [
    '--disable-gpu',
    '--single-process',
    '--no-zygote',
    '--no-sandbox',
    '--hide-scrollbars'
  ]
})

options

See puppeteer.launch#options.

By default the library will be pass a well known list of flags, so probably you don't need any additional setup.

timeout

type: number
default: 30000

This setting will change the default maximum navigation time.

puppeteer

type: Puppeteer
default: puppeteer|puppeteer-core|puppeteer-firefox

It's automatically detected based on your dependencies being supported puppeteer, puppeteer-core or puppeteer-firefox.

Alternatively, you can pass it.

incognito

type: boolean
default: false

Every time a new page is created, it will be an incognito page.

An incognito page will not share cookies/cache with other browser pages.

.html(url, options)

It returns the full HTML content from the target url.

const browserless = require('browserless')

;(async () => {
  const url = 'https://example.com'
  const html = await browserless.html(url)
  console.log(html)
})()

options

See page.goto.

Additionally, you can setup:

waitFor

type: string|function|number
default: 0

Wait a quantity of time, selector or function using page.waitFor.

waitUntil

type: string[]
default: ['networkidle0']

Specify a list of events until consider navigation succeeded, using page.waitForNavigation.

userAgent

It will setup a custom user agent, using page.setUserAgent method.

viewport

It will setup a custom viewport, using page.setViewport method.

abortTypes

type: string[]
default: ['image', 'media', 'stylesheet', 'font', 'xhr']

A list of resourceType requests that can be aborted in order to make the process faster.

cookies

type: object[]

A collection of cookie's object to set in the requests send.

headers

type: object

An object containing additional HTTP headers to be sent with every request.

adblock

type: boolean
default: true

It will be abort requests detected as ads.

.text(url, options)

It returns the full text content from the target url.

const browserless = require('browserless')

;(async () => {
  const url = 'https://example.com'
  const text = await browserless.text(url)
  console.log(text)
})()

options

The options you can provide are the same than .html method, just the output will be different.

.pdf(url, options)

It generates the PDF version of a website behind an url.

const browserless = require('browserless')

;(async () => {
  const url = 'https://example.com'
  const buffer = await browserless.pdf(url)
  console.log(`PDF generated!`)
})()

options

See page.pdf.

Additionally, you can setup:

media

type: string
default: 'screen'

Changes the CSS media type of the page using page.emulateMedia.

device

type: string
default: 'macbook pro 13'

It specifies the device descriptor to use in order to retrieve userAgent and viewport

userAgent

It will setup a custom user agent, using page.setUserAgent method.

viewport

It will setup a custom viewport, using page.setViewport method.

.screenshot(url, options)

It takes a screenshot from the target url.

const browserless = require('browserless')

;(async () => {
  const url = 'https://example.com'
  const buffer = await browserless.screenshot(url)
  console.log(`Screenshot taken!`)
})()

options

See page.screenshot.

Additionally, you can setup:

device

type: string
default: 'macbook pro 13'

It specifies the device descriptor to use in order to retrieve userAgent and viewport

userAgent

It will setup a custom user agent, using page.setUserAgent method.

viewport

It will setup a custom viewport, using page.setViewport method.

hideElements

type: array[]

Hide DOM elements matching the given CSS selectors.

Can be useful for cleaning up the page.

;(async () => {
  const buffer = await browserless.screenshot(url.toString(), {
    hideElements: ['.crisp-client', '#cookies-policy']
  })
})()

This sets visibility: hidden on the matched elements.

removeElements

type: string[]

Remove DOM elements matching the given CSS selectors.

This sets display: none on the matched elements, so it could potentially break the website layout.

;(async () => {
  const buffer = await browserless.screenshot(url.toString(), {
    removeElements: ['.crisp-client', '#cookies-policy']
  })
})()
clickElement

type: string

Click the DOM element matching the given CSS selector.

disableAnimations

Type: boolean
Default: false

Disable CSS animations and transitions.

modules

type: string[]

Inject JavaScript modules into the page.

Accepts an array of inline code, absolute URLs, and local file paths (must have a .js extension).

;(async () => {
  const buffer = await browserless.screenshot(url.toString(), {
    modules: ['https://cdn.jsdelivr.net/npm/@microlink/mql@0.3.12/src/browser.js', 'local-file.js', `document.body.style.backgroundColor = 'red`]
  })
})()
scripts

type: string[]

Same as the modules option, but instead injects the code as <script> instead of <script type="module">. Prefer the modules option whenever possible.

;(async () => {
  const buffer = await browserless.screenshot(url.toString(), {
    scripts: ['https://cdn.jsdelivr.net/npm/jquery@3.4.1/dist/jquery.min.js', 'local-file.js', `document.body.style.backgroundColor = 'red`]
  })
})()
styles

type: string[]

Inject CSS styles into the page.

Accepts an array of inline code, absolute URLs, and local file paths (must have a .css extension).

;(async () => {
  const buffer = await browserless.screenshot(url.toString(), {
    modules: ['https://cdn.jsdelivr.net/npm/hack@0.8.1/dist/dark.css', 'local-file.css', `body { background: red; }`, ``]
  })
})()
scrollToElement

type: string | object

Scroll to the DOM element matching the given CSS selector.

overlay

type: boolean | object

After the screenshot has been taken, this option allows you to place the screenshot with an overlay.

You can configure the overlay specifying:

  • path: The image path to use to put on top of the screenshot.
  • color: The hexadecimal background color to use (default is 'transparent').
;(async () => {
  const buffer = await browserless.screenshot(url.toString(), {
    hideElements: ['.crisp-client', '#cookies-policy'],
    overlay: {
      color: '#F76698'
    }
  })
})()

.devices

List of all available devices preconfigured with deviceName, viewport and userAgent settings.

These devices are used for emulation purposes.

.getDevice(deviceName)

Get a specific device descriptor settings by descriptor name.

const browserless = require('browserless')

browserless.getDevice('Macbook Pro 15')

// {
//   name: 'Macbook Pro 15',
//   userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X …',
//   viewport: {
//     width: 1440,
//     height: 900,
//     deviceScaleFactor: 1,
//     isMobile: false,
//     hasTouch: false,
//     isLandscape: false
//   }
// }

Advanced

The following methods are exposed to be used in scenarios where you need more granularity control and less magic.

.browser

It returns the internal browser instance used as singleton.

const browserless = require('browserless')

;(async () => {
  const browserInstance = await browserless.browser
})()

.evaluate(page, response)

It exposes an interface for creating your own evaluate function, passing you the page and response.

const browserless = require('browserless')()

const getUrlInfo = browserless.evaluate((page, response) => ({
  statusCode: response.status(),
  url: response.url(),
  redirectUrls: response.request().redirectChain()
}))

;(async () => {
  const url = 'https://example.com'
  const info = await getUrlInfo(url)

  console.log(info)
  // {
  //   "statusCode": 200,
  //   "url": "https://example.com/",
  //   "redirectUrls": []
  // }
})()

Note you don't need to close the page; It will be done under the hood.

Internally the method performs a .goto.

.goto(page, options)

It performs a smart page.goto, blocking ads trackers) requests and other requests based on resourceType.

const browserless = require('browserless')

;(async () => {
  const page = await browserless.page()
  await browserless.goto(page, {
    url: 'http://savevideo.me',
    abortTypes: ['image', 'media', 'stylesheet', 'font']
  })
})()

options

url

type: string

The target URL

abortTypes

type: string
default: []

A list of req.resourceType() to be aborted.

adblock

type: boolean
default: true

It will be abort requests detected as ads.

waitFor

type:string|function|number
default: 0

Wait a quantity of time, selector or function using page.waitFor.

waitUntil

type:array
default: ['networkidle2', 'load', 'domcontentloaded']

Specify a list of events until consider navigation succeeded, using page.waitForNavigation.

userAgent

It will setup a custom user agent, using page.setUserAgent method.

viewport

It will setup a custom viewport, using page.setViewport method.

args

type: object

The settings to be passed to page.goto.

.page()

It returns a standalone browser new page.

const browserless = require('browserless')

;(async () => {
  const page = await browserless.page()
})()

Pool of Instances

browserless uses internally a singleton browser instance.

You can use a pool instances using @browserless/pool package.

const createBrowserless = require('@browserless/pool')
const browserlessPool = createBrowserless({
  poolOpts: {
    max: 15,
    min: 2
  }
})

The API is the same than browserless. now the constructor is accepting an extra option called poolOpts.

This setting is used for initializing the pool properly. You can see what you can specify there at node-pool#opts.

Also, you can interact with a standalone browserless instance of your pool.

const createBrowserless = require('browserless')
const browserlessPool = createBrowserless.pool()

// get a browserless instance from the pool
browserlessPool(async browserless => {
  // get a page from the browser instance
  const page = await browserless.page()
  await browserless.goto(page, { url: url.toString() })
  const html = await page.content()
  console.log(html)
  process.exit()
})

You don't need to think about the acquire/release step: It's done automagically .

Packages

browserless is internally divided into multiple packages for ensuring just use the mininum quantity of code necessary for your user case.

Package Version Dependencies
browserless npm Dependency Status
@browserless/benchmark npm Dependency Status
@browserless/devices npm Dependency Status
@browserless/examples npm Dependency Status
@browserless/goto npm Dependency Status
@browserless/pool npm Dependency Status
@browserless/screenshot npm Dependency Status

Benchmark

For testing different approach, we included a tiny benchmark tool called @browserless/benchmark.

FAQ

Q: Why use browserless over Puppeteer?

browserless not replace puppeteer, it complements. It's just a syntactic sugar layer over official Headless Chrome oriented for production scenarios.

Q: Why do you block ads scripts by default?

Headless navigation is expensive compared with just fetch the content from a website.

In order to speed up the process, we block ads scripts by default because they are so bloat.

Q: My output is different from the expected

Probably browserless was too smart and it blocked a request that you need.

You can active debug mode using DEBUG=browserless environment variable in order to see what is happening behind the code:

DEBUG=browserless node index.js

Consider open an issue with the debug trace.

Q: Can I use browserless with my AWS Lambda like project?

Yes, check chrome-aws-lambda to setup AWS Lambda with a binary compatible.

License

browserless © Kiko Beats, Released under the MIT License.
Authored and maintained by Kiko Beats with help from contributors.

logo designed by xinh studio.

kikobeats.com · GitHub Kiko Beats · Twitter @kikobeats

You can’t perform that action at this time.