A puppeter-like Node.js library for interacting with Headless production scenarios.
Although you can think puppeteer could be enough, there is a set of use cases that make sense built on top of puppeteer and they are necessary to support into robust production scenario, like:
- Sensible good defaults, aborting unnecessary requests based of what you are doing (e.g, aborting image request if you just want to get
.html
content). - Privacy by default, blocking tracker requests.
- Easily create a pool of instance (via
@browserless/pool
). - Built-in AdBlocker (soon).
browserless is built on top of puppeteer, so you need to install it as well.
$ npm install puppeteer browserless --save
You can use browserless together with puppeteer
, puppeteer-core
or puppeteer-firefox
.
Internally, the library is divided into different packages based on the functionality
The browserless API is like puppeteer, but doing more things under the hood (not too much, I promise).
For example, if you want to take an screenshot
, just do:
const browserless = require('browserless')()
browserless
.screenshot('http://example.com', { device: 'iPhone 6' })
.then(buffer => {
console.log(`your screenshot is here!`)
})
You can see more common recipes at @browserless/examples
.
All methods follow the same interface:
url
(required): The target URLoptions
: Specific settings for the method (optional).
The methods returns a Promise or a Node.js callback if pass an additional function as the last parameter.
It creates the browser
instance, using puppeter.launch method.
// Creating a simple instance
const browserless = require('browserless')()
or passing specific launchers options:
// Creating an instance for running it at AWS Lambda
const browserless = require('browserless')({
ignoreHTTPSErrors: true,
args: [
'--disable-gpu',
'--single-process',
'--no-zygote',
'--no-sandbox',
'--hide-scrollbars'
]
})
By default the library will be pass a well known list of flags, so probably you don't need any additional setup.
type:number
default: 30000
This setting will change the default maximum navigation time.
type:Puppeteer
default: puppeteer
|puppeteer-core
|puppeteer-firefox
It's automatically detected based on your dependencies
being supported puppeteer, puppeteer-core or puppeteer-firefox.
Alternatively, you can pass it.
type:boolean
default: false
Every time a new page is created, it will be an incognito page.
An incognito page will not share cookies/cache with other browser pages.
It returns the full HTML content from the target url
.
const browserless = require('browserless')
;(async () => {
const url = 'https://example.com'
const html = await browserless.html(url)
console.log(html)
})()
See page.goto.
Additionally, you can setup:
type:string
|function
|number
default: 0
Wait a quantity of time, selector or function using page.waitFor.
type:array
default: ['networkidle0']
Specify a list of events until consider navigation succeeded, using page.waitForNavigation.
It will setup a custom user agent, using page.setUserAgent method.
It will setup a custom viewport, using page.setViewport method.
type: array
default: ['image', 'media', 'stylesheet', 'font', 'xhr']
A list of resourceType
requests that can be aborted in order to make the process faster.
type: boolean
default: true
It will be abort request coming for tracking domains.
It returns the full text content from the target url
.
const browserless = require('browserless')
;(async () => {
const url = 'https://example.com'
const text = await browserless.text(url)
console.log(text)
})()
They are the same than .html
method.
It generates the PDF version of a website behind an url
.
const browserless = require('browserless')
;(async () => {
const url = 'https://example.com'
const buffer = await browserless.pdf(url)
console.log(`PDF generated!`)
})()
See page.pdf.
Additionally, you can setup:
Changes the CSS media type of the page using page.emulateMedia.
It generate the PDF using the device descriptor name settings, like userAgent
and viewport
.
It will setup a custom user agent, using page.setUserAgent method.
It will setup a custom viewport, using page.setViewport method.
It takes a screenshot from the target url
.
const browserless = require('browserless')
;(async () => {
const url = 'https://example.com'
const buffer = await browserless.screenshot(url)
console.log(`Screenshot taken!`)
})()
See page.screenshot.
Additionally, you can setup:
The options
provided are passed to page.pdf.
Additionally, you can setup:
It generate the PDF using the device descriptor name settings, like userAgent
and viewport
.
It will setup a custom user agent, using page.setUserAgent method.
It will setup a custom viewport, using page.setViewport method.
List of all available devices preconfigured with deviceName
, viewport
and userAgent
settings.
These devices are used for emulation purposes.
Get a specific device descriptor settings by descriptor name.
const browserless = require('browserless')
browserless.getDevice('Macbook Pro 15')
// {
// name: 'Macbook Pro 15',
// userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X …',
// viewport: {
// width: 1440,
// height: 900,
// deviceScaleFactor: 1,
// isMobile: false,
// hasTouch: false,
// isLandscape: false
// }
// }
The following methods are exposed to be used in scenarios where you need more granularity control and less magic.
It returns the internal browser instance used as singleton.
const browserless = require('browserless')
;(async () => {
const browserInstance = await browserless.browser
})()
It exposes an interface for creating your own evaluate function, passing you the page
and response
.
const browserless = require('browserless')()
const getUrlInfo = browserless.evaluate((page, response) => ({
statusCode: response.status(),
url: response.url(),
redirectUrls: response.request().redirectChain()
}))
;(async () => {
const url = 'https://example.com'
const info = await getUrlInfo(url)
console.log(info)
// {
// "statusCode": 200,
// "url": "https://example.com/",
// "redirectUrls": []
// }
})()
Note you don't need to close the page; It will be done under the hood.
Internally the method performs a .goto.
It performs a smart page.goto, blocking ads trackers) requests and other requests based on resourceType
.
const browserless = require('browserless')
;(async () => {
const page = await browserless.page()
await browserless.goto(page, {
url: 'http://savevideo.me',
abortTypes: ['image', 'media', 'stylesheet', 'font']
})
})()
type: string
The target URL
type: string
default: []
A list of req.resourceType()
to be blocked.
type: boolean
default: true
It will be abort request coming for tracking domains.
type: boolean
default: true
It will be abort request coming for tracking domains.
type:string|function|number
default: 0
Wait a quantity of time, selector or function using page.waitFor.
type:array
default: ['networkidle2', 'load', 'domcontentloaded']
Specify a list of events until consider navigation succeeded, using page.waitForNavigation.
It will setup a custom user agent, using page.setUserAgent method.
It will setup a custom viewport, using page.setViewport method.
type: object
The settings to be passed to page.goto.
It returns a standalone browser new page.
const browserless = require('browserless')
;(async () => {
const page = await browserless.page()
})()
browserless uses internally a singleton browser instance.
You can use a pool instances using @browserless/pool
package.
const createBrowserless = require('@browserless/pool')
const browserlessPool = createBrowserless({
poolOpts: {
max: 15,
min: 2
}
})
The API is the same than browserless
. now the constructor is accepting an extra option called poolOpts
.
This setting is used for initializing the pool properly. You can see what you can specify there at node-pool#opts.
Also, you can interact with a standalone browserless
instance of your pool.
const createBrowserless = require('browserless')
const browserlessPool = createBrowserless.pool()
// get a browserless instance from the pool
browserlessPool(async browserless => {
// get a page from the browser instance
const page = await browserless.page()
await browserless.goto(page, { url: url.toString() })
const html = await page.content()
console.log(html)
process.exit()
})
You don't need to think about the acquire/release step: It's done automagically ✨.
browserless is internally divided into multiple packages for ensuring just use the mininum quantity of code necessary for your user case.
Package | Version | Dependencies |
---|---|---|
browserless |
||
@browserless/pool |
||
@browserless/devices |
||
@browserless/goto |
||
@browserless/benchmark |
||
@browserless/examples |
For testing different approach, we included a tiny benchmark tool called @browserless/benchmark
.
Q: Why use browserless over Puppeteer?
browserless not replace puppeteer, it complements. It's just a syntactic sugar layer over official Headless Chrome oriented for production scenarios.
Q: Why do you block ads scripts by default?
Headless navigation is expensive compared with just fetch the content from a website.
In order to speed up the process, we block ads scripts by default because they are so bloat.
Q: My output is different from the expected
Probably browserless was too smart and it blocked a request that you need.
You can active debug mode using DEBUG=browserless
environment variable in order to see what is happening behind the code:
DEBUG=browserless node index.js
Consider open an issue with the debug trace.
Q: Can I use browserless with my AWS Lambda like project?
Yes, check chrome-aws-lambda to setup AWS Lambda with a binary compatible.
browserless © Kiko Beats, Released under the MIT License.
Authored and maintained by Kiko Beats with help from contributors.
logo designed by xinh studio.
kikobeats.com · GitHub Kiko Beats · Twitter @kikobeats