---
sidebar_label: Puppeteer
sidebar_class_name: node-only
---

# PuppeteerWebBaseLoader

```{=mdx}
:::tip Compatibility

Only available on Node.js.

:::
```

This notebook provides a quick overview for getting started with [PuppeteerWebBaseLoader](/docs/integrations/document_loaders/). For detailed documentation of all PuppeteerWebBaseLoader features and configurations head to the [API reference](https://api.js.langchain.com/classes/langchain_community_document_loaders_web_puppeteer.PuppeteerWebBaseLoader.html).

Puppeteer is a Node.js library that provides a high-level API for controlling headless Chrome or Chromium. You can use Puppeteer to automate web page interactions, including extracting data from dynamic web pages that require JavaScript to render.

If you want a lighterweight solution, and the webpages you want to load do not require JavaScript to render, you can use the [CheerioWebBaseLoader](/docs/integrations/document_loaders/web_loaders/web_cheerio) instead.

## Overview
### Integration details

| Class | Package | Local | Serializable | PY support |
| :--- | :--- | :---: | :---: |  :---: |
| [PuppeteerWebBaseLoader](https://api.js.langchain.com/classes/langchain_community_document_loaders_web_puppeteer.PuppeteerWebBaseLoader.html) | [@langchain/community](https://api.js.langchain.com/modules/langchain_community_document_loaders_web_puppeteer.html) | ✅ | beta | ❌ | 
### Loader features
| Source | Web Loader | Node Envs Only
| :---: | :---: | :---: | 
| PuppeteerWebBaseLoader | ✅ | ✅ | 

## Setup

To access `PuppeteerWebBaseLoader` document loader you'll need to install the `@langchain/community` integration package, along with the `puppeteer` peer dependency.

### Credentials

If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:

```bash
# export LANGSMITH_TRACING="true"
# export LANGSMITH_API_KEY="your-api-key"
```

### Installation

The LangChain PuppeteerWebBaseLoader integration lives in the `@langchain/community` package:

```{=mdx}
import IntegrationInstallTooltip from "@mdx_components/integration_install_tooltip.mdx";
import Npm2Yarn from "@theme/Npm2Yarn";

<IntegrationInstallTooltip></IntegrationInstallTooltip>

<Npm2Yarn>
  @langchain/community @langchain/core puppeteer
</Npm2Yarn>

```

## Instantiation

Now we can instantiate our model object and load documents:

In [1]:
import { PuppeteerWebBaseLoader } from "@langchain/community/document_loaders/web/puppeteer"

const loader = new PuppeteerWebBaseLoader("https://langchain.com", {
  // required params = ...
  // optional params = ...
})

## Load

In [2]:
const docs = await loader.load()
docs[0]

Document {
  pageContent: '<div class="page-wrapper"><div class="global-styles w-embed"><style>\n' +
    '\n' +
    '* {\n' +
    '  -webkit-font-smoothing: antialiased;\n' +
    '}\n' +
    '\n' +
    '.page-wrapper {\n' +
    'overflow: clip;\n' +
    '  }\n' +
    '\n' +
    '\n' +
    '\n' +
    '/* Set fluid size change for smaller breakpoints */\n' +
    '  html { font-size: 1rem; }\n' +
    '  @media screen and (max-width:1920px) and (min-width:1281px) { html { font-size: calc(0.2499999999999999rem + 0.6250000000000001vw); } }\n' +
    '  @media screen and (max-width:1280px) and (min-width:992px) { html { font-size: calc(0.41223612197028925rem + 0.4222048475371384vw); } }\n' +
    '/* video sizing */\n' +
    '\n' +
    'video {\n' +
    '    object-fit: fill;\n' +
    '\t\twidth: 100%;\n' +
    '}\n' +
    '\n' +
    '\n' +
    '\n' +
    '#retrieval-video {\n' +
    '    object-fit: cover;\n' +
    '    width: 100%;\n' +
    '}\n' +
    '\n' +
    '\n' +
    '\n' +
    '/* Set

In [3]:
console.log(docs[0].metadata)

{ source: 'https://langchain.com' }


## Options

Here's an explanation of the parameters you can pass to the PuppeteerWebBaseLoader constructor using the PuppeteerWebBaseLoaderOptions interface:

```typescript
type PuppeteerWebBaseLoaderOptions = {
  launchOptions?: PuppeteerLaunchOptions;
  gotoOptions?: PuppeteerGotoOptions;
  evaluate?: (page: Page, browser: Browser) => Promise<string>;
};
```

1. `launchOptions`: an optional object that specifies additional options to pass to the puppeteer.launch() method. This can include options such as the headless flag to launch the browser in headless mode, or the slowMo option to slow down Puppeteer's actions to make them easier to follow.

2. `gotoOptions`: an optional object that specifies additional options to pass to the page.goto() method. This can include options such as the timeout option to specify the maximum navigation time in milliseconds, or the waitUntil option to specify when to consider the navigation as successful.

3. `evaluate`: an optional function that can be used to evaluate JavaScript code on the page using the page.evaluate() method. This can be useful for extracting data from the page or interacting with page elements. The function should return a Promise that resolves to a string containing the result of the evaluation.

By passing these options to the `PuppeteerWebBaseLoader` constructor, you can customize the behavior of the loader and use Puppeteer's powerful features to scrape and interact with web pages.


## Screenshots

To take a screenshot of a site, initialize the loader the same as above, and call the `.screenshot()` method.
This will return an instance of `Document` where the page content is a base64 encoded image, and the metadata contains a `source` field with the URL of the page.

In [7]:
import { PuppeteerWebBaseLoader } from "@langchain/community/document_loaders/web/puppeteer";

const loaderForScreenshot = new PuppeteerWebBaseLoader("https://langchain.com", {
  launchOptions: {
    headless: true,
  },
  gotoOptions: {
    waitUntil: "domcontentloaded",
  },
});
const screenshot = await loaderForScreenshot.screenshot();

console.log(screenshot.pageContent.slice(0, 100));
console.log(screenshot.metadata);

iVBORw0KGgoAAAANSUhEUgAACWAAAAdoCAIAAAA/Q2IJAAAAAXNSR0IArs4c6QAAIABJREFUeJzsvUuzHUeSJuaPiMjMk3nOuU88
{ source: 'https://langchain.com' }


## API reference

For detailed documentation of all PuppeteerWebBaseLoader features and configurations head to the API reference: https://api.js.langchain.com/classes/langchain_community_document_loaders_web_puppeteer.PuppeteerWebBaseLoader.html