---
sidebar_label: Cheerio
---

# Cheerio

This notebook provides a quick overview for getting started with [CheerioWebBaseLoader](/docs/integrations/document_loaders/). For detailed documentation of all CheerioWebBaseLoader features and configurations head to the [API reference](https://api.js.langchain.com/classes/langchain_community_document_loaders_web_cheerio.CheerioWebBaseLoader.html).

## Overview
### Integration details

This example goes over how to load data from webpages using Cheerio. One document will be created for each webpage.

Cheerio is a fast and lightweight library that allows you to parse and traverse HTML documents using a jQuery-like syntax. You can use Cheerio to extract data from web pages, without having to render them in a browser.

However, Cheerio does not simulate a web browser, so it cannot execute JavaScript code on the page. This means that it cannot extract data from dynamic web pages that require JavaScript to render. To do that, you can use the [`PlaywrightWebBaseLoader`](/docs/integrations/document_loaders/web_loaders/web_playwright) or [`PuppeteerWebBaseLoader`](/docs/integrations/document_loaders/web_loaders/web_puppeteer) instead.

| Class | Package | Local | Serializable | PY support|
| :--- | :--- | :---: | :---: |  :---: |
| [CheerioWebBaseLoader](https://api.js.langchain.com/classes/langchain_community_document_loaders_web_cheerio.CheerioWebBaseLoader.html) | @langchain/community | ✅ | ✅ | ❌ | 
### Loader features
| Source | Web Support | Node Support
| :---: | :---: | :---: | 
| CheerioWebBaseLoader | ✅ | ✅ | 

## Setup

To access `CheerioWebBaseLoader` document loader you'll need to install the `@langchain/community` integration package, along with the `cheerio` peer dependency.

### Credentials

If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:

```bash
# export LANGSMITH_TRACING="true"
# export LANGSMITH_API_KEY="your-api-key"
```

### Installation

The LangChain CheerioWebBaseLoader integration lives in the `@langchain/community` package:

```{=mdx}
import IntegrationInstallTooltip from "@mdx_components/integration_install_tooltip.mdx";
import Npm2Yarn from "@theme/Npm2Yarn";

<IntegrationInstallTooltip></IntegrationInstallTooltip>

<Npm2Yarn>
  @langchain/community @langchain/core cheerio
</Npm2Yarn>

```

## Instantiation

Now we can instantiate our model object and load documents:

In [1]:
import { CheerioWebBaseLoader } from "@langchain/community/document_loaders/web/cheerio"

const loader = new CheerioWebBaseLoader("https://news.ycombinator.com/item?id=34817881", {
  // optional params: ...
})

## Load

In [2]:
const docs = await loader.load()
docs[0]

Document {
  pageContent: '\n' +
    '        \n' +
    '                  Hacker News\n' +
    '                            new | past | comments | ask | show | jobs | submit            \n' +
    '                              login\n' +
    '                          \n' +
    '              \n' +
    '\n' +
    '        \n' +
    '            What Lights the Universe’s Standard Candles? (quantamagazine.org)\n' +
    '          75 points by Amorymeltzer on Feb 17, 2023  | hide | past | favorite | 6 comments        \n' +
    '              \n' +
    '        \n' +
    '                  \n' +
    '          \n' +
    '          delta_p_delta_x on Feb 17, 2023           \n' +
    '             | next [–]          \n' +
    '                  \n' +
    "                  Astrophysical and cosmological simulations are often insightful. They're also very cross-disciplinary; besides the obvious astrophysics, there's networking and sysadmin, parallel computing and algorithm theory (so that 

In [3]:
console.log(docs[0].metadata)

{ source: 'https://news.ycombinator.com/item?id=34817881' }


## Additional configurations

`CheerioWebBaseLoader` supports additional configuration when instantiating the loader. Here is an example of how to use it with the `selector` field passed, making it only load content from the provided HTML class names:

In [8]:
import { CheerioWebBaseLoader } from "@langchain/community/document_loaders/web/cheerio"

const loaderWithSelector = new CheerioWebBaseLoader("https://news.ycombinator.com/item?id=34817881", {
  selector: "p",
});

const docsWithSelector = await loaderWithSelector.load();
docsWithSelector[0].pageContent;

Some of my favourite simulation projects:- IllustrisTNG: https://www.tng-project.org/- SWIFT: https://swift.dur.ac.uk/- CO5BOLD: https://www.astro.uu.se/~bf/co5bold_main.html (which produced these animations of a red-giant star: https://www.astro.uu.se/~bf/movie/AGBmovie.html)- AbacusSummit: https://abacussummit.readthedocs.io/en/latest/And I can add the simulations in the article, too.
                  
      
                  
      
                  
      
                  
      
                  
      
                  
      


## API reference

For detailed documentation of all CheerioWebBaseLoader features and configurations head to the API reference: https://api.js.langchain.com/classes/langchain_community_document_loaders_web_cheerio.CheerioWebBaseLoader.html