In [1]:
import os
from dotenv import load_dotenv
load_dotenv()

True

# Simple and fast text extraction


If you are looking for a simple string representation of text that is embedded in a web page, the method below is appropriate. 

It will return a list of ```Document``` objects -- one per page -- containing a single string of the page's text. Under the hood it uses the ```beautifulsoup4``` Python library.

LangChain document loaders implement ```lazy_load``` and its async variant, ```alazy_load```, which return iterators of Document objects. We will use these below.

In [3]:
import bs4
from langchain_community.document_loaders import WebBaseLoader

page_url = "https://python.langchain.com/docs/how_to/chatbots_memory/"

loader = WebBaseLoader(web_paths=[page_url])
docs = []
async for doc in loader.alazy_load():
    docs.append(doc)

assert len(docs) == 1
doc = docs[0]

In [6]:
print(f"{doc.metadata}\n")
print(doc.page_content)

{'source': 'https://python.langchain.com/docs/how_to/chatbots_memory/', 'title': 'How to add memory to chatbots | \uf8ffü¶úÔ∏è\uf8ffüîó LangChain', 'description': 'A key feature of chatbots is their ability to use content of previous conversation turns as context. This state management can take several forms, including:', 'language': 'en'}






How to add memory to chatbots | ü¶úÔ∏èüîó LangChain






Skip to main contentIntegrationsAPI ReferenceMoreContributingPeopleError referenceLangSmithLangGraphLangChain HubLangChain JS/TSv0.3v0.3v0.2v0.1üí¨SearchIntroductionTutorialsBuild a Question Answering application over a Graph DatabaseTutorialsBuild a Simple LLM Application with LCELBuild a Query Analysis SystemBuild a ChatbotConversational RAGBuild an Extraction ChainBuild an AgentTaggingdata_generationBuild a Local RAG ApplicationBuild a PDF ingestion and Question/Answering systemBuild a Retrieval Augmented Generation (RAG) AppVector stores and retrieversBuild a Question/Answering sy

# Load with beautifulsoup

This is essentially a dump of the text from the page's HTML. It may contain extraneous information like headings and navigation bars. 

If you are familiar with the expected HTML, you can specify desired ```<div>``` classes and other parameters via BeautifulSoup. Below we parse only the body text of the article:

In [7]:
loader = WebBaseLoader(
    web_paths=[page_url],
    bs_kwargs={
        "parse_only": bs4.SoupStrainer(class_="theme-doc-markdown markdown"),
    },
    bs_get_text_kwargs={"separator": " | ", "strip": True},
)

docs = []
async for doc in loader.alazy_load():
    docs.append(doc)

assert len(docs) == 1
doc = docs[0]

In [8]:
print(f"{doc.metadata}\n")
print(doc.page_content)

{'source': 'https://python.langchain.com/docs/how_to/chatbots_memory/'}

How to add memory to chatbots | A key feature of chatbots is their ability to use content of previous conversation turns as context. This state management can take several forms, including: | Simply stuffing previous messages into a chat model prompt. | The above, but trimming old messages to reduce the amount of distracting information the model has to deal with. | More complex modifications like synthesizing summaries for long running conversations. | We'll go into more detail on a few techniques below! | note | This how-to guide previously built a chatbot using | RunnableWithMessageHistory | . You can access this version of the guide in the | v0.2 docs | . | As of the v0.3 release of LangChain, we recommend that LangChain users take advantage of | LangGraph persistence | to incorporate | memory | into new LangChain applications. | If your code is already relying on | RunnableWithMessageHistory | or | BaseChatMe