### We are going to use the TextSplitter class to split the text into chunks

#### First We need to load any documents we want to split


In [8]:
from langchain_community.document_loaders import TextLoader

loader = TextLoader("./speech.txt")
documents = loader.load()

print(documents)

[Document(metadata={'source': './speech.txt'}, page_content="Sure! Here's a short speech on the **Bajaj Pulsar 150**, suitable for a presentation, school event, or a general talk:\n\n**Speech on Bajaj Pulsar 150**\n\nGood \\[morning/afternoon/evening] everyone,\n\nToday, I would like to talk about one of Indiaâ€™s most iconic motorcycles â€“ the **Bajaj Pulsar 150**.\n\nSince its launch in 2001, the Pulsar 150 has become a household name among bike enthusiasts across the country. It wasn't just a bikeâ€”it was a revolution. At a time when most bikes focused purely on mileage, Bajaj dared to deliver **performance, style, and power** in an affordable package. With a sporty design, muscular fuel tank, and a powerful 149cc engine, the Pulsar 150 redefined the Indian biking scene.\n\nOver the years, the bike has seen several updates in both design and features, but what has remained consistent is its balance of **power and practicality**. Itâ€™s strong enough for thrill seekers and efficien

In [12]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=50)

chunks = text_splitter.split_documents(documents)

print(f"Total chunks: {len(chunks)}")
print(chunks[0].page_content)
print(chunks[1].page_content)
print(chunks[2].page_content)
print(chunks[3].page_content)
print(chunks[-2].page_content)
print(chunks[-1].page_content)

Total chunks: 30
Sure! Here's a short speech on the **Bajaj Pulsar 150**, suitable for a presentation, school event,
150**, suitable for a presentation, school event, or a general talk:
**Speech on Bajaj Pulsar 150**

Good \[morning/afternoon/evening] everyone,
Today, I would like to talk about one of Indiaâ€™s most iconic motorcycles â€“ the **Bajaj Pulsar
more than two decades, it continues to lead the pack, proving that true legends never fade.
Thank you!

---

Would you like a version in Hindi or a more technical/detailed version?


#### HTML text splitter

In [1]:
from langchain_text_splitters import HTMLHeaderTextSplitter

html_content = """
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Sample Web Page</title>
</head>
<body>
    <h1>Welcome to My Sample Page</h1>
    <p>This is the first paragraph. It contains some introductory text.</p>
    <p>This is the second paragraph. It's a bit longer and used for demonstration purposes.</p>
    <h2>About This Page</h2>
    <p>This section provides more detail about the page content.</p>
    <ul>
        <li>Point One</li>
        <li>Point Two</li>
        <li>Point Three</li>
    </ul>
    <h2>Contact</h2>
    <p>You can reach us at <a href="mailto:contact@example.com">contact@example.com</a>.</p>
</body>
</html>
"""

headers_to_split_on = [
    ("h1", "Header 1"),
    ("h2", "Heading 2"),
    ("h3", "Heading 3")
]

html_splitter = HTMLHeaderTextSplitter(headers_to_split_on=headers_to_split_on)
chunks = html_splitter.split_text(html_content)

chunks

[Document(metadata={'Header 1': 'Welcome to My Sample Page'}, page_content='Welcome to My Sample Page'),
 Document(metadata={'Header 1': 'Welcome to My Sample Page'}, page_content="This is the first paragraph. It contains some introductory text.  \nThis is the second paragraph. It's a bit longer and used for demonstration purposes."),
 Document(metadata={'Header 1': 'Welcome to My Sample Page', 'Heading 2': 'About This Page'}, page_content='About This Page'),
 Document(metadata={'Header 1': 'Welcome to My Sample Page', 'Heading 2': 'About This Page'}, page_content='This section provides more detail about the page content.  \nPoint One  \nPoint Two  \nPoint Three'),
 Document(metadata={'Header 1': 'Welcome to My Sample Page', 'Heading 2': 'Contact'}, page_content='Contact'),
 Document(metadata={'Header 1': 'Welcome to My Sample Page', 'Heading 2': 'Contact'}, page_content='You can reach us at .  \ncontact@example.com')]