# Text splitters
source: https://python.langchain.com/docs/how_to/#text-splitters

Text Splitters take a document and split into chunks that can be used for retrieval.

- recursively split text
- split HTML
- split by character
- split code
- split Markdown by headers
- recursively split JSON
- split text into semantic chunks
- split by tokens


## Text splitters

In [9]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

filepath = "../assets/lepetitprince.txt"

# Load example document
with open(filepath) as f:
    state_of_the_union = f.read()

text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size=250,
    chunk_overlap=20,
    length_function=len,
    is_separator_regex=True,
)
texts = text_splitter.create_documents([state_of_the_union])
print("--- chunk 1:\n",texts[0])
print("\n--- chunk 2:\n",texts[1])

--- chunk 1:
 page_content='THE LITTLE PRINCE 

Antoine De Saint-Exupery 


Antoine de Saint-Exupery, who was a French author, journalist and pilot wrote 
The Little Prince in 1943, one year before his death.'

--- chunk 2:
 page_content='The Little Prince appears to be a simple children’s tale, 
some would say that it is actually a profound and deeply moving tale, 
written in riddles and laced with philosophy and poetic metaphor.'


In [None]:
with open(filepath) as f:
    state_of_the_union = f.read()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=250,
    chunk_overlap=20,
    length_function=len,
    is_separator_regex=False,
    separators=[
        "\n\n",
        "\n",
        ],
    # Existing args
)

texts = text_splitter.create_documents([state_of_the_union])
print("--- chunk 1:\n",texts[0])
print("\n--- chunk 2:\n",texts[1])

--- chunk 1:
 page_content='THE LITTLE PRINCE 

Antoine De Saint-Exupery 


Antoine de Saint-Exupery, who was a French author, journalist and pilot wrote 
The Little Prince in 1943, one year before his death.'

--- chunk 2:
 page_content='The Little Prince appears to be a simple children’s tale, 
some would say that it is actually a profound and deeply moving tale, 
written in riddles and laced with philosophy and poetic metaphor.'
