## Text Splitting - CharacterTextSplitter

CharacterTextSplitter is a simpler cousin of RecursiveCharacterTextSplitter. 

- It simply splits text into chunks of a given size (chunk_size)
- It can add some overlap between chunks (chunk_overlap)
- It does not care about natural language structure (paragraphs, sentences). It just slices based on raw characters. 

With CharacterTextSplitter we define a separator (default is \n\n).

- CharacterTextSplitter will first try to split the text based on the seperator. While doing so, the created chunks may be greater than the chunk_size.
- If no seperator is found, it will split the text into equal character chunks based on the chunk_size.


### Load the Data

In [1]:
from langchain_community.document_loaders import TextLoader

## Initialize the TextLoader
loader = TextLoader('sampletext.txt')

## The below code will load the content of the text file.
text_documents=loader.load()

### Split Documents into Documents

In [2]:
from langchain_text_splitters import CharacterTextSplitter

text_splitter = CharacterTextSplitter(
    separator = "\n\n",
    chunk_size = 100,
    chunk_overlap  = 20)

texts = text_splitter.split_documents(text_documents)
texts

Created a chunk of size 555, which is longer than the specified 100
Created a chunk of size 624, which is longer than the specified 100


[Document(metadata={'source': 'sampletext.txt'}, page_content='Agentic AI is a class of artificial intelligence that focuses on autonomous systems that can make decisions and perform tasks with or without human intervention. The independent systems automatically respond to conditions, with procedural, algorithmic, and human-like creative steps, to produce process results. The field is closely linked to agentic automation, also known as agent-based process management systems, when applied to process automation. Applications include software development, customer support, cybersecurity and business intelligence.'),
 Document(metadata={'source': 'sampletext.txt'}, page_content='The core concept of agentic AI is the use of AI agents to perform automated tasks with or without human intervention.[1] While robotic process automation (RPA) systems automate rule-based, repetitive tasks with fixed logic, agentic AI adapts and learns from data inputs. [2] Agentic AI refers to autonomous systems c

In [5]:
speech=""

with open('sampletext.txt') as f:
    speech = f.read()

text_splitter = CharacterTextSplitter(
    separator = "\n\n",
    chunk_size = 100,
    chunk_overlap  = 20)

texts = text_splitter.create_documents([speech])
texts

Created a chunk of size 555, which is longer than the specified 100
Created a chunk of size 624, which is longer than the specified 100


[Document(metadata={}, page_content='Agentic AI is a class of artificial intelligence that focuses on autonomous systems that can make decisions and perform tasks with or without human intervention. The independent systems automatically respond to conditions, with procedural, algorithmic, and human-like creative steps, to produce process results. The field is closely linked to agentic automation, also known as agent-based process management systems, when applied to process automation. Applications include software development, customer support, cybersecurity and business intelligence.'),
 Document(metadata={}, page_content='The core concept of agentic AI is the use of AI agents to perform automated tasks with or without human intervention.[1] While robotic process automation (RPA) systems automate rule-based, repetitive tasks with fixed logic, agentic AI adapts and learns from data inputs. [2] Agentic AI refers to autonomous systems capable of pursuing complex goals with minimal human 