# Chunking Basics with Delimiter and Fixed Length Chunkers

This notebook introduces two basic chunkers: 
- **Delimiter-Based Chunker**
- **Fixed-Length Chunker**

We'll explore how to initialize these chunkers, examine their attributes, and use them to split text into smaller chunks.

In [35]:
# Importing necessary chunkers from the swarmauri library
from swarmauri.chunkers.concrete.DelimiterBasedChunker import DelimiterBasedChunker
from swarmauri.chunkers.concrete.FixedLengthChunker import FixedLengthChunker

## Delimiter-Based Chunking

A `DelimiterBasedChunker` divides a string based on specified delimiters (e.g., punctuation marks). Let's see how it works in practice.

In [2]:
# Basic usage of DelimiterBasedChunker
chunker = DelimiterBasedChunker()

# Checking the resource and type attributes
print(f"Resource: {chunker.resource}")
print(f"Type: {chunker.type}")

Resource: Chunker
Type: DelimiterBasedChunker


### Chunking Text Using Delimiters

We will now chunk a text using default delimiters (e.g., question marks, periods).

In [3]:
# Demonstrating the chunking of text based on delimiters
unchunked_text = 'question? test! period. run on'
chunks = chunker.chunk_text(unchunked_text)
print(f"Chunks: {chunks}")

Chunks: ['question?', 'test!', 'period.', 'run on']


## Fixed-Length Chunking

A `FixedLengthChunker` splits the text into chunks of a fixed size, which is useful for dividing content into uniform parts.

In [5]:
# Basic usage of FixedLengthChunker
fixed_chunker = FixedLengthChunker()

# Checking the resource and type attributes
print(f"Resource: {fixed_chunker.resource}")
print(f"Type: {fixed_chunker.type}")

Resource: Chunker
Type: FixedLengthChunker


### Chunking Text Based on Fixed Length

Let's now see how the `FixedLengthChunker` works by chunking a text of repeated patterns into equal-sized chunks.

In [6]:
# Demonstrating the chunking of text based on fixed lengths
unchunked_text = 'ab ' * 512  # Sample text with a repeated pattern
chunks = fixed_chunker.chunk_text(unchunked_text)
print(f"Number of chunks: {len(chunks)}")

Number of chunks: 6


## Notebook Metadata

In [38]:
import os
import platform
import sys
from datetime import datetime

author_name = "Huzaifa Irshad " 
github_username = "irshadhuzaifa"

print(f"Author: {author_name}")
print(f"GitHub Username: {github_username}")

notebook_file = "Notebook_01_Chunking_Basics.ipynb"
try:
    last_modified_time = os.path.getmtime(notebook_file)
    last_modified_datetime = datetime.fromtimestamp(last_modified_time)
    print(f"Last Modified: {last_modified_datetime}")
except Exception as e:
    print(f"Could not retrieve last modified datetime: {e}")

print(f"Platform: {platform.system()} {platform.release()}")
print(f"Python Version: {sys.version}")

try:
    import swarmauri
    print(f"Swarmauri Version: {swarmauri.__version__}")
except ImportError:
    print("Swarmauri is not installed.")

Author: Huzaifa Irshad 
GitHub Username: irshadhuzaifa
Last Modified: 2024-10-17 10:53:51.087449
Platform: Windows 11
Python Version: 3.12.7 | packaged by Anaconda, Inc. | (main, Oct  4 2024, 13:17:27) [MSC v.1929 64 bit (AMD64)]
Swarmauri Version: 0.5.0
