# Text Chunking Examples

## Simple Text Chunking

The sample `blog_post.txt` file is taken from [What is Data Science? Definition, Examples, Tools & More](https://www.datacamp.com/blog/what-is-data-science-the-definitive-guide) article.

In [None]:
def chunk_text(path):
    with open(path, "r") as f:
        text = f.read()

    chunks = [text[i : i + 200] for i in range(0, len(text), 200)]

    for chunk in chunks:
        print("-" * 20)
        print(chunk)

# call the function
chunk_text("blog_post.txt")

## Sliding Window Chunking

In [None]:
def sliding_window(text, window_size, step_size):
    if window_size > len(text) or step_size < 1:
        return []
    return [text[i:i+window_size] for i
    in range(0, len(text) - window_size + 1, step_size)]

text = "This is the sample text"
window_size = 4
step_size = 2

chunks = sliding_window(text, window_size, step_size)

for idx, chunk in enumerate(chunks):
    print(f"Chunk {idx + 1}: {chunk}")

## Text Tokenization with Tiktoken

In [None]:
import tiktoken

# choose encoding method
encoding = tiktoken.get_encoding("cl100k_base")

# convert text into tokens
print(encoding.encode("This is the sample text"))

# decode tokens into text
print(encoding.decode([576, 374, 264, 6205, 1984]))