# Chunker Implementation: Serialization and Deserialization

This notebook demonstrates how to serialize and deserialize various chunkers. Serialization is useful for saving the state of a chunker and restoring it later, enabling the same chunker configuration to be reused.

In [1]:
# Importing necessary chunkers from the swarmauri library
from swarmauri.chunkers.concrete.DelimiterBasedChunker import DelimiterBasedChunker
from swarmauri.chunkers.concrete.FixedLengthChunker import FixedLengthChunker
from swarmauri.chunkers.concrete.SentenceChunker import SentenceChunker
from swarmauri.chunkers.concrete.SlidingWindowChunker import SlidingWindowChunker

## Delimiter-Based Chunker Serialization

We will first demonstrate how to serialize and deserialize a `DelimiterBasedChunker`.

In [2]:
# DelimiterBasedChunker serialization
delimiter_chunker = DelimiterBasedChunker()
serialized_data = delimiter_chunker.model_dump_json()
restored_chunker = DelimiterBasedChunker.model_validate_json(serialized_data)

print(f"Original DelimiterBasedChunker ID: {delimiter_chunker.id}")
print(f"Restored DelimiterBasedChunker ID: {restored_chunker.id}")

Original DelimiterBasedChunker ID: 93190809-364f-4fe5-bb35-d17ce1f31ab4
Restored DelimiterBasedChunker ID: 93190809-364f-4fe5-bb35-d17ce1f31ab4


## Fixed-Length Chunker Serialization

Now, let's demonstrate serialization and deserialization for the `FixedLengthChunker`.

In [3]:
# FixedLengthChunker serialization
fixed_chunker = FixedLengthChunker()
serialized_data = fixed_chunker.model_dump_json()
restored_fixed_chunker = FixedLengthChunker.model_validate_json(serialized_data)

print(f"Original FixedLengthChunker ID: {fixed_chunker.id}")
print(f"Restored FixedLengthChunker ID: {restored_fixed_chunker.id}")

Original FixedLengthChunker ID: 72cda408-01e7-48c6-bb98-0396d09030a4
Restored FixedLengthChunker ID: 72cda408-01e7-48c6-bb98-0396d09030a4


## Sentence-Based Chunker Serialization

Next, we will serialize and deserialize a `SentenceChunker`.

In [4]:
# SentenceChunker serialization
sentence_chunker = SentenceChunker()
serialized_data = sentence_chunker.model_dump_json()
restored_sentence_chunker = SentenceChunker.model_validate_json(serialized_data)

print(f"Original SentenceChunker ID: {sentence_chunker.id}")
print(f"Restored SentenceChunker ID: {restored_sentence_chunker.id}")

Original SentenceChunker ID: a3f7dcb2-d2d9-4deb-834f-224e810fbadb
Restored SentenceChunker ID: a3f7dcb2-d2d9-4deb-834f-224e810fbadb


## Sliding Window Chunker Serialization

Lastly, we will serialize and deserialize a `SlidingWindowChunker`, showcasing how to handle chunkers with special settings like overlap.

In [5]:
# SlidingWindowChunker serialization
sliding_chunker = SlidingWindowChunker(overlap=True, step_size=21)
serialized_data = sliding_chunker.model_dump_json()
restored_sliding_chunker = SlidingWindowChunker.model_validate_json(serialized_data)

print(f"Original SlidingWindowChunker ID: {sliding_chunker.id}")
print(f"Restored SlidingWindowChunker ID: {restored_sliding_chunker.id}")

Original SlidingWindowChunker ID: 178a35a9-f29b-475a-adca-c74dfc625017
Restored SlidingWindowChunker ID: 178a35a9-f29b-475a-adca-c74dfc625017


## Notebook Metadata

In [9]:
import os
import platform
import sys
from datetime import datetime

author_name = "Huzaifa Irshad " 
github_username = "irshadhuzaifa"

print(f"Author: {author_name}")
print(f"GitHub Username: {github_username}")

notebook_file = "Notebook_03_Chunker_Implementation.ipynb"
try:
    last_modified_time = os.path.getmtime(notebook_file)
    last_modified_datetime = datetime.fromtimestamp(last_modified_time)
    print(f"Last Modified: {last_modified_datetime}")
except Exception as e:
    print(f"Could not retrieve last modified datetime: {e}")

print(f"Platform: {platform.system()} {platform.release()}")
print(f"Python Version: {sys.version}")

try:
    import swarmauri
    print(f"Swarmauri Version: {swarmauri.__version__}")
except ImportError:
    print("Swarmauri is not installed.")

Author: Huzaifa Irshad 
GitHub Username: irshadhuzaifa
Last Modified: 2024-10-17 10:53:58.977674
Platform: Windows 11
Python Version: 3.12.7 | packaged by Anaconda, Inc. | (main, Oct  4 2024, 13:17:27) [MSC v.1929 64 bit (AMD64)]
Swarmauri Version: 0.5.0
