# Stream Chunks From File To CPU Memory

In this notebook we are going to read file from the storage in chunks. For example, we will read `Harry Potter and the Sorcerer's Stone` book, chapter by chapter, and perform summerization for each chapter, while the rest of the chapters still read from the storage.

## Preperation
We will download the book file:

In [None]:
import subprocess

url = "https://github.com/amephraim/nlp/raw/refs/heads/master/texts/J.%20K.%20Rowling%20-%20Harry%20Potter%201%20-%20Sorcerer's%20Stone.txt"
local_filename = 'book.txt'

wget_command = ['wget', '--content-disposition', url, '-O', local_filename]
subprocess.run(wget_command, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

We will now analyze what is the size of each chapter (chunk) and where start the first chapter (file offset)

In [None]:
def find_chapter_sizes_and_first_index(file_path):
    chapter_sizes = []
    first_index = None
    
    with open(file_path, 'r') as file:
        content = file.read()
        word = "CHAPTER"
        chapter_positions = []
        index = content.find(word)
        
        while index != -1:
            chapter_positions.append(index)
            if first_index is None:
                first_index = index
            index = content.find(word, index + 1)
        
        chapter_positions.append(len(content))

        for i in range(len(chapter_positions) - 1):
            chapter_size = chapter_positions[i + 1] - chapter_positions[i]
            chapter_sizes.append(chapter_size)
    
    return chapter_sizes, first_index

chapter_sizes, first_index = find_chapter_sizes_and_first_index('book.txt')
print(f"First chapter starts at: {first_index}\nChapter sizes: {chapter_sizes}")

## Streaming

To load the chapters (chunks) from the file we need to create `FileStreamer` instance, perform the request, and start iterating the chapters:

In [None]:
from runai_model_streamer import FileStreamer

file_path = "book.txt"

def summerize_chapter(buffer, index):
    # Perform heavy computation with the chapter text
    # For example summerize the content
    print(buffer[index:index + 20])
    return

with FileStreamer() as streamer:
    streamer.stream_file(file_path, first_index, chapter_sizes)
    for chapter_index, buffer, buffer_offset in streamer.get_chunks():
        summerize_chapter(buffer, buffer_offset)

A heavy workload can be running on each tensor in the moment the tensoe is yielded - as the CPP threads continue reading from the storage. 