# Stream multiple Safetensors files To GPU

In this notebook we will demonstrate how to read model tensors that are divided to multiple files in parallel using the RunAI Model Streamer and copy them to the GPU memory.

## Prerequisite
Run this notebook on a Linux machine with GPU.

## Preperation
We will start by downloading few example `.safetensors` files. Feel free to use your own.

In [None]:
import subprocess

url = 'https://huggingface.co/vidore/colpali/resolve/main/adapter_model.safetensors?download=true'
local_filename_1 = 'model-1.safetensors'

wget_command = ['wget', '--content-disposition', url, '-O', local_filename_1]
subprocess.run(wget_command, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

url = 'https://huggingface.co/boltuix/NeuroBERT-Mini/resolve/main/model.safetensors?download=true'
local_filename_2 = 'model-2.safetensors'

wget_command = ['wget', '--content-disposition', url, '-O', local_filename_2]
subprocess.run(wget_command, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

## Streaming

To load the tensors from the files we need to create `SafetensorsStreamer` instance, perform the request, and transfer the tensors to the GPU memory.

In [None]:
from runai_model_streamer import SafetensorsStreamer

file_paths = ["model-1.safetensors", "model-2.safetensors"]

with SafetensorsStreamer() as streamer:
    streamer.stream_files(file_paths)
    for name, tensor in streamer.get_tensors():
        gpu_tensor = tensor.to('CUDA:0')

Each yielded tensor is copied to the GPU, while in the background the streamer continues to read the next tensors. Therefore, reading from storage and copying to GPU are performed in parallel.