# Stream Model To GPU

## Prerequisite
Run this notebook on a GPU machine.

## Preperation
We will start by downloading an example `.safetensors` file. Feel free to use your own.

In [11]:
import subprocess

def download_file_with_curl(url, local_filename):
    wget_command = ['wget', '--content-disposition', url, '-O', local_filename]
    
    try:
        subprocess.run(wget_command, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    except subprocess.CalledProcessError as e:
        print(f"Error occurred: {e.stderr.decode()}")

url = 'https://huggingface.co/vidore/colpali/resolve/main/adapter_model.safetensors?download=true'
local_filename = 'model.safetensors'
download_file_with_curl(url, local_filename)

## Streaming

To load the tensors from the file we need to create `SafetensorsStreamer` instance, perform the request, and transfer the tensors to the GPU memory.

In [12]:
from runai_model_streamer import SafetensorsStreamer

file_path = "model.safetensors"

with SafetensorsStreamer() as streamer:
    streamer.stream_file(file_path)
    for name, tensor in streamer.get_tensors():
        gpu_tensor = tensor.to('CUDA:0')

Read throughput is 1.60 KB per second 
Read throughput is 39.25 MB per second 
[RunAI Streamer] CPU Buffer size: 74.9 MiB for file: model.safetensors


Read throughput is 257.66 MB per second 
[RunAI Streamer] Overall time to stream 74.9 MiB of all files: 0.32s, 236.3 MiB/s


After you have the tensors in the GPU memory you are free to use it in `nn.Module` or alternatively use it freely