# Setup notebook: preload model weights to save time later

Before we dive into our main presentation, we'll do a tiny bit of setup.

In [None]:
import datetime
from transformers import pipeline
import torch

print(datetime.datetime.now())

We'll create the pipeline that will be used later, to trigger full download of the weights and verify we can load them.

> Note the `cache_dir` kwarg to use the fast NVMe storage on Anyscale

In [None]:
%%time

p  = pipeline(model="stabilityai/stablelm-tuned-alpha-7b", task='text-generation', 
              model_kwargs={'device_map':'auto', 'torch_dtype' : torch.float16, 'cache_dir': '/mnt/local_storage/'})

In [None]:
print(datetime.datetime.now())

Note that our model is loaded into GPU memory

In [None]:
! nvidia-smi

In production situations, this memory should be freed when the process exits.

However, in a notebook (or other long-running dev process environment), it can be useful to purge unneeded data directly

In [None]:
del(p)

> ðŸ¤— Accelerate is a library that enables the same PyTorch code to be run across any distributed configuration by adding just four lines of code! In short, training and inference at scale made simple, efficient and adaptable.

Here, we can use it directly to clear GPU space

In [8]:
from accelerate import Accelerator

accelerator = Accelerator()
accelerator.free_memory()

In [None]:
! nvidia-smi