This notebook inspects the NVIDIA graphics capabilities of the system and detemines how much memory is available for models. The other notebooks assume that the system has at least 10 GB of GPU memory (vRAM). If your system has more or less that this amount, you can either choose different sized models (e.g. use flan-t5-large instead of flan-t5-xlarge) or adjust model quantization (e.g. use 8-bit model weights).

You can determine the vRAM requirements of specific models, with or without varying levels quantization, at:
https://huggingface.co/docs/accelerate/main/en/usage_guides/model_size_estimator

In [1]:
import pynvml

pynvml.nvmlInit()

print(f'driver version : {pynvml.nvmlSystemGetDriverVersion()}')

devices = pynvml.nvmlDeviceGetCount()
print(f'device count : {devices}')
      
for i in range(devices):
    handle = pynvml.nvmlDeviceGetHandleByIndex(i)
    print(f'device {i} : {pynvml.nvmlDeviceGetName(handle)}')

driver version : 535.154.05
device count : 1
device 0 : NVIDIA GeForce RTX 3080


In [2]:
handle = pynvml.nvmlDeviceGetHandleByIndex(0)
info = pynvml.nvmlDeviceGetMemoryInfo(handle)

print(f'total : {info.total // 1024 ** 2} MB')
print(f'free  : {info.free // 1024 ** 2} MB')
print(f'used  : {info.used // 1024 ** 2} MB')

total : 10240 MB
free  : 9624 MB
used  : 615 MB


In [3]:
import torch

torch.cuda.empty_cache()

In [4]:
import gc

gc.collect()

0