### Specific Model Loading vs. AutoModel Loading

* This is an attempt to compare specific model loading using {ModelName}ForCausalLM.from_pretrained() to AutoModelForCausalLM.from_pretrained() to determine if there are any problems, specifically with the Falcon model.
* We will use Falcon1B because this is being done on a computer with a small GPU.

#### Check CUDA Availability

* We first need to check to ensure that CUDA is available.  We can start with the nvidia-smi shell tool.

In [7]:
!nvidia-smi

Failed to initialize NVML: Unknown Error


In [5]:
!pip install -U torch



In [6]:
import torch
print(torch.__version__)

if torch.cuda.is_available():
    print("GPU is available.")
else:
    print("GPU is not available. Check your CUDA installation.")

current_device = torch.cuda.current_device()
print(f"Current GPU device: {current_device}")

device_properties = torch.cuda.get_device_properties(0)  # Replace 0 with the desired GPU index
print(f"GPU Name: {device_properties.name}")
print(f"GPU Memory Capacity: {device_properties.total_memory / 1e9} GB")

2.1.1+cu121
GPU is not available. Check your CUDA installation.


RuntimeError: No CUDA GPUs are available

#### Specific Model Loading

* First off, we will try to do this with auto model loading, which we have been able to do successfully previously.

In [1]:
# imports - Auto{Stuff}
from transformers import AutoModelForCausalLM, AutoTokenizer

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
# 
model_identifier = "tiiuae/falcon-rw-1b"
# tokenizers are generally lightweight and are loaded into RAM. They are used to convert text into a format that the model can understand (like token IDs).
tokenizer = AutoTokenizer.from_pretrained(model_identifier)
# if a GPU is available and PyTorch is configured to use it, the model will be loaded into the GPU's memory. 
model = AutoModelForCausalLM.from_pretrained(model_identifier, trust_remote_code=True)