## Install required modules
Use existing package managers (Conda, UV, Pip) to install required modules

In [None]:
import os
from dotenv import load_dotenv
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig,BitsAndBytesConfig
import torch

  from .autonotebook import tqdm as notebook_tqdm


### Check version of Torch and is Torch enabled with GPU.
CUDA libraries are developed by NVidia and Pytorch are python abstractions over NVidia CUDA

In [None]:
print(f"Torch Version: {torch.__version__}")
print(f"GPU enabled with Pytorch:  {torch.cuda.is_available()}")

Torch Version: 2.6.0+cpu
Is GPU enabled with Pytorch False


### Hugging Face API
1. Create Hugging Face Account if not already exists.
2. Create API Token
3. Configure token in .env file 

In [14]:
load_dotenv()
token = os.getenv("HUGGING_FACE_TOKEN")

Function: Load Model
1. Given a model name
2. From HF model hub, loads the model in memory.

Note: 
1. When model is loaded it uses GPU / CPU based on avilable compute resources.
2. By default, pytorch uses datatype of weights as FP32.
3. On GPUs, loading models may fail if they exceed GPU memory.


In [None]:
def load_model(model_name="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"):
    model_name = model_name
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(model_name, token=token)
    return model, tokenizer

Load Model in Memory

In [4]:
model, tokenizer = load_model()
print("Model loaded")

Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.


Model loaded


Lets undersand details of model.
1. Number of parameters or weights
2. Datatype of weights.
3. CPU / GPU based compute
4. Model Layers

In [15]:
print(f"Number of model parameters: {model.num_parameters()}")

Number of model parameters: 1777088000


In [16]:
for name, param in model.named_parameters():
    print(name, param.dtype, param.device)

model.embed_tokens.weight torch.float32 cpu
model.layers.0.self_attn.q_proj.weight torch.float32 cpu
model.layers.0.self_attn.q_proj.bias torch.float32 cpu
model.layers.0.self_attn.k_proj.weight torch.float32 cpu
model.layers.0.self_attn.k_proj.bias torch.float32 cpu
model.layers.0.self_attn.v_proj.weight torch.float32 cpu
model.layers.0.self_attn.v_proj.bias torch.float32 cpu
model.layers.0.self_attn.o_proj.weight torch.float32 cpu
model.layers.0.mlp.gate_proj.weight torch.float32 cpu
model.layers.0.mlp.up_proj.weight torch.float32 cpu
model.layers.0.mlp.down_proj.weight torch.float32 cpu
model.layers.0.input_layernorm.weight torch.float32 cpu
model.layers.0.post_attention_layernorm.weight torch.float32 cpu
model.layers.1.self_attn.q_proj.weight torch.float32 cpu
model.layers.1.self_attn.q_proj.bias torch.float32 cpu
model.layers.1.self_attn.k_proj.weight torch.float32 cpu
model.layers.1.self_attn.k_proj.bias torch.float32 cpu
model.layers.1.self_attn.v_proj.weight torch.float32 cpu
m