## Remember!!! Even this is big model for CPU based machines.
### Install required modules
Use existing package managers (Conda, UV, Pip) to install required modules.
Ran this model on a CPU based Server, with 64 GB RAM and for inferencing CPU as 100% for more than 5 minutes.

In [1]:
import warnings
warnings.filterwarnings("ignore")
import os
from dotenv import load_dotenv
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig,BitsAndBytesConfig
import torch
import time

### Check version of Torch and is Torch enabled with GPU.
CUDA libraries are developed by NVidia and Pytorch are python abstractions over NVidia CUDA

In [2]:
print(f"Torch Version: {torch.__version__}")
print(f"GPU enabled with Pytorch:  {torch.cuda.is_available()}")

Torch Version: 2.6.0+cpu
GPU enabled with Pytorch:  False


### Hugging Face API
1. Create Hugging Face Account if not already exists.
2. Create API Token
3. Configure token in .env file 

In [3]:
load_dotenv()
token = os.getenv("HUGGING_FACE_TOKEN")

Function: Load Model
1. Given a model name
2. From HF model hub, loads the model in memory.

Note: 
1. When model is loaded it uses GPU / CPU based on avilable compute resources.
2. By default, pytorch uses datatype of weights as FP32.
3. On GPUs, loading models may fail if they exceed GPU memory.


In [4]:
def load_model(model_name="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"):
    model_name = model_name
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(model_name, token=token)
    return model, tokenizer

Load Model in Memory

In [5]:
model, tokenizer = load_model("unsloth/DeepSeek-R1-Distill-Llama-8B")
print("Model loaded")

Fetching 4 files: 100%|██████████| 4/4 [03:03<00:00, 45.81s/it] 
Loading checkpoint shards: 100%|██████████| 4/4 [00:07<00:00,  1.90s/it]


Model loaded


Lets undersand details of model.
1. Number of parameters or weights
2. Datatype of weights.
3. CPU / GPU based compute
4. Model Layers

In [10]:
print(f"Number of model parameters: {model.num_parameters()}")
print(f"Approximate model size: {model.num_parameters()*4/1024/1024/1024} GB")

Number of model parameters: 8030261248
Approximate model size: 29.915054321289062 GB


In [11]:
for name, param in model.named_parameters():
    print(name, param.dtype, param.device)

model.embed_tokens.weight torch.float32 cpu
model.layers.0.self_attn.q_proj.weight torch.float32 cpu
model.layers.0.self_attn.k_proj.weight torch.float32 cpu
model.layers.0.self_attn.v_proj.weight torch.float32 cpu
model.layers.0.self_attn.o_proj.weight torch.float32 cpu
model.layers.0.mlp.gate_proj.weight torch.float32 cpu
model.layers.0.mlp.up_proj.weight torch.float32 cpu
model.layers.0.mlp.down_proj.weight torch.float32 cpu
model.layers.0.input_layernorm.weight torch.float32 cpu
model.layers.0.post_attention_layernorm.weight torch.float32 cpu
model.layers.1.self_attn.q_proj.weight torch.float32 cpu
model.layers.1.self_attn.k_proj.weight torch.float32 cpu
model.layers.1.self_attn.v_proj.weight torch.float32 cpu
model.layers.1.self_attn.o_proj.weight torch.float32 cpu
model.layers.1.mlp.gate_proj.weight torch.float32 cpu
model.layers.1.mlp.up_proj.weight torch.float32 cpu
model.layers.1.mlp.down_proj.weight torch.float32 cpu
model.layers.1.input_layernorm.weight torch.float32 cpu
mod

In [8]:
def generate_model_response(
        prompt:str,
        tokenizer:AutoTokenizer,
        model:AutoModelForCausalLM,
        max_length:int=3500,
        temperature:float=0.1,
        top_k:int=50)->str:
    input_ids = tokenizer(prompt, return_tensors="pt",padding=True)
    device = "cuda" if torch.cuda.is_available() else "cpu"
    inputs = {k: v.to(device) for k, v in input_ids.items()}
    print(inputs)
    attention_mask = input_ids["attention_mask"]
    input_ids = input_ids["input_ids"]
    pad_token_id = tokenizer.pad_token_id
    eos_token_id = tokenizer.eos_token_id
    print(attention_mask[0])
    start_time = time.time()
    with torch.no_grad():
        logits = model(**inputs).logits
        output = model.generate(
                                    **inputs, 
                                    max_length=max_length, 
                                    do_sample=True,
                                    temperature=temperature, 
                                    top_k=top_k,
                                    # attention_mask=attention_mask,
                                    pad_token_id=pad_token_id,
                                    eos_token_id=eos_token_id
                                    )
        final_output = tokenizer.decode(output[0], skip_special_tokens=True)
        print(final_output)
    end_time = time.time()
    print(f"Time taken: {end_time-start_time}")


In [21]:
model_fp16 = model.half()
tokenizer_fp16 = tokenizer
print("Model loaded in FP16")

Model loaded in FP16


In [16]:
print(f"Number of model parameters: {model_fp16.num_parameters()}")
print(f"Approximate model size: {model_fp16.num_parameters()*2/1024/1024/1024} GB")

Number of model parameters: 1777088000
Approximate model size: 3.310084342956543 GB


In [9]:
generate_model_response("What is the meaning of life?", tokenizer=tokenizer, model=model)

{'input_ids': tensor([[128000,   3923,    374,    279,   7438,    315,   2324,     30]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1]])}
tensor([1, 1, 1, 1, 1, 1, 1, 1])
What is the meaning of life? This is a question that has been asked by people throughout history. There are many theories and beliefs about what life's purpose is. Some people believe it's about happiness, others about success, while some look to religion or spirituality for answers. But what does the Bible say about the meaning of life?

First, I need to recall what the Bible teaches about life. The Bible is filled with stories and teachings that provide different perspectives. One common theme is that life is a gift from God. In Genesis, it says that God created man in His own image, implying that life is something of great value and purpose.

Another perspective is that life is about serving God. In the New Testament, Jesus says, "I have come to serve, not to be served." This suggests that the purpose of life