![Hugging Face](https://huggingface.co/front/assets/huggingface_logo-noborder.svg)

# 🏗️ Working with [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B)

The **DeepSeek-R1-Distill-Qwen-7B** model is a distilled version of DeepSeek's R1 reasoning model, fine-tuned for **advanced reasoning and chain-of-thought tasks**. It supports a **128k token context length**, making it well-suited for complex text generation.

## 🚀 **Key Features**
- **🔍 Distilled Model:** Retains strong reasoning abilities from DeepSeek-R1 in a more compact form.
- **📜 Extended Context Length:** Handles up to **128,000 tokens**, allowing for **better long-context understanding**.
- **🧠 Optimized for Reasoning:** Fine-tuned for **structured thought processes** and **logical inference**.

## 🛠️ **How to Use**
### **📦 Install Dependencies**
```bash
pip install torch transformers accelerate


In [3]:
import os
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from accelerate import load_checkpoint_and_dispatch

# Set environment variables for CUDA devices
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
os.environ["TOKENIZERS_PARALLELISM"] = "false"

# Specify the model name

# https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
model_name = "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B"
cache_dir = "/kaggle/temp"  # Use Kaggle's temp storage to avoid disk space issues

# ✅ Load the tokenizer with caching
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    cache_dir=cache_dir,  # Store model files in Kaggle's temporary directory
    trust_remote_code=True
)

# ✅ Load the model with caching and multi-GPU execution
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    cache_dir=cache_dir,  # Cache model in temp storage
    torch_dtype=torch.float16,  # ✅ Use torch.float16 instead of string
    device_map="auto",  # Automatically distribute across available GPUs
    trust_remote_code=True
)

# Define multiple prompt variations to observe impact
prompts = [
    "Solve the equation: 3x + 5 = 20. What is x?",
    "What is x if 3x + 5 = 20?",
    "Find x in the equation: 3x + 5 = 20.",
    "Compute x given 3x + 5 = 20.",
    "If 3x + 5 equals 20, what does x equal?",
    "Determine the numerical value of x in 3x + 5 = 20.",
    "Can you solve for x: 3x + 5 = 20?",
    "What is the solution to 3x + 5 = 20?"
]

# Tokenize the prompts
inputs = tokenizer(prompts, padding=True, return_tensors="pt").to("cuda")

print("🔢 **Tokenized Input (IDs):**")
print(inputs["input_ids"])  # Shows numerical tokenized representation

# Perform inference
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=1024,  # Limit the number of generated tokens
        temperature=0.7,  # Sampling temperature
        top_p=0.9,  # Nucleus sampling
        repetition_penalty=1.1  # Penalize repetition
    )

# Display raw model output (before converting to text)
print("\n📊 **Raw Model Output (IDs):**")
print(outputs)  # Shows the generated token IDs before decoding

# Decode and print responses
decoded_outputs = tokenizer.batch_decode(outputs, skip_special_tokens=True)

for i, response in enumerate(decoded_outputs):
    print(f"Response for prompt {i+1}:\n{response}\n")
    print("***********************************************")


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.


🔢 **Tokenized Input (IDs):**
tensor([[151646,     50,   3948,    279,  23606,     25,    220,     18,     87,
            488,    220,     20,    284,    220,     17,     15,     13,   3555,
            374,    856,     30],
        [151643, 151643, 151643, 151643, 151643, 151646,   3838,    374,    856,
            421,    220,     18,     87,    488,    220,     20,    284,    220,
             17,     15,     30],
        [151643, 151643, 151643, 151646,   9885,    856,    304,    279,  23606,
             25,    220,     18,     87,    488,    220,     20,    284,    220,
             17,     15,     13],
        [151643, 151643, 151643, 151643, 151643, 151643, 151646,  46254,    856,
           2661,    220,     18,     87,    488,    220,     20,    284,    220,
             17,     15,     13],
        [151643, 151643, 151643, 151646,   2679,    220,     18,     87,    488,
            220,     20,  16819,    220,     17,     15,     11,   1128,   1558,
            856,   6144, 

In [4]:
del outputs,inputs,tokenizer,model
import gc, torch
gc.collect()
torch.cuda.empty_cache()

# 🏆 **Mistral-7B-Instruct-v0.3**  
🚀 **[View on Hugging Face](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)**  

## 📌 **Overview**  
**Mistral-7B-Instruct-v0.3** is a **fine-tuned version** of Mistral-7B, optimized for **instruction-following tasks** and **chat-based interactions**. It builds upon Mistral-7B’s **efficient architecture**, delivering **high-quality reasoning and conversational abilities** while maintaining a **small model size (7B parameters)** for improved performance.

## ⚡ **Key Features**
- **🧠 Strong Instruction-Following:** Optimized for handling a wide range of user queries, including **reasoning, coding, and knowledge-based tasks**.
- **🔍 Enhanced Context Understanding:** Works well with **long-form text generation** while maintaining coherence.
- **📏 Compact Yet Powerful (7B Parameters):** **Balances speed and performance**, making it a great alternative to larger models.
- **🖥️ Efficient Execution:** Supports **multi-GPU execution** with **float16 precision for reduced memory usage**.
  
## 🛠 **How to Use**
### **1️⃣ Install Dependencies**
```bash
pip install torch transformers accelerate


## 🔑 Step 1: Log in to Hugging Face from Kaggle
Since Kaggle does not have an interactive login like local machines, you need to use your **Hugging Face token** for authentication.

1. Go to **[Hugging Face Tokens](https://huggingface.co/settings/tokens)**
2. **Create a new access token** with **"read" permissions**
3. **Copy the token**


In [8]:
from huggingface_hub import login

# Replace 'your_huggingface_token_here' with your actual token
login("hf_X.............................")  # replace it with your read token



## ✅ Step 2: Accept Model Terms on Hugging Face
Some Hugging Face models require **explicit access approval** before you can use them.

1. Go to the **[Mistral-7B-Instruct-v0.3 Model Page](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)**
2. Click **"Request Access"** and accept the terms
3. Wait for **access approval** (this might take some time)


In [7]:
import os
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Set environment variables for CUDA devices
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
os.environ["TOKENIZERS_PARALLELISM"] = "false"

# 🔹 Specify the Mistral v3 model name
model_name = "mistralai/Mistral-7B-Instruct-v0.3"
cache_dir = "/kaggle/temp"  # ✅ Cache model in Kaggle’s temp directory to avoid space issues

# ✅ Load the tokenizer with authentication
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    cache_dir=cache_dir,
    use_auth_token=True,
    trust_remote_code=True
)

# 🔹 Fix: Set a padding token if it doesn't exist
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token  # Use EOS token as padding

# ✅ Load the model with caching and multi-GPU execution
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    cache_dir=cache_dir,
    torch_dtype=torch.float16,  # ✅ Use torch.float16 for efficiency
    device_map="auto",  # ✅ Automatically distribute across available GPUs
    use_auth_token=True,
    trust_remote_code=True
)

# Step 4: Define multiple prompts to test the model
prompts = [
    "Solve the equation: 3x + 5 = 20. What is x?",
    "What is x if 3x + 5 = 20?",
    "Find x in the equation: 3x + 5 = 20.",
    "Compute x given 3x + 5 = 20.",
    "If 3x + 5 equals 20, what does x equal?",
    "Determine the numerical value of x in 3x + 5 = 20.",
    "Can you solve for x: 3x + 5 = 20?",
    "What is the solution to 3x + 5 = 20?"
]

# Step 5: Tokenize the prompts
inputs = tokenizer(prompts, padding=True, return_tensors="pt").to("cuda")

# 🔢 Display tokenized input (series of numbers)
print("🔢 **Tokenized Input (IDs):**")
print(inputs["input_ids"])  # Shows numerical tokenized representation

# Step 6: Perform inference
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=1024,  # Limit the number of generated tokens
        temperature=0.7,  # Sampling temperature
        top_p=0.9,  # Nucleus sampling
        repetition_penalty=1.1  # Penalize repetition
    )

# 📊 Display raw model output (before converting to text)
print("\n📊 **Raw Model Output (IDs):**")
print(outputs)  # Shows the generated token IDs before decoding

# Step 7: Decode and print responses
decoded_outputs = tokenizer.batch_decode(outputs, skip_special_tokens=True)

for i, response in enumerate(decoded_outputs):
    print(f"Response for prompt {i+1}:\n{response}\n")
    print("***********************************************")




Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


🔢 **Tokenized Input (IDs):**
tensor([[    1,  1086,  6071,  1040,  9545, 29515, 29473, 29538, 29512,  1416,
         29473, 29550,  1095, 29473, 29518, 29502, 29491,  2592,  1117,  2086,
         29572],
        [    2,     2,     2,     2,     2,     1,  2592,  1117,  2086,  1281,
         29473, 29538, 29512,  1416, 29473, 29550,  1095, 29473, 29518, 29502,
         29572],
        [    2,     2,     2,     1,  9537,  2086,  1065,  1040,  9545, 29515,
         29473, 29538, 29512,  1416, 29473, 29550,  1095, 29473, 29518, 29502,
         29491],
        [    2,     2,     2,     2,     2,     1,  1892,  8913,  2086,  2846,
         29473, 29538, 29512,  1416, 29473, 29550,  1095, 29473, 29518, 29502,
         29491],
        [    2,     2,     2,     1,  1815, 29473, 29538, 29512,  1416, 29473,
         29550, 22356, 29473, 29518, 29502, 29493,  1535,  2003,  2086,  7298,
         29572],
        [    2,     1,  5926, 22592,  1040, 18893,  1960,  1070,  2086,  1065,
         29473, 2