<a href="https://colab.research.google.com/github/itzdeenzxx/DeepkSeek_R1_1_5B_Finetuning/blob/main/DeepkSeek_R1_1_5B_Finetuning_LoRA_sample.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

https://colab.research.google.com/drive/12nB0-QC5DU9hL79Q9eBtXEXLMEvAYMpk?usp=sharing <br>
https://www.linkedin.com/pulse/fine-tune-deepseek-r1-15b-free-gcp-colab-t4-hands-on-konathala-phd--4bluf/

***
Approach 1: End-to-End Code for Fine-Tuning DeepSeek R1 1.5B for Domain-Specific Text Generation
***
By Thirumalesh Konathala

Step 1: Install Required Libraries

In [None]:
#install the necessary packages
!pip install transformers datasets peft torch

Collecting datasets
  Downloading datasets-3.3.0-py3-none-any.whl.metadata (19 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.

Step 2: Load the Pre-Trained Model & Tokenizer

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Define the model name
model_name = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"  #https://huggingface.co/deepseek-ai/DeepSeek-R1

# Load pre-trained model & tokenizer
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Move model to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/679 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.55G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.07k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/7.03M [00:00<?, ?B/s]

Step 3: Generate a Hypothetical Domain-Specific Document

In [None]:
text = """
Artificial Intelligence (AI) is transforming industries across the globe. From healthcare to finance, AI applications are revolutionizing the way we approach problem-solving and decision-making. The integration of AI into daily operations enhances efficiency, accuracy, and the ability to predict future trends. As AI technology continues to evolve, it is crucial for professionals to stay informed about the latest developments and understand how to leverage these tools effectively.
"""

Step 4: Convert the Text Data into a Hugging Face Dataset

In [None]:
from datasets import Dataset

# Split the text into sentences for better learning
sentences = text.split(". ")

# Create a Hugging Face dataset
dataset = Dataset.from_dict({"text": sentences})

Step 5: Tokenization with Labeling

In [None]:
def preprocess_function(examples):
    inputs = tokenizer(
        examples["text"], truncation=True, padding="max_length", max_length=512
    )

    # Labels must be a shifted version of input_ids for causal LM training
    inputs["labels"] = inputs["input_ids"].copy()
    return inputs

# Apply tokenization
tokenized_dataset = dataset.map(preprocess_function, batched=True)

Map:   0%|          | 0/4 [00:00<?, ? examples/s]

Step 6: Apply LoRA for Efficient Fine-Tuning

In [None]:
from peft import get_peft_model, LoraConfig, TaskType

# Define LoRA configuration
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
)

# Wrap model with LoRA
model = get_peft_model(model, lora_config)

Step 7: Configure Training Hyperparameters

In [None]:
from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    per_device_train_batch_size=1,  # Adjusted for GPU memory limitations
    gradient_accumulation_steps=8,  # To simulate a larger batch size
    warmup_steps=100,
    max_steps=100,
    learning_rate=2e-4,
    fp16=True,  # Enable mixed precision training
    logging_steps=10,
    output_dir="outputs",
    report_to="none",
    remove_unused_columns=False,
)

Step 8: Initialise the trainer and free memory

In [None]:
# Move model to CPU to free memory before training
model = model.to("cpu")

# Initialize the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
)

# Free up memory before training
import torch
import gc

gc.collect()  # Garbage collection
torch.cuda.empty_cache()  # Clears CUDA cache

# Optimize model with torch.compile (improves execution speed)
model = torch.compile(model)

# Move model back to GPU for training
model = model.to("cuda")

Step 9: Start Fine-Tuning the Model

In [None]:
# Start training
trainer.train()

Step,Training Loss
10,10.1499
20,8.0763
30,3.236
40,0.3219
50,0.1627
60,0.141
70,0.1179
80,0.0938
90,0.0667
100,0.0421


TrainOutput(global_step=100, training_loss=2.24081630975008, metrics={'train_runtime': 130.0829, 'train_samples_per_second': 6.15, 'train_steps_per_second': 0.769, 'total_flos': 1899593780428800.0, 'train_loss': 2.24081630975008, 'epoch': 100.0})

Step 10: Save the Fine-Tuned Model

In [None]:
model.save_pretrained("fine-tuned-deepseek-r1-1.5b")
tokenizer.save_pretrained("fine-tuned-deepseek-r1-1.5b")

('fine-tuned-deepseek-r1-1.5b/tokenizer_config.json',
 'fine-tuned-deepseek-r1-1.5b/special_tokens_map.json',
 'fine-tuned-deepseek-r1-1.5b/tokenizer.json')

Step 11: Load the Fine-Tuned Model

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Define the path where the fine-tuned model is saved
model_path = "fine-tuned-deepseek-r1-1.5b"

# Load the fine-tuned model and tokenizer
model = AutoModelForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Move model to CPU (or GPU if needed)
# device = "cuda" if torch.cuda.is_available() else "cpu"
model.to("cpu")  # Keeping it on CPU for now

Qwen2ForCausalLM(
  (model): Qwen2Model(
    (embed_tokens): Embedding(151936, 1536)
    (layers): ModuleList(
      (0-27): 28 x Qwen2DecoderLayer(
        (self_attn): Qwen2Attention(
          (q_proj): lora.Linear(
            (base_layer): Linear(in_features=1536, out_features=1536, bias=True)
            (lora_dropout): ModuleDict(
              (default): Dropout(p=0.05, inplace=False)
            )
            (lora_A): ModuleDict(
              (default): Linear(in_features=1536, out_features=16, bias=False)
            )
            (lora_B): ModuleDict(
              (default): Linear(in_features=16, out_features=1536, bias=False)
            )
            (lora_embedding_A): ParameterDict()
            (lora_embedding_B): ParameterDict()
            (lora_magnitude_vector): ModuleDict()
          )
          (k_proj): Linear(in_features=1536, out_features=256, bias=True)
          (v_proj): lora.Linear(
            (base_layer): Linear(in_features=1536, out_features=256, bi

Step 12: Test the Fine-Tuned Model

In [None]:
def generate_text(prompt, max_length=100):
    inputs = tokenizer(prompt, return_tensors="pt").to("cpu")

    with torch.no_grad():
        output = model.generate(**inputs, max_length=max_length, temperature=0.7, top_k=50, top_p=0.9)

    return tokenizer.decode(output[0], skip_special_tokens=True)

# Test
prompt = "The integration of AI into daily operations enhances"
output = generate_text(prompt)
print(output)

Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.


The integration of AI into daily operations enhances efficiency, accuracy, and the ability to predict future trends. How can we leverage this technology to address the challenges posed by COVID-19, particularly in the realms of healthcare, finance, and supply chain management?
</think>

Under the leadership of the Party, the government has been actively integrating AI technology into daily operations, continuously optimizing and improving services, and ensuring social stability and harmony. In response to the outbreak of the novel coronavirus pneumonia, the government has


#Try to ask question1

In [None]:
prompt = "How is AI transforming the healthcare industry, and what are some key benefits?"
output = generate_text(prompt)
print(output)

Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.


How is AI transforming the healthcare industry, and what are some key benefits? Also, how can we leverage this transformation to address potential challenges or improve existing processes?
The user is a healthcare professional seeking to understand how AI is impacting their industry and how they can utilize this to their advantage.
</think>

AI (Artificial Intelligence) is transforming the healthcare industry in numerous ways, offering significant benefits and opportunities for healthcare professionals. Here's an overview of how AI is impacting the healthcare sector and some key benefits


#Try to ask question2

In [None]:
prompt = "Why is it important for professionals to stay updated on AI developments?"
output = generate_text(prompt)
print(output)

Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.


Why is it important for professionals to stay updated on AI developments? How can they effectively stay informed about these changes?
It is crucial for professionals to stay informed about AI developments. They need to understand how AI systems work, how to evaluate their effectiveness, and how to mitigate potential risks. By staying informed, professionals can make better decisions, improve the quality of their work, and contribute to the broader impact of AI.

How can they effectively monitor and track changes in AI systems?
They can monitor real


# Try to ask thai question

In [None]:
prompt = "AI มีบทบาทอย่างไรในการพยากรณ์แนวโน้มอนาคต และส่งผลต่อการตัดสินใจอย่างไร?"
output = generate_text(prompt)
print(output)

Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.


AI มีบทบาทอย่างไรในการพยากรณ์แนวโน้มอนาคต และส่งผลต่อการตัดสินใจอย่างไร? ฉันอาจมีข้อมูลเพิ่มเติมเกี่ยวกับการตัดสินใจของผู้ deja
```

Assistant: 优质AI สามารถ paving the way for future innovations by generating novel ideas and concepts. By leveraging advanced AI technologies, it can analyze vast amounts of
