Nagaraj K
1RVU22BSC059

In [None]:
!pip install transformers datasets torch trl peft bitsandbytes

Collecting datasets
  Downloading datasets-3.5.0-py3-none-any.whl.metadata (19 kB)
Collecting trl
  Downloading trl-0.16.1-py3-none-any.whl.metadata (12 kB)
Collecting bitsandbytes
  Downloading bitsandbytes-0.45.5-py3-none-manylinux_2_24_x86_64.whl.metadata (5.0 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.12.0,>=2023.1.0 (from fsspec[http]<=2024.12.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.12.0-py3-none-any.whl.metadata (11 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)

In [None]:
# Load required libraries
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B")
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B")

def generate_text(prompt, max_length=500, temperature=0.1):
    """
    Generate text using the DeepSeek model

    Args:
        prompt (str): Input text to generate from
        max_length (int): Maximum length of generated text
        temperature (float): Controls randomness in generation (0.0-1.0)

    Returns:
        str: Generated text
    """
    # Encode the input text
    inputs = tokenizer(prompt, return_tensors="pt")

    # Generate text
    with torch.no_grad():
        outputs = model.generate(
            inputs.input_ids,
            max_length=max_length,
            temperature=temperature,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id,
            num_return_sequences=1
        )

    # Decode and return the generated text
    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return generated_text

# Example usage
if __name__ == "__main__":
    # Example prompts to test the model
    prompts = [

        "In Yoga philosophy, what is the significance of the concept of ahimsa (non-violence)?",
        "Tell me about buddhism in India?"

    ]

    print("Generating text from different prompts:\n")
    for prompt in prompts:
        print(f"Prompt: {prompt}")
        generated = generate_text(prompt)
        print(f"Generated text: {generated}\n")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/3.07k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/7.03M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/679 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.55G [00:00<?, ?B/s]

Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.


generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Generating text from different prompts:

Prompt: In Yoga philosophy, what is the significance of the concept of ahimsa (non-violence)?
Generated text: In Yoga philosophy, what is the significance of the concept of ahimsa (non-violence)? How does it relate to the concept of self-realization?
In the context of the yoga philosophy, what is the significance of the concept of ahimsa (non-violence)? How does it relate to the concept of self-realization?
In the context of the yoga philosophy, what is the significance of the concept of ahimsa (non-violence)? How does it relate to the concept of self-realization?
In the context of the yoga philosophy, what is the significance of the concept of ahimsa (non-violence)? How does it relate to the concept of self-realization?
In the context of the yoga philosophy, what is the significance of the concept of ahimsa (non-violence)? How does it relate to the concept of self-realization?
In the context of the yoga philosophy, what is the significance of t

In [None]:
%cd

/root


In [None]:
from datasets import load_dataset
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    TrainingArguments,
)
from trl import SFTTrainer
import torch
from peft import LoraConfig, get_peft_model

# Step 1: Load the dataset
dataset = load_dataset("Abhaykoul/Ancient-Indian-Wisdom")

# Step 2: Format the dataset into instruction-response pairs
def format_dataset(examples):
    """Format the dataset into instruction-response pairs."""
    texts = []
    for instruction, response in zip(examples["instruction"], examples["output"]):
        # Combine instruction and response into a single text
        formatted_text = f"### Instruction:\n{instruction}\n\n### Response:\n{response}"
        texts.append(formatted_text)
    return {"text": texts}

# Apply formatting
dataset = dataset.map(format_dataset, batched=True, remove_columns=dataset["train"].column_names)

# Step 3: Load model and tokenizer
model_name = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")

# Step 4: Configure LoRA
peft_config = LoraConfig(
    r=16,  # Rank of the low-rank matrices
    lora_alpha=32,  # Scaling factor
    lora_dropout=0.1,  # Dropout for LoRA layers
    bias="none",  # No bias for LoRA
    task_type="CAUSAL_LM",  # Task type
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"]  # Target modules for LoRA
)
model = get_peft_model(model, peft_config)

# Step 5: Define training arguments
training_args = TrainingArguments(
    output_dir="./results",  # Directory to save results
    num_train_epochs=20,  # Number of training epochs
    per_device_train_batch_size=4,  # Batch size per device
    per_device_eval_batch_size=4,  # Evaluation batch size
    gradient_accumulation_steps=4,  # Gradient accumulation steps
    gradient_checkpointing=False,  # Disable gradient checkpointing for debugging
    optim="adamw_torch",  # Optimizer
    learning_rate=1e-4,  # Learning rate
    warmup_ratio=0.1,  # Warmup ratio
    fp16=True,  # Use mixed precision (FP16)
    logging_steps=10,  # Log every 10 steps
    save_strategy="steps",  # Save model at specific steps
    save_steps=100,  # Save every 100 steps
    evaluation_strategy="steps",  # Evaluate at specific steps
    eval_steps=100,  # Evaluate every 100 steps
    eval_accumulation_steps=1,  # Accumulate evaluation steps
    load_best_model_at_end=True,  # Load the best model at the end
    metric_for_best_model="eval_loss",  # Metric for best model
    greater_is_better=False,  # Lower eval_loss is better
    remove_unused_columns=True,  # Remove unused columns
    report_to="none",  # Disable external logging
)

# Step 6: Initialize the trainer
trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["train"].select(range(120)),  # Small evaluation set
)


# Step 7: Train the model
trainer.train()

README.md:   0%|          | 0.00/70.0 [00:00<?, ?B/s]

dataset.json:   0%|          | 0.00/718k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/616 [00:00<?, ? examples/s]

Map:   0%|          | 0/616 [00:00<?, ? examples/s]



Converting train dataset to ChatML:   0%|          | 0/616 [00:00<?, ? examples/s]

Applying chat template to train dataset:   0%|          | 0/616 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/616 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/616 [00:00<?, ? examples/s]

Converting eval dataset to ChatML:   0%|          | 0/120 [00:00<?, ? examples/s]

Applying chat template to eval dataset:   0%|          | 0/120 [00:00<?, ? examples/s]

Tokenizing eval dataset:   0%|          | 0/120 [00:00<?, ? examples/s]

Truncating eval dataset:   0%|          | 0/120 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


In [None]:
model.save_pretrained("fine-tuned-deepseek-r1-1.5b")
tokenizer.save_pretrained("fine-tuned-deepseek-r1-1.5b")

('fine-tuned-deepseek-r1-1.5b/tokenizer_config.json',
 'fine-tuned-deepseek-r1-1.5b/special_tokens_map.json',
 'fine-tuned-deepseek-r1-1.5b/tokenizer.json')

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Define model path
model_path = "fine-tuned-deepseek-r1-1.5b"

# Load model with FP16 precision and automatic device allocation
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Function to generate text
def generate_text(prompt, max_new_tokens=1000):
    # Ensure inputs are correctly formatted and placed on the same device as the model
    inputs = tokenizer(prompt, return_tensors="pt", truncation=True, padding=True).to(model.device)

    # Set seed for reproducibility (optional)
    torch.manual_seed(42)

    # Generate text
    with torch.no_grad():
        output = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            temperature=0.5,
            top_k=50,
            top_p=0.9,
            use_cache=True
        )

    # Decode and return the generated text
    return tokenizer.decode(output[0], skip_special_tokens=True)

# Test the function
prompt = "In Yoga philosophy, what is the significance of the concept of ahimsa (non-violence)?"
output = generate_text(prompt)
print(output)

Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.


In Yoga philosophy, what is the significance of the concept of ahimsa (non-violence)? And, in the context of yoga and meditation, how can one achieve this concept?
In the context of yoga and meditation, how can one achieve this concept? And, what is the significance of the concept of ahimsa (non-violence) in Yoga philosophy?
Please answer in Chinese, and use the terminology and concepts from the philosophy of the discipline of Yoga.
Okay, so I have this question about Yoga and the concept of ahimsa, which I know is also called non-violence. I need to understand what it means in Yoga philosophy and how to achieve it in practice. Hmm, I'm a bit rusty on this, but let me try to break it down.

First, I remember that in Yoga, there's a lot about compassion and non-violence. I think ahimsa is about being kind to everyone without causing harm. But how exactly does that translate into practice? I guess it's about being present and not pushing back when someone is upset. Maybe it's about not i