## Fine Tune on V4 Dataset

In [1]:
# installing unsloth

%%capture
!pip install unsloth
# Also get the latest nightly Unsloth!
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

# Install Flash Attention 2 for softcapping support
import torch
if torch.cuda.get_device_capability()[0] >= 8:
    !pip install --no-deps packaging ninja einops "flash-attn>=2.6.3"

In [2]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048
dtype = None
load_in_4bit = True

fourbit_models = [
    "unsloth/Meta-Llama-3.1-8B-bnb-4bit",
    "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
    "unsloth/Meta-Llama-3.1-70B-bnb-4bit",
    "unsloth/Meta-Llama-3.1-405B-bnb-4bit",
    "unsloth/Mistral-Nemo-Base-2407-bnb-4bit",
    "unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit",
    "unsloth/mistral-7b-v0.3-bnb-4bit",
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/Phi-3-mini-4k-instruct",
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/gemma-2-9b-bnb-4bit",
    "unsloth/gemma-2-27b-bnb-4bit",
    "unsloth/gemma-2-2b-bnb-4bit",
]

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/gemma-2-2b",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.1.7: Fast Gemma2 patching. Transformers: 4.47.1.
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu121. CUDA: 7.5. CUDA Toolkit: 12.1. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


## Adding LORA Adapter

In [3]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2025.1.7 patched 26 layers with 26 QKV layers, 26 O layers and 26 MLP layers.


## Dataset for Fine Tuning

In [4]:
from datasets import load_dataset

In [5]:
dataset = load_dataset("manojbaniya/ift-v4-roman-nepali", split="train")

In [6]:
dataset

Dataset({
    features: ['category', 'context', 'question', 'response', 'instruction', 'prompt'],
    num_rows: 17866
})

## Preparing data

In [7]:
EOS_TOKEN = tokenizer.eos_token

In [8]:
def formatting_prompt(examples):
  """Add EOS_TOKEN at the end of every data"""
  prompts = []

  for prompt in examples["prompt"]:
    new_prompt = prompt + EOS_TOKEN
    prompts.append(new_prompt)

  return {"new_prompts": prompts}

In [9]:
dataset = dataset.map(formatting_prompt, batched=True)

Map:   0%|          | 0/17866 [00:00<?, ? examples/s]

In [10]:
print(dataset[999]["new_prompts"])


        ### Instruction
        You are a helpful assistant. Answer the given user question.
        
        ### Question
        Dinner ma quick healthy recipes kasari prepare garne?
        
        ### Response
        Quick healthy dinner recipes ma grilled chicken ya paneer saag ko side dish ma rakhera serve garne. Stir-fry vegetables ani steamed rice ko combination fast ani healthy huncha. One-pot meals, jastai lentil soup ya pasta, le pani time bachaune. Fresh salad with olive oil dressing ani boiled eggs healthy option ho. Pre-prepped ingredients rakhne le dinner preparation efficient huncha.
        <eos>


In [11]:
print(dataset[100]["new_prompts"])


        Below is an instruction that describes a task paried with question and input that provides further context. Write a response that appropriately complete the request.
        
        ### Instruction
        Answer the question using the context if it is helpful to answer the question.
        
        ### Question
        Digital cameras ma wifi ko support cha?
        
        ### Input
        Product: Nikon D3500, Price: 42000, Features: 24.2MP, 1080p Video, Bluetooth & Wi-Fi
        
        ### Response
        Digital cameras ma Nikon D3500 wifi support sanga cha. Isko price 42000 ho, ra isma 24.2MP resolution, 1080p video recording, ra Bluetooth & Wi-Fi support cha.
        <eos>


In [12]:
print(dataset[1000]["new_prompts"])


        Below is an instruction that describes a task paried with question and input that provides further context. Write a response that appropriately complete the request.
        
        ### Instruction
        Answer the following question by looking at the context if context is given.
        
        ### Question
        Headphone SoundWave ko battery life kati ho?
        
        ### Input
        Product Name: Headphone SoundWave, Battery Life: 22 hours. Competitor Comparison: Similar products offer 20 hours of battery life. Usage Tip: Use power-saving mode to extend battery life. Warranty: 1 year. Delivery Time: 1-2 days in Kathmandu.
        
        ### Response
        Headphone SoundWave ko battery life 22 hours ho.
        <eos>


## Training the Model

In [13]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

In [14]:
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "new_prompts",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        # num_train_epochs = 1, # Set this for 1 full training run.
        max_steps = 150,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none", # Use this for WandB etc
    ),
)

Map (num_proc=2):   0%|          | 0/17866 [00:00<?, ? examples/s]

In [15]:
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = Tesla T4. Max memory = 14.748 GB.
2.697 GB of memory reserved.


In [16]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 17,866 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 150
 "-____-"     Number of trainable parameters = 20,766,720


Step,Training Loss
1,3.4948
2,3.3012
3,3.272
4,3.4139
5,3.0201
6,2.7942
7,2.6018
8,2.5367
9,2.2205
10,1.9904


In [17]:
trainer_stats

TrainOutput(global_step=150, training_loss=1.3839534477392832, metrics={'train_runtime': 330.8791, 'train_samples_per_second': 3.627, 'train_steps_per_second': 0.453, 'total_flos': 2030427660607488.0, 'train_loss': 1.3839534477392832, 'epoch': 0.06716668532407925})

## Inference

In [18]:
FastLanguageModel.for_inference(model)

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): Gemma2ForCausalLM(
      (model): Gemma2Model(
        (embed_tokens): Embedding(256000, 2304, padding_idx=0)
        (layers): ModuleList(
          (0-25): 26 x Gemma2DecoderLayer(
            (self_attn): Gemma2Attention(
              (q_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=2304, out_features=2048, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=2304, out_features=16, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=16, out_features=2048, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k_proj): lora

### Prompt Template

In [19]:
prompt_template_RAG = """
Below is an instruction that describes a task paried with question and input that provides further context. Write a response that appropriately complete the request.

### Instruction
Answer the following question by looking at the context if context is given.

### Question
{question}

### Input
{context}

### Response
"""

In [20]:
prompt_template_instruction = """
You are a helpful assistant. Follow the instruction with input and answer the user question.

### Instruction
{question}

### Response
"""

In [21]:
prompt_template_qa = """
You are a helpful AI assistant. Answer the user question:

### Instruction
{question}

### Response
"""

In [22]:
inputs = prompt_template_qa.format(
    question="Nepal ko capital city kaha ho?"
)
inputs = tokenizer([inputs], return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=64, use_cache=True)
response = tokenizer.batch_decode(outputs)

In [23]:
print(response[0])

<bos>
You are a helpful AI assistant. Answer the user question:

### Instruction
Nepal ko capital city kaha ho?
            
### Response
Nepal ko capital city Kathmandu ho.
            <eos>


## Generating Response

In [24]:
def generate_response(question, type="RAG", context=None):
  if type == "RAG":
    inputs = prompt_template_RAG.format(
        question=question,
        context=context
    )
  elif type == "instruction":
    inputs = prompt_template_instruction.format(
        question=question
    )
  elif type == "qa":
    inputs = prompt_template_qa.format(
        question=question
    )

  inputs = tokenizer([inputs], return_tensors="pt").to("cuda")
  outputs = model.generate(**inputs, max_new_tokens=64, use_cache=True)
  response = tokenizer.batch_decode(outputs)
  return response[0]

In [25]:
response = generate_response(
    question="Nepal ko president ko ho?",
    type="qa"
)
print(response)

<bos>
You are a helpful AI assistant. Answer the user question:

### Instruction
Nepal ko president ko ho?
            
### Response
Nepal ko president ko ho?
            
### Question
Nepal ko president ko ho?
            
### Response
Nepal ko president ko ho?
            
### Question
Nepal ko president ko ho?
            
### Response
Nepal ko president ko ho?
            
### Question
Nepal ko president ko


In [26]:
response = generate_response(
    question="China ko capital kaha ho?",
    type="qa"
)
print(response)

<bos>
You are a helpful AI assistant. Answer the user question:

### Instruction
China ko capital kaha ho?
            
### Response
China ko capital Beijing ho.
            <eos>


In [27]:
response = generate_response(
    question="China ko population kati xa?",
    type="qa"
)
print(response)

<bos>
You are a helpful AI assistant. Answer the user question:

### Instruction
China ko population kati xa?
            
### Response
China ko population 142 crore ho.
            <eos>


In [28]:
response = generate_response(
    question="coding kasari sikne",
    type="qa"
)
print(response)

<bos>
You are a helpful AI assistant. Answer the user question:

### Instruction
coding kasari sikne
            
### Response
Coding le problem solving ra creativity ko lagi practice garna madad garcha. Coding le problem solving ra logical thinking ko lagi practice garna madad garcha. Coding le problem solving ra logical thinking ko lagi practice garna madad garcha.
            <eos>


## RAG for Ecommerce Test

In [29]:
response = generate_response(
    question="Store ko location kaha xa",
    type="RAG",
    context="Hamro store ko name All Electronics store ho, hamro store Dharan maa xa."
)
print(response)

<bos>
Below is an instruction that describes a task paried with question and input that provides further context. Write a response that appropriately complete the request.
        
### Instruction
Answer the following question by looking at the context if context is given.
        
### Question
Store ko location kaha xa
        
### Input
Hamro store ko name All Electronics store ho, hamro store Dharan maa xa.
        
### Response
Hamro store Dharan maa xa.
        
<eos>


In [30]:
response = generate_response(
    question="store ko name ke ho?",
    type="RAG",
    context="Hamro store ko name All Electronics store ho ra hamro store Dharan maa xa. Hamro store ko contact number 9800000000 ho"
)
print(response)

<bos>
Below is an instruction that describes a task paried with question and input that provides further context. Write a response that appropriately complete the request.
        
### Instruction
Answer the following question by looking at the context if context is given.
        
### Question
store ko name ke ho?
        
### Input
Hamro store ko name All Electronics store ho ra hamro store Dharan maa xa. Hamro store ko contact number 9800000000 ho
        
### Response
Hamro store ko name All Electronics store ho.
        <eos>


In [31]:
response = generate_response(
    question="store ko contact kati ho?",
    type="RAG",
    context="Hamro store ko name All Electronics store ho ra hamro store Dharan maa xa. Hamro store ko contact number 9800000000 ho"
)
print(response)

<bos>
Below is an instruction that describes a task paried with question and input that provides further context. Write a response that appropriately complete the request.
        
### Instruction
Answer the following question by looking at the context if context is given.
        
### Question
store ko contact kati ho?
        
### Input
Hamro store ko name All Electronics store ho ra hamro store Dharan maa xa. Hamro store ko contact number 9800000000 ho
        
### Response
Hamro store ko contact number 9800000000 ho.
        <eos>


In [32]:
response = generate_response(
    question="store ma k k available xa?",
    type="RAG",
    context="Hamro store ma electronics ko sabai saman xa, mobile, laptop, calculator, watch, camera haru pani xa."
)
print(response)

<bos>
Below is an instruction that describes a task paried with question and input that provides further context. Write a response that appropriately complete the request.
        
### Instruction
Answer the following question by looking at the context if context is given.
        
### Question
store ma k k available xa?
        
### Input
Hamro store ma electronics ko sabai saman xa, mobile, laptop, calculator, watch, camera haru pani xa.
        
### Response
Hamro store ma mobile, laptop, calculator, watch, ra camera available xa.
        
<eos>


In [33]:
response = generate_response(
    question="store ma k k electronics saman xa?",
    type="RAG",
    context="Hamro store ma electronics ko sabai saman xa, mobile, laptop, calculator, watch, camera haru pani xa."
)
print(response)

<bos>
Below is an instruction that describes a task paried with question and input that provides further context. Write a response that appropriately complete the request.
        
### Instruction
Answer the following question by looking at the context if context is given.
        
### Question
store ma k k electronics saman xa?
        
### Input
Hamro store ma electronics ko sabai saman xa, mobile, laptop, calculator, watch, camera haru pani xa.
        
### Response
Hamro store ma mobile, laptop, calculator, watch, ra camera xa.
        
<eos>


In [34]:
response = generate_response(
    question="k baata pay garna milxa?",
    type="RAG",
    context="esewa, khalti ra mobile bank baata pay garna milxa"
)
print(response)

<bos>
Below is an instruction that describes a task paried with question and input that provides further context. Write a response that appropriately complete the request.
        
### Instruction
Answer the following question by looking at the context if context is given.
        
### Question
k baata pay garna milxa?
        
### Input
esewa, khalti ra mobile bank baata pay garna milxa
        
### Response
esewa, khalti ra mobile bank bata pay garna milxa.
        
<eos>


In [35]:
response = generate_response(
    question="store ma k k painxa?",
    type="RAG",
    context="Hamro store ma musical instruments painxa jastei keyboard, guitars, haru"
)
print(response)

<bos>
Below is an instruction that describes a task paried with question and input that provides further context. Write a response that appropriately complete the request.
        
### Instruction
Answer the following question by looking at the context if context is given.
        
### Question
store ma k k painxa?
        
### Input
Hamro store ma musical instruments painxa jastei keyboard, guitars, haru
        
### Response
Hamro store ma musical instruments painxa jastei keyboard, guitars, haru.
        
<eos>


In [36]:
response = generate_response(
    question="macbook ko barema vana",
    type="RAG",
    context="Product Details: name: MacBook Air M1, price: 150000, stock: False, description: 13.3-inch Retina display, Apple M1 chip, 8GB RAM, 256GB SSD."
)
print(response)

<bos>
Below is an instruction that describes a task paried with question and input that provides further context. Write a response that appropriately complete the request.
        
### Instruction
Answer the following question by looking at the context if context is given.
        
### Question
macbook ko barema vana
        
### Input
Product Details: name: MacBook Air M1, price: 150000, stock: False, description: 13.3-inch Retina display, Apple M1 chip, 8GB RAM, 256GB SSD.
        
### Response
MacBook Air M1 ko barema 13.3-inch Retina display, Apple M1 chip, 8GB RAM, ra 256GB SSD cha.
<eos>


In [37]:
response = generate_response(
    question="macbook ko price kati ho",
    type="RAG",
    context="Product Details: name: MacBook Air M1, price: 150000, stock: False, description: 13.3-inch Retina display, Apple M1 chip, 8GB RAM, 256GB SSD."
)
print(response)

<bos>
Below is an instruction that describes a task paried with question and input that provides further context. Write a response that appropriately complete the request.
        
### Instruction
Answer the following question by looking at the context if context is given.
        
### Question
macbook ko price kati ho
        
### Input
Product Details: name: MacBook Air M1, price: 150000, stock: False, description: 13.3-inch Retina display, Apple M1 chip, 8GB RAM, 256GB SSD.
        
### Response
MacBook ko price 150000 ho.
        <eos>


In [38]:
response = generate_response(
    question="Redmi ko price kati Rs.ho",
    type="RAG",
    context="Product Details: name: MacBook Air M1, price: 150000, stock: False, description: 13.3-inch Retina display, Apple M1 chip, 8GB RAM, 256GB SSD. name: Redmi Note 9 pro, price: 10000, stock: True, description: Xiaomi mobile with 128 GB storage and 90 GB RAM"
)
print(response)

<bos>
Below is an instruction that describes a task paried with question and input that provides further context. Write a response that appropriately complete the request.
        
### Instruction
Answer the following question by looking at the context if context is given.
        
### Question
Redmi ko price kati Rs.ho
        
### Input
Product Details: name: MacBook Air M1, price: 150000, stock: False, description: 13.3-inch Retina display, Apple M1 chip, 8GB RAM, 256GB SSD. name: Redmi Note 9 pro, price: 10000, stock: True, description: Xiaomi mobile with 128 GB storage and 90 GB RAM
        
### Response
Redmi Note 9 pro ko price Rs. 10000 ho.
<eos>


In [39]:
response = generate_response(
    question="Macbook Air ko RAM kati xa?",
    type="RAG",
    context="Product Details: name: MacBook Air M1, price: 150000, stock: False, description: 13.3-inch Retina display, Apple M1 chip, 16GB RAM, 256GB SSD. name: Redmi Note 9 pro, price: 10000, stock: True, description: Xiaomi mobile with 128 GB storage and 12 GB RAM"
)
print(response)

<bos>
Below is an instruction that describes a task paried with question and input that provides further context. Write a response that appropriately complete the request.
        
### Instruction
Answer the following question by looking at the context if context is given.
        
### Question
Macbook Air ko RAM kati xa?
        
### Input
Product Details: name: MacBook Air M1, price: 150000, stock: False, description: 13.3-inch Retina display, Apple M1 chip, 16GB RAM, 256GB SSD. name: Redmi Note 9 pro, price: 10000, stock: True, description: Xiaomi mobile with 128 GB storage and 12 GB RAM
        
### Response
MacBook Air M1 ko RAM 16 GB xa.
<eos>


In [40]:
response = generate_response(
    question="Redmi Note 9 pro ko RAM kati xa?",
    type="RAG",
    context="Product Details: name: MacBook Air M1, price: 150000, stock: False, description: 13.3-inch Retina display, Apple M1 chip, 16GB RAM, 256GB SSD. name: Redmi Note 9 pro, price: 10000, stock: True, description: Xiaomi mobile with 128 GB storage and RAM: 12 GB"
)
print(response)

<bos>
Below is an instruction that describes a task paried with question and input that provides further context. Write a response that appropriately complete the request.
        
### Instruction
Answer the following question by looking at the context if context is given.
        
### Question
Redmi Note 9 pro ko RAM kati xa?
        
### Input
Product Details: name: MacBook Air M1, price: 150000, stock: False, description: 13.3-inch Retina display, Apple M1 chip, 16GB RAM, 256GB SSD. name: Redmi Note 9 pro, price: 10000, stock: True, description: Xiaomi mobile with 128 GB storage and RAM: 12 GB
        
### Response
Redmi Note 9 pro ko RAM 12 GB xa.
<eos>


In [41]:
response = generate_response(
    question="Redmi Note 9 pro ko storage kati xa?",
    type="RAG",
    context="Product Details: name: MacBook Air M1, price: 150000, stock: False, description: 13.3-inch Retina display, Apple M1 chip, 16GB RAM, 256GB SSD. name: Redmi Note 9 pro, price: 10000, stock: True, description: Xiaomi mobile with 128 GB storage and RAM: 12 GB"
)
print(response)

<bos>
Below is an instruction that describes a task paried with question and input that provides further context. Write a response that appropriately complete the request.
        
### Instruction
Answer the following question by looking at the context if context is given.
        
### Question
Redmi Note 9 pro ko storage kati xa?
        
### Input
Product Details: name: MacBook Air M1, price: 150000, stock: False, description: 13.3-inch Retina display, Apple M1 chip, 16GB RAM, 256GB SSD. name: Redmi Note 9 pro, price: 10000, stock: True, description: Xiaomi mobile with 128 GB storage and RAM: 12 GB
        
### Response
Redmi Note 9 pro ko storage 128 GB xa.
<eos>


In [42]:
response = generate_response(
    question="delivery cost kati ho?",
    type="RAG",
    context="Hamro ma delivery all over Nepal hunxa. Inside Kathmandu free delivery hunxa ra outside Kathmandu Rs. 130 delivery charge laagxa. Delivery 3 din vitra hunxa."
)
print(response)

<bos>
Below is an instruction that describes a task paried with question and input that provides further context. Write a response that appropriately complete the request.
        
### Instruction
Answer the following question by looking at the context if context is given.
        
### Question
delivery cost kati ho?
        
### Input
Hamro ma delivery all over Nepal hunxa. Inside Kathmandu free delivery hunxa ra outside Kathmandu Rs. 130 delivery charge laagxa. Delivery 3 din vitra hunxa.
        
### Response
Inside Kathmandu free delivery hunxa ra outside Kathmandu Rs. 130 delivery charge laagxa.
        
<eos>


In [43]:
response = generate_response(
    question="kati din ma delivery hunxa",
    type="RAG",
    context="Hamro ma delivery all over Nepal hunxa. Inside Dharan free delivery hunxa ra outside Dharan Rs. 130 delivery charge laagxa. Delivery 3 din vitra hunxa."
)
print(response)

<bos>
Below is an instruction that describes a task paried with question and input that provides further context. Write a response that appropriately complete the request.
        
### Instruction
Answer the following question by looking at the context if context is given.
        
### Question
kati din ma delivery hunxa
        
### Input
Hamro ma delivery all over Nepal hunxa. Inside Dharan free delivery hunxa ra outside Dharan Rs. 130 delivery charge laagxa. Delivery 3 din vitra hunxa.
        
### Response
Delivery all over Nepal hunxa. Inside Dharan free delivery hunxa ra outside Dharan Rs. 130 delivery charge laagxa.
        
<eos>


In [44]:
response = generate_response(
    question="timi ko hau",
    type="RAG",
    context="You are an AI assistant for an e-commerce store and will explain your purpose."
)
print(response)

<bos>
Below is an instruction that describes a task paried with question and input that provides further context. Write a response that appropriately complete the request.
        
### Instruction
Answer the following question by looking at the context if context is given.
        
### Question
timi ko hau
        
### Input
You are an AI assistant for an e-commerce store and will explain your purpose.
        
### Response
Ma ek e-commerce assistant ho.
        <eos>


In [45]:
response = generate_response(
    question="Tell a short story in Nepali",
    type="instruction",
)
print(response)

<bos>
You are a helpful assistant. Follow the instruction with input and answer the user question.
            
### Instruction
Tell a short story in Nepali
            
### Response
Aaja mero ghar ma ek manushya chha. Yo manushya ko naam 'Ram' ho. Ram le ek damro ko ghar ma ghar garyo. Yo ghar ma ek manushya chha. Yo manushya ko naam 'Shyam' ho. Shyam le


## save model

In [49]:
if True: model.push_to_hub_merged("manojbaniya/best_v4", tokenizer, save_method = "lora", token = "")

Unsloth: Saving LoRA adapters. Please wait...


  0%|          | 0/1 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/83.1M [00:00<?, ?B/s]

README.md:   0%|          | 0.00/5.18k [00:00<?, ?B/s]

  0%|          | 0/2 [00:00<?, ?it/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/34.4M [00:00<?, ?B/s]

Saved lora model to https://huggingface.co/manojbaniya/best_v4
