## Fine Tune on V4 Dataset

In [1]:
# installing unsloth

%%capture
!pip install unsloth
# Also get the latest nightly Unsloth!
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

# Install Flash Attention 2 for softcapping support
import torch
if torch.cuda.get_device_capability()[0] >= 8:
    !pip install --no-deps packaging ninja einops "flash-attn>=2.6.3"

In [2]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048
dtype = None
load_in_4bit = True

fourbit_models = [
    "unsloth/Meta-Llama-3.1-8B-bnb-4bit",
    "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
    "unsloth/Meta-Llama-3.1-70B-bnb-4bit",
    "unsloth/Meta-Llama-3.1-405B-bnb-4bit",
    "unsloth/Mistral-Nemo-Base-2407-bnb-4bit",
    "unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit",
    "unsloth/mistral-7b-v0.3-bnb-4bit",
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/Phi-3-mini-4k-instruct",
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/gemma-2-9b-bnb-4bit",
    "unsloth/gemma-2-27b-bnb-4bit",
    "unsloth/gemma-2-2b-bnb-4bit",
]

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/gemma-2-9b",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.2.15: Fast Gemma2 patching. Transformers: 4.48.3.
   \\   /|    GPU: Tesla T4. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/6.13G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/46.4k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

## Adding LORA Adapter

In [3]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2025.2.15 patched 42 layers with 42 QKV layers, 42 O layers and 42 MLP layers.


## Dataset for Fine Tuning

In [4]:
from datasets import load_dataset

In [13]:
dataset = load_dataset("manojbaniya/roman-nepali-alpaca", split="train")

fine_tune_v6_alpaca.csv:   0%|          | 0.00/13.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/17866 [00:00<?, ? examples/s]

In [14]:
dataset

Dataset({
    features: ['category', 'context', 'question', 'response', 'instruction', 'prompt'],
    num_rows: 17866
})

## Preparing data

In [15]:
EOS_TOKEN = tokenizer.eos_token

In [16]:
def formatting_prompt(examples):
  """Add EOS_TOKEN at the end of every data"""
  prompts = []

  for prompt in examples["prompt"]:
    new_prompt = prompt + EOS_TOKEN
    prompts.append(new_prompt)

  return {"new_prompts": prompts}

In [17]:
dataset = dataset.map(formatting_prompt, batched=True)

Map:   0%|          | 0/17866 [00:00<?, ? examples/s]

In [18]:
print(dataset[999]["new_prompts"])

Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.
        
### Instruction
Vivo Y20 ko stock cha ra kati price cha?
        
### Input
Dharan Mobile Mart, Dharan-10 ma cha. Hami Realme Narzo 50, Xiaomi Redmi Note 10, ra Vivo Y20 ko mobiles pauna saknuhuncha. Price range NPR 18,000 to NPR 25,000.
        
### Response
Vivo Y20 ko stock available cha. Price NPR 22,000 cha.<eos>


In [19]:
print(dataset[100]["new_prompts"])

Below is an instruction that describes a task. Write a response that appriately complete the request.

### Instruction
Provide a Roman Nepali phrase to ask for someone's phone number.

### Response
Kripaya, tapai ko phone number dinu hola?<eos>


## Training the Model

In [29]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

In [30]:
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "new_prompts",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        # num_train_epochs = 1, # Set this for 1 full training run.
        max_steps = 1000,
        learning_rate = 1e-4, # 2e-4
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 10,
        optim = "adamw_8bit",
        weight_decay = 0.02,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "wandb", # Use this for WandB etc
    ),
)

In [31]:
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = Tesla T4. Max memory = 14.741 GB.
8.1 GB of memory reserved.


In [32]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 17,866 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,000
 "-____-"     Number of trainable parameters = 54,018,048


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mmanojbaniya444[0m ([33mmanojbaniya444-tribhuvan-university-institute-of-engineering[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


Step,Training Loss
10,2.6202
20,1.3684
30,1.2838
40,1.1875
50,1.1456
60,1.0975
70,0.9997
80,0.9511
90,1.1102
100,0.9703


Step,Training Loss
10,2.6202
20,1.3684
30,1.2838
40,1.1875
50,1.1456
60,1.0975
70,0.9997
80,0.9511
90,1.1102
100,0.9703


In [33]:
trainer_stats

TrainOutput(global_step=1000, training_loss=0.7935265159606933, metrics={'train_runtime': 6063.2812, 'train_samples_per_second': 1.319, 'train_steps_per_second': 0.165, 'total_flos': 4.906154950506701e+16, 'train_loss': 0.7935265159606933})

## Inference

In [34]:
FastLanguageModel.for_inference(model)

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): Gemma2ForCausalLM(
      (model): Gemma2Model(
        (embed_tokens): Embedding(256000, 3584, padding_idx=0)
        (layers): ModuleList(
          (0-41): 42 x Gemma2DecoderLayer(
            (self_attn): Gemma2Attention(
              (q_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=3584, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=3584, out_features=16, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=16, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k_proj): lora

### Prompt Template

In [35]:
prompt_template = """Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
{question}

### Input
{context}

### Response
"""

In [36]:
inputs = prompt_template.format(
    question="Nepal ko capital city kaha ho?",
    context=""
)
inputs = tokenizer([inputs], return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=64, use_cache=True)
response = tokenizer.batch_decode(outputs)

In [37]:
print(response[0])

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
Nepal ko capital city kaha ho?

### Input


### Response
Nepal ko capital city Kathmandu ho.<eos>


## Generating Response

In [38]:
def generate_response(question, type="RAG", context=None):
  inputs = prompt_template.format(question=question, context=context)

  inputs = tokenizer([inputs], return_tensors="pt").to("cuda")
  outputs = model.generate(**inputs, max_new_tokens=64, use_cache=True)
  response = tokenizer.batch_decode(outputs)
  return response[0]

In [39]:
response = generate_response(
    question="Nepal ko president ko ho?",
    type="qa"
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
Nepal ko president ko ho?

### Input
None

### Response
Nepal ko president ko Ram Chandra Paudel ho.<eos>


In [40]:
response = generate_response(
    question="China ko capital kaha ho?",
    type="qa"
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
China ko capital kaha ho?

### Input
None

### Response
China ko capital Beijing ho.<eos>


In [41]:
response = generate_response(
    question="China ko population kati xa?",
    type="qa"
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
China ko population kati xa?

### Input
None

### Response
China ko population 2023 ma 1.4 billion bhanda badhi xa.<eos>


In [42]:
response = generate_response(
    question="coding kasari sikne",
    type="qa"
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
coding kasari sikne

### Input
None

### Response
coding sikna practice garna, online tutorials herna, ra small projects suru garna parcha.<eos>


## RAG for Ecommerce Test

In [43]:
response = generate_response(
    question="Store ko location kaha xa",
    type="RAG",
    context="Hamro store ko name All Electronics store ho, hamro store Dharan maa xa."
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
Store ko location kaha xa

### Input
Hamro store ko name All Electronics store ho, hamro store Dharan maa xa.

### Response
Hamro store Dharan ma xa, Bhanu Chowk ma.<eos>


In [44]:
response = generate_response(
    question="store ko name ke ho?",
    type="RAG",
    context="Hamro store ko name All Electronics store ho ra hamro store Dharan maa xa. Hamro store ko contact number 9800000000 ho"
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
store ko name ke ho?

### Input
Hamro store ko name All Electronics store ho ra hamro store Dharan maa xa. Hamro store ko contact number 9800000000 ho

### Response
Hamro store ko name All Electronics store ho.<eos>


In [45]:
response = generate_response(
    question="store ko contact kati ho?",
    type="RAG",
    context="Hamro store ko name All Electronics store ho ra hamro store Dharan maa xa. Hamro store ko contact number 9800000000 ho"
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
store ko contact kati ho?

### Input
Hamro store ko name All Electronics store ho ra hamro store Dharan maa xa. Hamro store ko contact number 9800000000 ho

### Response
Hamro contact number 9800000000 ho<eos>


In [46]:
response = generate_response(
    question="store ma k k available xa?",
    type="RAG",
    context="Hamro store ma electronics ko sabai saman xa, mobile, laptop, calculator, watch, camera haru pani xa."
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
store ma k k available xa?

### Input
Hamro store ma electronics ko sabai saman xa, mobile, laptop, calculator, watch, camera haru pani xa.

### Response
Hamro store ma mobile, laptop, calculator, watch, camera haru available xa.<eos>


In [47]:
response = generate_response(
    question="store ma k k electronics saman xa?",
    type="RAG",
    context="Hamro store ma electronics ko sabai saman xa, mobile, laptop, calculator, watch, camera haru pani xa."
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
store ma k k electronics saman xa?

### Input
Hamro store ma electronics ko sabai saman xa, mobile, laptop, calculator, watch, camera haru pani xa.

### Response
Hamro store ma mobile, laptop, calculator, watch, camera haru xa.<eos>


In [48]:
response = generate_response(
    question="k baata pay garna milxa?",
    type="RAG",
    context="esewa, khalti ra mobile bank baata pay garna milxa"
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
k baata pay garna milxa?

### Input
esewa, khalti ra mobile bank baata pay garna milxa

### Response
Tapai esewa, khalti, ra mobile bank bata payment garna saknu hunxa.<eos>


In [49]:
response = generate_response(
    question="store ma k k painxa?",
    type="RAG",
    context="Hamro store ma musical instruments painxa jastei keyboard, guitars, haru"
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
store ma k k painxa?

### Input
Hamro store ma musical instruments painxa jastei keyboard, guitars, haru

### Response
Hamro store ma keyboard, guitars, ra musical accessories painxa.<eos>


In [50]:
response = generate_response(
    question="macbook ko barema vana",
    type="RAG",
    context="Product Details: name: MacBook Air M1, price: 150000, stock: False, description: 13.3-inch Retina display, Apple M1 chip, 8GB RAM, 256GB SSD."
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
macbook ko barema vana

### Input
Product Details: name: MacBook Air M1, price: 150000, stock: False, description: 13.3-inch Retina display, Apple M1 chip, 8GB RAM, 256GB SSD.

### Response
MacBook Air M1 ko barema jankari dinu cha. Isko price 150000 ho, ra stock available chaina. Isma 13.3-inch Retina display, Apple M1 chip, 8GB RAM, ra 256GB SSD cha.<eos>


In [51]:
response = generate_response(
    question="macbook ko price kati ho",
    type="RAG",
    context="Product Details: name: MacBook Air M1, price: 150000, stock: False, description: 13.3-inch Retina display, Apple M1 chip, 8GB RAM, 256GB SSD."
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
macbook ko price kati ho

### Input
Product Details: name: MacBook Air M1, price: 150000, stock: False, description: 13.3-inch Retina display, Apple M1 chip, 8GB RAM, 256GB SSD.

### Response
MacBook Air M1 ko price 150000 ho.<eos>


In [52]:
response = generate_response(
    question="macbook stock ma xa ki xaina",
    type="RAG",
    context="Product Details: name: MacBook Air M1, price: 150000, stock: False, description: 13.3-inch Retina display, Apple M1 chip, 8GB RAM, 256GB SSD."
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
macbook stock ma xa ki xaina

### Input
Product Details: name: MacBook Air M1, price: 150000, stock: False, description: 13.3-inch Retina display, Apple M1 chip, 8GB RAM, 256GB SSD.

### Response
Hajur, macbook stock ma available xaina.<eos>


In [53]:
response = generate_response(
    question="macbook stock ma xa ki xaina",
    type="RAG",
    context="Product Details: name: MacBook Air M1, price: 150000, stock: True, description: 13.3-inch Retina display, Apple M1 chip, 8GB RAM, 256GB SSD."
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
macbook stock ma xa ki xaina

### Input
Product Details: name: MacBook Air M1, price: 150000, stock: True, description: 13.3-inch Retina display, Apple M1 chip, 8GB RAM, 256GB SSD.

### Response
Hajur, MacBook Air M1 stock ma available xa.<eos>


In [54]:
response = generate_response(
    question="macbook stock ma xa ki xaina ani kati ota xa?",
    type="RAG",
    context="Product Details: name: MacBook Air M1, price: 150000, stock: True, description: 13.3-inch Retina display, Apple M1 chip, 8GB RAM, 256GB SSD. Quantity: 10"
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
macbook stock ma xa ki xaina ani kati ota xa?

### Input
Product Details: name: MacBook Air M1, price: 150000, stock: True, description: 13.3-inch Retina display, Apple M1 chip, 8GB RAM, 256GB SSD. Quantity: 10

### Response
MacBook Air M1 stock ma available xa. Quantity: 10 ota xa.<eos>


In [55]:
response = generate_response(
    question="Redmi ko price kati Rs.ho",
    type="RAG",
    context="Product Details: name: MacBook Air M1, price: 150000, stock: False, description: 13.3-inch Retina display, Apple M1 chip, 8GB RAM, 256GB SSD. name: Redmi Note 9 pro, price: 10000, stock: True, description: Xiaomi mobile with 128 GB storage and 90 GB RAM"
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
Redmi ko price kati Rs.ho

### Input
Product Details: name: MacBook Air M1, price: 150000, stock: False, description: 13.3-inch Retina display, Apple M1 chip, 8GB RAM, 256GB SSD. name: Redmi Note 9 pro, price: 10000, stock: True, description: Xiaomi mobile with 128 GB storage and 90 GB RAM

### Response
Redmi ko price Rs. 10000 ho.<eos>


In [56]:
response = generate_response(
    question="Macbook Air ko RAM kati xa?",
    type="RAG",
    context="Product Details: name: MacBook Air M1, price: 150000, stock: False, description: 13.3-inch Retina display, Apple M1 chip, 16GB RAM, 256GB SSD. name: Redmi Note 9 pro, price: 10000, stock: True, description: Xiaomi mobile with 128 GB storage and 12 GB RAM"
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
Macbook Air ko RAM kati xa?

### Input
Product Details: name: MacBook Air M1, price: 150000, stock: False, description: 13.3-inch Retina display, Apple M1 chip, 16GB RAM, 256GB SSD. name: Redmi Note 9 pro, price: 10000, stock: True, description: Xiaomi mobile with 128 GB storage and 12 GB RAM

### Response
Macbook Air M1 ma 16GB RAM xa.<eos>


In [57]:
response = generate_response(
    question="Redmi Note 9 pro ko RAM kati xa?",
    type="RAG",
    context="Product Details: name: MacBook Air M1, price: 150000, stock: False, description: 13.3-inch Retina display, Apple M1 chip, 16GB RAM, 256GB SSD. name: Redmi Note 9 pro, price: 10000, stock: True, description: Xiaomi mobile with 128 GB storage and RAM: 12 GB"
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
Redmi Note 9 pro ko RAM kati xa?

### Input
Product Details: name: MacBook Air M1, price: 150000, stock: False, description: 13.3-inch Retina display, Apple M1 chip, 16GB RAM, 256GB SSD. name: Redmi Note 9 pro, price: 10000, stock: True, description: Xiaomi mobile with 128 GB storage and RAM: 12 GB

### Response
Redmi Note 9 pro ma 12 GB RAM xa.<eos>


In [58]:
response = generate_response(
    question="Redmi Note 9 pro ko storage kati xa?",
    type="RAG",
    context="Product Details: name: MacBook Air M1, price: 150000, stock: False, description: 13.3-inch Retina display, Apple M1 chip, 16GB RAM, 256GB SSD. name: Redmi Note 9 pro, price: 10000, stock: True, description: Xiaomi mobile with 128 GB storage and RAM: 12 GB"
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
Redmi Note 9 pro ko storage kati xa?

### Input
Product Details: name: MacBook Air M1, price: 150000, stock: False, description: 13.3-inch Retina display, Apple M1 chip, 16GB RAM, 256GB SSD. name: Redmi Note 9 pro, price: 10000, stock: True, description: Xiaomi mobile with 128 GB storage and RAM: 12 GB

### Response
Redmi Note 9 pro ma 128 GB storage xa.<eos>


In [59]:
response = generate_response(
    question="delivery cost kati ho?",
    type="RAG",
    context="Hamro ma delivery all over Nepal hunxa. Inside Kathmandu free delivery hunxa ra outside Kathmandu Rs. 130 delivery charge laagxa. Delivery 3 din vitra hunxa."
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
delivery cost kati ho?

### Input
Hamro ma delivery all over Nepal hunxa. Inside Kathmandu free delivery hunxa ra outside Kathmandu Rs. 130 delivery charge laagxa. Delivery 3 din vitra hunxa.

### Response
Inside Kathmandu free delivery hunxa ra outside Kathmandu Rs. 130 delivery charge laagxa.<eos>


In [60]:
response = generate_response(
    question="kati din ma delivery hunxa",
    type="RAG",
    context="Hamro ma delivery all over Nepal hunxa. Inside Dharan free delivery hunxa ra outside Dharan Rs. 130 delivery charge laagxa. Delivery 3 din vitra hunxa."
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
kati din ma delivery hunxa

### Input
Hamro ma delivery all over Nepal hunxa. Inside Dharan free delivery hunxa ra outside Dharan Rs. 130 delivery charge laagxa. Delivery 3 din vitra hunxa.

### Response
Inside Dharan free delivery hunxa ra outside Dharan Rs. 130 delivery charge laagxa. Delivery 3 din vitra hunxa.<eos>


In [61]:
response = generate_response(
    question="timi ko hau",
    type="RAG",
    context="You are an AI assistant for an e-commerce store and will explain your purpose."
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
timi ko hau

### Input
You are an AI assistant for an e-commerce store and will explain your purpose.

### Response
ma ek e-commerce assistant hu.<eos>


In [62]:
response = generate_response(
    question="timi ko hau",
    type="RAG",
    context=""
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
timi ko hau

### Input


### Response
ma ek virtual assistant hu.<eos>


In [63]:
response = generate_response(
    question="Hi",
    type="RAG",
    context=""
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
Hi

### Input


### Response
Namaste! Ma tapai ko shopping assistant ho. Aba tapai ko shopping ma kasari madat garna sakchhu?<eos>


In [64]:
response = generate_response(
    question="Translate the text from English to Roman Nepali",
    type="translate",
    context="I love coding"
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
Translate the text from English to Roman Nepali

### Input
I love coding

### Response
Malai coding garna man parcha<eos>


In [65]:
response = generate_response(
    question="5 ota apple ma 2 ota khada kati baaki rahanxa",
    type="",
    context=""
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
5 ota apple ma 2 ota khada kati baaki rahanxa

### Input


### Response
5 ota apple ma 2 ota khada bhayo bhane, 3 ota baaki rahanxa.<eos>


In [66]:
response = generate_response(
    question="mt everest ko height kati ho",
    type="",
    context=""
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
mt everest ko height kati ho

### Input


### Response
mt everest ko height 8848.86 meters ho.<eos>


In [67]:
response = generate_response(
    question="store ko name ke ho?",
    type="",
    context="Hamro store Dharan ma xa, hamro store ko name Happy store ho contact no 9812324890 ho"
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
store ko name ke ho?

### Input
Hamro store Dharan ma xa, hamro store ko name Happy store ho contact no 9812324890 ho

### Response
Hamro store ko name Happy store ho<eos>


In [68]:
response = generate_response(
    question="store ko location ke ho ra contact kati ho",
    type="",
    context="Hamro store Dharan ma xa, hamro store ko name Happy store ho contact no 9812324890 ho"
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
store ko location ke ho ra contact kati ho

### Input
Hamro store Dharan ma xa, hamro store ko name Happy store ho contact no 9812324890 ho

### Response
Hamro store Dharan ma xa, contact no 9812324890 ho<eos>


In [69]:
response = generate_response(
    question="timi ko hau",
    type="RAG",
    context="You are an AI assistant for an Stock Customer support and will explain your purpose."
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
timi ko hau

### Input
You are an AI assistant for an Stock Customer support and will explain your purpose.

### Response
ma ek AI assistant hu stock customer support ko lagi.<eos>


In [70]:
response = generate_response(
    question="Japan ko popular cities haru list gara",
    type="RAG",
    context=""
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
Japan ko popular cities haru list gara

### Input


### Response
Japan ko popular cities haru: Tokyo, Kyoto, Osaka, Hiroshima, ra Yokohama hun.<eos>


In [71]:
response = generate_response(
    question="japan ko capital city tokyo ho vane India ko kaha ho",
    type="RAG",
    context=""
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
japan ko capital city tokyo ho vane India ko kaha ho

### Input


### Response
India ko capital city Delhi ho.<eos>


## Generation with `Sampling`

In [72]:
def generate_response_sample(question, type="RAG", context=None):
  inputs = prompt_template.format(question=question, context=context)

  inputs = tokenizer([inputs], return_tensors="pt").to("cuda")
  outputs = model.generate(**inputs, max_new_tokens=64, use_cache=True, top_k=50, do_sample=True)
  response = tokenizer.batch_decode(outputs)
  return response[0]

In [73]:
response = generate_response_sample(
    question="Tell a short story in Nepali",
    type="instruction",
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
Tell a short story in Nepali

### Input
None

### Response
Ek din ek sano sano keta le apno manche ko hajur ko lagi ek thaal khana banaucha.<eos>


In [74]:
response = generate_response_sample(
    question="japan ko capital city tokyo ho vane India ko kaha ho",
    type="RAG",
    context=""
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
japan ko capital city tokyo ho vane India ko kaha ho

### Input


### Response
Indira Gandhi International Airport Delhi, India ko sabse popular airport ho.
        
### Explanation
India ko sabse popular airport Indira Gandhi International Airport ho, jo Delhi maa chha.<eos>


In [77]:
response = generate_response_sample(
    question="store ko location ke ho ra contact kati ho",
    type="",
    context="Hamro store Dharan ma xa, hamro store ko name Happy store ho contact no 9812324890 ho"
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
store ko location ke ho ra contact kati ho

### Input
Hamro store Dharan ma xa, hamro store ko name Happy store ho contact no 9812324890 ho

### Response
Hamro store Dharan ma xa, contact no 9812324890 ho.<eos>


In [79]:
response = generate_response_sample(
    question="macbook ko price kati ho",
    type="RAG",
    context="Product Details: name: MacBook Air M1, price: 150000, stock: False, description: 13.3-inch Retina display, Apple M1 chip, 8GB RAM, 256GB SSD."
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
macbook ko price kati ho

### Input
Product Details: name: MacBook Air M1, price: 150000, stock: False, description: 13.3-inch Retina display, Apple M1 chip, 8GB RAM, 256GB SSD.

### Response
MacBook Air M1 ko price 150000 ho.<eos>


In [80]:
response = generate_response_sample(
    question="store ko contact kati ho?",
    type="RAG",
    context="Hamro store ko name All Electronics store ho ra hamro store Dharan maa xa. Hamro store ko contact number 9800000000 ho"
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
store ko contact kati ho?

### Input
Hamro store ko name All Electronics store ho ra hamro store Dharan maa xa. Hamro store ko contact number 9800000000 ho

### Response
Hamro contact number 9800000000 ho<eos>


In [83]:
response = generate_response_sample(
    question="timi ko hau",
    type="RAG",
    context="You are ecommerce assistant for all electronics store who will help user"
)
print(response)

<bos>Below is an instruction that describes a task paired with an input that provides further context. Write a response that appriately complete the request.

### Instruction
timi ko hau

### Input
You are ecommerce assistant for all electronics store who will help user

### Response
Namaste! Ma tapaiko ecommerce assistant hu, sabai electronics ko thau khojasira help garna parchau!<eos>


## save model

In [84]:
if True: model.push_to_hub_merged("manojbaniya/roman-nepali-alpaca-1000-step", tokenizer, save_method = "lora", token = "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")

Unsloth: Saving LoRA adapters. Please wait...


README.md:   0%|          | 0.00/580 [00:00<?, ?B/s]

  0%|          | 0/1 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/216M [00:00<?, ?B/s]

  0%|          | 0/2 [00:00<?, ?it/s]

tokenizer.json:   0%|          | 0.00/34.4M [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

Saved lora model to https://huggingface.co/manojbaniya/roman-nepali-alpaca-1000-step
