<a href="https://colab.research.google.com/github/rizalpangestu1/ILT-Bangkit2024H2/blob/main/Hands_On_Generative_AI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fine Tuning with QLoRA

## Getting Ready

### Install Library

In [None]:
%pip install -U -q datasets trl bitsandbytes transformers accelerate peft wandb gradio

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.5/43.5 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m12.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m365.7/365.7 kB[0m [31m30.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m69.1/69.1 MB[0m [31m32.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m336.3/336.3 kB[0m [31m22.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m374.8/374.8 kB[0m [31m25.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m20.1/20.1 MB[0m [31m96.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.2/57.2 MB[0m [31m40.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━

### Import Library

In [None]:
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import (
    LoraConfig,
    PeftModel,
    prepare_model_for_kbit_training,
    get_peft_model,
)
import os, torch, wandb
from datasets import load_dataset
from trl import SFTTrainer, SFTConfig

from google.colab import userdata
from huggingface_hub import login

## Setup Model

### Login on HF and Wandb

In [None]:
hf_token = userdata.get("HF_TOKEN")
wb_token = userdata.get("WANDB_TOKEN")

login(token=hf_token)
wandb.login(key=wb_token)

run = wandb.init(
    project="Fine-tune Gemma 2B on Medical Chatbot Dataset",
    job_type="training",
    anonymous="allow"
)

### QLoRA Config

In [None]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True
)

base_model = "google/gemma-2b-it"

model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=bnb_config,
    device_map='auto'
)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

### Setup Tokenizer

In [None]:
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model)

tokenizer_config.json:   0%|          | 0.00/34.2k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

### Inference Test

In [None]:
messages = [
    {
        "role": "user",
        "content": "Dok antibiotik yg ampuh untuk radang tenggorokan apa ya? sudah seminggu ini adik saya usia 10 tahun mengalami radang tenggorokan dan amandel juga merah. sudah minum obat paracetamol tapi demamnya turun sebentar lalu naik lagi. antibiotik yg tepat untuk anak umur 10 tahun apa yang tepat untuk anak 10 tahun.. terima kasih"
    }
]

prompt = tokenizer.apply_chat_template(messages,
                                       tokenize=False,
                                       add_generation_prompt=True)
inputs = tokenizer(prompt,
                   return_tensors="pt",
                   padding=True,
                   truncation=True).to("cuda")

outputs = model.generate(**inputs, max_length=500, num_return_sequences=1)
text = tokenizer.decode(outputs[0], skip_special_tokens=True)

text

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


'user\nDok antibiotik yg ampuh untuk radang tenggorokan apa ya? sudah seminggu ini adik saya usia 10 tahun mengalami radang tenggorokan dan amandel juga merah. sudah minum obat paracetamol tapi demamnya turun sebentar lalu naik lagi. antibiotik yg tepat untuk anak umur 10 tahun apa yang tepat untuk anak 10 tahun.. terima kasih\nmodel\n**Antibiotik yang tepat untuk anak-anak 10 tahun:**\n\n* **Amoxicilin**\n* **Ciprofloxacin**\n* **Erythromycin**\n* **Gentamicin**\n* **Levofloxacin**\n* **Metronidazole**\n\n**Catatan:**\n\n* Konsultasikan dengan dokter atau profesional kesehatan sebelum memberikan antibiotics kepada anak-anak.\n* Dosi dan durasi penggunaan antibiotics akan tergantung pada kondisi kesehatan dan kondisi infeksi.\n* Jangan menggunakan antibiotik jika anak-anak mengalami gejala lain, seperti demam, flu, atau batuk.'

In [None]:
print(text.split("model")[1])


**Antibiotik yang tepat untuk anak-anak 10 tahun:**

* **Amoxicilin**
* **Ciprofloxacin**
* **Erythromycin**
* **Gentamicin**
* **Levofloxacin**
* **Metronidazole**

**Catatan:**

* Konsultasikan dengan dokter atau profesional kesehatan sebelum memberikan antibiotics kepada anak-anak.
* Dosi dan durasi penggunaan antibiotics akan tergantung pada kondisi kesehatan dan kondisi infeksi.
* Jangan menggunakan antibiotik jika anak-anak mengalami gejala lain, seperti demam, flu, atau batuk.


### Parameter Efficient Fine-tuning (PEFT)

In [None]:
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

print(model)

GemmaForCausalLM(
  (model): GemmaModel(
    (embed_tokens): Embedding(256000, 2048, padding_idx=0)
    (layers): ModuleList(
      (0-17): 18 x GemmaDecoderLayer(
        (self_attn): GemmaSdpaAttention(
          (q_proj): Linear4bit(in_features=2048, out_features=2048, bias=False)
          (k_proj): Linear4bit(in_features=2048, out_features=256, bias=False)
          (v_proj): Linear4bit(in_features=2048, out_features=256, bias=False)
          (o_proj): Linear4bit(in_features=2048, out_features=2048, bias=False)
          (rotary_emb): GemmaRotaryEmbedding()
        )
        (mlp): GemmaMLP(
          (gate_proj): Linear4bit(in_features=2048, out_features=16384, bias=False)
          (up_proj): Linear4bit(in_features=2048, out_features=16384, bias=False)
          (down_proj): Linear4bit(in_features=16384, out_features=2048, bias=False)
          (act_fn): PytorchGELUTanh()
        )
        (input_layernorm): GemmaRMSNorm((2048,), eps=1e-06)
        (post_attention_layernorm): G

In [None]:
peft_config = LoraConfig(
    r=64,
    lora_alpha=32,
    target_modules='all-linear',
    lora_dropout=0.05,
    bias='none',
    task_type='CAUSAL_LM'
)

model = get_peft_model(model, peft_config)

### Trainable Parameters

In [None]:
trainable, total = model.get_nb_trainable_parameters()
print(f"Trainable: {trainable} | total: {total} | Percentage: {trainable/total*100:.4f}%")

Trainable: 78446592 | total: 2584619008 | Percentage: 3.0351%


## Setup Dataset

### Load Training Dataset

In [None]:
dataset_name = "ruslanmv/ai-medical-chatbot"
dataset = load_dataset(dataset_name)
dataset

README.md:   0%|          | 0.00/863 [00:00<?, ?B/s]

dialogues.parquet:   0%|          | 0.00/142M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/256916 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['Description', 'Patient', 'Doctor'],
        num_rows: 256916
    })
})

### Show Dataset

In [None]:
print(dataset['train']['Patient'][0])
print(dataset['train']['Doctor'][0])

Hi doctor,I am just wondering what is abutting and abutment of the nerve root means in a back issue. Please explain. What treatment is required for annular bulging and tear?
Hi. I have gone through your query with diligence and would like you to know that I am here to help you. For further information consult a neurologist online -->


### Filter Dataset

In [None]:
dataset = dataset.shuffle(seed=42).select(range(1000)) # Only use 1000 samples for quick demo

### Preprocess Data

In [None]:
def format_chat_template(row):
    row_json = [{"role": "user", "content": row["Patient"]},
               {"role": "assistant", "content": row["Doctor"]}]
    row["text"] = tokenizer.apply_chat_template(row_json, tokenize=False)
    return row

dataset = dataset.map(
    format_chat_template,
    num_proc=4,
)

dataset['text'][0]

Map (num_proc=4):   0%|          | 0/1000 [00:00<?, ? examples/s]

'<bos><start_of_turn>user\nlast year my wife was went through a surgery for appendix cancer, that appendix was removed , that appendix slice tested in lab and found so called adino carcinoma in apedix,  after that doctor decided to operate again and remove her partial intestine, there was no sign of cancer in any test other than the biopsy of appendix, however  after one moth of hospitalization came back to home, 6 month of follow up check up no bad sign, now almost one year of surgery puss mark notice at steches near belly button .Please advice this is not a sign of any cancer<end_of_turn>\n<start_of_turn>model\nHi and welcome to HCM. First, you dont have to worry. This cant be tumour relaps because this is lesion in abdominall wall,obviously some local infection or wound abscess.This is often seen after laparotomy. Appendix cancers are rare but in most cases surgery is enough for complete treatment and recovery. If tehre were no found metastasis or extended disease during the surgery

In [None]:
dataset = dataset.train_test_split(test_size=0.1)

## Fine-Tuning Model

### Set Training Argument (Config)

In [None]:
training_arguments = SFTConfig(
    output_dir="output",
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    gradient_accumulation_steps=4,
    warmup_steps=10,
    optim="paged_adamw_8bit",
    num_train_epochs=1,
    eval_strategy="steps",
    eval_steps=0.2,
    logging_steps=1,
    learning_rate=2e-4,
    fp16=False,
    bf16=False,
    group_by_length=True,
    dataset_text_field="text",
    packing=False,
    max_seq_length=512,
    report_to="wandb"
)

### Training Time!

In [None]:
import time

# Mulai waktu pelatihan
start_time = time.time()

# Proses pelatihan model
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    peft_config=peft_config,
    args=training_arguments,
    processing_class=tokenizer
)

trainer.train()

# Akhir waktu pelatihan
end_time = time.time()

# Menghitung durasi pelatihan
training_duration = end_time - start_time
print(f"Pelatihan selesai dalam waktu {training_duration / 60:.2f} menit")

Map:   0%|          | 0/900 [00:00<?, ? examples/s]

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

Step,Training Loss,Validation Loss
45,3.3062,3.112824
90,2.7165,2.93843
135,3.3889,2.863719
180,3.1829,2.823114
225,2.823,2.808095


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


Pelatihan selesai dalam waktu 14.22 menit


### Stop Wandb

In [None]:
wandb.finish()

0,1
eval/loss,█▄▂▁▁
eval/runtime,▁▃▄▃█
eval/samples_per_second,█▆▆▆▁
eval/steps_per_second,█▆▆▆▁
train/epoch,▁▁▂▂▂▂▂▃▃▃▃▄▄▄▄▄▄▄▅▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇█████
train/global_step,▁▁▁▁▂▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▅▅▅▆▆▆▆▆▆▆▇▇▇▇▇▇▇██
train/grad_norm,▅█▇▃▃▆▄▂▂▃▂▃▂▃▂▂▂▃▃▃▄▁▂▂▂▂▂▂▃▂▃▃▂▁▁▂▂▄▂▂
train/learning_rate,▃▇███▇▇▇▇▇▆▆▆▆▆▆▆▅▅▅▅▄▄▄▄▄▄▄▄▃▃▃▃▃▃▂▂▂▂▁
train/loss,██▇█▅▅▅▅▅▅▃▃▃▃▅▄▅▃▅▃▃▃▂▁▃▅▂▃▃▃▄▃▂▃▄▂▃▂▄▂

0,1
eval/loss,2.80809
eval/runtime,28.5299
eval/samples_per_second,3.505
eval/steps_per_second,3.505
total_flos,2528162128662528.0
train/epoch,1.0
train/global_step,225.0
train/grad_norm,1.72633
train/learning_rate,0.0
train/loss,2.823


### Inference Test (CLI)

In [None]:
messages = [
    {
        "role": "user",
        "content": "Hello doctor, I have bad acne. How do I get rid of it?"
    }
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False,
                                       add_generation_prompt=True)

inputs = tokenizer(prompt, return_tensors='pt', padding=True,
                   truncation=True,).to("cuda")

outputs = model.generate(**inputs, max_length=200,
                         num_return_sequences=1)

text = tokenizer.decode(outputs[0], skip_special_tokens=True)
text

'user\nHello doctor, I have bad acne. How do I get rid of it?\nmodel\nHi, I have gone through your query. I can understand your concern. I would suggest you to use a good quality antibiotic cream like erythromycin or clindamycin. Apply it twice daily. Avoid oily foods and drinks. Avoid spicy foods. Avoid alcohol. Avoid smoking. Avoid stress. Hope I have answered your query. Let me know if I can assist you further. Regards, Dr. Shinas Hussain, Dermatologist\nmodel\nHi. I have gone through your query. I can understand your concern. I would suggest you to use a good quality antibiotic cream like erythromycin or clindamycin. Apply it twice daily. Avoid oily foods and drinks. Avoid spicy foods. Avoid alcohol. Avoid smoking. Avoid stress. Hope I have answered your query. Regards, Dr. Shinas Hussain, Dermatologist\n\nHi. I have gone through your'

In [None]:
print(text.split("model")[1])


Hi, I have gone through your query. I can understand your concern. I would suggest you to use a good quality antibiotic cream like erythromycin or clindamycin. Apply it twice daily. Avoid oily foods and drinks. Avoid spicy foods. Avoid alcohol. Avoid smoking. Avoid stress. Hope I have answered your query. Let me know if I can assist you further. Regards, Dr. Shinas Hussain, Dermatologist



### Push Model to HF

In [None]:
output_model = "gemma-2b-medical-chatbot"
trainer.model.save_pretrained(output_model)
trainer.model.push_to_hub(output_model, use_temp_dir=False)

adapter_model.safetensors:   0%|          | 0.00/314M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/ivandrian11/gemma-2b-medical-chatbot/commit/5c4cf3d8b817bcb81e6cb23739995ece09114f31', commit_message='Upload model', commit_description='', oid='5c4cf3d8b817bcb81e6cb23739995ece09114f31', pr_url=None, repo_url=RepoUrl('https://huggingface.co/ivandrian11/gemma-2b-medical-chatbot', endpoint='https://huggingface.co', repo_type='model', repo_id='ivandrian11/gemma-2b-medical-chatbot'), pr_revision=None, pr_num=None)

### Inference Test (Gradio)

In [None]:
# prompt: make gradio interface to test my new model from huggingface by pipeline
import gradio as gr
from transformers import pipeline

# Replace with your actual model path or Hugging Face model ID
model_path = "ivandrian11/gemma-2b-medical-chatbot"

pipe = pipeline("text-generation", model=model_path, device=0) # Assuming you have a GPU (device=0)

def predict(input_text):
    result = pipe(input_text, max_length=150, num_return_sequences=1)
    generated_text = result[0]["generated_text"]
    return generated_text


iface = gr.Interface(
    fn=predict,
    inputs=gr.Textbox(lines=5, placeholder="Enter your text here..."),
    outputs="text",
    title="Gemma 2B Medical Chatbot",
    description="Ask questions and get responses from the fine-tuned Gemma 2B model."
)

iface.launch()

adapter_config.json:   0%|          | 0.00/792 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/627 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/13.5k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/67.1M [00:00<?, ?B/s]

`config.hidden_act` is ignored, you should use `config.hidden_activation` instead.
Gemma's activation function will be set to `gelu_pytorch_tanh`. Please, use
`config.hidden_activation` if you want to override this behaviour.
See https://github.com/huggingface/transformers/pull/29402 for more details.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/34.2k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://5c9eb979cc3a4f7a1d.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




# RAG

## Install Library

In [None]:
%pip install -q huggingface_hub langchain-community gradio faiss-cpu

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m60.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.2/57.2 MB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m320.2/320.2 kB[0m [31m22.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.5/27.5 MB[0m [31m45.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m94.8/94.8 kB[0m [31m7.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m27.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m410.6/410.6 kB[0m [31m21.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.2/11.2 MB[0m [31m54.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

## Import Library

In [None]:
from langchain.document_loaders import TextLoader
from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.llms import HuggingFaceEndpoint
from langchain.chains import LLMChain, RetrievalQA
from langchain.prompts import PromptTemplate
from google.colab import userdata

hf_token = userdata.get("HF_TOKEN")

## Load Model

In [None]:
repo_id = "mistralai/Mistral-7B-Instruct-v0.2"
question = "Who won the FIFA World Cup in the year 2022? "
template = """Question: {question}

Answer: Let's think step by step."""

prompt = PromptTemplate.from_template(template)


llm = HuggingFaceEndpoint(
    repo_id=repo_id, temperature=0.5, huggingfacehub_api_token=hf_token
)
llm_chain = LLMChain(prompt=prompt, llm=llm)
llm_chain.run(question)

  llm = HuggingFaceEndpoint(
  llm_chain = LLMChain(prompt=prompt, llm=llm)
  llm_chain.run(question)


" First, let's find out who won the most recent FIFA World Cup. The 2018 FIFA World Cup was won by France. As for the 2022 FIFA World Cup, it hasn't happened yet. The next FIFA World Cup is scheduled for Qatar in 2022. So, no team has won the FIFA World Cup in the year 2022 yet."

In [None]:
template = """You are a helpful and harmless AI assistant. Always answer the user's question.
Remember that before you answer a question, you must check to see if it complies with your mission.
If not, you can say, Sorry I can't answer that question.

Question: {question}

Answer: Let's think step by step."""

prompt = PromptTemplate.from_template(template)
llm_chain = LLMChain(prompt=prompt, llm=llm)
print(llm_chain.run(question))

 First, I need to check if the question complies with my mission. The question asks about a specific event, the FIFA World Cup, and a specific year, 2022. This information is factual and does not pose any harm or violation of my mission. Therefore, I can proceed to answer the question.

The FIFA World Cup is the most prestigious association football tournament in the world, contested by the national teams of the member associations of FIFA. The tournament takes place every four years. However, as of now, the FIFA World Cup in the year 2022 has not taken place yet. The last FIFA World Cup was held in Russia in 2018, and the next one is scheduled to take place in Qatar in 2022. So, I cannot provide an answer to this question as the outcome is not yet known.

Therefore, if someone asks me this question, I would say, I'm sorry, I can't answer that question as the FIFA World Cup in the year 2022 has not taken place yet, and the outcome is not yet known.


## Load Data
Dataset: [Scribd](https://www.scribd.com/doc/244050191/Kumpulan-Cerita-Legenda-Indonesia-pdf)

In [None]:
loader = TextLoader("legenda-indo.txt")
documents = loader.load()

## Load Embeddings Model

In [None]:
embeddings = HuggingFaceInferenceAPIEmbeddings(
    api_key=hf_token, model_name="sentence-transformers/all-MiniLM-l6-v2"
)

## Save Data to FAISS (Vector Database)

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100, separators=[" ", ",", "\n"])
docs = text_splitter.split_documents(documents)
db = FAISS.from_documents(docs, embeddings)
print(db.index.ntotal)

138


In [None]:
for doc in docs[:2]:
  print(doc)
  print("-"*100)

page_content='Judul: AJI SAKA  
Dahulu kala, ada sebuah kerajaan bernama Medang Kamulan yang diperintah oleh raja bernama Prabu Dewata Cengkar yang buas dan suka makan manusia. Setiap hari sang raja memakan seorang manusia yang dibawa oleh Patih Jugul Muda. Sebagian kecil dari rakyat yang resah dan ketakutan mengungsi secara diam-diam ke daerah lain.  
Di dusun Medang Kawit ada seorang pemuda bernama Aji Saka yang sakti, rajin dan baik hati. Suatu hari, Aji Saka berhasil menolong seorang bapak tua yang sedang' metadata={'source': 'legenda-indo.txt'}
----------------------------------------------------------------------------------------------------
page_content='sakti, rajin dan baik hati. Suatu hari, Aji Saka berhasil menolong seorang bapak tua yang sedang dipukuli oleh dua orang penyamun. Bapak tua yang akhirnya diangkat ayah oleh Aji Saka itu ternyata pengungsi dari Medang Kamulan. Mendengar cerita tentang kebuasan Prabu Dewata Cengkar, Aji Saka berniat menolong rakyat Medang Kamula

## QA Process

In [None]:
qa = RetrievalQA.from_chain_type(llm=llm, retriever=db.as_retriever(search_kwargs={"top_k": 5}),
                                 return_source_documents=True)

In [None]:
query = "Di wilayah mana Danau Toba?"
answer = qa(query)

# for doc in answer["source_documents"]:
#   print(doc)
#   print("-"*100)

print(answer["result"])

  answer = qa(query)


 Wilayah Sumatra.

Explanation:
The context "Pada suatu waktu, hiduplah sebuah keluarga nelayan di pesisir pantai wilayah Sumatra" states that the family of Malin Kundang lived by the coast of Sumatra. Therefore, Danau Toba is located in the region of Sumatra.


In [None]:
import gradio as gr

qa = RetrievalQA.from_chain_type(llm=llm, retriever=db.as_retriever(search_kwargs={"top_k": 5}))
def input_qa(query):
    answer = qa.run(query)
    return answer

iface = gr.Interface(fn=input_qa, inputs="text", outputs="text")
iface.launch(share=True, debug=True)

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://38c1381c1a7c0484a2.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)
