# 1. BUSINESS UNDERSTANDING

Tujuan Bisnis:
- Menyediakan fondasi awal untuk pengembangan chatbot akademik berbasis LLM melalui proses fine-tuning model menggunakan data dari pedoman akademik Unjaya.
- Meningkatkan aksesibilitas dan pemahaman terhadap informasi akademik di lingkungan Universitas Jenderal Achmad Yani Yogyakarta melalui inovasi teknologi AI.
- Mendukung efisiensi penyampaian informasi dengan mengurangi beban kerja staf akademik dalam menjawab pertanyaan berulang seputar pedoman akademik.
- Evaluasi menggunakan BERTScore 

## Persiapan 

### Import Library

In [1]:
# Setup environment
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
os.environ["TOKENIZERS_PARALLELISM"] = "false"

In [None]:
# Menginstal library 
# !pip install peft datasets transformers trl accelerate bitsandbytes evaluate wandb -q

In [None]:
# !pip install -U bert_score -q

In [2]:
import pandas as pd
from sklearn.model_selection import StratifiedShuffleSplit
import json
import torch
import wandb
from datasets import load_dataset, Dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
    logging
)
from peft import (
    LoraConfig,
    PeftModel,
    prepare_model_for_kbit_training,
    get_peft_model
)

from trl import SFTTrainer #gak bisa di install
from transformers import Trainer, DataCollatorForLanguageModeling #alternative trainer

import evaluate # Mengimpor library evaluate untuk BERTScore


2025-07-17 02:43:26.177808: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1752720206.202460     260 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1752720206.209891     260 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


In [3]:
print(f"Numpy version: {pd.__version__}")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

Numpy version: 2.2.3
PyTorch version: 2.6.0+cu124
CUDA available: True


In [4]:
# Pengaturan logging
logging.set_verbosity_info()

### Login Huggingface dan Wandb

In [5]:
# Autentikasi dan setup
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
secret_hf = user_secrets.get_secret("HUGGINGFACE_TOKEN")
secret_wandb = user_secrets.get_secret("WANDB_KEY")

In [6]:
# Login ke Hugging Face Hub
!huggingface-cli login --token $secret_hf

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
The token `kaggle_llm` has been saved to /root/.cache/huggingface/stored_tokens
Your token has been saved to /root/.cache/huggingface/token
Login successful.
The current active token is: `kaggle_llm`


In [7]:
# Login ke Weights & Biases untuk monitoring pelatihan
wandb.login(key = secret_wandb)
run = wandb.init(
    project='Fine-tuning-Mistral-7B-Pedoman-Akademik',
    job_type="training",
    notes="Fine-tuning Mistral 7B Instruct v0.3 untuk chatbot pedoman akademik",
    tags=["mistral", "chatbot", "academic", "qlora"]
)

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mriakrst[0m ([33mriakrst-universitas-jenderal-achmad-yani-yogyakarta[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


# 2. DATA UNDERSTANDING

In [8]:
# Memuat dataset dari file CSV
df = pd.read_csv('/kaggle/input/dataset-pedoman-akademik-unjaya/Dataset Pedoman Akademik 2024.csv')
df.head()

Unnamed: 0,ID,Instruction,Response,Sumber,ID_Original,Tipe_Variasi
0,1,Apa itu pedoman akademik di Unjaya?,Pedoman akademik adalah jabaran dari kebijakan...,"Kata Pengantar, Hal. 3 (Pedoman Akademik Unjay...",1,Original
1,2,Apa tujuan dari penyusunan pedoman akademik Un...,Tujuannya adalah menjadi panduan menyeluruh ba...,"Kata Pengantar, Hal. 3 (Pedoman Akademik Unjay...",2,Original
2,3,Apa saja yang dicakup dalam pedoman akademik U...,"Pedoman mencakup kebijakan mutu, visi, misi, t...","Kata Pengantar, Hal. 3 (Pedoman Akademik Unjay...",3,Original
3,4,Siapa yang menyusun pedoman akademik Unjaya 2024?,"Tim penyusun terdiri dari Niko Wahyu Nurcahyo,...","Kata Pengantar, Hal. 4 (Pedoman Akademik Unjay...",4,Original
4,5,Siapa Rektor Universitas Jenderal Achmad Yani ...,Rektor Unjaya adalah Prof. Dr.rer.nat.apt. Tri...,Struktur Organisasi Universitas Jenderal Achma...,5,Original


In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1626 entries, 0 to 1625
Data columns (total 6 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   ID            1626 non-null   int64 
 1   Instruction   1626 non-null   object
 2   Response      1626 non-null   object
 3   Sumber        1626 non-null   object
 4   ID_Original   1626 non-null   int64 
 5   Tipe_Variasi  1626 non-null   object
dtypes: int64(2), object(4)
memory usage: 76.3+ KB


In [10]:
df['Tipe_Variasi'].unique()

array(['Original', 'gaya_pertanyaan'], dtype=object)

# 3. DATA PREPARATION
## 3.1. Split Train dan Eval Set Secara Adil

In [11]:
# Cek kolom penting
assert 'Instruction' in df.columns, "Kolom 'Instruction' tidak ditemukan."
assert 'ID' in df.columns, "Kolom 'ID' tidak ditemukan."

# Pisahkan data original (ID 1–413) dan data variasi (ID 414 ke atas)
df_original = df[df['ID'] <= 413].reset_index(drop=True)
df_variasi = df[df['ID'] > 413].reset_index(drop=True)

# Binning untuk stratifikasi hanya pada data variasi (ID 414 ke atas)
df_variasi['length_bin'] = pd.qcut(
    df_variasi['Instruction'].str.len(), 
    q=5, 
    labels=False, 
    duplicates='drop'
)

# Stratified split hanya untuk data variasi
splitter = StratifiedShuffleSplit(n_splits=1, test_size=0.1, random_state=42)
for train_idx, eval_idx in splitter.split(df_variasi, df_variasi['length_bin']):
    df_variasi_train = df_variasi.iloc[train_idx].reset_index(drop=True)
    df_variasi_eval = df_variasi.iloc[eval_idx].reset_index(drop=True)

# Gabungkan kembali seluruh data original masuk ke train dan df_variasi_train
df_train = pd.concat([df_original, df_variasi_train], ignore_index=True)
df_eval = df_variasi_eval.copy()

# Hapus kolom bantu
df_train = df_train.drop(columns=['length_bin'], errors='ignore')
df_eval = df_eval.drop(columns=['length_bin'], errors='ignore')

# Validasi tidak ada null di kolom penting
assert df_train['Instruction'].isnull().sum() == 0, "Ada instruksi kosong di train"
assert df_eval['Instruction'].isnull().sum() == 0, "Ada instruksi kosong di eval"
assert df_train['Response'].isnull().sum() == 0, "Ada respons kosong di train"
assert df_eval['Response'].isnull().sum() == 0, "Ada respons kosong di eval"

# Output hasil
print(f"Jumlah data original   : {len(df_original)}")
print(f"Jumlah variasi - train : {len(df_variasi_train)}")
print(f"Jumlah variasi - eval  : {len(df_variasi_eval)}")
print(f"Jumlah total train     : {len(df_train)}")
print(f"Jumlah total eval      : {len(df_eval)}")


Jumlah data original   : 413
Jumlah variasi - train : 1091
Jumlah variasi - eval  : 122
Jumlah total train     : 1504
Jumlah total eval      : 122


In [12]:
df_eval

Unnamed: 0,ID,Instruction,Response,Sumber,ID_Original,Tipe_Variasi
0,622,Apa peran dan posisi Dekan menurut pedoman ini?,Dekan adalah pimpinan tertinggi fakultas dalam...,"Bab II Ketentuan Umum, Pasal 2 Daftar Istilah ...",71,gaya_pertanyaan
1,1366,Unjaya melibatkan siapa saja dalam proses peni...,Penilaian dapat dilakukan oleh: a) dosen penga...,"Bab IX Penilaian Pembelajaran, Pasal 42 Pelaks...",329,gaya_pertanyaan
2,751,Bisa dijelaskan keputusan menteri yang dijadik...,(1) Keputusan Menteri Pendidikan dan Kebudayaa...,"Bab III Dasar Hukum Pedoman, Pasal 3 Dasar Huk...",114,gaya_pertanyaan
3,854,Langkah apa saja yang harus dilakukan sebelum ...,"""Sebelum pengisian KRS, setiap mahasiswa wajib...","Bab V Administrasi dan Registrasi Akademik, Pa...",149,gaya_pertanyaan
4,614,Apa tugas dan posisi Wakil Rektor I dalam pedo...,Wakil Rektor I (Warek I) adalah unsur pimpinan...,"Bab II Ketentuan Umum, Pasal 2 Daftar Istilah ...",68,gaya_pertanyaan
...,...,...,...,...,...,...
117,657,Wisuda itu sebenarnya acara apa?,Wisuda adalah upacara akademik dalam forum rap...,"Bab II Ketentuan Umum, Pasal 2 Daftar Istilah ...",83,gaya_pertanyaan
118,941,Siapa yang menerima laporan rutin dari DPA ter...,Setiap DPA berkewajiban melaporkan kegiatan bi...,"Bab V Administrasi dan Registrasi Akademik, Pa...",179,gaya_pertanyaan
119,790,Unjaya menyediakan jenis RPL apa saja untuk la...,Jenis RPL untuk melanjutkan pendidikan formal ...,Bab IV Sistem Penerimaan dan Pendaftaran Mahas...,127,gaya_pertanyaan
120,1521,"Mahasiswa Unjaya yang sudah lulus yudisium, ap...","Ya, mahasiswa yang telah dinyatakan lulus dala...","Bab X Tugas Akhir, Yudisium, Wisuda, Pemberian...",379,gaya_pertanyaan


## 3.2. Konversi ke Format Chat (ChatML)

In [13]:
# Fungsi untuk mengonversi DataFrame menjadi format JSONL yang sesuai dengan ChatML.
# Setiap baris akan menjadi entri 'messages' dengan peran 'user' dan 'assistant'.
def to_chatml_format(df, filename):
    chat_data = []
    
    for _, row in df.iterrows():
        # Format pesan untuk Mistral Instruct
        messages = [
            {
                "role": "user", 
                "content": row['Instruction']
            },
            {
                "role": "assistant", 
                "content": f"{row['Response']}\n\n(Sumber: {row['Sumber']})"
            }
        ]
        
        chat_data.append({"messages": messages})
    
    # Simpan ke file JSONL
    with open(filename, 'w', encoding='utf-8') as f:
        for item in chat_data:
            json.dump(item, f, ensure_ascii=False)
            f.write('\n')
    
    return chat_data

# Konversi dan simpan data
train_jsonl_path = '/kaggle/working/train_chatml.jsonl'
eval_jsonl_path = '/kaggle/working/eval_chatml.jsonl'

train_chatml = to_chatml_format(df_train, train_jsonl_path)
eval_chatml = to_chatml_format(df_eval, eval_jsonl_path)

print(f"Train data disimpan di: {train_jsonl_path}")
print(f"Eval data disimpan di: {eval_jsonl_path}")

Train data disimpan di: /kaggle/working/train_chatml.jsonl
Eval data disimpan di: /kaggle/working/eval_chatml.jsonl


In [14]:
# Load dataset menggunakan Hugging Face datasets
train_dataset = load_dataset('json', data_files=train_jsonl_path, split='train')
eval_dataset = load_dataset('json', data_files=eval_jsonl_path, split='train')

print(f"Train dataset loaded: {len(train_dataset)} samples")
print(f"Eval dataset loaded: {len(eval_dataset)} samples")

Generating train split: 0 examples [00:00, ? examples/s]

Generating train split: 0 examples [00:00, ? examples/s]

Train dataset loaded: 1504 samples
Eval dataset loaded: 122 samples


In [15]:
# Contoh format data
print("\nContoh format data ChatML:")
print(json.dumps(train_chatml[0], indent=2, ensure_ascii=False))


Contoh format data ChatML:
{
  "messages": [
    {
      "role": "user",
      "content": "Apa itu pedoman akademik di Unjaya?"
    },
    {
      "role": "assistant",
      "content": "Pedoman akademik adalah jabaran dari kebijakan akademik Universitas Jenderal Achmad Yani Yogyakarta yang menjadi pedoman penyelenggaraan program akademik.\n\n(Sumber: Kata Pengantar, Hal. 3 (Pedoman Akademik Unjaya 2024))"
    }
  ]
}



# 4. MODELING

## 4.1 Konfigurasi Model dan Tokenizer

In [16]:
# Konfigurasi QLoRA (4-bit quantization)
# QLoRA memungkinkan fine-tuning model besar dengan memory GPU terbatas
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,                    # Kuantisasi model ke 4-bit (menghemat 75% memory)
    bnb_4bit_quant_type="nf4",           # Normal Float 4: format kuantisasi yang optimal
    bnb_4bit_compute_dtype=torch.bfloat16, # Tipe data untuk komputasi (lebih stabil dari float16)
    bnb_4bit_use_double_quant=False      # Double quantization off (menghemat memory lebih)
)
# Model configuration
base_model = "mistralai/Mistral-7B-Instruct-v0.3"
new_model_name = "riakrst/mistral-7b-pedoman-akademik-unjaya"

print(f"Loading model: {base_model}")

# Load model dengan quantization
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=bnb_config,      # Terapkan konfigurasi QLoRA
    torch_dtype=torch.bfloat16,          # Tipe data model (lebih stabil dari float16)
    device_map="auto",                   # Otomatis distribusi ke GPU yang tersedia
    trust_remote_code=True               # Izinkan eksekusi kode kustom dari model
)

# Disable cache untuk training (akan diaktifkan kembali untuk inference)
model.config.use_cache = False          # Matikan cache untuk menghemat memory saat training
model.config.pretraining_tp = 1         # Tensor parallelism = 1 (menghindari warning)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)

# Setup tokenizer untuk chat format
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token  # Gunakan end-of-sequence sebagai padding
tokenizer.padding_side = "right"              # Padding di sebelah kanan (standar untuk causal LM)

print(f"Model dan tokenizer berhasil dimuat")
print(f"Vocab size: {tokenizer.vocab_size}")

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.3/snapshots/e0bc86c23ce5aae1db576c8cca6f06f1f73af2db/config.json
Model config MistralConfig {
  "architectures": [
    "MistralForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "head_dim": null,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 32768,
  "model_type": "mistral",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "rms_norm_eps": 1e-05,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.52.4",
  "use_cache": true,
  "vocab_size": 32768
}



Loading model: mistralai/Mistral-7B-Instruct-v0.3


loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.3/snapshots/e0bc86c23ce5aae1db576c8cca6f06f1f73af2db/model.safetensors.index.json
Instantiating MistralForCausalLM model under default dtype torch.bfloat16.
Generate config GenerationConfig {
  "bos_token_id": 1,
  "eos_token_id": 2
}

target_dtype {target_dtype} is replaced by `CustomDtype.INT4` for 4-bit BnB quantization


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

All model checkpoint weights were used when initializing MistralForCausalLM.

All the weights of MistralForCausalLM were initialized from the model checkpoint at mistralai/Mistral-7B-Instruct-v0.3.
If your task is similar to the task the model of the checkpoint was trained on, you can already use MistralForCausalLM for predictions without further training.
loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.3/snapshots/e0bc86c23ce5aae1db576c8cca6f06f1f73af2db/generation_config.json
Generate config GenerationConfig {
  "bos_token_id": 1,
  "eos_token_id": 2
}

loading file tokenizer.model from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.3/snapshots/e0bc86c23ce5aae1db576c8cca6f06f1f73af2db/tokenizer.model
loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.3/snapshots/e0bc86c23ce5aae1db576c8cca6f06f1f73af2db/to

Model dan tokenizer berhasil dimuat
Vocab size: 32768


## 4.2 Konfigurasi PEFT (LoRA)

In [17]:
# Persiapan model untuk kbit training
model = prepare_model_for_kbit_training(model)

# Konfigurasi LoRA (Low-Rank Adaptation)
# LoRA menambahkan layer kecil yang dapat dilatih tanpa mengubah model asli
peft_config = LoraConfig(
    lora_alpha=16,                       # Skala untuk bobot LoRA (biasanya 16 atau 32)
    lora_dropout=0.1,                    # Dropout untuk mencegah overfitting
    r=64,                                # Rank matriks LoRA (semakin tinggi = lebih ekspresif)
    bias="none",                         # Tidak melatih bias (menghemat parameter)
    task_type="CAUSAL_LM",              # Tipe tugas: Causal Language Modeling
    target_modules=[                     # Layer yang akan ditambahkan LoRA adapter
        "q_proj", "k_proj", "v_proj", "o_proj","gate_proj"      # attention layers
    ]
)

# Apply LoRA
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

trainable params: 92,274,688 || all params: 7,340,298,240 || trainable%: 1.2571


## 4.3 Konfigurasi SFT Trainer

In [18]:
from trl import SFTConfig

sft_config = SFTConfig(
    output_dir = "/kaggle/working/results-pedoman-akademik",
    num_train_epochs=1,                         
    per_device_train_batch_size=2,
    per_device_eval_batch_size=2,
    gradient_accumulation_steps=16,
    eval_accumulation_steps=16,
    learning_rate=2e-4,
    weight_decay=0.001,

    # Logging dan checkpointing
    eval_steps=10,
    save_steps=10,
    save_total_limit=1,
    logging_steps=10,
    disable_tqdm=False,

    # Evaluasi dan reproducibility
    eval_strategy="steps",                # pastikan evaluasi jalan
    seed=42,

    # Precision
    bf16=False,                                 # T4 tidak support bf16, pakai fp16 otomatis
    fp16=True,

    # Hugging Face Hub
    push_to_hub=True,
    hub_model_id=new_model_name,                

    # Wandb monitoring 
    report_to="wandb" if wandb.run else None,
    run_name=f"mistral-pedoman-{wandb.run.id}" if wandb.run else None,

    # SFT-specific params
    max_seq_length=512,
    packing=False,                              # False untuk dataset ChatML baris per dialog
    neftune_noise_alpha=5,                      # Optional: bisa di-nol-kan juga
)

PyTorch: setting up devices
average_tokens_across_devices is set to True but it is invalid when world size is1. Turn it to False automatically.


In [19]:
from trl import SFTTrainer

trainer = SFTTrainer(
    model=model,
    args=sft_config,              
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    # tokenizer=tokenizer,          
    peft_config=peft_config,
)

trainer.tokenizer = tokenizer

loading file tokenizer.model from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.3/snapshots/e0bc86c23ce5aae1db576c8cca6f06f1f73af2db/tokenizer.model
loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.3/snapshots/e0bc86c23ce5aae1db576c8cca6f06f1f73af2db/tokenizer.json
loading file added_tokens.json from cache at None
loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.3/snapshots/e0bc86c23ce5aae1db576c8cca6f06f1f73af2db/special_tokens_map.json
loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.3/snapshots/e0bc86c23ce5aae1db576c8cca6f06f1f73af2db/tokenizer_config.json
loading file chat_template.jinja from cache at None


Tokenizing train dataset:   0%|          | 0/1504 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/1504 [00:00<?, ? examples/s]

Tokenizing eval dataset:   0%|          | 0/122 [00:00<?, ? examples/s]

Truncating eval dataset:   0%|          | 0/122 [00:00<?, ? examples/s]

Using auto half precision backend
No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
Trainer.tokenizer is now deprecated. You should use `Trainer.processing_class = processing_class` instead.


## 4.4 Training

In [20]:
# Hindari error pembuatan model card
trainer.create_model_card = lambda *args, **kwargs: None

print("Memulai fine-tuning...")

# Log hyperparameters ke W&B
if wandb.run:
    wandb.config.update({
        "base_model": base_model,
        "dataset_size": len(train_dataset),
        "eval_size": len(eval_dataset),
        "lora_r": peft_config.r,
        "lora_alpha": peft_config.lora_alpha,
        "learning_rate": sft_config.learning_rate,  
        "batch_size": sft_config.per_device_train_batch_size,  
        "epochs": sft_config.num_train_epochs,  
        "max_seq_length": sft_config.max_seq_length,
        "model_name": new_model_name,
    })
# Training
trainer.train()

The following columns in the Training set don't have a corresponding argument in `PeftModelForCausalLM.forward` and have been ignored: messages. If messages are not expected by `PeftModelForCausalLM.forward`,  you can safely ignore this message.


Memulai fine-tuning...


***** Running training *****
  Num examples = 1,504
  Num Epochs = 1
  Instantaneous batch size per device = 2
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 16
  Total optimization steps = 47
  Number of trainable parameters = 92,274,688
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
  return fn(*args, **kwargs)


Step,Training Loss,Validation Loss
10,2.8778,1.592857
20,1.3835,1.158748
30,1.1187,1.009609
40,0.9971,0.944186


The following columns in the Evaluation set don't have a corresponding argument in `PeftModelForCausalLM.forward` and have been ignored: messages. If messages are not expected by `PeftModelForCausalLM.forward`,  you can safely ignore this message.

***** Running Evaluation *****
  Num examples = 122
  Batch size = 2
Saving model checkpoint to /kaggle/working/results-pedoman-akademik/checkpoint-10
loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.3/snapshots/e0bc86c23ce5aae1db576c8cca6f06f1f73af2db/config.json
Model config MistralConfig {
  "architectures": [
    "MistralForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "head_dim": null,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 32768,
  "model_type": "mistral",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value

TrainOutput(global_step=47, training_loss=1.4998841691524425, metrics={'train_runtime': 2268.45, 'train_samples_per_second': 0.663, 'train_steps_per_second': 0.021, 'total_flos': 1.2234108777086976e+16, 'train_loss': 1.4998841691524425})

In [21]:
trainer.evaluate()

The following columns in the Evaluation set don't have a corresponding argument in `PeftModelForCausalLM.forward` and have been ignored: messages. If messages are not expected by `PeftModelForCausalLM.forward`,  you can safely ignore this message.

***** Running Evaluation *****
  Num examples = 122
  Batch size = 2


{'eval_loss': 0.9180750846862793,
 'eval_runtime': 57.3104,
 'eval_samples_per_second': 2.129,
 'eval_steps_per_second': 1.064}

# 5. EVALUATION

In [32]:
# Set model ke mode eval
model.config.use_cache = True
model.eval()

# Load BERTScore
bertscore = evaluate.load("bertscore")

# Sampling data evaluasi
sample_size = min(15, len(eval_dataset))
sample_data = eval_dataset.shuffle(seed=42).select(range(sample_size))

# Ambil data
prompts = [ex["messages"][0]["content"] for ex in sample_data] # pesan dari user
references = [ex["messages"][1]["content"] for ex in sample_data] # pesan dari assistant

# Setup pipeline (tanpa parameter device)
pipe = pipeline(
    task="text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
    return_full_text=False
)

# Generate jawaban
print(f"Generating {sample_size} responses...")
generated = []

for i, prompt in enumerate(prompts):
    print(f"Progress: {i+1}/{sample_size}")
    
    try:
        chat = [{"role": "user", "content": prompt}]
        input_text = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
        
        output = pipe(
            input_text,
            max_new_tokens=256,
            do_sample=True,
            temperature=0.7,
            top_p=0.9,
            pad_token_id=tokenizer.eos_token_id
        )
        
        generated.append(output[0]["generated_text"].strip())
        
    except Exception as e:
        print(f"Error pada sample {i+1}: {e}")
        generated.append("")

Device set to use cuda:0


Generating 15 responses...
Progress: 1/15
Progress: 2/15
Progress: 3/15
Progress: 4/15
Progress: 5/15
Progress: 6/15
Progress: 7/15
Progress: 8/15
Progress: 9/15
Progress: 10/15


You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


Progress: 11/15
Progress: 12/15
Progress: 13/15
Progress: 14/15
Progress: 15/15


In [37]:
import numpy as np 

# Hitung BERTScore
print("\nMenghitung BERTScore...")

try:
    results = bertscore.compute(
        predictions=generated,
        references=references,
        lang="id"  # Bahasa Indonesia
    )
    
    # Hitung rata-rata
    avg_f1 = np.mean(results["f1"])
    avg_precision = np.mean(results["precision"])
    avg_recall = np.mean(results["recall"])
    
    # Buat DataFrame lengkap tanpa pemotongan teks
    df = pd.DataFrame({
        'Sample': [f"S{i+1}" for i in range(len(generated))],
        'F1': [f"{score:.3f}" for score in results['f1']],
        'Precision': [f"{score:.3f}" for score in results['precision']],
        'Recall': [f"{score:.3f}" for score in results['recall']],
        'Prompt': prompts,
        'Reference': references,
        'Prediksi': generated
    })
    
    # Tampilkan semua kolom dan lebar maksimal untuk teks
    pd.set_option('display.max_columns', None)
    pd.set_option('display.max_colwidth', None)
    pd.set_option('display.width', None)
    
    print("\n" + "="*100)
    print("HASIL EVALUASI BERTSCORE (Format Per Sample)")
    print("="*100)
    
    for i, row in df.iterrows():
        print(f"\n📌 Sample {row['Sample']}")
        print(f"🔹 F1       : {row['F1']}")
        print(f"🔹 Precision: {row['Precision']}")
        print(f"🔹 Recall   : {row['Recall']}")
        print(f"🔸 Prompt   : {row['Prompt']}")
        print(f"🔸 Reference:\n{row['Reference']}")
        print(f"🔸 Prediksi :\n{row['Prediksi']}")
        print("-" * 100)

    
    # Tampilkan rata-rata
    print(f"\nRATA-RATA SKOR:")
    print(f"   F1 Score  : {avg_f1:.4f}")
    print(f"   Precision : {avg_precision:.4f}")
    print(f"   Recall    : {avg_recall:.4f}")
    
    # Tampilkan sample terbaik dan terburuk
    best_idx = np.argmax(results['f1'])
    worst_idx = np.argmin(results['f1'])
    
    print(f"\nSAMPLE TERBAIK (F1: {results['f1'][best_idx]:.3f}):")
    print(f"   Prompt: {prompts[best_idx]}")
    print(f"   Reference: {references[best_idx]}")
    print(f"   Prediksi: {generated[best_idx]}")
    
    print(f"\nSAMPLE TERBURUK (F1: {results['f1'][worst_idx]:.3f}):")
    print(f"   Prompt: {prompts[worst_idx]}")
    print(f"   Reference: {references[worst_idx]}")
    print(f"   Prediksi: {generated[worst_idx]}")
    
    # Log ke wandb
    if wandb.run:
        wandb.log({
            "eval/bertscore_f1": avg_f1,
            "eval/bertscore_precision": avg_precision,
            "eval/bertscore_recall": avg_recall
        })
        print("\nHasil sudah di-log ke wandb")
    
    print("\nEvaluasi selesai!")
    
except Exception as e:
    print(f"Error saat menghitung BERTScore: {e}")
    print("Coba gunakan model default atau cek koneksi internet")

--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3.11/logging/__init__.py", line 1114, in emit
    self.flush()
  File "/usr/lib/python3.11/logging/__init__.py", line 1094, in flush
    self.stream.flush()
OSError: [Errno 28] No space left on device
Call stack:
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/usr/local/lib/python3.11/dist-packages/colab_kernel_launcher.py", line 37, in <module>
    ColabKernelApp.launch_instance()
  File "/usr/local/lib/python3.11/dist-packages/traitlets/config/application.py", line 992, in launch_instance
    app.start()
  File "/usr/local/lib/python3.11/dist-packages/ipykernel/kernelapp.py", line 712, in start
    self.io_loop.start()
  File "/usr/local/lib/python3.11/dist-packages/tornado/platform/asyncio.py", line 211, in start
    self.asyncio_loop.run_forever()
  File "/usr/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
    self._run_


Menghitung BERTScore...

HASIL EVALUASI BERTSCORE (Format Per Sample)

📌 Sample S1
🔹 F1       : 0.663
🔹 Precision: 0.673
🔹 Recall   : 0.654
🔸 Prompt   : Bisa dijelaskan berapa semester dalam setahun dan adakah semester pendek?
🔸 Reference:
Satu tahun akademik terdiri atas 2 (dua) semester dan perguruan tinggi dapat menyelenggarakan semester antara.

(Sumber: Bab II Ketentuan Umum, Pasal 2 Daftar Istilah Ayat (38), Hal. 11 (Pedoman Akademik Unjaya 2024))
🔸 Prediksi :
Dalam sistem akademik yang biasa, satu tahun akademik terdiri dari dua atau tiga semester. Semester adalah periode waktu yang pendek, biasanya mengikuti tahun angka, di mana mahasiswa akan mempelajari materi tertentu selama semester tersebut. Jumlah semester dalam satu tahun berbeda-beda bergantung pada institusi pendidikan yang dilamar.

Contohnya, di beberapa universiti, satu tahun akademik terdiri dari tiga semester yang panjang (semester pertama, semester kedua, dan semester ketiga). Namun, di beberapa universiti lain,

--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3.11/logging/__init__.py", line 1114, in emit
    self.flush()
  File "/usr/lib/python3.11/logging/__init__.py", line 1094, in flush
    self.stream.flush()
OSError: [Errno 28] No space left on device
Call stack:
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/usr/local/lib/python3.11/dist-packages/colab_kernel_launcher.py", line 37, in <module>
    ColabKernelApp.launch_instance()
  File "/usr/local/lib/python3.11/dist-packages/traitlets/config/application.py", line 992, in launch_instance
    app.start()
  File "/usr/local/lib/python3.11/dist-packages/ipykernel/kernelapp.py", line 712, in start
    self.io_loop.start()
  File "/usr/local/lib/python3.11/dist-packages/tornado/platform/asyncio.py", line 211, in start
    self.asyncio_loop.run_forever()
  File "/usr/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
    self._run_

In [38]:
# Konversi F1 ke array NumPy
f1_scores = np.array(results["f1"])

# Klasifikasi skor F1
def classify_f1(f1):
    if f1 >= 0.70:
        return "≥ 0.70"
    elif f1 >= 0.60:
        return "0.60–0.69"
    else:
        return "< 0.60"

f1_categories = [classify_f1(score) for score in f1_scores]

# Hitung distribusi kategori
from collections import Counter

f1_distribution = Counter(f1_categories)
total_samples = len(f1_scores)

# Buat DataFrame laporan klasifikasi
classification_df = pd.DataFrame([
    {
        "Range F1 Score": k,
        "Jumlah Sampel": v,
        "Persentase (%)": f"{(v/total_samples)*100:.1f}%"
    }
    for k, v in sorted(f1_distribution.items(), reverse=True)
])

# Statistik ringkasan
f1_mean = np.mean(f1_scores)
f1_median = np.median(f1_scores)
f1_min = np.min(f1_scores)
f1_max = np.max(f1_scores)
f1_std = np.std(f1_scores)

# Tampilkan
print("\n" + "="*50)
print("BERTScore Classification Report")
print("="*50)
print(classification_df.to_string(index=False))

print("\nStatistik Ringkasan F1:")
print(f"  Mean   : {f1_mean:.4f}")
print(f"  Median : {f1_median:.4f}")
print(f"  Min    : {f1_min:.4f}")
print(f"  Max    : {f1_max:.4f}")
print(f"  Std Dev: {f1_std:.4f}")



BERTScore Classification Report
Range F1 Score  Jumlah Sampel Persentase (%)
        < 0.60              1           6.7%
     0.60–0.69             14          93.3%

Statistik Ringkasan F1:
  Mean   : 0.6421
  Median : 0.6432
  Min    : 0.5989
  Max    : 0.6737
  Std Dev: 0.0179


--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3.11/logging/__init__.py", line 1114, in emit
    self.flush()
  File "/usr/lib/python3.11/logging/__init__.py", line 1094, in flush
    self.stream.flush()
OSError: [Errno 28] No space left on device
Call stack:
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/usr/local/lib/python3.11/dist-packages/colab_kernel_launcher.py", line 37, in <module>
    ColabKernelApp.launch_instance()
  File "/usr/local/lib/python3.11/dist-packages/traitlets/config/application.py", line 992, in launch_instance
    app.start()
  File "/usr/local/lib/python3.11/dist-packages/ipykernel/kernelapp.py", line 712, in start
    self.io_loop.start()
  File "/usr/local/lib/python3.11/dist-packages/tornado/platform/asyncio.py", line 211, in start
    self.asyncio_loop.run_forever()
  File "/usr/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
    self._run_

# 6. SAVE MODEL

In [24]:
# Simpan adapter model (LoRA) ke folder adapter_dir
adapter_dir = "/kaggle/working/adapter-pedoman-akademik"
trainer.save_model(adapter_dir)
tokenizer.save_pretrained(adapter_dir)

Saving model checkpoint to /kaggle/working/adapter-pedoman-akademik
loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.3/snapshots/e0bc86c23ce5aae1db576c8cca6f06f1f73af2db/config.json
Model config MistralConfig {
  "architectures": [
    "MistralForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "head_dim": null,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 32768,
  "model_type": "mistral",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "rms_norm_eps": 1e-05,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.52.4",
  "use_cache": true,
  "vocab_size": 32768
}

chat template saved in /kaggle/working/adapter-pedoman-akademik/chat_template.jinja
tokenizer confi

('/kaggle/working/adapter-pedoman-akademik/tokenizer_config.json',
 '/kaggle/working/adapter-pedoman-akademik/special_tokens_map.json',
 '/kaggle/working/adapter-pedoman-akademik/chat_template.jinja',
 '/kaggle/working/adapter-pedoman-akademik/tokenizer.model',
 '/kaggle/working/adapter-pedoman-akademik/added_tokens.json',
 '/kaggle/working/adapter-pedoman-akademik/tokenizer.json')

In [26]:
# Merge adapter ke base model
merged_model = trainer.model.merge_and_unload()

# Nama dan path full model
full_model_name = f"{new_model_name}-merged"
full_model_dir = f"/kaggle/working/{full_model_name}"

# Simpan full model dan tokenizer ke lokal
merged_model.save_pretrained(full_model_dir)
tokenizer.save_pretrained(full_model_dir)

Configuration saved in /kaggle/working/riakrst/mistral-7b-pedoman-akademik-unjaya-merged/config.json
Configuration saved in /kaggle/working/riakrst/mistral-7b-pedoman-akademik-unjaya-merged/generation_config.json
The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at /kaggle/working/riakrst/mistral-7b-pedoman-akademik-unjaya-merged/model.safetensors.index.json.
chat template saved in /kaggle/working/riakrst/mistral-7b-pedoman-akademik-unjaya-merged/chat_template.jinja
tokenizer config file saved in /kaggle/working/riakrst/mistral-7b-pedoman-akademik-unjaya-merged/tokenizer_config.json
Special tokens file saved in /kaggle/working/riakrst/mistral-7b-pedoman-akademik-unjaya-merged/special_tokens_map.json


('/kaggle/working/riakrst/mistral-7b-pedoman-akademik-unjaya-merged/tokenizer_config.json',
 '/kaggle/working/riakrst/mistral-7b-pedoman-akademik-unjaya-merged/special_tokens_map.json',
 '/kaggle/working/riakrst/mistral-7b-pedoman-akademik-unjaya-merged/chat_template.jinja',
 '/kaggle/working/riakrst/mistral-7b-pedoman-akademik-unjaya-merged/tokenizer.model',
 '/kaggle/working/riakrst/mistral-7b-pedoman-akademik-unjaya-merged/added_tokens.json',
 '/kaggle/working/riakrst/mistral-7b-pedoman-akademik-unjaya-merged/tokenizer.json')

In [27]:
# sizenya gede banget
# # Upload model hasil merge ke Hugging Face Hub
# merged_model.push_to_hub(full_model_name)
# tokenizer.push_to_hub(full_model_name)
# print(f"Full model berhasil disimpan di: {full_model_dir}")
# print(f"Full model berhasil diupload ke: https://huggingface.co/{full_model_name}")

Configuration saved in /tmp/tmpt2tjx1at/config.json
Configuration saved in /tmp/tmpt2tjx1at/generation_config.json
The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at /tmp/tmpt2tjx1at/model.safetensors.index.json.
Uploading the following files to riakrst/mistral-7b-pedoman-akademik-unjaya-merged: generation_config.json,config.json,README.md,model-00001-of-00002.safetensors,model-00002-of-00002.safetensors,model.safetensors.index.json


Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/537M [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.46G [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

chat template saved in /tmp/tmp6tcc5ybv/chat_template.jinja
tokenizer config file saved in /tmp/tmp6tcc5ybv/tokenizer_config.json
Special tokens file saved in /tmp/tmp6tcc5ybv/special_tokens_map.json
Uploading the following files to riakrst/mistral-7b-pedoman-akademik-unjaya-merged: special_tokens_map.json,tokenizer.json,tokenizer_config.json,chat_template.jinja,tokenizer.model,README.md


tokenizer.model:   0%|          | 0.00/587k [00:00<?, ?B/s]

Full model berhasil disimpan di: /kaggle/working/riakrst/mistral-7b-pedoman-akademik-unjaya-merged
Full model berhasil diupload ke: https://huggingface.co/riakrst/mistral-7b-pedoman-akademik-unjaya-merged


In [28]:
# Upload adapter jika disetel di konfigurasi
if sft_config.push_to_hub:
    trainer.push_to_hub()
    print(f"Adapter (LoRA) juga berhasil diupload ke: https://huggingface.co/{new_model_name}")
else:
    print("Adapter tidak diupload otomatis (karena sft_config.push_to_hub = False)")
    print(f"Untuk upload manual: trainer.model.push_to_hub('{new_model_name}')")


Saving model checkpoint to /kaggle/working/results-pedoman-akademik
loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-Instruct-v0.3/snapshots/e0bc86c23ce5aae1db576c8cca6f06f1f73af2db/config.json
Model config MistralConfig {
  "architectures": [
    "MistralForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "head_dim": null,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 32768,
  "model_type": "mistral",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "rms_norm_eps": 1e-05,
  "rope_theta": 1000000.0,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.52.4",
  "use_cache": true,
  "vocab_size": 32768
}

chat template saved in /kaggle/working/results-pedoman-akademik/chat_template.jinja
tokenizer confi

adapter_model.safetensors:   0%|          | 0.00/40.0 [00:00<?, ?B/s]

Adapter (LoRA) juga berhasil diupload ke: https://huggingface.co/riakrst/mistral-7b-pedoman-akademik-unjaya


In [29]:
print("PROSES MERGE DAN UPLOAD SELESAI!")
print("="*60)
print(f"Full model tersedia di:   https://huggingface.co/{full_model_name}")
print(f"LoRA adapter :  https://huggingface.co/{new_model_name}")
print("\nUntuk inference:")
print(f"model = AutoModelForCausalLM.from_pretrained('{full_model_name}')")

PROSES MERGE DAN UPLOAD SELESAI!
Full model tersedia di:   https://huggingface.co/riakrst/mistral-7b-pedoman-akademik-unjaya-merged
LoRA adapter :  https://huggingface.co/riakrst/mistral-7b-pedoman-akademik-unjaya

Untuk inference:
model = AutoModelForCausalLM.from_pretrained('riakrst/mistral-7b-pedoman-akademik-unjaya-merged')


In [54]:
!du -sh /root/.cache
!du -sh ~/.cache
!du -sh /kaggle/working/*  # Ulangi pengecekan


--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3.11/logging/__init__.py", line 1114, in emit
    self.flush()
  File "/usr/lib/python3.11/logging/__init__.py", line 1094, in flush
    self.stream.flush()
OSError: [Errno 28] No space left on device
Call stack:
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/usr/local/lib/python3.11/dist-packages/colab_kernel_launcher.py", line 37, in <module>
    ColabKernelApp.launch_instance()
  File "/usr/local/lib/python3.11/dist-packages/traitlets/config/application.py", line 992, in launch_instance
    app.start()
  File "/usr/local/lib/python3.11/dist-packages/ipykernel/kernelapp.py", line 712, in start
    self.io_loop.start()
  File "/usr/local/lib/python3.11/dist-packages/tornado/platform/asyncio.py", line 211, in start
    self.asyncio_loop.run_forever()
  File "/usr/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
    self._run_

33G	/root/.cache
33G	/root/.cache
357M	/kaggle/working/adapter-pedoman-akademik
0	/kaggle/working/adapter-pedoman-akademik.zip
56K	/kaggle/working/eval_chatml.jsonl
0	/kaggle/working/mistral-7b-pedoman-akademik-unjaya-merged.zip
1.1G	/kaggle/working/results-pedoman-akademik
4.7G	/kaggle/working/riakrst
692K	/kaggle/working/train_chatml.jsonl
480K	/kaggle/working/wandb


--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3.11/logging/__init__.py", line 1114, in emit
    self.flush()
  File "/usr/lib/python3.11/logging/__init__.py", line 1094, in flush
    self.stream.flush()
OSError: [Errno 28] No space left on device
Call stack:
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/usr/local/lib/python3.11/dist-packages/colab_kernel_launcher.py", line 37, in <module>
    ColabKernelApp.launch_instance()
  File "/usr/local/lib/python3.11/dist-packages/traitlets/config/application.py", line 992, in launch_instance
    app.start()
  File "/usr/local/lib/python3.11/dist-packages/ipykernel/kernelapp.py", line 712, in start
    self.io_loop.start()
  File "/usr/local/lib/python3.11/dist-packages/tornado/platform/asyncio.py", line 211, in start
    self.asyncio_loop.run_forever()
  File "/usr/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
    self._run_

In [59]:
!du -sh /root/.cache/* | sort -hr | head -20


16G	/root/.cache/uv
15G	/root/.cache/huggingface
1.7G	/root/.cache/pip
310M	/root/.cache/jedi
56M	/root/.cache/node-gyp
96K	/root/.cache/matplotlib
20K	/root/.cache/wandb


In [61]:
import shutil

# Lokasi folder adapter
adapter_dir = "/kaggle/working/adapter-pedoman-akademik"
adapter_zip = "/kaggle/working/adapter-pedoman-akademik.zip"

# Buat zip dari folder adapter
shutil.make_archive(adapter_zip.replace(".zip", ""), 'zip', adapter_dir)
print(f"✅ Adapter berhasil diarsipkan: {adapter_zip}")

✅ Adapter berhasil diarsipkan: /kaggle/working/adapter-pedoman-akademik.zip


In [None]:
# Model terlalu besar untuk diunduh lokal
# Lokasi folder full model 
full_model_dir = "/kaggle/working/riakrst/mistral-7b-pedoman-akademik-unjaya-merged"
full_model_zip = "/kaggle/working/mistral-7b-pedoman-akademik-unjaya-merged.zip"

# Buat zip dari folder full model
shutil.make_archive(full_model_zip.replace(".zip", ""), 'zip', full_model_dir)
print(f"✅ Full merged model berhasil diarsipkan: {full_model_zip}")

✅ Full merged model berhasil diarsipkan: /kaggle/working/mistral-7b-pedoman-akademik-unjaya-merged.zip


# 7. SIMPLE INFERENCE

In [55]:
# Contoh inference
test_questions = [
    "Bagaimana cara mengajukan cuti akademik?",
    "Berapa lama masa studi maksimal untuk S1?",
]

print("\nContoh Inference:")
print("=" * 50)

for i, question in enumerate(test_questions, 1):
    try:
        messages = [{"role": "user", "content": question}]
        
        result = pipe(
            messages,
            max_new_tokens=256,
            do_sample=True,
            temperature=0.7,
            top_p=0.95,
            pad_token_id=tokenizer.eos_token_id
        )
        
        response = result[0]['generated_text'].strip()
        
        print(f"\n{i}. Pertanyaan: {question}")
        print(f"   Jawaban: {response}")
        print("-" * 50)
        
    except Exception as e:
        print(f"Error pada pertanyaan {i}: {e}")



Contoh Inference:

1. Pertanyaan: Bagaimana cara mengajukan cuti akademik?
   Jawaban: Untuk mengajukan cuti akademik di sekolah atau perusahaan, anda perlu menyusun surat permohonan cuti akademik yang sesuai. Berikut adalah langkah-langkah umum yang dapat dilakukan:

1. Tempatkan surat permohonan cuti akademik di bawah anda.
2. Tulis alamat, nomor telepon, dan email anda di bagian atas surat.
3. Tulis nama institusi atau perusahaan yang anda kerjakan di bagian atas surat.
4. Tulis nama pengurus atau pengelola yang bertanggung jawab atas anda di bagian atas surat.
5. Tulis tanggal surat di bagian atas surat.
6. Tulis tujuan cuti akademik anda di bagian pertama surat. Contoh: "Saya mengajukan cuti ak
--------------------------------------------------

2. Pertanyaan: Berapa lama masa studi maksimal untuk S1?
   Jawaban: Masa studi maksimal untuk S1 di Indonesia adalah 4 tahun atau 8 semester, mengikut Aturan Ketetapan Pemerintah Nomor 2 Tahun 2012 tentang Peraturan Pemerintah Nomor 123 

In [70]:
# Cleanup
if wandb.run:
    wandb.finish()

0,1
eval/bertscore_f1,▁▁▁
eval/bertscore_precision,▁▁▁
eval/bertscore_recall,▁▁▁
eval/loss,█▃▂▁▁
eval/mean_token_accuracy,▁▅▇██
eval/num_tokens,▁▃▅▇█
eval/runtime,▃▁▃▄█
eval/samples_per_second,▆█▆▅▁
eval/steps_per_second,▆█▆▆▁
train/epoch,▁▁▃▃▅▅▇▇██

0,1
eval/bertscore_f1,0.64212
eval/bertscore_precision,0.63229
eval/bertscore_recall,0.65254
eval/loss,0.91808
eval/mean_token_accuracy,0.80338
eval/num_tokens,239036.0
eval/runtime,57.3104
eval/samples_per_second,2.129
eval/steps_per_second,1.064
total_flos,1.2234108777086976e+16


In [69]:
# Bersihkan GPU memory
del trainer
del merged_model
torch.cuda.empty_cache()

In [71]:
# Set logging back to normal
logging.set_verbosity(logging.WARNING)