# ü§ñ Eksperimen 2: Hybrid ASR + LLM Post-Processing
**Researcher:** Muhammad Hendika Putra  
**Method:** Pipeline ASR (Whisper) -> Text Correction (Gemini Pro)  
**Dataset:** Google FLEURS (Indonesia)

---
### üí° Hipotesis
Model ASR sering menghasilkan teks "mentah" (tanpa tanda baca/kapitalisasi) atau salah eja pada nama entitas. LLM dapat bertindak sebagai *Post-Processor* untuk memperbaiki struktur kalimat tanpa mengubah makna, sehingga menurunkan **WER (Word Error Rate)** dan meningkatkan keterbacaan.

### ‚öôÔ∏è Alur Kerja (Pipeline)
1.  **Input Audio:** Suara manusia (Dataset FLEURS).
2.  **ASR Engine:** Mengubah suara menjadi Teks Mentah (*Raw Transcript*).
3.  **LLM Corrector:** Mengirim Teks Mentah ke Gemini dengan prompt khusus untuk perbaikan tata bahasa.
4.  **Evaluasi:** Membandingkan WER antara (Raw vs Ground Truth) dan (LLM vs Ground Truth).

Selain library audio, kita butuh google-generativeai untuk mengakses Gemini API. Kita juga tetap menggunakan datasets==2.19.0 untuk menghindari error trust_remote_code seperti sebelumnya.

In [18]:
# @title 1. Instalasi Library & Gemini SDK
%pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
%pip install "datasets==2.19.0" transformers librosa soundfile torch accelerate jiwer pandas seaborn matplotlib google-generativeai --quiet

print("‚úÖ Environment Siap.")

Looking in indexes: https://download.pytorch.org/whl/cu124
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.2 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.
‚úÖ Environment Siap.



[notice] A new release of pip is available: 25.2 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [19]:
# @title 2. Setup Google Gemini API

import getpass
import os
import google.generativeai as genai

# Input API Key secara aman (tidak terlihat di layar)
print("Silakan masukkan Google AI Studio API Key Anda:")
GOOGLE_API_KEY = input().strip()

# Konfigurasi Library
genai.configure(api_key=GOOGLE_API_KEY)

print("Daftar model yang tersedia untuk Anda:")
for m in genai.list_models():
    if 'generateContent' in m.supported_generation_methods:
        print(m.name)

# Test Koneksi Sederhana
try:
    model_test = genai.GenerativeModel('gemini-2.5-flash')
    response = model_test.generate_content("Sapa saya dalam satu kata.")
    print(f"‚úÖ Koneksi Berhasil! Gemini menjawab: {response.text}")
except Exception as e:
    print(f"‚ùå Koneksi Gagal: {e}")

Silakan masukkan Google AI Studio API Key Anda:
Daftar model yang tersedia untuk Anda:
models/gemini-2.5-flash
models/gemini-2.5-pro
models/gemini-2.0-flash-exp
models/gemini-2.0-flash
models/gemini-2.0-flash-001
models/gemini-2.0-flash-exp-image-generation
models/gemini-2.0-flash-lite-001
models/gemini-2.0-flash-lite
models/gemini-2.0-flash-lite-preview-02-05
models/gemini-2.0-flash-lite-preview
models/gemini-exp-1206
models/gemini-2.5-flash-preview-tts
models/gemini-2.5-pro-preview-tts
models/gemma-3-1b-it
models/gemma-3-4b-it
models/gemma-3-12b-it
models/gemma-3-27b-it
models/gemma-3n-e4b-it
models/gemma-3n-e2b-it
models/gemini-flash-latest
models/gemini-flash-lite-latest
models/gemini-pro-latest
models/gemini-2.5-flash-lite
models/gemini-2.5-flash-image
models/gemini-2.5-flash-preview-09-2025
models/gemini-2.5-flash-lite-preview-09-2025
models/gemini-3-pro-preview
models/gemini-3-flash-preview
models/gemini-3-pro-image-preview
models/nano-banana-pro-preview
models/gemini-robotics-e

Prompt Engineering: Perhatikan variabel prompt_template di dalam class. Kita memberi instruksi spesifik: "Perbaiki tanda baca dan kapitalisasi. JANGAN ubah kata-katanya." Ini penting agar LLM tidak malah mengarang bebas (halusinasi).

In [20]:
import torch
from transformers import pipeline
import time
from jiwer import wer
import pandas as pd
from tqdm import tqdm
import google.generativeai as genai
import soundfile as sf

class ASRModule:
    """Modul 1: Pengenalan Suara (The Ear)"""
    def __init__(self, model_id="openai/whisper-tiny"):
        self.device = "cuda:0" if torch.cuda.is_available() else "cpu"
        print(f"üëÇ Loading ASR Model: {model_id}...")
        self.pipe = pipeline(
            "automatic-speech-recognition",
            model=model_id,
            chunk_length_s=30,
            device=self.device
        )

    def transcribe(self, audio_array):
        # Pakai 'indonesian' agar model tidak bingung
        res = self.pipe(audio_array, batch_size=1, generate_kwargs={"language": "indonesian"})
        return res['text']

class LLMCorrector:
    """Modul 2: Perbaikan Teks (The Brain)"""
    def __init__(self):
        # PERBAIKAN DI SINI: Menggunakan model terbaru 'gemini-2.5-flash'
        # Model ini lebih cepat, lebih pintar, dan tersedia di Free Tier
        print("üß† Loading LLM Model: Gemini 2.5 Flash...")
        self.model = genai.GenerativeModel('gemini-2.5-flash')

    def correct_text(self, raw_text):
        """
        Mengirim teks mentah ke Gemini untuk diperbaiki.
        """
        # --- PROMPT ENGINEERING ---
        prompt = f"""
        Anda adalah editor Bahasa Indonesia profesional. Tugas Anda memperbaiki teks transkripsi ASR.

        Instruksi:
        1. Perbaiki tanda baca (titik, koma, tanda tanya).
        2. Perbaiki huruf kapital (nama orang, awal kalimat, nama tempat).
        3. Perbaiki ejaan yang sangat jelas salah (typo).
        4. JANGAN mengubah susunan kata atau makna kalimat.
        5. Output HANYA teks yang sudah diperbaiki, tanpa basa-basi.

        Teks Mentah: "{raw_text}"
        Teks Perbaikan:
        """

        try:
            # Kirim ke Gemini
            response = self.model.generate_content(prompt)
            # Bersihkan whitespace
            return response.text.strip()
        except Exception as e:
            # Jika API error, print errornya sedikit agar tahu kenapa
            print(f"\n[LLM Error]: {e}")
            return raw_text

class HybridPipeline:
    """Manager yang menghubungkan ASR dan LLM"""
    def __init__(self):
        self.asr = ASRModule()
        self.llm = LLMCorrector()
        self.results = []

    def process_dataset(self, dataset_stream, num_samples=10):
        print(f"üöÄ Memulai Pipeline Hybrid pada {num_samples} data...")

        counter = 0
        for sample in tqdm(dataset_stream, total=num_samples):
            if counter >= num_samples: break

            # 1. Ambil Data
            audio = sample['audio']['array']
            ground_truth = sample['raw_transcription']

            # 2. Tahap ASR (Raw)
            raw_text = self.asr.transcribe(audio)

            # 3. Tahap LLM (Correction)
            # Jeda agar aman rate limit
            time.sleep(2.0)
            corrected_text = self.llm.correct_text(raw_text)

            # 4. Hitung WER (Evaluasi)
            # Handle error jika WER gagal hitung (misal teks kosong)
            try:
                wer_raw = wer(ground_truth.lower(), raw_text.lower())
                wer_llm = wer(ground_truth.lower(), corrected_text.lower())
            except:
                wer_raw = 1.0
                wer_llm = 1.0

            self.results.append({
                "Ground_Truth": ground_truth,
                "ASR_Raw": raw_text,
                "ASR_LLM": corrected_text,
                "WER_Raw": wer_raw,
                "WER_LLM": wer_llm,
                "Improvement": wer_raw - wer_llm
            })
            counter += 1

        return pd.DataFrame(self.results)
    
class HybridPipelineEnhanced:
    """Manager yang menghubungkan ASR dan LLM dengan fitur Export Lengkap"""
    def __init__(self, asr_module, llm_module):
        self.asr = asr_module
        self.llm = llm_module
        self.results = []

    def process_dataset(self, dataset_stream, num_samples=30):
        print(f"üöÄ Memulai Pipeline pada {num_samples} data (dengan export audio & split time)...")

        counter = 0

        output_folder = "exp2_asr_plus_llm_results2_audio_samples"
        os.makedirs(output_folder, exist_ok=True)

        for sample in tqdm(dataset_stream, total=num_samples):
            if counter >= num_samples: break

            # 1. Ambil Data Audio & Metadata
            audio_array = sample['audio']['array']
            sampling_rate = sample['audio']['sampling_rate']
            ground_truth = sample['raw_transcription']
            
            # Buat nama file audio unik (gunakan path asli jika ada, atau index)
            # Dataset streaming kadang path-nya panjang/aneh, kita pakai index agar rapi
           
            audio_filename = f"sample_{counter:03d}.wav"
            audio_path = os.path.join(output_folder, audio_filename)
            
            # Simpan file audio .wav
            sf.write(audio_path, audio_array, sampling_rate)

            # 2. Tahap ASR (Raw) + Hitung Waktu
            start_asr = time.time()
            raw_text = self.asr.transcribe(audio_array)
            end_asr = time.time()
            time_asr = end_asr - start_asr

            # 3. Tahap LLM (Correction) + Hitung Waktu
            # Kita pisahkan sleep dari perhitungan waktu proses LLM
            time.sleep(2.0) # Jeda rate limit (tidak dihitung sebagai processing time)
            
            start_llm = time.time()
            corrected_text = self.llm.correct_text(raw_text)
            end_llm = time.time()
            time_llm = end_llm - start_llm
            
            # Total waktu proses (mesin bekerja)
            total_proc_time = time_asr + time_llm

            # 4. Hitung WER (Evaluasi)
            try:
                wer_raw = wer(ground_truth.lower(), raw_text.lower())
                wer_llm = wer(ground_truth.lower(), corrected_text.lower())
            except:
                wer_raw = 1.0
                wer_llm = 1.0

            # 5. Simpan Data Lengkap
            self.results.append({
                "Voice": audio_filename,         # Nama file audio
                "Model": "Whisper+Gemini",       # Nama Model
                "Ref": ground_truth,             # Referensi
                "Pred_Raw": raw_text,            # Hasil ASR Mentah
                "Pred": corrected_text,          # Hasil Akhir (Prediksi)
                "WER_Raw": wer_raw,              # WER Mentah
                "WER_LLM": wer_llm,              # WER Akhir
                "Time_ASR": time_asr,            # Waktu Whisper (detik)
                "Time_LLM": time_llm,            # Waktu Gemini (detik)
                "Time_Total": total_proc_time,   # Total Waktu (detik)
            })
            counter += 1

        return pd.DataFrame(self.results)

In [21]:
# @title 3. Load Dataset
from datasets import load_dataset

print("üì° Connect to Google FLEURS...")
dataset_stream = load_dataset(
    "google/fleurs",
    "id_id",
    split="test",
    streaming=True,
    trust_remote_code=True
)
print("‚úÖ Siap.")

üì° Connect to Google FLEURS...
‚úÖ Siap.


Kita akan menjalankan pipeline ini pada 15-20 sampel saja. Kenapa sedikit? Karena API LLM (Gemini) butuh waktu untuk berpikir (latency) dan versi gratis memiliki batasan jumlah request per menit (Rate Limit). 15 sampel cukup untuk pembuktian konsep.

In [22]:
# @title 4. Jalankan Pipeline
# pipeline = HybridPipeline()
pipeline = HybridPipelineEnhanced(ASRModule(), LLMCorrector())

# Jalankan pada 15 sampel
# (Proses akan memakan waktu karena ada jeda sleep agar API aman)
df_results = pipeline.process_dataset(dataset_stream, num_samples=15)

print("\n‚úÖ Proses Selesai!")

# --- Exporting Results ---

# 1. Export CSV
csv_filename = "exp2_asr_plus_llm_results2.csv"
# Mengatur urutan kolom agar rapi
final_columns = ["Voice", "Model", "Ref", "Pred", "WER_Raw", "WER_LLM", "Time_Total", "Time_ASR", "Time_LLM", "Pred_Raw"]
df_results[final_columns].to_csv(csv_filename, index=False, sep=';')
print(f"\n‚úÖ CSV berhasil diexport: {csv_filename}")

üëÇ Loading ASR Model: openai/whisper-tiny...


Device set to use cuda:0


üß† Loading LLM Model: Gemini 2.5 Flash...
üöÄ Memulai Pipeline pada 15 data (dengan export audio & split time)...


 27%|‚ñà‚ñà‚ñã       | 4/15 [00:44<02:00, 10.92s/it]


[LLM Error]: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/rate-limit. 
* Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_requests, limit: 20, model: gemini-2.5-flash
Please retry in 18.690455254s. [links {
  description: "Learn more about Gemini API quotas"
  url: "https://ai.google.dev/gemini-api/docs/rate-limits"
}
, violations {
  quota_metric: "generativelanguage.googleapis.com/generate_content_free_tier_requests"
  quota_id: "GenerateRequestsPerDayPerProjectPerModel-FreeTier"
  quota_dimensions {
    key: "model"
    value: "gemini-2.5-flash"
  }
  quota_dimensions {
    key: "location"
    value: "global"
  }
  quota_value: 20
}
, retry_delay {
  seconds: 18
}
]


 33%|‚ñà‚ñà‚ñà‚ñé      | 5/15 [00:47<01:20,  8.03s/it]


[LLM Error]: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/rate-limit. 
* Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_requests, limit: 20, model: gemini-2.5-flash
Please retry in 15.77018978s. [links {
  description: "Learn more about Gemini API quotas"
  url: "https://ai.google.dev/gemini-api/docs/rate-limits"
}
, violations {
  quota_metric: "generativelanguage.googleapis.com/generate_content_free_tier_requests"
  quota_id: "GenerateRequestsPerDayPerProjectPerModel-FreeTier"
  quota_dimensions {
    key: "model"
    value: "gemini-2.5-flash"
  }
  quota_dimensions {
    key: "location"
    value: "global"
  }
  quota_value: 20
}
, retry_delay {
  seconds: 15
}
]


 40%|‚ñà‚ñà‚ñà‚ñà      | 6/15 [00:50<00:56,  6.25s/it]


[LLM Error]: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/rate-limit. 
* Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_requests, limit: 20, model: gemini-2.5-flash
Please retry in 12.981193354s. [links {
  description: "Learn more about Gemini API quotas"
  url: "https://ai.google.dev/gemini-api/docs/rate-limits"
}
, violations {
  quota_metric: "generativelanguage.googleapis.com/generate_content_free_tier_requests"
  quota_id: "GenerateRequestsPerDayPerProjectPerModel-FreeTier"
  quota_dimensions {
    key: "model"
    value: "gemini-2.5-flash"
  }
  quota_dimensions {
    key: "location"
    value: "global"
  }
  quota_value: 20
}
, retry_delay {
  seconds: 12
}
]


 47%|‚ñà‚ñà‚ñà‚ñà‚ñã     | 7/15 [00:52<00:40,  5.06s/it]


[LLM Error]: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/rate-limit. 
* Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_requests, limit: 20, model: gemini-2.5-flash
Please retry in 10.365055829s. [links {
  description: "Learn more about Gemini API quotas"
  url: "https://ai.google.dev/gemini-api/docs/rate-limits"
}
, violations {
  quota_metric: "generativelanguage.googleapis.com/generate_content_free_tier_requests"
  quota_id: "GenerateRequestsPerDayPerProjectPerModel-FreeTier"
  quota_dimensions {
    key: "model"
    value: "gemini-2.5-flash"
  }
  quota_dimensions {
    key: "location"
    value: "global"
  }
  quota_value: 20
}
, retry_delay {
  seconds: 10
}
]


 53%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé    | 8/15 [00:55<00:31,  4.48s/it]


[LLM Error]: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/rate-limit. 
* Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_requests, limit: 20, model: gemini-2.5-flash
Please retry in 7.136193393s. [links {
  description: "Learn more about Gemini API quotas"
  url: "https://ai.google.dev/gemini-api/docs/rate-limits"
}
, violations {
  quota_metric: "generativelanguage.googleapis.com/generate_content_free_tier_requests"
  quota_id: "GenerateRequestsPerDayPerProjectPerModel-FreeTier"
  quota_dimensions {
    key: "model"
    value: "gemini-2.5-flash"
  }
  quota_dimensions {
    key: "location"
    value: "global"
  }
  quota_value: 20
}
, retry_delay {
  seconds: 7
}
]


 60%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà    | 9/15 [00:59<00:39,  6.56s/it]


[LLM Error]: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. To monitor your current usage, head to: https://ai.dev/rate-limit. 
* Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_requests, limit: 20, model: gemini-2.5-flash
Please retry in 4.181143445s. [links {
  description: "Learn more about Gemini API quotas"
  url: "https://ai.google.dev/gemini-api/docs/rate-limits"
}
, violations {
  quota_metric: "generativelanguage.googleapis.com/generate_content_free_tier_requests"
  quota_id: "GenerateRequestsPerDayPerProjectPerModel-FreeTier"
  quota_dimensions {
    key: "model"
    value: "gemini-2.5-flash"
  }
  quota_dimensions {
    key: "location"
    value: "global"
  }
  quota_value: 20
}
, retry_delay {
  seconds: 4
}
]





KeyboardInterrupt: 

Bagian ini sangat menarik untuk laporan.

* Tabel Komparasi: Kita akan melihat langsung bagaimana Gemini mengubah "halo nama saya budi" menjadi "Halo, nama saya Budi."

* Grafik Perubahan: Apakah garis WER turun setelah diperbaiki LLM?

In [None]:
# @title 5. Laporan & Visualisasi
import matplotlib.pyplot as plt
import seaborn as sns

# 1. Tampilkan Perbandingan Teks (Kualitatif)
print("üîç CONTOH PERBAIKAN LLM:")
pd.set_option('display.max_colwidth', None) # Agar teks panjang terlihat
display(df_results[['ASR_Raw', 'ASR_LLM', 'Ground_Truth', 'WER_Raw', 'WER_LLM']].head(5))

# 2. Hitung Rata-Rata Peningkatan
avg_wer_raw = df_results['WER_Raw'].mean()
avg_wer_llm = df_results['WER_LLM'].mean()
improvement_pct = ((avg_wer_raw - avg_wer_llm) / avg_wer_raw) * 100

print("\n" + "="*40)
print(f"üìä STATISTIK AKHIR")
print("="*40)
print(f"Rata-rata WER ASR Murni : {avg_wer_raw:.4f}")
print(f"Rata-rata WER + Gemini  : {avg_wer_llm:.4f}")
print(f"Peningkatan Kinerja     : {improvement_pct:.2f}%")

# 3. Visualisasi (Bar Chart Comparison)
plt.figure(figsize=(8, 5))
data_plot = pd.DataFrame({
    'Metode': ['ASR Murni (Raw)', 'ASR + LLM (Hybrid)'],
    'WER Score': [avg_wer_raw, avg_wer_llm]
})

sns.barplot(x='Metode', y='WER Score', data=data_plot, palette=['gray', 'green'])
plt.title("Dampak LLM Post-Processing terhadap Error Rate")
plt.ylabel("Word Error Rate (Lebih Rendah Lebih Baik)")
plt.ylim(0, max(avg_wer_raw, avg_wer_llm) * 1.2) # Memberi ruang di atas bar

# Menambahkan label angka di atas bar
for index, row in data_plot.iterrows():
    plt.text(index, row['WER Score'] + 0.01, f"{row['WER Score']:.3f}", color='black', ha="center")

plt.show()

üîç CONTOH PERBAIKAN LLM:


KeyError: "['ASR_Raw', 'ASR_LLM', 'Ground_Truth'] not in index"