<a href="https://colab.research.google.com/github/nuriddinovN/practice_nlp/blob/main/testing_stt_models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!jupyter nbconvert --to notebook --output cleaned_notebook.ipynb --ClearMetadataPreprocessor.enabled=True your_notebook.ipynb

This application is used to convert notebook files (*.ipynb)
        to various other formats.


Options
The options below are convenience aliases to configurable class-options,
as listed in the "Equivalent to" description-line of the aliases.
To see all configurable class-options for some <cmd>, use:
    <cmd> --help-all

--debug
    set log level to logging.DEBUG (maximize logging output)
    Equivalent to: [--Application.log_level=10]
--show-config
    Show the application's configuration (human-readable format)
    Equivalent to: [--Application.show_config=True]
--show-config-json
    Show the application's configuration (json format)
    Equivalent to: [--Application.show_config_json=True]
--generate-config
    generate default config file
    Equivalent to: [--JupyterApp.generate_config=True]
-y
    Answer yes to any questions instead of prompting.
    Equivalent to: [--JupyterApp.answer_yes=True]
--execute
    Execute the notebook prior to export.
    Equivalent to: [--ExecutePr

In [2]:
!pip install transformers torch torchaudio librosa jiwer matplotlib seaborn plotly pandas numpy psutil



In [3]:
import torch
import torchaudio
import librosa
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import time
import psutil
import os
from pathlib import Path
from transformers import WhisperProcessor, WhisperForConditionalGeneration
from jiwer import wer, cer
import warnings
warnings.filterwarnings('ignore')


  from .autonotebook import tqdm as notebook_tqdm


In [4]:
plt.style.use('default')
sns.set_palette("husl")


In [5]:
class ModelEvaluator:
    def __init__(self):
        self.results = {}
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        print(f"Using device: {self.device}")

    def load_audio(self, audio_path, target_sr=16000):
        """Load and preprocess audio file"""
        try:
            # Load audio using librosa (more robust)
            audio, sr = librosa.load(audio_path, sr=target_sr)
            return audio, sr
        except Exception as e:
            print(f"Error loading audio {audio_path}: {e}")
            return None, None

    def evaluate_single_model(self, model_name, audio_files, reference_texts=None, verbose=True):
        """
        Evaluate a single model on given audio files

        Args:
            model_name: HuggingFace model name (e.g., "shivkumarganesh/whisper-small-uz-v1")
            audio_files: List of audio file paths or single audio file path
            reference_texts: List of reference transcriptions (optional, for WER/CER calculation)
            verbose: Print detailed logs
        """
        if isinstance(audio_files, str):
            audio_files = [audio_files]

        if verbose:
            print(f"\n{'='*60}")
            print(f"🔄 EVALUATING MODEL: {model_name}")
            print(f"{'='*60}")

        # Initialize metrics
        metrics = {
            'model_name': model_name,
            'transcriptions': [],
            'load_time': 0,
            'inference_times': [],
            'memory_usage': {'before': 0, 'after': 0, 'peak': 0},
            'model_size_mb': 0,
            'audio_durations': [],
            'processing_speed_ratio': [],  # How many times faster than real-time
            'wer_scores': [],
            'cer_scores': [],
            'errors': []
        }

        try:
            # Memory before loading
            process = psutil.Process()
            metrics['memory_usage']['before'] = process.memory_info().rss / 1024 / 1024

            # Load model and processor
            if verbose:
                print("📥 Loading model and processor...")
            start_load = time.time()

            processor = WhisperProcessor.from_pretrained(model_name)
            model = WhisperForConditionalGeneration.from_pretrained(model_name)
            model.to(self.device)

            metrics['load_time'] = time.time() - start_load
            metrics['model_size_mb'] = sum(p.numel() * p.element_size() for p in model.parameters()) / (1024 * 1024)

            # Memory after loading
            metrics['memory_usage']['after'] = process.memory_info().rss / 1024 / 1024

            if verbose:
                print(f"✅ Model loaded in {metrics['load_time']:.2f}s")
                print(f"📊 Model size: {metrics['model_size_mb']:.1f} MB")
                print(f"💾 Memory usage: {metrics['memory_usage']['after'] - metrics['memory_usage']['before']:.1f} MB")

            # Process each audio file
            for i, audio_path in enumerate(audio_files):
                if verbose:
                    print(f"\n🎵 Processing audio {i+1}/{len(audio_files)}: {Path(audio_path).name}")

                # Load audio
                audio, sr = self.load_audio(audio_path)
                if audio is None:
                    metrics['errors'].append(f"Failed to load {audio_path}")
                    continue

                audio_duration = len(audio) / sr
                metrics['audio_durations'].append(audio_duration)

                # Transcribe
                start_inference = time.time()

                try:
                    # Prepare inputs
                    inputs = processor(audio, sampling_rate=sr, return_tensors="pt").to(self.device)

                    # Generate transcription
                    with torch.no_grad():
                        predicted_ids = model.generate(
                            inputs["input_features"],
                            max_length=448,
                            num_beams=5,
                            do_sample=False,
                            task="transcribe",
                            language="uz"  # Uzbek language code
                        )

                    transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
                    inference_time = time.time() - start_inference

                    metrics['transcriptions'].append(transcription)
                    metrics['inference_times'].append(inference_time)

                    # Calculate processing speed ratio
                    speed_ratio = audio_duration / inference_time
                    metrics['processing_speed_ratio'].append(speed_ratio)

                    if verbose:
                        print(f"📝 Transcription: '{transcription}'")
                        print(f"⏱️  Inference time: {inference_time:.2f}s")
                        print(f"🚀 Speed ratio: {speed_ratio:.1f}x real-time")

                    # Calculate WER and CER if reference provided
                    if reference_texts and i < len(reference_texts):
                        wer_score = wer([reference_texts[i]], [transcription])
                        cer_score = cer([reference_texts[i]], [transcription])
                        metrics['wer_scores'].append(wer_score)
                        metrics['cer_scores'].append(cer_score)

                        if verbose:
                            print(f"📊 WER: {wer_score:.3f} ({wer_score*100:.1f}%)")
                            print(f"📊 CER: {cer_score:.3f} ({cer_score*100:.1f}%)")

                except Exception as e:
                    error_msg = f"Inference error for {audio_path}: {e}"
                    metrics['errors'].append(error_msg)
                    if verbose:
                        print(f"❌ {error_msg}")

            # Peak memory usage
            metrics['memory_usage']['peak'] = process.memory_info().rss / 1024 / 1024

            # Calculate aggregate metrics
            if metrics['inference_times']:
                metrics['avg_inference_time'] = np.mean(metrics['inference_times'])
                metrics['avg_speed_ratio'] = np.mean(metrics['processing_speed_ratio'])
                metrics['total_audio_duration'] = sum(metrics['audio_durations'])
                metrics['total_processing_time'] = sum(metrics['inference_times'])

            if metrics['wer_scores']:
                metrics['avg_wer'] = np.mean(metrics['wer_scores'])
                metrics['avg_cer'] = np.mean(metrics['cer_scores'])

            if verbose:
                print(f"\n📈 SUMMARY FOR {model_name}:")
                print(f"   Average inference time: {metrics.get('avg_inference_time', 0):.2f}s")
                print(f"   Average speed ratio: {metrics.get('avg_speed_ratio', 0):.1f}x")
                if 'avg_wer' in metrics:
                    print(f"   Average WER: {metrics['avg_wer']:.3f} ({metrics['avg_wer']*100:.1f}%)")
                    print(f"   Average CER: {metrics['avg_cer']:.3f} ({metrics['avg_cer']*100:.1f}%)")

        except Exception as e:
            error_msg = f"Model loading/evaluation error: {e}"
            metrics['errors'].append(error_msg)
            if verbose:
                print(f"❌ {error_msg}")

        finally:
            # Clean up memory
            if 'model' in locals():
                del model
            if 'processor' in locals():
                del processor
            torch.cuda.empty_cache() if torch.cuda.is_available() else None

        # Store results
        self.results[model_name] = metrics
        return metrics

    def compare_models(self, model_names, audio_files, reference_texts=None):
        """Compare multiple models"""
        print(f"\n🏁 STARTING COMPARISON OF {len(model_names)} MODELS")
        print(f"📁 Audio files: {len(audio_files) if isinstance(audio_files, list) else 1}")

        for model_name in model_names:
            self.evaluate_single_model(model_name, audio_files, reference_texts)

        return self.results

    def create_comparison_plots(self, save_plots=True):
        """Create comprehensive comparison plots"""
        if not self.results:
            print("❌ No results to plot. Run evaluation first.")
            return

        # Prepare data for plotting
        plot_data = []
        for model_name, metrics in self.results.items():
            if 'avg_inference_time' in metrics:  # Only include successful evaluations
                plot_data.append({
                    'Model': model_name.split('/')[-1],  # Short name
                    'Full_Model': model_name,
                    'Avg_Inference_Time': metrics.get('avg_inference_time', 0),
                    'Avg_Speed_Ratio': metrics.get('avg_speed_ratio', 0),
                    'Memory_Usage_MB': metrics['memory_usage']['after'] - metrics['memory_usage']['before'],
                    'Model_Size_MB': metrics.get('model_size_mb', 0),
                    'Load_Time': metrics.get('load_time', 0),
                    'Avg_WER': metrics.get('avg_wer', None),
                    'Avg_CER': metrics.get('avg_cer', None),
                    'Accuracy_Percent': (1 - metrics.get('avg_wer', 1)) * 100 if metrics.get('avg_wer') is not None else None
                })

        if not plot_data:
            print("❌ No successful evaluations to plot.")
            return

        df = pd.DataFrame(plot_data)

        # 1. Performance Overview
        fig = make_subplots(
            rows=2, cols=2,
            subplot_titles=('Inference Time Comparison', 'Speed Ratio (Real-time Multiple)',
                          'Memory Usage', 'Model Size'),
            specs=[[{"secondary_y": False}, {"secondary_y": False}],
                   [{"secondary_y": False}, {"secondary_y": False}]]
        )

        # Inference time
        fig.add_trace(
            go.Bar(x=df['Model'], y=df['Avg_Inference_Time'], name='Inference Time (s)',
                   marker_color='lightblue'),
            row=1, col=1
        )

        # Speed ratio
        fig.add_trace(
            go.Bar(x=df['Model'], y=df['Avg_Speed_Ratio'], name='Speed Ratio (x)',
                   marker_color='lightgreen'),
            row=1, col=2
        )

        # Memory usage
        fig.add_trace(
            go.Bar(x=df['Model'], y=df['Memory_Usage_MB'], name='Memory (MB)',
                   marker_color='orange'),
            row=2, col=1
        )

        # Model size
        fig.add_trace(
            go.Bar(x=df['Model'], y=df['Model_Size_MB'], name='Model Size (MB)',
                   marker_color='red'),
            row=2, col=2
        )

        fig.update_layout(height=800, title_text="🔍 Model Performance Comparison", showlegend=False)
        fig.show()

        # 2. Accuracy Comparison (if available)
        if df['Avg_WER'].notna().any():
            fig_acc = go.Figure()

            fig_acc.add_trace(go.Bar(
                x=df['Model'],
                y=df['Accuracy_Percent'],
                name='Accuracy %',
                marker_color='green',
                text=[f'{x:.1f}%' for x in df['Accuracy_Percent']],
                textposition='auto'
            ))

            fig_acc.add_trace(go.Bar(
                x=df['Model'],
                y=df['Avg_WER'] * 100,
                name='WER %',
                marker_color='red',
                text=[f'{x:.1f}%' for x in df['Avg_WER'] * 100],
                textposition='auto'
            ))

            fig_acc.update_layout(
                title='🎯 Accuracy Comparison',
                xaxis_title='Model',
                yaxis_title='Percentage (%)',
                barmode='group',
                height=500
            )
            fig_acc.show()

        # 3. Efficiency Scatter Plot
        if df['Avg_WER'].notna().any():
            fig_scatter = px.scatter(
                df,
                x='Avg_Inference_Time',
                y='Accuracy_Percent',
                size='Model_Size_MB',
                color='Model',
                title='⚡ Efficiency Analysis: Speed vs Accuracy',
                labels={
                    'Avg_Inference_Time': 'Average Inference Time (seconds)',
                    'Accuracy_Percent': 'Accuracy (%)',
                    'Model_Size_MB': 'Model Size (MB)'
                },
                hover_data=['Memory_Usage_MB', 'Avg_Speed_Ratio']
            )
            fig_scatter.update_layout(height=500)
            fig_scatter.show()

        # 4. Summary Table
        print("\n📊 DETAILED COMPARISON TABLE")
        print("="*100)

        display_df = df[['Model', 'Avg_Inference_Time', 'Avg_Speed_Ratio', 'Memory_Usage_MB',
                        'Model_Size_MB', 'Load_Time']]
        if 'Accuracy_Percent' in df.columns and df['Accuracy_Percent'].notna().any():
            display_df = pd.concat([display_df, df[['Avg_WER', 'Avg_CER', 'Accuracy_Percent']]], axis=1)

        print(display_df.round(3).to_string(index=False))

        # 5. Recommendations
        print(f"\n🏆 RECOMMENDATIONS")
        print("="*50)

        if len(df) > 1:
            fastest_model = df.loc[df['Avg_Inference_Time'].idxmin(), 'Model']
            print(f"🚀 Fastest Model: {fastest_model}")

            smallest_model = df.loc[df['Model_Size_MB'].idxmin(), 'Model']
            print(f"💾 Smallest Model: {smallest_model}")

            memory_efficient = df.loc[df['Memory_Usage_MB'].idxmin(), 'Model']
            print(f"🧠 Most Memory Efficient: {memory_efficient}")

            if df['Accuracy_Percent'].notna().any():
                most_accurate = df.loc[df['Accuracy_Percent'].idxmax(), 'Model']
                print(f"🎯 Most Accurate: {most_accurate}")

                best_balance = df.loc[(df['Accuracy_Percent'] / df['Accuracy_Percent'].max() +
                                     (df['Avg_Speed_Ratio'] / df['Avg_Speed_Ratio'].max())).idxmax(), 'Model']
                print(f"⚖️  Best Balance (Speed + Accuracy): {best_balance}")

        return df


In [6]:
def quick_test_single_model(model_name, audio_file_path, reference_text=None):
    """
    Quick test for a single model - just change the model name!

    Usage:
        quick_test_single_model("shivkumarganesh/whisper-small-uz-v1", "audio.wav")
    """
    evaluator = ModelEvaluator()
    reference_texts = [reference_text] if reference_text else None

    result = evaluator.evaluate_single_model(
        model_name=model_name,
        audio_files=[audio_file_path],
        reference_texts=reference_texts,
        verbose=True
    )

    return result, evaluator

In [7]:
def compare_multiple_models(model_list, audio_files, reference_texts=None):
    """
    Compare multiple models easily

    Usage:
        models = ["shivkumarganesh/whisper-small-uz-v1", "GitNazarov/whisper-large-uz"]
        compare_multiple_models(models, ["audio1.wav", "audio2.wav"])
    """
    evaluator = ModelEvaluator()
    results = evaluator.compare_models(model_list, audio_files, reference_texts)

    # Create plots
    comparison_df = evaluator.create_comparison_plots()

    return results, comparison_df, evaluator


In [8]:
models_to_compare = [
    # 👈 ADD YOUR MODELS HERE!
    "GitNazarov/whisper-large-uz",
    "Makhmud/whisper-uzbek"
]

audio_files = [
    "/content/s_1796.wav",                          # 👈 ADD YOUR AUDIO FILES!
    "/content/test2.wav"
]

reference_texts = [                                 # 👈 OPTIONAL REFERENCES
    "Quyidagilar Sug‘urta hodisasi hisoblanmaydi: mehnat shartnomasini xodimning tashabbusi bilan bekor qilinganda; xodimning o‘z mehnat vazifalarini muntazam ravishda buzganligi.",
    "Test versiya ikki, model aniqligini tekshiramiz, menning ismim Nurmuhammad"
]

results, comparison_df, evaluator = compare_multiple_models(
    models_to_compare,
    audio_files,
    reference_texts  # Can be None if you don't have references
)

Using device: cuda

🏁 STARTING COMPARISON OF 2 MODELS
📁 Audio files: 2

🔄 EVALUATING MODEL: GitNazarov/whisper-large-uz
📥 Loading model and processor...
✅ Model loaded in 75.23s
📊 Model size: 5887.2 MB
💾 Memory usage: 3137.6 MB

🎵 Processing audio 1/2: s_1796.wav


You have passed task=transcribe, but also have set `forced_decoder_ids` to [[1, None], [2, 50359]] which creates a conflict. `forced_decoder_ids` will be ignored in favor of task=transcribe.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


📝 Transcription: ' Qüydagiler suxurta xadisa səsəq soplanməidə. Mekhnad şartnamasını xadimnin təşabosub ilan bekarqalindəndə xadimnin oz meknad vazifələrini muntazam ravisdə buzgəlligə.'
⏱️  Inference time: 10.57s
🚀 Speed ratio: 1.0x real-time
📊 WER: 0.944 (94.4%)
📊 CER: 0.322 (32.2%)

🎵 Processing audio 2/2: test2.wav
📝 Transcription: ' Taz versiyaya ikki madar anaqla genetik shiramiz. Menin ismim Nur Muhammad.'
⏱️  Inference time: 4.03s
🚀 Speed ratio: 1.3x real-time
📊 WER: 1.111 (111.1%)
📊 CER: 0.324 (32.4%)

📈 SUMMARY FOR GitNazarov/whisper-large-uz:
   Average inference time: 7.30s
   Average speed ratio: 1.2x
   Average WER: 1.028 (102.8%)
   Average CER: 0.323 (32.3%)

🔄 EVALUATING MODEL: Makhmud/whisper-uzbek
📥 Loading model and processor...
✅ Model loaded in 9.55s
📊 Model size: 922.1 MB
💾 Memory usage: 14.6 MB

🎵 Processing audio 1/2: s_1796.wav
📝 Transcription: 'Quyidagilar sug‘urta hodisasi hisoblanmaydi. Mehnat shartnomasini xodimning tashabbusi bilan bekor qilinganda, xodim


📊 DETAILED COMPARISON TABLE
           Model  Avg_Inference_Time  Avg_Speed_Ratio  Memory_Usage_MB  Model_Size_MB  Load_Time  Avg_WER  Avg_CER  Accuracy_Percent
whisper-large-uz               7.296            1.193         3137.629       5887.241     75.234    1.028    0.323            -2.778
   whisper-uzbek               3.586            2.305           14.582        922.146      9.555    0.444    0.079            55.556

🏆 RECOMMENDATIONS
🚀 Fastest Model: whisper-uzbek
💾 Smallest Model: whisper-uzbek
🧠 Most Memory Efficient: whisper-uzbek
🎯 Most Accurate: whisper-uzbek
⚖️  Best Balance (Speed + Accuracy): whisper-uzbek


In [9]:
models_to_compare = [
    # 👈 ADD YOUR MODELS HERE!
    "aslon1213/whisper-small-uz-with-uzbekvoice"]

audio_files = [
    "/content/s_1796.wav",                          # 👈 ADD YOUR AUDIO FILES!
    "/content/test2.wav"
]

reference_texts = [                                 # 👈 OPTIONAL REFERENCES
    "Quyidagilar Sug‘urta hodisasi hisoblanmaydi: mehnat shartnomasini xodimning tashabbusi bilan bekor qilinganda; xodimning o‘z mehnat vazifalarini muntazam ravishda buzganligi.",
    "Test versiya ikki, model aniqligini tekshiramiz, menning ismim Nurmuhammad"
]

results, comparison_df, evaluator = compare_multiple_models(
    models_to_compare,
    audio_files,
    reference_texts  # Can be None if you don't have references
)

Using device: cuda

🏁 STARTING COMPARISON OF 1 MODELS
📁 Audio files: 2

🔄 EVALUATING MODEL: aslon1213/whisper-small-uz-with-uzbekvoice
📥 Loading model and processor...
✅ Model loaded in 9.06s
📊 Model size: 922.1 MB
💾 Memory usage: 28.1 MB

🎵 Processing audio 1/2: s_1796.wav
'
⏱️  Inference time: 1.97s
🚀 Speed ratio: 5.5x real-time
📊 WER: 0.222 (22.2%)
📊 CER: 0.023 (2.3%)

🎵 Processing audio 2/2: test2.wav
📝 Transcription: 'Test versiya ikki, model aniqligini tekshiramiz, mening ismim Nurmuhammad.'
⏱️  Inference time: 0.97s
🚀 Speed ratio: 5.6x real-time
📊 WER: 0.222 (22.2%)
📊 CER: 0.027 (2.7%)

📈 SUMMARY FOR aslon1213/whisper-small-uz-with-uzbekvoice:
   Average inference time: 1.47s
   Average speed ratio: 5.6x
   Average WER: 0.222 (22.2%)
   Average CER: 0.025 (2.5%)



📊 DETAILED COMPARISON TABLE
                           Model  Avg_Inference_Time  Avg_Speed_Ratio  Memory_Usage_MB  Model_Size_MB  Load_Time  Avg_WER  Avg_CER  Accuracy_Percent
whisper-small-uz-with-uzbekvoice                1.47            5.585           28.086        922.146      9.057    0.222    0.025            77.778

🏆 RECOMMENDATIONS


In [10]:
models_to_compare = [
    # 👈 ADD YOUR MODELS HERE!
    "mustafoyev202/whisper-uz"]

audio_files = [
    "/content/s_1796.wav",                          # 👈 ADD YOUR AUDIO FILES!
    "/content/test2.wav"
]

reference_texts = [                                 # 👈 OPTIONAL REFERENCES
    "Quyidagilar Sug‘urta hodisasi hisoblanmaydi: mehnat shartnomasini xodimning tashabbusi bilan bekor qilinganda; xodimning o‘z mehnat vazifalarini muntazam ravishda buzganligi.",
    "Test versiya ikki, model aniqligini tekshiramiz, menning ismim Nurmuhammad"
]

results, comparison_df, evaluator = compare_multiple_models(
    models_to_compare,
    audio_files,
    reference_texts  # Can be None if you don't have references
)

Using device: cuda

🏁 STARTING COMPARISON OF 1 MODELS
📁 Audio files: 2

🔄 EVALUATING MODEL: mustafoyev202/whisper-uz
📥 Loading model and processor...


You have passed task=transcribe, but also have set `forced_decoder_ids` to [[1, 50259], [2, 50359], [3, 50363]] which creates a conflict. `forced_decoder_ids` will be ignored in favor of task=transcribe.


✅ Model loaded in 9.63s
📊 Model size: 922.1 MB
💾 Memory usage: 11.4 MB

🎵 Processing audio 1/2: s_1796.wav
📝 Transcription: 'Quyidagilar sug‘urta hodisasiz hisoblanmaydi. Mehnat shartnomasini xodimning tashabbusi bilan bekor qilinganda, xodimning o‘z mehnat vazifalarini muntazam ravishda buzganligi'
⏱️  Inference time: 2.10s
🚀 Speed ratio: 5.2x real-time
📊 WER: 0.333 (33.3%)
📊 CER: 0.034 (3.4%)

🎵 Processing audio 2/2: test2.wav
📝 Transcription: 'Tasvirsiya ikki, madir aniqligini tekshiramiz, mening ismi Nur Muhammad.'
⏱️  Inference time: 0.80s
🚀 Speed ratio: 6.8x real-time
📊 WER: 0.778 (77.8%)
📊 CER: 0.162 (16.2%)

📈 SUMMARY FOR mustafoyev202/whisper-uz:
   Average inference time: 1.45s
   Average speed ratio: 6.0x
   Average WER: 0.556 (55.6%)
   Average CER: 0.098 (9.8%)



📊 DETAILED COMPARISON TABLE
     Model  Avg_Inference_Time  Avg_Speed_Ratio  Memory_Usage_MB  Model_Size_MB  Load_Time  Avg_WER  Avg_CER  Accuracy_Percent
whisper-uz               1.449            6.009           11.414        922.146      9.631    0.556    0.098            44.444

🏆 RECOMMENDATIONS


In [11]:
models_to_compare = [
    # 👈 ADD YOUR MODELS HERE!
    "nodirjon/whisper-small-uz",
    "aisha-org/Whisper-Uzbek"]

audio_files = [
    "/content/s_1796.wav",                          # 👈 ADD YOUR AUDIO FILES!
    "/content/test2.wav"
]

reference_texts = [                                 # 👈 OPTIONAL REFERENCES
    "Quyidagilar Sug‘urta hodisasi hisoblanmaydi: mehnat shartnomasini xodimning tashabbusi bilan bekor qilinganda; xodimning o‘z mehnat vazifalarini muntazam ravishda buzganligi.",
    "Test versiya ikki, model aniqligini tekshiramiz, menning ismim Nurmuhammad"
]

results, comparison_df, evaluator = compare_multiple_models(
    models_to_compare,
    audio_files,
    reference_texts  # Can be None if you don't have references
)

Using device: cuda

🏁 STARTING COMPARISON OF 2 MODELS
📁 Audio files: 2

🔄 EVALUATING MODEL: nodirjon/whisper-small-uz
📥 Loading model and processor...
✅ Model loaded in 9.55s
📊 Model size: 922.1 MB
💾 Memory usage: 8.2 MB

🎵 Processing audio 1/2: s_1796.wav
📝 Transcription: 'Quyidagilar sug‘urda xodisasi hisoblanmaydi. Mehnat shartnomasini xodimning tashabbusi bilan bekor qilinganda, xodimning o‘z mehnat vazifalarini muntazam ravishda buzganligi.'
⏱️  Inference time: 2.03s
🚀 Speed ratio: 5.4x real-time
📊 WER: 0.278 (27.8%)
📊 CER: 0.034 (3.4%)

🎵 Processing audio 2/2: test2.wav
📝 Transcription: 'Tast versiya ikki, model anaqligini tekshiramiz, mening ismim bur muhammad.'
⏱️  Inference time: 0.81s
🚀 Speed ratio: 6.7x real-time
📊 WER: 0.556 (55.6%)
📊 CER: 0.081 (8.1%)

📈 SUMMARY FOR nodirjon/whisper-small-uz:
   Average inference time: 1.42s
   Average speed ratio: 6.0x
   Average WER: 0.417 (41.7%)
   Average CER: 0.058 (5.8%)

🔄 EVALUATING MODEL: aisha-org/Whisper-Uzbek
📥 Loading model a


📊 DETAILED COMPARISON TABLE
           Model  Avg_Inference_Time  Avg_Speed_Ratio  Memory_Usage_MB  Model_Size_MB  Load_Time  Avg_WER  Avg_CER  Accuracy_Percent
whisper-small-uz               1.420            6.043            8.184        922.146      9.552    0.417    0.058            58.333
   Whisper-Uzbek               3.276            2.520         1464.660       2913.887     31.527    0.333    0.052            66.667

🏆 RECOMMENDATIONS
🚀 Fastest Model: whisper-small-uz
💾 Smallest Model: whisper-small-uz
🧠 Most Memory Efficient: whisper-small-uz
🎯 Most Accurate: Whisper-Uzbek
⚖️  Best Balance (Speed + Accuracy): whisper-small-uz


In [12]:
models_to_compare = [
    # 👈 ADD YOUR MODELS HERE!
    "jmshd/whisper-uz",
    "ShakhzoDavronov/whisper-large-lora-uz"]

audio_files = [
    "/content/s_1796.wav",                          # 👈 ADD YOUR AUDIO FILES!
    "/content/test2.wav"
]

reference_texts = [                                 # 👈 OPTIONAL REFERENCES
    "Quyidagilar Sug‘urta hodisasi hisoblanmaydi: mehnat shartnomasini xodimning tashabbusi bilan bekor qilinganda; xodimning o‘z mehnat vazifalarini muntazam ravishda buzganligi.",
    "Test versiya ikki, model aniqligini tekshiramiz, menning ismim Nurmuhammad"
]

results, comparison_df, evaluator = compare_multiple_models(
    models_to_compare,
    audio_files,
    reference_texts  # Can be None if you don't have references
)

Using device: cuda

🏁 STARTING COMPARISON OF 2 MODELS
📁 Audio files: 2

🔄 EVALUATING MODEL: jmshd/whisper-uz
📥 Loading model and processor...


`generation_config` default values have been modified to match model-specific defaults: {'suppress_tokens': [1, 2, 7, 8, 9, 10, 14, 25, 26, 27, 28, 29, 31, 58, 59, 60, 61, 62, 63, 90, 91, 92, 93, 359, 503, 522, 542, 873, 893, 902, 918, 922, 931, 1350, 1853, 1982, 2460, 2627, 3246, 3253, 3268, 3536, 3846, 3961, 4183, 4667, 6585, 6647, 7273, 9061, 9383, 10428, 10929, 11938, 12033, 12331, 12562, 13793, 14157, 14635, 15265, 15618, 16553, 16604, 18362, 18956, 20075, 21675, 22520, 26130, 26161, 26435, 28279, 29464, 31650, 32302, 32470, 36865, 42863, 47425, 49870, 50254, 50258, 50358, 50359, 50360, 50361, 50362], 'begin_suppress_tokens': [220, 50257]}. If this is not desired, please set these values explicitly.
A custom logits processor of type <class 'transformers.generation.logits_process.SuppressTokensLogitsProcessor'> has been passed to `.generate()`, but it was also created in `.generate()`, given its parameterization. The custom <class 'transformers.generation.logits_process.SuppressTok

✅ Model loaded in 3.90s
📊 Model size: 276.9 MB
💾 Memory usage: 3.9 MB

🎵 Processing audio 1/2: s_1796.wav
📝 Transcription: 'Quyidagilar sug'urta hodisasi hisoblanmaydi. Mehnat shartomasini xodimning tashabbusi bilan bekor qilinganda, xodimning o'z mehnat vazifalarini muntazam ravishda buzganligi.'
⏱️  Inference time: 0.88s
🚀 Speed ratio: 12.4x real-time
📊 WER: 0.333 (33.3%)
📊 CER: 0.040 (4.0%)

🎵 Processing audio 2/2: test2.wav
📝 Transcription: 'Tasvirsiya ikki, model aniqligini tekshiramiz, mening ismin bo'lmuhammad.'
⏱️  Inference time: 0.38s
🚀 Speed ratio: 14.4x real-time
📊 WER: 0.556 (55.6%)
📊 CER: 0.149 (14.9%)

📈 SUMMARY FOR jmshd/whisper-uz:
   Average inference time: 0.63s
   Average speed ratio: 13.4x
   Average WER: 0.444 (44.4%)
   Average CER: 0.094 (9.4%)

🔄 EVALUATING MODEL: ShakhzoDavronov/whisper-large-lora-uz
📥 Loading model and processor...
❌ Model loading/evaluation error: ShakhzoDavronov/whisper-large-lora-uz does not appear to have a file named preprocessor_config.


📊 DETAILED COMPARISON TABLE
     Model  Avg_Inference_Time  Avg_Speed_Ratio  Memory_Usage_MB  Model_Size_MB  Load_Time  Avg_WER  Avg_CER  Accuracy_Percent
whisper-uz                0.63           13.408            3.887        276.924      3.896    0.444    0.094            55.556

🏆 RECOMMENDATIONS


In [13]:
models_to_compare = [
    # 👈 ADD YOUR MODELS HERE!
    "jmshd/whisper-uz",
    "ShakhzoDavronov/whisper-large-lora-uz"]

audio_files = [
    "/content/s_1796.wav",                          # 👈 ADD YOUR AUDIO FILES!
    "/content/test2.wav"
]

reference_texts = [                                 # 👈 OPTIONAL REFERENCES
    "Quyidagilar Sug‘urta hodisasi hisoblanmaydi: mehnat shartnomasini xodimning tashabbusi bilan bekor qilinganda; xodimning o‘z mehnat vazifalarini muntazam ravishda buzganligi.",
    "Test versiya ikki, model aniqligini tekshiramiz, menning ismim Nurmuhammad"
]

results, comparison_df, evaluator = compare_multiple_models(
    models_to_compare,
    audio_files,
    reference_texts  # Can be None if you don't have references
)

Using device: cuda

🏁 STARTING COMPARISON OF 2 MODELS
📁 Audio files: 2

🔄 EVALUATING MODEL: jmshd/whisper-uz
📥 Loading model and processor...
✅ Model loaded in 4.25s
📊 Model size: 276.9 MB
💾 Memory usage: 2.3 MB

🎵 Processing audio 1/2: s_1796.wav
📝 Transcription: 'Quyidagilar sug'urta hodisasi hisoblanmaydi. Mehnat shartomasini xodimning tashabbusi bilan bekor qilinganda, xodimning o'z mehnat vazifalarini muntazam ravishda buzganligi.'
⏱️  Inference time: 0.89s
🚀 Speed ratio: 12.3x real-time
📊 WER: 0.333 (33.3%)
📊 CER: 0.040 (4.0%)

🎵 Processing audio 2/2: test2.wav
📝 Transcription: 'Tasvirsiya ikki, model aniqligini tekshiramiz, mening ismin bo'lmuhammad.'
⏱️  Inference time: 0.38s
🚀 Speed ratio: 14.3x real-time
📊 WER: 0.556 (55.6%)
📊 CER: 0.149 (14.9%)

📈 SUMMARY FOR jmshd/whisper-uz:
   Average inference time: 0.63s
   Average speed ratio: 13.3x
   Average WER: 0.444 (44.4%)
   Average CER: 0.094 (9.4%)

🔄 EVALUATING MODEL: ShakhzoDavronov/whisper-large-lora-uz
📥 Loading model and 


📊 DETAILED COMPARISON TABLE
     Model  Avg_Inference_Time  Avg_Speed_Ratio  Memory_Usage_MB  Model_Size_MB  Load_Time  Avg_WER  Avg_CER  Accuracy_Percent
whisper-uz               0.635           13.314            2.328        276.924      4.253    0.444    0.094            55.556

🏆 RECOMMENDATIONS
