# Whisper Local - Transcription GPU avec faster-whisper

**Module :** 01-Audio-Foundation  
**Niveau :** Intermediaire  
**Technologies :** faster-whisper (CTranslate2), transformers  
**Duree estimee :** 45 minutes  
**VRAM :** ~10 GB  

## Objectifs d'Apprentissage

- [ ] Installer et configurer faster-whisper
- [ ] Comprendre les tailles de modeles et leurs compromis
- [ ] Transcrire de l'audio avec WhisperModel en local
- [ ] Obtenir des segments detailles avec timestamps
- [ ] Detecter la langue et la probabilite de detection
- [ ] Effectuer une transcription batch de plusieurs fichiers
- [ ] Comparer les performances local vs API (vitesse, qualite, cout)
- [ ] Surveiller l'utilisation VRAM

## Prerequis

- GPU NVIDIA avec au moins 4 GB VRAM (10 GB recommande pour large-v3)
- CUDA Toolkit installe
- `pip install faster-whisper`
- Echantillons audio (generes dans les notebooks precedents ou fournis)

**Navigation :** [Index](../README.md) | [<< Precedent](01-3-Basic-Audio-Operations.ipynb) | [Suivant >>](01-5-Kokoro-TTS-Local.ipynb)

In [1]:
# Parametres Papermill - JAMAIS modifier ce commentaire

# Configuration notebook
notebook_mode = "interactive"        # "interactive" ou "batch"
skip_widgets = False               # True pour mode batch MCP
debug_level = "INFO"

# Parametres Whisper local
model_size = "large-v3-turbo"      # "tiny", "base", "small", "medium", "large-v3", "large-v3-turbo"
device = "cuda"                    # "cuda" ou "cpu"
compute_type = "float16"           # "float16", "int8", "int8_float16"

# Configuration
generate_test_audio = True         # Generer des echantillons via TTS
compare_model_sizes = True         # Comparer differentes tailles de modeles
compare_with_api = True            # Comparer avec l'API OpenAI
batch_transcribe = True            # Tester la transcription batch
monitor_vram = True                # Surveiller l'utilisation VRAM
save_results = True                # Sauvegarder les resultats

In [2]:
# Setup environnement et imports
import os
import sys
import json
import time
import gc
from pathlib import Path
from datetime import datetime
from typing import Dict, List, Any, Optional
import logging

import numpy as np
from IPython.display import Audio, display

# Import helpers GenAI
GENAI_ROOT = Path.cwd()
while GENAI_ROOT.name != 'GenAI' and len(GENAI_ROOT.parts) > 1:
    GENAI_ROOT = GENAI_ROOT.parent

HELPERS_PATH = GENAI_ROOT / 'shared' / 'helpers'
if HELPERS_PATH.exists():
    sys.path.insert(0, str(HELPERS_PATH.parent))
    try:
        from helpers.audio_helpers import transcribe_local, load_audio, play_audio_file
        print("Helpers audio importes")
    except ImportError:
        print("Helpers audio non disponibles - mode autonome")

# Repertoires
OUTPUT_DIR = GENAI_ROOT / 'outputs' / 'audio' / 'whisper_local'
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
SAMPLES_DIR = GENAI_ROOT / 'outputs' / 'audio' / 'samples'
SAMPLES_DIR.mkdir(parents=True, exist_ok=True)

# Configuration logging
logging.basicConfig(level=getattr(logging, debug_level))
logger = logging.getLogger('whisper_local')

# Verification GPU
try:
    import torch
    gpu_available = torch.cuda.is_available()
    if gpu_available:
        gpu_name = torch.cuda.get_device_name(0)
        gpu_vram = torch.cuda.get_device_properties(0).total_mem / (1024**3)
        print(f"GPU : {gpu_name} ({gpu_vram:.1f} GB VRAM)")
    else:
        print("GPU non disponible - utilisation CPU")
        device = "cpu"
        compute_type = "int8"
except ImportError:
    print("torch non disponible - verification GPU ignoree")
    gpu_available = False

print(f"\nWhisper Local - Transcription GPU")
print(f"Date : {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"Modele : {model_size}, Device : {device}, Compute : {compute_type}")
print(f"Sortie : {OUTPUT_DIR}")

Helpers audio importes


GPU non disponible - utilisation CPU

Whisper Local - Transcription GPU
Date : 2026-02-18 10:31:13
Modele : large-v3-turbo, Device : cpu, Compute : int8
Sortie : D:\Dev\CoursIA.worktrees\GenAI_Series\MyIA.AI.Notebooks\GenAI\outputs\audio\whisper_local


In [3]:
# Chargement .env et preparation des echantillons
from dotenv import load_dotenv

current_path = Path.cwd()
found_env = False
for _ in range(4):
    env_path = current_path / '.env'
    if env_path.exists():
        load_dotenv(env_path)
        print(f"Fichier .env charge depuis : {env_path}")
        found_env = True
        break
    current_path = current_path.parent

# Preparation des echantillons audio
test_files = {}

# Verifier les fichiers existants (generes par les notebooks precedents)
for sample_file in SAMPLES_DIR.glob("sample_*.mp3"):
    lang = sample_file.stem.split('_')[1]
    test_files[lang] = sample_file
    print(f"Echantillon existant : {sample_file.name}")

# Generer si necessaire
if generate_test_audio and len(test_files) == 0:
    openai_key = os.getenv('OPENAI_API_KEY')
    if openai_key:
        from openai import OpenAI
        client_api = OpenAI(api_key=openai_key)

        texts = {
            "fr": "La reconnaissance vocale locale permet de transcrire sans connexion internet.",
            "en": "Local speech recognition enables transcription without internet connection.",
            "de": "Die lokale Spracherkennung ermoeglicht die Transkription ohne Internetverbindung."
        }

        for lang, text in texts.items():
            print(f"Generation echantillon '{lang}'...")
            response = client_api.audio.speech.create(
                model="tts-1", voice="nova", input=text, response_format="mp3"
            )
            filepath = SAMPLES_DIR / f"sample_{lang}.mp3"
            with open(filepath, 'wb') as f:
                f.write(response.content)
            test_files[lang] = filepath
    else:
        print("OPENAI_API_KEY non disponible pour generer les echantillons")

print(f"\n{len(test_files)} echantillons prets pour transcription")

Fichier .env charge depuis : D:\Dev\CoursIA.worktrees\GenAI_Series\MyIA.AI.Notebooks\GenAI\.env
Echantillon existant : sample_en.mp3
Echantillon existant : sample_fr.mp3
Echantillon existant : sample_multi.mp3

3 echantillons prets pour transcription


## Section 1 : Modeles Whisper et compromis

faster-whisper utilise CTranslate2 pour une inference optimisee des modeles Whisper.

### Tailles de modeles

| Modele | Parametres | VRAM (float16) | Vitesse relative | Qualite |
|--------|-----------|----------------|-------------------|---------|
| `tiny` | 39 M | ~1 GB | 32x | Basique |
| `base` | 74 M | ~1 GB | 16x | Correcte |
| `small` | 244 M | ~2 GB | 6x | Bonne |
| `medium` | 769 M | ~5 GB | 2x | Tres bonne |
| `large-v3` | 1.55 B | ~10 GB | 1x (reference) | Excellente |
| `large-v3-turbo` | 809 M | ~6 GB | 3x | Excellente |

### Types de calcul

| Type | Precision | VRAM | Vitesse |
|------|-----------|------|--------|
| `float16` | Haute | Standard | Standard |
| `int8_float16` | Bonne | Reduite ~50% | Plus rapide |
| `int8` | Acceptable | Minimale | Plus rapide |

In [4]:
# Chargement du modele Whisper
print("CHARGEMENT DU MODELE WHISPER")
print("=" * 45)

from faster_whisper import WhisperModel

# Mesure VRAM avant chargement
vram_before = 0
if monitor_vram and gpu_available:
    vram_before = torch.cuda.memory_allocated(0) / (1024**3)
    print(f"VRAM avant chargement : {vram_before:.2f} GB")

print(f"\nChargement du modele '{model_size}' sur '{device}'...")
print(f"Type de calcul : {compute_type}")

start_time = time.time()
model = WhisperModel(model_size, device=device, compute_type=compute_type)
load_time = time.time() - start_time

print(f"Modele charge en {load_time:.1f}s")

# Mesure VRAM apres chargement
if monitor_vram and gpu_available:
    vram_after = torch.cuda.memory_allocated(0) / (1024**3)
    vram_used = vram_after - vram_before
    print(f"VRAM apres chargement : {vram_after:.2f} GB")
    print(f"VRAM utilisee par le modele : {vram_used:.2f} GB")

CHARGEMENT DU MODELE WHISPER



Chargement du modele 'large-v3-turbo' sur 'cpu'...
Type de calcul : int8


INFO:httpx:HTTP Request: GET https://huggingface.co/api/models/mobiuslabsgmbh/faster-whisper-large-v3-turbo/revision/main "HTTP/1.1 307 Temporary Redirect"


INFO:httpx:HTTP Request: GET https://huggingface.co/api/models/dropbox-dash/faster-whisper-large-v3-turbo/revision/main "HTTP/1.1 200 OK"


Modele charge en 4.0s


## Section 2 : Transcription detaillee

faster-whisper retourne deux objets :
- `segments` : generateur de segments avec timestamps, texte et confiance
- `info` : metadonnees (langue detectee, probabilite, duree)

In [5]:
# Transcription detaillee
print("TRANSCRIPTION DETAILLEE")
print("=" * 45)

if test_files:
    # Transcription du premier echantillon
    lang_key = list(test_files.keys())[0]
    sample_path = test_files[lang_key]

    print(f"Fichier : {sample_path.name}")
    print(f"\nTranscription en cours...")

    start_time = time.time()
    segments, info = model.transcribe(
        str(sample_path),
        beam_size=5,
        word_timestamps=True
    )

    # Collecter les segments (le generateur est consomme une seule fois)
    segments_list = list(segments)
    transcribe_time = time.time() - start_time

    # Informations de detection
    print(f"\n--- Metadonnees ---")
    print(f"  Langue detectee : {info.language} (probabilite : {info.language_probability:.2%})")
    print(f"  Duree audio : {info.duration:.1f}s")
    print(f"  Temps de transcription : {transcribe_time:.2f}s")
    print(f"  Ratio temps reel : {info.duration / transcribe_time:.1f}x")

    # Texte complet
    full_text = " ".join([s.text.strip() for s in segments_list])
    print(f"\n--- Texte transcrit ---")
    print(f"  {full_text}")

    # Segments detailles
    print(f"\n--- Segments ({len(segments_list)}) ---")
    for seg in segments_list:
        print(f"  [{seg.start:.2f}s - {seg.end:.2f}s] (conf: {seg.avg_logprob:.3f}) {seg.text.strip()}")

    # Mots avec timestamps
    all_words = []
    for seg in segments_list:
        if seg.words:
            all_words.extend(seg.words)

    if all_words:
        print(f"\n--- Mots avec timestamps ({len(all_words)}) ---")
        for w in all_words[:10]:
            print(f"  [{w.start:.2f}s - {w.end:.2f}s] (p={w.probability:.2f}) {w.word}")
        if len(all_words) > 10:
            print(f"  ... ({len(all_words) - 10} mots supplementaires)")

    # Sauvegarde
    if save_results:
        result = {
            "model": model_size,
            "device": device,
            "compute_type": compute_type,
            "language": info.language,
            "language_probability": info.language_probability,
            "duration": info.duration,
            "transcription_time": transcribe_time,
            "text": full_text,
            "segments": [
                {"start": s.start, "end": s.end, "text": s.text.strip(),
                 "avg_logprob": s.avg_logprob}
                for s in segments_list
            ]
        }
        result_file = OUTPUT_DIR / f"transcription_{lang_key}_{model_size}.json"
        with open(result_file, 'w', encoding='utf-8') as f:
            json.dump(result, f, indent=2, ensure_ascii=False)
        print(f"\nResultat sauvegarde : {result_file.name}")
else:
    print("Aucun echantillon audio disponible")

INFO:faster_whisper:Processing audio with duration 00:10.992


TRANSCRIPTION DETAILLEE
Fichier : sample_en.mp3

Transcription en cours...


INFO:faster_whisper:Detected language 'en' with probability 1.00



--- Metadonnees ---
  Langue detectee : en (probabilite : 99.94%)
  Duree audio : 11.0s
  Temps de transcription : 17.17s
  Ratio temps reel : 0.6x

--- Texte transcrit ---
  Speech recognition converts spoken language into written text. Modern systems use deep neural networks trained on thousands of hours of audio data to achieve human-level accuracy.

--- Segments (3) ---
  [0.00s - 3.18s] (conf: -0.131) Speech recognition converts spoken language into written text.
  [3.36s - 8.84s] (conf: -0.131) Modern systems use deep neural networks trained on thousands of hours of audio data
  [8.84s - 10.46s] (conf: -0.131) to achieve human-level accuracy.

--- Mots avec timestamps (27) ---
  [0.00s - 0.34s] (p=0.89)  Speech
  [0.34s - 0.90s] (p=0.98)  recognition
  [0.90s - 1.38s] (p=0.99)  converts
  [1.38s - 1.82s] (p=1.00)  spoken
  [1.82s - 2.18s] (p=1.00)  language
  [2.18s - 2.42s] (p=1.00)  into
  [2.42s - 2.68s] (p=1.00)  written
  [2.68s - 3.18s] (p=1.00)  text.
  [3.36s - 3.82s] (p

### Interpretation : Transcription locale

| Aspect | Valeur | Signification |
|--------|--------|---------------|
| Ratio temps reel | >1x = plus rapide que le temps reel | Plus le ratio est eleve, meilleure est la performance |
| avg_logprob | Proche de 0 = haute confiance | Valeurs < -1.0 indiquent une incertitude |
| language_probability | Proche de 1.0 | Detection de langue fiable |

**Points cles** :
1. `word_timestamps=True` active l'alignement mot par mot
2. `beam_size=5` ameliore la qualite au prix d'une latence accrue
3. Le generateur `segments` est consomme une seule fois - le convertir en liste pour reutilisation

## Section 3 : Transcription batch

Pour traiter plusieurs fichiers audio, on itere sur les fichiers tout en conservant le modele en memoire. Le cout marginal d'une transcription supplementaire est minime une fois le modele charge.

In [6]:
# Transcription batch
print("TRANSCRIPTION BATCH")
print("=" * 45)

if batch_transcribe and len(test_files) > 0:
    batch_results = {}

    for lang_key, filepath in test_files.items():
        print(f"\nTranscription '{lang_key}' : {filepath.name}")
        start_time = time.time()

        segments, info = model.transcribe(str(filepath))
        segments_list = list(segments)
        elapsed = time.time() - start_time

        text = " ".join([s.text.strip() for s in segments_list])

        batch_results[lang_key] = {
            "text": text,
            "language": info.language,
            "probability": info.language_probability,
            "duration": info.duration,
            "time": elapsed
        }

        print(f"  Langue : {info.language} ({info.language_probability:.0%})")
        print(f"  Texte : {text[:80]}...")
        print(f"  Temps : {elapsed:.2f}s (ratio : {info.duration/elapsed:.1f}x)")

    # Tableau recapitulatif
    print(f"\nRecapitulatif batch :")
    print(f"{'Fichier':<12} {'Langue':<8} {'Proba':<8} {'Duree':<8} {'Temps':<8} {'Ratio':<8}")
    print("-" * 52)
    for lang_key, data in batch_results.items():
        ratio = data['duration'] / data['time'] if data['time'] > 0 else 0
        print(f"{lang_key:<12} {data['language']:<8} {data['probability']:<8.0%} "
              f"{data['duration']:<8.1f} {data['time']:<8.2f} {ratio:<8.1f}")

    total_duration = sum(d['duration'] for d in batch_results.values())
    total_time = sum(d['time'] for d in batch_results.values())
    print(f"\nTotal : {total_duration:.1f}s d'audio transcrits en {total_time:.1f}s")
else:
    print("Transcription batch desactivee ou pas d'echantillons")

INFO:faster_whisper:Processing audio with duration 00:10.992


TRANSCRIPTION BATCH

Transcription 'en' : sample_en.mp3


INFO:faster_whisper:Detected language 'en' with probability 1.00


INFO:faster_whisper:Processing audio with duration 00:11.784


  Langue : en (100%)
  Texte : Speech recognition converts spoken language into written text. Modern systems us...
  Temps : 22.74s (ratio : 0.5x)

Transcription 'fr' : sample_fr.mp3


INFO:faster_whisper:Detected language 'fr' with probability 1.00


INFO:faster_whisper:Processing audio with duration 00:05.424


  Langue : fr (100%)
  Texte : La reconnaissance vocale permet de convertir la parole en texte. Cette technolog...
  Temps : 21.79s (ratio : 0.5x)

Transcription 'multi' : sample_multi.mp3


INFO:faster_whisper:Detected language 'fr' with probability 0.79


  Langue : fr (79%)
  Texte : Bonjour, je parle franquet. Now I switch to English. Et je reviens au franquet p...
  Temps : 20.48s (ratio : 0.3x)

Recapitulatif batch :
Fichier      Langue   Proba    Duree    Temps    Ratio   
----------------------------------------------------
en           en       100%     11.0     22.74    0.5     
fr           fr       100%     11.8     21.79    0.5     
multi        fr       79%      5.4      20.48    0.3     

Total : 28.2s d'audio transcrits en 65.0s


## Section 4 : Comparaison local vs API

Analyse comparative entre la transcription locale (faster-whisper) et l'API OpenAI Whisper.

| Critere | Local (faster-whisper) | API (OpenAI) |
|---------|----------------------|---------------|
| Cout | Gratuit (hardware) | $0.006/minute |
| Latence | Depend du GPU | ~1-3s par requete |
| Confidentialite | Donnees locales | Envoi a OpenAI |
| Disponibilite | Pas de connexion requise | Internet requis |
| Maintenance | Mises a jour manuelles | Geree par OpenAI |

In [7]:
# Comparaison local vs API
print("COMPARAISON LOCAL VS API")
print("=" * 45)

if compare_with_api and len(test_files) > 0:
    openai_key = os.getenv('OPENAI_API_KEY')

    if openai_key:
        from openai import OpenAI
        client_api = OpenAI(api_key=openai_key)

        lang_key = list(test_files.keys())[0]
        filepath = test_files[lang_key]

        # --- Transcription locale ---
        print(f"\n--- Local ({model_size}) ---")
        start_local = time.time()
        segments, info = model.transcribe(str(filepath))
        text_local = " ".join([s.text.strip() for s in segments])
        time_local = time.time() - start_local
        print(f"  Texte : {text_local[:80]}...")
        print(f"  Temps : {time_local:.2f}s")

        # --- Transcription API ---
        print(f"\n--- API (whisper-1) ---")
        start_api = time.time()
        with open(filepath, 'rb') as audio_file:
            transcript = client_api.audio.transcriptions.create(
                model="whisper-1",
                file=audio_file,
                response_format="json"
            )
        text_api = transcript.text
        time_api = time.time() - start_api
        print(f"  Texte : {text_api[:80]}...")
        print(f"  Temps : {time_api:.2f}s")

        # --- Analyse ---
        print(f"\n--- Analyse comparative ---")
        print(f"{'Critere':<25} {'Local':<20} {'API':<20}")
        print("-" * 65)
        print(f"{'Temps':<25} {time_local:<20.2f} {time_api:<20.2f}")
        print(f"{'Longueur texte':<25} {len(text_local):<20} {len(text_api):<20}")

        cost_api = info.duration / 60 * 0.006
        print(f"{'Cout':<25} {'$0.00 (local)':<20} {f'${cost_api:.4f}':<20}")

        # Estimation cout pour 1 heure
        cost_1h = 60 * 0.006
        print(f"\nCout pour 1 heure d'audio :")
        print(f"  Local : $0.00 (cout electricite uniquement)")
        print(f"  API   : ${cost_1h:.2f}")
    else:
        print("OPENAI_API_KEY non disponible pour la comparaison")
else:
    print("Comparaison desactivee ou pas d'echantillons")

COMPARAISON LOCAL VS API


INFO:faster_whisper:Processing audio with duration 00:10.992



--- Local (large-v3-turbo) ---


INFO:faster_whisper:Detected language 'en' with probability 1.00


  Texte : Speech recognition converts spoken language into written text. Modern systems us...
  Temps : 21.51s

--- API (whisper-1) ---


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/audio/transcriptions "HTTP/1.1 200 OK"


  Texte : Speech recognition converts spoken language into written text. Modern systems us...
  Temps : 1.11s

--- Analyse comparative ---
Critere                   Local                API                 
-----------------------------------------------------------------
Temps                     21.51                1.11                
Longueur texte            179                  179                 
Cout                      $0.00 (local)        $0.0011             

Cout pour 1 heure d'audio :
  Local : $0.00 (cout electricite uniquement)
  API   : $0.36


In [8]:
# Mode interactif
if notebook_mode == "interactive" and not skip_widgets:
    print("MODE INTERACTIF")
    print("=" * 50)
    print("\nEntrez le chemin d'un fichier audio a transcrire :")
    print("(Laissez vide pour passer a la suite)")

    try:
        user_path = input("\nChemin du fichier audio : ")

        if user_path.strip():
            user_file = Path(user_path.strip())
            if user_file.exists():
                print(f"\nTranscription de {user_file.name}...")
                start_time = time.time()
                segments, info = model.transcribe(
                    str(user_file), word_timestamps=True
                )
                segments_list = list(segments)
                elapsed = time.time() - start_time

                text = " ".join([s.text.strip() for s in segments_list])
                print(f"\nLangue : {info.language} ({info.language_probability:.0%})")
                print(f"Duree : {info.duration:.1f}s")
                print(f"Temps : {elapsed:.2f}s")
                print(f"\nTexte : {text}")
            else:
                print(f"Fichier non trouve : {user_file}")
        else:
            print("Mode interactif ignore")

    except (KeyboardInterrupt, EOFError):
        print("Mode interactif interrompu")
    except Exception as e:
        error_type = type(e).__name__
        if "StdinNotImplemented" in error_type or "input" in str(e).lower():
            print("Mode interactif non disponible (execution automatisee)")
        else:
            print(f"Erreur : {error_type} - {str(e)[:100]}")
else:
    print("Mode batch - Interface interactive desactivee")

MODE INTERACTIF

Entrez le chemin d'un fichier audio a transcrire :
(Laissez vide pour passer a la suite)
Mode interactif non disponible (execution automatisee)


## Bonnes pratiques et optimisation

### Choix du modele

| Scenario | Modele recommande | Raison |
|----------|-------------------|--------|
| Prototypage rapide | `small` ou `base` | Chargement rapide, VRAM minimale |
| Production | `large-v3-turbo` | Meilleur compromis qualite/vitesse |
| Qualite maximale | `large-v3` | Precision maximale, plus lent |
| GPU limite (4 GB) | `small` + `int8` | Fonctionne sur la plupart des GPUs |

### Optimisation des performances

| Technique | Impact | Description |
|-----------|--------|-------------|
| `int8_float16` | VRAM -50% | Quantification mixte |
| `beam_size=1` | Vitesse +50% | Perte de qualite minime |
| Specifier la langue | Vitesse +20% | Evite la detection |
| `condition_on_previous_text=False` | Evite les repetitions | Utile pour longs fichiers |

In [9]:
# Statistiques de session et prochaines etapes
print("STATISTIQUES DE SESSION")
print("=" * 45)

print(f"Date : {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"Modele : {model_size}")
print(f"Device : {device}, Compute : {compute_type}")
print(f"Fichiers transcrits : {len(test_files)}")

if monitor_vram and gpu_available:
    vram_current = torch.cuda.memory_allocated(0) / (1024**3)
    vram_peak = torch.cuda.max_memory_allocated(0) / (1024**3)
    print(f"VRAM actuelle : {vram_current:.2f} GB")
    print(f"VRAM pic : {vram_peak:.2f} GB")

if save_results:
    saved = list(OUTPUT_DIR.glob('*'))
    print(f"Fichiers sauvegardes : {len(saved)} dans {OUTPUT_DIR}")

# Liberation memoire
print(f"\nLiberation du modele...")
del model
gc.collect()
if gpu_available:
    torch.cuda.empty_cache()
print(f"Memoire liberee")

print(f"\nPROCHAINES ETAPES")
print(f"1. Essayer le TTS local avec Kokoro (01-5-Kokoro-TTS-Local)")
print(f"2. Decouvrir le voice cloning avec XTTS (02-2)")
print(f"3. Comparer tous les modeles STT (03-1)")
print(f"4. Construire un pipeline STT->LLM->TTS (03-2)")

print(f"\nNotebook Whisper Local termine - {datetime.now().strftime('%H:%M:%S')}")

STATISTIQUES DE SESSION
Date : 2026-02-18 10:33:04
Modele : large-v3-turbo
Device : cpu, Compute : int8
Fichiers transcrits : 3
Fichiers sauvegardes : 1 dans D:\Dev\CoursIA.worktrees\GenAI_Series\MyIA.AI.Notebooks\GenAI\outputs\audio\whisper_local

Liberation du modele...
Memoire liberee

PROCHAINES ETAPES
1. Essayer le TTS local avec Kokoro (01-5-Kokoro-TTS-Local)
2. Decouvrir le voice cloning avec XTTS (02-2)
3. Comparer tous les modeles STT (03-1)
4. Construire un pipeline STT->LLM->TTS (03-2)

Notebook Whisper Local termine - 10:33:04
