# HunyuanVideo - Generation Video Haute Qualite

**Module :** 02-Video-Advanced  
**Niveau :** Intermediaire  
**Technologies :** HunyuanVideo 1.5 (Tencent), ComfyUI API ou diffusers  
**Duree estimee :** 60 minutes  
**VRAM :** ~12 GB (API) ou ~18 GB (local avec INT8)  

## Objectifs d'Apprentissage

- [ ] Comprendre l'architecture HunyuanVideo et ses avantages
- [ ] Choisir entre API ComfyUI (production) et diffusers (pedagogique)
- [ ] Generer des videos text-to-video avec des prompts detailles
- [ ] Explorer les parametres de generation (steps, guidance_scale, num_frames, fps)
- [ ] Controler la resolution et la duree des videos
- [ ] Sauvegarder les resultats en MP4 avec imageio
- [ ] Analyser la qualite et les metriques de generation

## Prerequis

### Mode API ComfyUI (recommande pour production)
- Service ComfyUI-Video demarre (docker-compose comfyui-video)
- Pas de dependances Python lourdes cote client

### Mode Local diffusers (pedagogique)
- GPU avec 18+ GB VRAM (RTX 3090 / RTX 4090)
- Packages : `diffusers>=0.32`, `transformers`, `torch`, `accelerate`, `bitsandbytes`, `imageio`

**Navigation :** [<< 01-5](../01-Foundation/01-5-AnimateDiff-Introduction.ipynb) | [Index](../README.md) | [Suivant >>](02-2-LTX-Video-Lightweight.ipynb)

In [None]:
# Parametres Papermill - JAMAIS modifier ce commentaire

# Configuration notebook
notebook_mode = "interactive"        # "interactive" ou "batch"
skip_widgets = False               # True pour mode batch MCP
debug_level = "INFO"

# MODE D'EXECUTION : API ou Local
# - True  : Utilise l'API ComfyUI (recommande, pas de GPU local requis)
# - False : Utilise diffusers en local (pedagogique, necessite GPU)
use_api = True

# Parametres API ComfyUI (si use_api=True)
comfyui_url = "http://localhost:8189"  # ComfyUI-Video service
comfyui_token = None                 # Token Bearer (optionnel pour localhost)

# Parametres modele HunyuanVideo (si use_api=False)
model_id = "tencent/HunyuanVideo"  # Modele HunyuanVideo
quantize = True                      # Quantification INT8 (recommande)
device = "cuda"                     # Device de calcul

# Parametres generation (communs aux deux modes)
num_frames = 33                    # Nombre de frames a generer (HunyuanVideo optimal)
guidance_scale = 7.0               # CFG scale (7.0 recommande pour HunyuanVideo)
num_inference_steps = 30           # Nombre d'etapes de debruitage
height = 720                       # Hauteur video (720p optimal)
width = 1280                       # Largeur video
fps_output = 24                    # FPS de la video de sortie

# Configuration
run_generation = True              # Executer la generation
save_as_mp4 = True                 # Sauvegarder en MP4
save_results = True

In [2]:
# Parameters
notebook_mode = "batch"
skip_widgets = True


In [None]:
# Setup environnement et imports
import os
import sys
import json
import time
import warnings
from pathlib import Path
from datetime import datetime
from typing import Dict, List, Any, Optional
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
import logging
from dotenv import load_dotenv

warnings.filterwarnings('ignore', category=DeprecationWarning)
warnings.filterwarnings('ignore', category=FutureWarning)

# Import helpers GenAI
GENAI_ROOT = Path.cwd()
while GENAI_ROOT.name != 'GenAI' and len(GENAI_ROOT.parts) > 1:
    GENAI_ROOT = GENAI_ROOT.parent

HELPERS_PATH = GENAI_ROOT / 'shared' / 'helpers'
if HELPERS_PATH.exists():
    sys.path.insert(0, str(HELPERS_PATH.parent))
    try:
        from helpers import comfyui_client
        print("‚úÖ Helper comfyui_client import√©")
    except ImportError as e:
        print(f"‚ö†Ô∏è Helper comfyui_client NON disponible: {e}")
        comfyui_client = None

OUTPUT_DIR = GENAI_ROOT / 'outputs' / 'hunyuan_video'
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

logging.basicConfig(level=getattr(logging, debug_level))
logger = logging.getLogger('hunyuan_video')

# Affichage du mode d'execution
mode_str = "API ComfyUI" if use_api else "Local diffusers"
print(f"HunyuanVideo 1.5 - Generation Video Haute Qualite")
print(f"Mode d'execution : {mode_str}")
print(f"Date : {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"Frames : {num_frames}, Steps : {num_inference_steps}, CFG : {guidance_scale}")
print(f"Resolution : {width}x{height}")

In [None]:
# Chargement .env et verification de l'environnement
current_path = Path.cwd()
found_env = False
for _ in range(4):
    env_path = current_path / '.env'
    if env_path.exists():
        load_dotenv(env_path)
        print(f"‚úÖ Fichier .env charge depuis : {env_path}")
        found_env = True
        break
    current_path = current_path.parent

if not found_env:
    print("‚ö†Ô∏è Aucun fichier .env trouve")

# Verification et initialisation selon le mode
print("\n" + "=" * 50)
print(f"MODE : {'API ComfyUI' if use_api else 'Local diffusers'}")
print("=" * 50)

client = None
pipe = None
comfyui_available = False
local_available = False

if use_api:
    # === MODE API COMFYUI ===
    print("\nüì° Verification de l'API ComfyUI-Video...")
    
    if comfyui_client is not None:
        try:
            client = comfyui_client.ComfyUIClient(
                base_url=comfyui_url,
                api_token=comfyui_token
            )
            
            stats = client.get_system_stats()
            
            print(f"‚úÖ ComfyUI-Video accessible sur : {comfyui_url}")
            comfyui_available = True
            
        except Exception as e:
            print(f"‚ö†Ô∏è ComfyUI-Video non accessible: {type(e).__name__}: {str(e)[:100]}")
            print("\nüí° Pour d√©marrer ComfyUI-Video :")
            print("   docker-compose -f docker-configurations/services/comfyui-video/docker-compose.yml up -d")
            run_generation = False
    else:
        print("‚ö†Ô∏è Helper comfyui_client non disponible")
        run_generation = False
        
else:
    # === MODE LOCAL DIFFUSERS ===
    print("\nüîß Verification de l'environnement local...")
    
    # Verification GPU
    try:
        import torch
        if torch.cuda.is_available():
            gpu_name = torch.cuda.get_device_name(0)
            vram_total = torch.cuda.get_device_properties(0).total_mem / 1024**3
            print(f"‚úÖ GPU : {gpu_name}")
            print(f"   VRAM totale : {vram_total:.1f} GB")
            
            if vram_total < 18:
                print(f"‚ö†Ô∏è VRAM faible (< 18 GB), activation de la quantification")
                quantize = True
                if vram_total < 12:
                    height = 480
                    width = 640
                    num_frames = 24
                    print(f"  Resolution reduite a {width}x{height}, {num_frames} frames")
        else:
            print("‚ö†Ô∏è CUDA non disponible")
            run_generation = False
    except ImportError:
        print("‚ö†Ô∏è PyTorch non install√©")
        run_generation = False
    
    # Verification des dependances
    deps_ok = True
    
    try:
        import diffusers
        print(f"‚úÖ diffusers : v{diffusers.__version__}")
    except ImportError:
        print("‚ö†Ô∏è diffusers NON INSTALLE (pip install diffusers>=0.32)")
        deps_ok = False
    
    try:
        import transformers
        print(f"‚úÖ transformers : v{transformers.__version__}")
    except ImportError:
        print("‚ö†Ô∏è transformers NON INSTALLE")
        deps_ok = False
    
    if quantize:
        try:
            import bitsandbytes as bnb
            print(f"‚úÖ bitsandbytes : v{bnb.__version__}")
        except ImportError:
            print("‚ö†Ô∏è bitsandbytes NON INSTALLE (pip install bitsandbytes)")
            quantize = False
    
    try:
        import imageio
        print(f"‚úÖ imageio : v{imageio.__version__}")
    except ImportError:
        print("‚ö†Ô∏è imageio NON INSTALLE")
        deps_ok = False
    
    if deps_ok and run_generation:
        print("\nüì¶ Chargement du pipeline HunyuanVideo...")
        try:
            from diffusers import HunyuanVideoPipeline
            from diffusers.utils import export_to_video
            
            start_load = time.time()
            
            if quantize:
                from diffusers import BitsAndBytesConfig
                quant_config = BitsAndBytesConfig(load_in_8bit=True)
                pipe = HunyuanVideoPipeline.from_pretrained(
                    model_id,
                    quantization_config=quant_config,
                    torch_dtype=torch.float16
                )
            else:
                pipe = HunyuanVideoPipeline.from_pretrained(
                    model_id,
                    torch_dtype=torch.float16
                )
            
            pipe = pipe.to(device)
            pipe.enable_vae_slicing()
            pipe.enable_vae_tiling()
            
            load_time = time.time() - start_load
            print(f"‚úÖ Pipeline charge en {load_time:.1f}s")
            local_available = True
            
        except Exception as e:
            print(f"‚ö†Ô∏è Erreur chargement pipeline : {type(e).__name__}: {str(e)[:200]}")
            run_generation = False

print(f"\n{'='*50}")
print(f"Generation activee : {run_generation}")
print(f"{'='*50}")

## Section 1 : Architecture HunyuanVideo

HunyuanVideo est un modele de generation text-to-video open-source developpe par Tencent.
Il se distingue par sa qualite de generation et sa capacite a produire des videos longues
avec une bonne coherence temporelle.

### Deux approches pour utiliser HunyuanVideo

| Aspect | API ComfyUI | Local diffusers |
|--------|-------------|-----------------|
| **Cas d'usage** | Production, applications | Pedagogie, recherche |
| **GPU requis** | Non (cote serveur) | Oui (18+ GB) |
| **Installation** | Aucune (Docker) | diffusers, transformers, torch |
| **Flexibilite** | Moyenne | Elevee |
| **Performance** | Serveur optimise | Depend du GPU local |

### Architecture de HunyuanVideo

| Composant | Description |
|-----------|-------------|
| **Backbone** | Transformer 3D avec attention spatio-temporelle |
| **Text encoders** | DualCLIP (clip_l + llava_llama3) |
| **VAE** | Encodeur/decodeur video avec compression temporelle |
| **Scheduler** | Flow matching pour un debruitage progressif |

### Avantages par rapport a AnimateDiff

| Aspect | AnimateDiff (01-5) | HunyuanVideo |
|--------|-------------------|---------------|
| Architecture | SD 1.5 + motion module | Transformer 3D natif |
| Resolution | 512x512 max | Jusqu'a 720p |
| Coherence temporelle | Moyenne | Elevee |
| Duree video | 2-3 secondes | 5+ secondes |
| VRAM | ~12 GB | ~18 GB (INT8) |

In [None]:
# Fonction de generation unifiee (API ou Local)
def generate_hunyuan_video(prompt: str, negative_prompt: str = "", seed: int = 42) -> Dict[str, Any]:
    """
    Genere une video avec HunyuanVideo (API ComfyUI ou local diffusers).
    
    Cette fonction s'adapte automatiquement au mode d'execution choisi.
    
    Args:
        prompt: Description textuelle de la video
        negative_prompt: Elements a eviter
        seed: Graine aleatoire pour reproductibilite
    
    Returns:
        Dict avec frames, temps de generation et metadonnees
    """
    if use_api:
        # === MODE API COMFYUI ===
        if not comfyui_available:
            return {"success": False, "error": "API ComfyUI non disponible"}
        
        try:
            start_time = time.time()
            
            result = client.generate_text2video_hunyuan(
                prompt=prompt,
                width=width,
                height=height,
                num_frames=num_frames,
                steps=num_inference_steps,
                seed=seed,
                cfg=guidance_scale,
                negative_prompt=negative_prompt or "bad quality, low quality, blurry, distortion",
                save_prefix=f"hunyuan_gen_{seed}",
                timeout=600
            )
            
            gen_time = time.time() - start_time
            
            return {
                "success": True,
                "result": result,
                "generation_time": gen_time,
                "mode": "API ComfyUI",
                "seed": seed
            }
            
        except Exception as e:
            return {"success": False, "error": f"{type(e).__name__}: {str(e)[:200]}"}
    
    else:
        # === MODE LOCAL DIFFUSERS ===
        if not local_available:
            return {"success": False, "error": "Pipeline local non disponible"}
        
        try:
            import torch
            from diffusers.utils import export_to_video
            
            generator = torch.Generator(device=device).manual_seed(seed)
            
            if device == "cuda":
                torch.cuda.reset_peak_memory_stats()
            
            start_time = time.time()
            
            output = pipe(
                prompt=prompt,
                negative_prompt=negative_prompt or "bad quality, low quality, blurry, distortion, artifacts",
                num_frames=num_frames,
                guidance_scale=guidance_scale,
                num_inference_steps=num_inference_steps,
                height=height,
                width=width,
                generator=generator
            )
            
            gen_time = time.time() - start_time
            frames = output.frames[0]
            
            # Sauvegarder en MP4
            mp4_path = OUTPUT_DIR / f"hunyuan_local_{seed}.mp4"
            export_to_video(frames, str(mp4_path), fps=fps_output)
            
            result_dict = {
                "success": True,
                "frames": frames,
                "generation_time": gen_time,
                "time_per_frame": gen_time / num_frames,
                "prompt": prompt,
                "seed": seed,
                "mode": "Local diffusers",
                "mp4_path": str(mp4_path)
            }
            
            if device == "cuda":
                result_dict["vram_peak"] = torch.cuda.max_memory_allocated(0) / 1024**3
            
            return result_dict
            
        except Exception as e:
            return {"success": False, "error": f"{type(e).__name__}: {str(e)[:200]}"}

print("‚úÖ Fonction de generation unifiee chargee")

In [None]:
# Generation text-to-video
print("\n--- GENERATION TEXT-TO-VIDEO ---")
print("=" * 40)

# Premier test : prompt cinematographique
prompt_1 = "a majestic eagle soaring over snow-capped mountains at golden hour, cinematic aerial shot, smooth camera movement, volumetric clouds"

if run_generation:
    print(f"Prompt : {prompt_1}")
    print(f"Parametres : {num_frames} frames, {num_inference_steps} steps, CFG={guidance_scale}")
    print(f"Resolution : {width}x{height}")
    print(f"Mode : {'API ComfyUI' if use_api else 'Local diffusers'}")
    print(f"\nGeneration en cours...")
    
    result_1 = generate_hunyuan_video(prompt_1, seed=42)
    
    if result_1['success']:
        print(f"\n‚úÖ Generation terminee en {result_1['generation_time']:.1f}s ({result_1['mode']})")
        if 'vram_peak' in result_1:
            print(f"   VRAM pic : {result_1['vram_peak']:.1f} GB")
        if 'time_per_frame' in result_1:
            print(f"   Temps/frame : {result_1['time_per_frame']:.2f}s")
        
        # Affichage si frames disponibles
        if 'frames' in result_1:
            frames = result_1['frames']
            print(f"   Frames : {len(frames)}")
            
            n_display = min(8, len(frames))
            indices = np.linspace(0, len(frames) - 1, n_display, dtype=int)
            fig, axes = plt.subplots(2, 4, figsize=(16, 8))
            axes_flat = axes.flatten()
            for i, idx in enumerate(indices):
                if i < len(axes_flat):
                    axes_flat[i].imshow(frames[idx])
                    axes_flat[i].set_title(f"Frame {idx + 1}/{len(frames)}", fontsize=9)
                    axes_flat[i].axis('off')
            for i in range(len(indices), len(axes_flat)):
                axes_flat[i].axis('off')
            plt.suptitle(f"HunyuanVideo : {prompt_1[:60]}...", fontsize=11, fontweight='bold')
            plt.tight_layout()
            plt.show()
        
        # Sauvegarde MP4 (mode local seulement)
        if save_as_mp4 and 'mp4_path' in result_1:
            from pathlib import Path
            mp4_path = Path(result_1['mp4_path'])
            if mp4_path.exists():
                mp4_size_kb = mp4_path.stat().st_size / 1024
                print(f"   MP4 sauvegarde : {mp4_path.name} ({mp4_size_kb:.1f} KB)")
    else:
        print(f"‚ùå Erreur : {result_1['error']}")
else:
    print("Generation desactivee")
    print(f"\nExemple de code pour generer :")
    print(f"  result = generate_hunyuan_video('{prompt_1[:50]}...', seed=42)")

### Interpretation : Premiere generation

### MODE PEDAGOGIQUE (GPU non disponible)

Sur un environnement GPU (RTX 3090, 24GB VRAM), ce code g√©n√©rerait:

| Param√®tre | Valeur typique | Signification |
|--------|---------------|---------------|
| **Temps total** | 60-180s (RTX 3090) | Significativement plus long qu'AnimateDiff |
| **VRAM pic** | 16-22 GB | La quantification INT8 maintient la VRAM sous 24 GB |
| **Qualit√©** | Haute | Meilleure coh√©rence temporelle que AnimateDiff |
| **R√©solution** | 512x320 | Compromis qualit√©/m√©moire |
| **Frames** | 24 | ~3 secondes √† 8fps |

**R√©sultat visuel attendu:**

Pour le prompt "a majestic eagle soaring over snow-capped mountains at golden hour", HunyuanVideo g√©n√©rerait:

- **Aigle**: Ailes d√©ploy√©es, plumage d√©taill√©, mouvement de vol naturel
- **Montagnes**: Pics enneig√©s, ombres dramatiques, profondeur de champ
- **Ciel**: Nuages volum√©triques, lumi√®re dor√©e, atmosph√®re cin√©matographique
- **Cam√©ra**: Mouvement fluide de suivi de l'aigle, cadrage large

**Comparaison avec AnimateDiff (01-5):**

| Aspect | AnimateDiff | HunyuanVideo | Avantage |
|--------|-------------|--------------|----------|
| **Architecture** | SD 1.5 + motion module | Transformer 3D natif | Hunyuan |
| **R√©solution max** | 512x512 | 720p | Hunyuan |
| **Coh√©rence temporelle** | Moyenne | √âlev√©e | Hunyuan |
| **Dur√©e vid√©o** | 2-3s | 5+s | Hunyuan |
| **VRAM** | ~12 GB | ~18 GB (INT8) | AnimateDiff |
| **Vitesse** | Rapide | Lent | AnimateDiff |

**Code pour reproduire:**

```python
import torch
from diffusers import HunyuanVideoPipeline

# Pipeline avec quantification INT8
from diffusers import BitsAndBytesConfig
quant_config = BitsAndBytesConfig(load_in_8bit=True)

pipe = HunyuanVideoPipeline.from_pretrained(
    "tencent/HunyuanVideo",
    quantization_config=quant_config,
    torch_dtype=torch.float16
).to("cuda")

# Optimisations m√©moire
pipe.enable_vae_slicing()
pipe.enable_vae_tiling()
pipe.enable_model_cpu_offload()

# G√©n√©ration
prompt = "a majestic eagle soaring over snow-capped mountains at golden hour, cinematic aerial shot"
output = pipe(
    prompt=prompt,
    negative_prompt="low quality, blurry",
    num_frames=24,
    guidance_scale=6.0,
    num_inference_steps=30,
    height=320,
    width=512,
    generator=torch.Generator("cuda").manual_seed(42)
)

frames = output.frames[0]
```

In [7]:
# Exploration des parametres
if run_generation and pipe is not None:
    print("\n--- EXPLORATION DES PARAMETRES ---")
    print("=" * 45)
    
    # Test avec differentes valeurs de guidance_scale
    test_prompt = "a serene waterfall in a lush forest, sunlight filtering through trees, mist rising"
    
    cfg_values = [3.0, 6.0, 9.0]
    cfg_results = []
    
    print(f"Test guidance_scale : {cfg_values}")
    print(f"Prompt : {test_prompt[:60]}...")
    
    for cfg_val in cfg_values:
        print(f"\n  CFG = {cfg_val}...")
        
        # Sauvegarder et modifier temporairement
        original_cfg = guidance_scale
        original_steps = num_inference_steps
        guidance_scale = cfg_val
        num_inference_steps = 20  # Reduit pour acceleration
        
        result = generate_hunyuan_video(test_prompt, seed=42)
        
        # Restaurer
        guidance_scale = original_cfg
        num_inference_steps = original_steps
        
        if result['success']:
            cfg_results.append({
                "cfg": cfg_val,
                "frames": result['frames'],
                "time": result['generation_time'],
                "vram_peak": result.get('vram_peak', 0)
            })
            print(f"    Temps : {result['generation_time']:.1f}s")
        else:
            print(f"    Erreur : {result['error']}")
    
    # Affichage comparatif
    if cfg_results:
        n_cfgs = len(cfg_results)
        n_preview = 4
        fig, axes = plt.subplots(n_cfgs, n_preview, figsize=(3.5 * n_preview, 3 * n_cfgs))
        if n_cfgs == 1:
            axes = [axes]
        
        for v_idx, cr in enumerate(cfg_results):
            frame_indices = np.linspace(0, len(cr['frames']) - 1, n_preview, dtype=int)
            for f_idx, fi in enumerate(frame_indices):
                axes[v_idx][f_idx].imshow(cr['frames'][fi])
                axes[v_idx][f_idx].axis('off')
                if f_idx == 0:
                    axes[v_idx][f_idx].set_ylabel(f"CFG={cr['cfg']}", fontsize=11, fontweight='bold')
        
        plt.suptitle("Impact de guidance_scale sur la generation", fontsize=13, fontweight='bold')
        plt.tight_layout()
        plt.show()
        
        # Tableau recapitulatif
        print(f"\nRecapitulatif guidance_scale :")
        print(f"{'CFG':<10} {'Temps (s)':<12} {'VRAM pic (GB)':<15}")
        print("-" * 37)
        for cr in cfg_results:
            print(f"  {cr['cfg']:<10} {cr['time']:<12.1f} {cr['vram_peak']:<15.1f}")
else:
    print("Exploration des parametres : generation desactivee")
    print("\nGuide des parametres :")
    print("  CFG 3-4 : Creatif, plus de liberte")
    print("  CFG 5-7 : Equilibre (recommande)")
    print("  CFG 8-10 : Strict, peut introduire des artefacts")

Exploration des parametres : generation desactivee

Guide des parametres :
  CFG 3-4 : Creatif, plus de liberte
  CFG 5-7 : Equilibre (recommande)
  CFG 8-10 : Strict, peut introduire des artefacts


### Interpretation : Impact des parametres

| guidance_scale | Comportement | Recommandation |
|---------------|-------------|----------------|
| 3.0 (bas) | Creatif, variations, parfois hors-sujet | Exploration creative |
| 6.0 (moyen) | Bon equilibre fidelite/creativite | Usage general |
| 9.0 (haut) | Tres fidele au prompt, risque artefacts | Prompt precis |

**Points cles** :
1. Contrairement a Stable Diffusion Image, une CFG trop elevee degrade la coherence temporelle
2. Pour HunyuanVideo, la plage 5.0-7.0 donne generalement les meilleurs resultats
3. Le temps de generation varie peu avec la CFG (meme nombre de steps)

## Section 4 : Resolution et duree

Nous allons explorer les compromis entre resolution, nombre de frames et consommation memoire.

In [8]:
# Test de resolution et duree
if run_generation and pipe is not None:
    print("\n--- RESOLUTION ET DUREE ---")
    print("=" * 40)
    
    resolution_prompt = "a golden retriever running through a field of sunflowers, joyful, sunny day, slow motion"
    
    # Configurations a tester (resolution, frames)
    configs = [
        {"w": 384, "h": 256, "frames": 24, "label": "384x256 / 24f"},
        {"w": 512, "h": 320, "frames": 16, "label": "512x320 / 16f"},
        {"w": 512, "h": 320, "frames": 32, "label": "512x320 / 32f"},
    ]
    
    config_results = []
    
    for cfg in configs:
        print(f"\nTest : {cfg['label']}")
        
        # Modifier temporairement les parametres globaux
        orig_w, orig_h, orig_f = width, height, num_frames
        original_steps = num_inference_steps
        
        # Variables locales pour la generation
        gen_width = cfg['w']
        gen_height = cfg['h']
        gen_frames = cfg['frames']
        
        try:
            if device == "cuda":
                torch.cuda.reset_peak_memory_stats()
            
            generator = torch.Generator(device=device).manual_seed(42)
            start_time = time.time()
            
            output = pipe(
                prompt=resolution_prompt,
                negative_prompt="low quality, blurry, distorted",
                num_frames=gen_frames,
                guidance_scale=6.0,
                num_inference_steps=20,
                height=gen_height,
                width=gen_width,
                generator=generator
            )
            
            gen_time = time.time() - start_time
            frames = output.frames[0]
            
            vram_peak = 0
            if device == "cuda":
                vram_peak = torch.cuda.max_memory_allocated(0) / 1024**3
            
            config_results.append({
                "label": cfg['label'],
                "frames": frames,
                "time": gen_time,
                "vram_peak": vram_peak,
                "n_frames": gen_frames,
                "resolution": f"{gen_width}x{gen_height}"
            })
            
            print(f"  Temps : {gen_time:.1f}s, VRAM pic : {vram_peak:.1f} GB")
            
            # Sauvegarder en MP4
            if save_as_mp4:
                mp4_path = OUTPUT_DIR / f"hunyuan_{cfg['label'].replace(' / ', '_').replace('x', '_')}.mp4"
                export_to_video(frames, str(mp4_path), fps=fps_output)
                
        except Exception as e:
            print(f"  Erreur : {type(e).__name__}: {str(e)[:100]}")
    
    # Tableau recapitulatif
    if config_results:
        print(f"\n{'Configuration':<25} {'Temps (s)':<12} {'VRAM (GB)':<12} {'Duree video':<15}")
        print("-" * 64)
        for cr in config_results:
            duration = cr['n_frames'] / fps_output
            print(f"  {cr['label']:<25} {cr['time']:<12.1f} {cr['vram_peak']:<12.1f} {duration:.1f}s")
else:
    print("Test resolution/duree : generation desactivee")
    print("\nGuide resolution/VRAM :")
    print("  384x256 : ~14 GB, rapide, basse qualite")
    print("  512x320 : ~18 GB, bon compromis (recommande)")
    print("  640x480 : ~22 GB, haute qualite, lent")
    print("  720p    : ~28 GB+, necessite quantification avancee")

Test resolution/duree : generation desactivee

Guide resolution/VRAM :
  384x256 : ~14 GB, rapide, basse qualite
  512x320 : ~18 GB, bon compromis (recommande)
  640x480 : ~22 GB, haute qualite, lent
  720p    : ~28 GB+, necessite quantification avancee


### Interpretation : Resolution et duree

### MODE PEDAGOGIQUE (GPU non disponible)

Sur un environnement GPU (RTX 3090, 24GB VRAM), ce code g√©n√©rerait:

| Param√®tre | Valeur |
|-----------|--------|
| **Device** | cuda (RTX 3090/4090) |
| **VRAM utilis√©e** | ~14-22 GB (selon config) |
| **Temps par g√©n√©ration** | 20-60 secondes |
| **Configurations test√©es** | 3 r√©solutions/dur√©es |

**R√©sultat attendu:**
HunyuanVideo g√©n√©rerait 3 vid√©os du golden retriever avec differentes configurations:

| Configuration | VRAM | Temps relatif | Dur√©e vid√©o | Qualit√© |
|--------------|------|---------------|-------------|---------|
| **384x256 / 24f** | ~14 GB | 1x (20s) | 3s @ 8fps | Basique, rapide |
| **512x320 / 16f** | ~16 GB | 1.2x (24s) | 2s @ 8fps | Bon compromis |
| **512x320 / 32f** | ~20 GB | 2x (40s) | 4s @ 8fps | Haute qualit√© |

**Description visuelle:**

- **384x256 / 24f**: Chien courant visible, textures simplifi√©es, mouvement fluide mais basse r√©solution
- **512x320 / 16f**: Meilleure r√©solution, pelage plus d√©taill√©, dur√©e plus courte
- **512x320 / 32f**: Meilleure qualit√© globale, dur√©e plus longue, coh√©rence temporelle excellente

**Analyse des compromis:**

| Aspect | Augmente avec... | Impact |
|--------|-----------------|--------|
| **VRAM** | R√©solution (W x H) | Plus de pixels = plus de memoire |
| **VRAM** | Nombre de frames | Lineaire avec la dur√©e |
| **Temps** | Frames + Steps | Proportionnel |
| **Qualit√©** | R√©solution | Details visuels |

**Code pour reproduire:**

```python
import torch
from diffusers import HunyuanVideoPipeline

pipe = HunyuanVideoPipeline.from_pretrained(
    "tencent/HunyuanVideo",
    torch_dtype=torch.float16
).to("cuda")
pipe.enable_vae_slicing()
pipe.enable_vae_tiling()

configs = [
    {"w": 384, "h": 256, "frames": 24},
    {"w": 512, "h": 320, "frames": 16},
    {"w": 512, "h": 320, "frames": 32},
]

prompt = "a golden retriever running through a field of sunflowers"

for cfg in configs:
    output = pipe(
        prompt=prompt,
        negative_prompt="low quality",
        num_frames=cfg['frames'],
        guidance_scale=6.0,
        num_inference_steps=20,
        height=cfg['h'],
        width=cfg['w'],
        generator=torch.Generator("cuda").manual_seed(42)
    )
```

In [9]:
# Comparaison de prompts et analyse qualite
if run_generation and pipe is not None:
    print("\n--- COMPARAISON DE PROMPTS ---")
    print("=" * 40)
    
    prompts = [
        {
            "text": "a candle flame flickering gently in a dark room, warm light, intimate atmosphere, close-up",
            "label": "Bougie"
        },
        {
            "text": "ocean waves rolling onto a sandy beach at sunset, aerial view, golden hour lighting",
            "label": "Ocean"
        },
        {
            "text": "a timelapse of clouds moving over a mountain landscape, dramatic sky, epic scale",
            "label": "Timelapse"
        }
    ]
    
    comparison_results = []
    
    for p_idx, prompt_info in enumerate(prompts):
        print(f"\nGeneration {p_idx + 1}/{len(prompts)} : {prompt_info['label']}")
        print(f"  Prompt : {prompt_info['text'][:70]}...")
        
        result = generate_hunyuan_video(prompt_info['text'], seed=42 + p_idx)
        
        if result['success']:
            print(f"  Temps : {result['generation_time']:.1f}s")
            comparison_results.append({
                "label": prompt_info['label'],
                "prompt": prompt_info['text'],
                "frames": result['frames'],
                "time": result['generation_time']
            })
            
            if save_as_mp4:
                mp4_path = OUTPUT_DIR / f"hunyuan_{prompt_info['label'].lower()}.mp4"
                export_to_video(result['frames'], str(mp4_path), fps=fps_output)
        else:
            print(f"  Erreur : {result['error']}")
    
    # Affichage comparatif
    if comparison_results:
        n_videos = len(comparison_results)
        n_preview = 4
        fig, axes = plt.subplots(n_videos, n_preview, figsize=(3.5 * n_preview, 3 * n_videos))
        if n_videos == 1:
            axes = [axes]
        
        for v_idx, cr in enumerate(comparison_results):
            frame_indices = np.linspace(0, len(cr['frames']) - 1, n_preview, dtype=int)
            for f_idx, fi in enumerate(frame_indices):
                axes[v_idx][f_idx].imshow(cr['frames'][fi])
                axes[v_idx][f_idx].axis('off')
                if f_idx == 0:
                    axes[v_idx][f_idx].set_ylabel(cr['label'], fontsize=11, fontweight='bold')
        
        plt.suptitle("Comparaison de prompts - HunyuanVideo", fontsize=13, fontweight='bold')
        plt.tight_layout()
        plt.show()
        
        # Analyse de coherence temporelle (difference entre frames consecutives)
        print(f"\nAnalyse de coherence temporelle :")
        print(f"{'Prompt':<15} {'Temps (s)':<12} {'Diff moy frames':<18} {'Stabilite':<15}")
        print("-" * 60)
        for cr in comparison_results:
            # Calculer la difference moyenne entre frames consecutives
            diffs = []
            for i in range(len(cr['frames']) - 1):
                f1 = np.array(cr['frames'][i]).astype(float)
                f2 = np.array(cr['frames'][i + 1]).astype(float)
                diff = np.mean(np.abs(f1 - f2))
                diffs.append(diff)
            avg_diff = np.mean(diffs)
            stability = "Haute" if avg_diff < 15 else "Moyenne" if avg_diff < 30 else "Basse"
            print(f"  {cr['label']:<15} {cr['time']:<12.1f} {avg_diff:<18.2f} {stability:<15}")
else:
    print("Comparaison de prompts : generation desactivee")
    print("\nTypes de prompts efficaces pour HunyuanVideo :")
    print("  - Mouvements naturels : eau, feu, nuages, vent")
    print("  - Scenes cinematographiques : camera aerienne, slow motion")
    print("  - Timelapse : nuages, coucher de soleil, fleurs")
    print("  - Animaux en mouvement : vol d'oiseau, course de chien")

Comparaison de prompts : generation desactivee

Types de prompts efficaces pour HunyuanVideo :
  - Mouvements naturels : eau, feu, nuages, vent
  - Scenes cinematographiques : camera aerienne, slow motion
  - Timelapse : nuages, coucher de soleil, fleurs
  - Animaux en mouvement : vol d'oiseau, course de chien


In [10]:
# Mode interactif
if notebook_mode == "interactive" and not skip_widgets:
    print("\n--- MODE INTERACTIF ---")
    print("=" * 40)
    print("Entrez votre propre prompt pour generer une video HunyuanVideo.")
    print("(Laissez vide pour passer a la suite)")
    
    try:
        user_prompt = input("\nVotre prompt : ").strip()
        
        if user_prompt and run_generation and pipe is not None:
            print(f"\nGeneration en cours...")
            result_user = generate_hunyuan_video(user_prompt, seed=123)
            
            if result_user['success']:
                print(f"Generation reussie en {result_user['generation_time']:.1f}s")
                
                # Affichage
                n_display = min(8, len(result_user['frames']))
                fig, axes = plt.subplots(1, n_display, figsize=(2.5 * n_display, 3))
                if n_display == 1:
                    axes = [axes]
                indices = np.linspace(0, len(result_user['frames']) - 1, n_display, dtype=int)
                for ax, idx in zip(axes, indices):
                    ax.imshow(result_user['frames'][idx])
                    ax.set_title(f"Frame {idx+1}", fontsize=8)
                    ax.axis('off')
                plt.suptitle(f"Votre video : {user_prompt[:50]}...", fontweight='bold')
                plt.tight_layout()
                plt.show()
                
                if save_as_mp4:
                    user_mp4 = OUTPUT_DIR / "user_generation.mp4"
                    export_to_video(result_user['frames'], str(user_mp4), fps=fps_output)
                    print(f"MP4 sauvegarde : {user_mp4.name}")
            else:
                print(f"Erreur : {result_user['error']}")
        elif user_prompt:
            print("Generation non disponible (pipeline non charge)")
        else:
            print("Mode interactif ignore")
    
    except (KeyboardInterrupt, EOFError) as e:
        print(f"\nMode interactif interrompu ({type(e).__name__})")
    except Exception as e:
        error_type = type(e).__name__
        if "StdinNotImplemented" in error_type or "input" in str(e).lower():
            print("\nMode interactif non disponible (execution automatisee)")
        else:
            print(f"\nErreur inattendue : {error_type} - {str(e)[:100]}")
            print("Passage a la suite du notebook")
else:
    print("\nMode batch - Interface interactive desactivee")


Mode batch - Interface interactive desactivee


## Bonnes pratiques et optimisation HunyuanVideo

### Conseils de prompt engineering

| Bon prompt | Mauvais prompt | Raison |
|-----------|---------------|--------|
| "a bird flying over a lake, aerial shot, cinematic" | "bird lake" | Preciser l'action et le style |
| "timelapse of sunset, clouds moving, warm colors" | "nice sunset video" | Indiquer le type de mouvement |
| "close-up of rain drops on a window" | "rain" | Le cadrage guide la generation |

### Comparaison avec les autres modeles du Module 02

| Aspect | HunyuanVideo | LTX-Video (02-2) | Wan (02-3) | SVD (02-4) |
|--------|-------------|------------------|-----------|------------|
| Type | Text-to-video | Text/Img/Vid | Text-to-video | Image-to-video |
| VRAM | ~18 GB | ~8 GB | ~10 GB | ~10 GB |
| Qualite | Haute | Moyenne | Haute | Haute |
| Vitesse | Lente | Rapide | Moyenne | Moyenne |
| Resolution max | 720p | 512p | 720p | 576p |

In [11]:
# Statistiques de session et prochaines etapes
print("\n--- STATISTIQUES DE SESSION ---")
print("=" * 40)

print(f"Date : {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"Mode : {notebook_mode}")
print(f"Modele : {model_id}")
print(f"Quantification : {'INT8' if quantize else 'FP16'}")
print(f"Device : {device}")
print(f"Parametres : {num_frames} frames, {num_inference_steps} steps, CFG={guidance_scale}")
print(f"Resolution : {width}x{height}")

if device == "cuda" and torch.cuda.is_available():
    vram_peak = torch.cuda.max_memory_allocated(0) / 1024**3
    print(f"VRAM pic session : {vram_peak:.1f} GB")

if save_results and OUTPUT_DIR.exists():
    generated_files = list(OUTPUT_DIR.glob('*'))
    print(f"\nFichiers generes ({len(generated_files)}) :")
    for f in sorted(generated_files):
        size_kb = f.stat().st_size / 1024
        print(f"  {f.name} ({size_kb:.1f} KB)")

# Liberation VRAM
if pipe is not None:
    del pipe
    if device == "cuda":
        torch.cuda.empty_cache()
        print(f"\nVRAM liberee")

print(f"\n--- PROCHAINES ETAPES ---")
print(f"1. Notebook 02-2 : LTX-Video (generation rapide et legere, ~8 GB VRAM)")
print(f"2. Notebook 02-3 : Wan 2.1/2.2 (prompts multilingues, motion control)")
print(f"3. Notebook 02-4 : SVD (animation d'images statiques)")
print(f"4. Module 03 : Comparaison multi-modeles et orchestration de pipelines")

print(f"\nNotebook 02-1 HunyuanVideo Generation termine - {datetime.now().strftime('%H:%M:%S')}")


--- STATISTIQUES DE SESSION ---
Date : 2026-02-19 10:29:32
Mode : batch
Modele : tencent/HunyuanVideo
Quantification : FP16
Device : cpu
Parametres : 24 frames, 30 steps, CFG=6.0
Resolution : 512x320

Fichiers generes (0) :

--- PROCHAINES ETAPES ---
1. Notebook 02-2 : LTX-Video (generation rapide et legere, ~8 GB VRAM)
2. Notebook 02-3 : Wan 2.1/2.2 (prompts multilingues, motion control)
3. Notebook 02-4 : SVD (animation d'images statiques)
4. Module 03 : Comparaison multi-modeles et orchestration de pipelines

Notebook 02-1 HunyuanVideo Generation termine - 10:29:32
