# Workflow de Composition Musicale

**Module :** 04-Audio-Applications  
**Niveau :** Applications  
**Technologies :** MusicGen, Demucs, pydub, scipy, numpy  
**VRAM estimee :** ~14 GB  
**Duree estimee :** 60 minutes  

## Objectifs d'Apprentissage

- [ ] Generer de la musique a partir de descriptions textuelles avec MusicGen
- [ ] Separer les stems (voix, batterie, basse, autres) avec Demucs
- [ ] Remixer les stems (ajuster volumes, echanger des elements)
- [ ] Appliquer des effets audio (reverb, EQ, compression)
- [ ] Creer des compositions multi-sections (intro, couplet, refrain)
- [ ] Exporter une production finale avec metadonnees

## Prerequis

- Notebooks Foundation (01-3 Basic Audio Operations) et Advanced (02-3 MusicGen, 02-4 Demucs) completes
- GPU avec au moins 14 GB VRAM (RTX 3090 recommande)
- audiocraft, demucs, pydub installes

**Navigation :** [Index](../README.md) | [<< Precedent](04-2-Transcription-Pipeline.ipynb) | [Suivant >>](04-4-Audio-Video-Sync.ipynb)

In [1]:
# Parametres Papermill - JAMAIS modifier ce commentaire

# Configuration notebook
notebook_mode = "interactive"        # "interactive" ou "batch"
skip_widgets = False               # True pour mode batch MCP
debug_level = "INFO"

# Parametres MusicGen
musicgen_model = "facebook/musicgen-small"  # small, medium, large
generation_duration = 8            # Duree de generation en secondes
musicgen_device = "cuda"           # "cuda" ou "cpu"

# Parametres Demucs
demucs_model = "htdemucs"          # htdemucs, htdemucs_ft, mdx_extra

# Configuration pipeline
generate_audio = True
save_audio_files = True
apply_effects = True               # Appliquer les effets audio

In [2]:
# Setup environnement et imports
import os
import sys
import json
import time
import struct
from pathlib import Path
from datetime import datetime
from typing import Dict, List, Any, Optional, Tuple
from io import BytesIO
import logging

import numpy as np
from IPython.display import Audio, display, HTML

# Resolution GENAI_ROOT
GENAI_ROOT = Path.cwd()
while GENAI_ROOT.name != 'GenAI' and len(GENAI_ROOT.parts) > 1:
    GENAI_ROOT = GENAI_ROOT.parent

HELPERS_PATH = GENAI_ROOT / 'shared' / 'helpers'
if HELPERS_PATH.exists():
    sys.path.insert(0, str(HELPERS_PATH.parent))
    try:
        from helpers.audio_helpers import (
            load_audio, save_audio, play_audio,
            plot_waveform, plot_spectrogram
        )
        print("Helpers audio importes")
    except ImportError as e:
        print(f"Helpers audio non disponibles - mode autonome : {e}")

# Repertoire de sortie
OUTPUT_DIR = GENAI_ROOT / 'outputs' / 'audio' / 'composition'
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
STEMS_DIR = OUTPUT_DIR / 'stems'
STEMS_DIR.mkdir(parents=True, exist_ok=True)

logging.basicConfig(level=getattr(logging, debug_level))
logger = logging.getLogger('music_composition')

print(f"Workflow de Composition Musicale")
print(f"Date : {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"Mode : {notebook_mode}")
print(f"MusicGen : {musicgen_model} | Demucs : {demucs_model}")
print(f"Sortie : {OUTPUT_DIR}")

Helpers audio importes
Workflow de Composition Musicale
Date : 2026-02-18 10:52:33
Mode : interactive
MusicGen : facebook/musicgen-small | Demucs : htdemucs
Sortie : D:\Dev\CoursIA.worktrees\GenAI_Series\MyIA.AI.Notebooks\GenAI\outputs\audio\composition


In [3]:
# Chargement configuration et validation GPU
from dotenv import load_dotenv

current_path = Path.cwd()
found_env = False
for _ in range(4):
    env_path = current_path / '.env'
    if env_path.exists():
        load_dotenv(env_path)
        print(f"Fichier .env charge depuis : {env_path}")
        found_env = True
        break
    current_path = current_path.parent

if not found_env:
    print("Aucun fichier .env trouve")

# Verification GPU
import torch
gpu_available = torch.cuda.is_available()
if gpu_available:
    gpu_name = torch.cuda.get_device_name(0)
    gpu_mem = torch.cuda.get_device_properties(0).total_mem / (1024**3)
    print(f"GPU : {gpu_name} ({gpu_mem:.1f} GB)")
else:
    print("GPU non disponible - basculement CPU (lent)")
    musicgen_device = "cpu"

# Verification des dependances
musicgen_available = False
demucs_available = False

try:
    from audiocraft.models import MusicGen
    musicgen_available = True
    print(f"audiocraft (MusicGen) disponible")
except ImportError:
    print("audiocraft non installe - pip install audiocraft")

try:
    import demucs
    demucs_available = True
    print(f"demucs disponible")
except ImportError:
    print("demucs non installe - pip install demucs")

from pydub import AudioSegment
from scipy import signal
print(f"pydub et scipy disponibles")

Fichier .env charge depuis : D:\Dev\CoursIA.worktrees\GenAI_Series\MyIA.AI.Notebooks\GenAI\.env


GPU non disponible - basculement CPU (lent)
audiocraft non installe - pip install audiocraft
demucs non installe - pip install demucs




pydub et scipy disponibles


## Section 1 : Generation musicale avec MusicGen

MusicGen genere de la musique a partir de descriptions textuelles. Le modele produit des fichiers audio de qualite studio.

| Modele | Parametres | VRAM | Qualite | Duree max |
|--------|-----------|------|---------|----------|
| musicgen-small | 300M | ~4 GB | Correcte | 30s |
| musicgen-medium | 1.5B | ~8 GB | Bonne | 30s |
| musicgen-large | 3.3B | ~14 GB | Excellente | 30s |

Le prompt textuel guide le style, l'instrumentation et l'ambiance de la musique generee.

In [4]:
# Generation musicale avec MusicGen
print("GENERATION MUSICALE - MUSICGEN")
print("=" * 50)

generated_tracks = {}

# Descriptions pour differentes sections d'une composition
track_descriptions = {
    "intro": "soft ambient piano with gentle strings, calm and peaceful, cinematic intro",
    "verse": "upbeat electronic music with synth pads and light drums, energetic but not aggressive",
    "chorus": "powerful orchestral music with drums and brass, epic and uplifting, full arrangement"
}

if generate_audio and musicgen_available:
    print(f"Chargement du modele {musicgen_model}...")
    start_time = time.time()
    mg_model = MusicGen.get_pretrained(musicgen_model)
    mg_model.set_generation_params(duration=generation_duration)
    load_time = time.time() - start_time
    print(f"Modele charge en {load_time:.1f}s")

    for section_name, description in track_descriptions.items():
        print(f"\n--- Generation : {section_name} ---")
        print(f"Prompt : {description}")

        start_time = time.time()
        wav = mg_model.generate([description])
        gen_time = time.time() - start_time

        # Extraire le numpy array
        audio_data = wav[0].cpu().numpy().squeeze()
        sample_rate = mg_model.sample_rate

        generated_tracks[section_name] = {
            "audio": audio_data,
            "sr": sample_rate,
            "duration": len(audio_data) / sample_rate,
            "time": gen_time,
            "description": description
        }

        print(f"Genere en {gen_time:.1f}s | Duree : {len(audio_data)/sample_rate:.1f}s | SR : {sample_rate}Hz")
        display(Audio(data=audio_data, rate=sample_rate, autoplay=False))

        # Sauvegarder
        if save_audio_files:
            import soundfile as sf
            filepath = OUTPUT_DIR / f"musicgen_{section_name}.wav"
            sf.write(str(filepath), audio_data, sample_rate)
            print(f"Sauvegarde : {filepath.name}")

    # Recapitulatif
    print(f"\nRecapitulatif des generations :")
    print(f"{'Section':<12} {'Duree (s)':<12} {'Temps gen (s)':<15}")
    print("-" * 39)
    for name, data in generated_tracks.items():
        print(f"{name:<12} {data['duration']:<12.1f} {data['time']:<15.1f}")

    # Liberer la memoire GPU
    del mg_model
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
    print(f"\nMemoire GPU liberee")
else:
    if not musicgen_available:
        print("MusicGen non disponible - installation requise : pip install audiocraft")
    else:
        print("Generation desactivee")

GENERATION MUSICALE - MUSICGEN
MusicGen non disponible - installation requise : pip install audiocraft


### Interpretation : Generation MusicGen

| Aspect | Valeur | Signification |
|--------|--------|---------------|
| Temps de generation | ~2-5s par section (small) | Rapide meme pour de la musique complexe |
| Qualite | 32kHz mono | Suffisante pour prototypage, pas pour production finale |
| Controle | Via le prompt textuel | Le style depend fortement de la description |

**Points cles** :
1. Des prompts detailles (instrumentation, ambiance, tempo) ameliorent la qualite
2. La generation est non-deterministe : chaque execution produit un resultat different
3. Le modele `large` offre une qualite significativement superieure au `small`

## Section 2 : Separation de stems avec Demucs

Demucs separe un mix musical en 4 stems :

| Stem | Description | Utilisation remix |
|------|-------------|-------------------|
| `drums` | Batterie, percussions | Ajuster le rythme |
| `bass` | Ligne de basse | Modifier les fondations |
| `other` | Melodie, harmonie (synths, guitares) | Elements melodiques |
| `vocals` | Voix (si presentes) | Isoler ou supprimer la voix |

La separation permet de remixer chaque element independamment.

In [5]:
# Separation de stems avec Demucs
print("SEPARATION DE STEMS - DEMUCS")
print("=" * 50)

separated_stems = {}

if generate_audio and demucs_available and generated_tracks:
    # Utiliser le track "chorus" (le plus riche)
    target_track = "chorus"
    if target_track not in generated_tracks:
        target_track = list(generated_tracks.keys())[0]

    track_data = generated_tracks[target_track]
    print(f"Separation du track : {target_track}")
    print(f"Duree : {track_data['duration']:.1f}s | SR : {track_data['sr']}Hz")

    # Sauvegarder le fichier pour Demucs
    import soundfile as sf
    input_path = OUTPUT_DIR / f"demucs_input_{target_track}.wav"
    sf.write(str(input_path), track_data['audio'], track_data['sr'])

    # Executer Demucs
    print(f"\nExecution de Demucs ({demucs_model})...")
    start_time = time.time()

    import subprocess
    result = subprocess.run(
        ["python", "-m", "demucs", "--name", demucs_model,
         "--out", str(STEMS_DIR), str(input_path)],
        capture_output=True, text=True, timeout=300
    )

    sep_time = time.time() - start_time

    if result.returncode == 0:
        print(f"Separation terminee en {sep_time:.1f}s")

        # Charger les stems
        stems_path = STEMS_DIR / demucs_model / input_path.stem
        stem_names = ["drums", "bass", "other", "vocals"]

        print(f"\nStems separes :")
        for stem_name in stem_names:
            stem_file = stems_path / f"{stem_name}.wav"
            if stem_file.exists():
                stem_audio, stem_sr = sf.read(str(stem_file))
                # Si stereo, prendre la moyenne pour la lecture
                if len(stem_audio.shape) > 1:
                    stem_mono = stem_audio.mean(axis=1)
                else:
                    stem_mono = stem_audio

                energy = np.sqrt(np.mean(stem_mono ** 2))
                separated_stems[stem_name] = {
                    "audio": stem_audio,
                    "mono": stem_mono,
                    "sr": stem_sr,
                    "energy": energy,
                    "path": stem_file
                }

                print(f"  {stem_name:8s} | Energie RMS : {energy:.4f} | Shape : {stem_audio.shape}")
                display(Audio(data=stem_mono, rate=stem_sr, autoplay=False))
            else:
                print(f"  {stem_name:8s} | Fichier non trouve")
    else:
        print(f"Erreur Demucs : {result.stderr[:200]}")

elif not demucs_available:
    print("Demucs non disponible - pip install demucs")
elif not generated_tracks:
    print("Pas de tracks generees - executez la Section 1 d'abord")
else:
    print("Separation desactivee")

SEPARATION DE STEMS - DEMUCS
Demucs non disponible - pip install demucs


### Interpretation : Separation de stems

| Stem | Energie RMS typique | Observation |
|------|-------------------|-------------|
| drums | Elevee | Batterie bien isolee si presente |
| bass | Moyenne | Basses frequences extraites |
| other | Variable | Contient melodie et harmonie |
| vocals | Faible (musique instrumentale) | Vide si pas de voix dans l'original |

> **Note technique** : Pour de la musique generee par MusicGen (instrumentale), le stem `vocals` sera quasi-vide. C'est normal.

## Section 3 : Remixage des stems

Le remixage consiste a recombiner les stems avec des niveaux differents pour creer une nouvelle version.

| Operation | Description | Parametre |
|-----------|-------------|----------|
| Volume | Ajuster le gain de chaque stem | dB |
| Mute | Supprimer un stem | On/Off |
| Solo | Isoler un stem | On/Off |
| Pan | Position stereo | Gauche/Centre/Droite |

In [6]:
# Remixage des stems
print("REMIXAGE DES STEMS")
print("=" * 50)

def remix_stems(stems: Dict, mix_config: Dict[str, float],
                sample_rate: int) -> np.ndarray:
    """Remixe les stems avec les gains specifies.

    Args:
        stems: Dict {name: {"mono": np.array, ...}}
        mix_config: Dict {name: gain_linear} (0.0 = mute, 1.0 = normal)
        sample_rate: Taux d'echantillonnage

    Returns:
        np.ndarray: Mix final
    """
    # Determiner la longueur maximale
    max_len = max(len(s['mono']) for s in stems.values())

    mix = np.zeros(max_len, dtype=np.float32)

    for stem_name, stem_data in stems.items():
        gain = mix_config.get(stem_name, 1.0)
        mono = stem_data['mono']

        # Pad si necessaire
        if len(mono) < max_len:
            mono = np.pad(mono, (0, max_len - len(mono)))

        mix += mono * gain

    # Normalisation pour eviter le clipping
    peak = np.max(np.abs(mix))
    if peak > 0:
        mix = mix / peak * 0.95

    return mix

if separated_stems:
    sr = list(separated_stems.values())[0]['sr']

    # Differentes configurations de mix
    mix_configs = {
        "Original (equilibre)": {"drums": 1.0, "bass": 1.0, "other": 1.0, "vocals": 1.0},
        "Instrumental (sans voix)": {"drums": 1.0, "bass": 1.0, "other": 1.0, "vocals": 0.0},
        "Rythme only": {"drums": 1.2, "bass": 0.8, "other": 0.0, "vocals": 0.0},
        "Melodique (sans batterie)": {"drums": 0.0, "bass": 0.5, "other": 1.2, "vocals": 0.8},
    }

    remix_results = {}
    for mix_name, config in mix_configs.items():
        print(f"\n--- {mix_name} ---")
        config_str = " | ".join([f"{k}: {v:.1f}" for k, v in config.items()])
        print(f"  Config : {config_str}")

        mix = remix_stems(separated_stems, config, sr)
        remix_results[mix_name] = mix

        display(Audio(data=mix, rate=sr, autoplay=False))

        if save_audio_files:
            import soundfile as sf
            safe_name = mix_name.lower().replace(' ', '_').replace('(', '').replace(')', '')
            filepath = OUTPUT_DIR / f"remix_{safe_name}.wav"
            sf.write(str(filepath), mix, sr)
            print(f"  Sauvegarde : {filepath.name}")

    print(f"\n{len(remix_results)} versions de remix creees")
else:
    print("Pas de stems disponibles - executez la Section 2 d'abord")
    remix_results = {}

REMIXAGE DES STEMS
Pas de stems disponibles - executez la Section 2 d'abord


### Interpretation : Remixage

| Version | Drums | Bass | Other | Vocals | Caractere |
|---------|-------|------|-------|--------|----------|
| Original | 1.0 | 1.0 | 1.0 | 1.0 | Fidele au mix original |
| Instrumental | 1.0 | 1.0 | 1.0 | 0.0 | Karaoke / fond musical |
| Rythme only | 1.2 | 0.8 | 0.0 | 0.0 | Base rythmique |
| Melodique | 0.0 | 0.5 | 1.2 | 0.8 | Ambiance melodique |

**Points cles** :
1. La normalisation previent le clipping lors de l'amplification
2. Combiner separation et remixage ouvre des possibilites creatives infinies
3. Les gains >1.0 amplifient, <1.0 attenuent, 0.0 coupe completement

## Section 4 : Effets audio

Les effets audio transforment le caractere sonore d'un signal :

| Effet | Description | Parametre principal |
|-------|-------------|-------------------|
| Reverb | Simulation d'espace (salle, cathedral) | Decay time |
| EQ | Egalisation des frequences | Bandes de frequences |
| Compression | Reduction de la dynamique | Ratio, threshold |
| Fade | Fondu d'entree/sortie | Duree (ms) |

In [7]:
# Application d'effets audio
print("EFFETS AUDIO")
print("=" * 50)

def apply_reverb(audio: np.ndarray, sr: int, decay: float = 0.3,
                 delay_ms: float = 40) -> np.ndarray:
    """Applique une reverb simple (delay + feedback)."""
    delay_samples = int(sr * delay_ms / 1000)
    output = np.copy(audio).astype(np.float64)

    # Plusieurs taps de delay pour simuler les reflexions
    for tap_mult in [1.0, 1.7, 2.3, 3.1]:
        tap_delay = int(delay_samples * tap_mult)
        tap_gain = decay ** tap_mult
        if tap_delay < len(output):
            output[tap_delay:] += audio[:len(audio) - tap_delay] * tap_gain

    # Normalisation
    peak = np.max(np.abs(output))
    if peak > 0:
        output = output / peak * 0.95

    return output.astype(np.float32)

def apply_eq(audio: np.ndarray, sr: int,
             bass_gain: float = 1.0, mid_gain: float = 1.0,
             treble_gain: float = 1.0) -> np.ndarray:
    """EQ 3 bandes simple (basses, mediums, aigus)."""
    from scipy.signal import butter, sosfilt

    # Filtres passe-bande
    sos_low = butter(4, 300, btype='low', fs=sr, output='sos')
    sos_mid = butter(4, [300, 4000], btype='band', fs=sr, output='sos')
    sos_high = butter(4, 4000, btype='high', fs=sr, output='sos')

    bass = sosfilt(sos_low, audio) * bass_gain
    mid = sosfilt(sos_mid, audio) * mid_gain
    treble = sosfilt(sos_high, audio) * treble_gain

    result = (bass + mid + treble).astype(np.float32)

    peak = np.max(np.abs(result))
    if peak > 0:
        result = result / peak * 0.95

    return result

if generate_audio and apply_effects and generated_tracks:
    # Utiliser le track "verse" pour les effets
    target = "verse" if "verse" in generated_tracks else list(generated_tracks.keys())[0]
    source_audio = generated_tracks[target]['audio']
    sr = generated_tracks[target]['sr']

    print(f"Application des effets sur : {target}")

    effects_results = {}

    # Reverb
    print(f"\n--- Reverb (decay=0.4, delay=50ms) ---")
    reverbed = apply_reverb(source_audio, sr, decay=0.4, delay_ms=50)
    effects_results["reverb"] = reverbed
    display(Audio(data=reverbed, rate=sr, autoplay=False))

    # EQ : boost des basses
    print(f"\n--- EQ (bass boost) ---")
    eq_bass = apply_eq(source_audio, sr, bass_gain=1.5, mid_gain=1.0, treble_gain=0.8)
    effects_results["eq_bass_boost"] = eq_bass
    display(Audio(data=eq_bass, rate=sr, autoplay=False))

    # EQ : voix claire (mid boost)
    print(f"\n--- EQ (mid boost - clarte) ---")
    eq_mid = apply_eq(source_audio, sr, bass_gain=0.8, mid_gain=1.3, treble_gain=1.1)
    effects_results["eq_clarity"] = eq_mid
    display(Audio(data=eq_mid, rate=sr, autoplay=False))

    # Sauvegarder
    if save_audio_files:
        import soundfile as sf_lib
        for name, audio in effects_results.items():
            filepath = OUTPUT_DIR / f"effect_{name}.wav"
            sf_lib.write(str(filepath), audio, sr)
            print(f"Sauvegarde : {filepath.name}")

    print(f"\n{len(effects_results)} effets appliques")
else:
    print("Effets desactives ou pas de tracks disponibles")
    effects_results = {}

EFFETS AUDIO
Effets desactives ou pas de tracks disponibles


## Section 5 : Composition multi-sections

Une composition complete suit une structure musicale standard :

| Section | Duree typique | Caractere | Transition |
|---------|--------------|-----------|------------|
| Intro | 4-8s | Doux, progressif | Fade in |
| Couplet (Verse) | 8-16s | Energie moyenne | Crossfade |
| Refrain (Chorus) | 8-16s | Pleine energie | Crossfade |
| Outro | 4-8s | Decrescendo | Fade out |

Nous assemblons les sections generees en appliquant des transitions fluides.

In [8]:
# Composition multi-sections avec transitions
print("COMPOSITION MULTI-SECTIONS")
print("=" * 50)

if generate_audio and generated_tracks:
    # Structure de la composition
    composition_structure = ["intro", "verse", "chorus", "verse", "chorus"]

    # Determiner le sample rate
    sr = list(generated_tracks.values())[0]['sr']

    # Parametres de transition
    fade_duration_ms = 500  # Duree du crossfade en ms
    fade_samples = int(sr * fade_duration_ms / 1000)

    print(f"Structure : {' -> '.join(composition_structure)}")
    print(f"Crossfade : {fade_duration_ms}ms ({fade_samples} samples)")

    # Construire la composition
    composition = np.array([], dtype=np.float32)

    for i, section_name in enumerate(composition_structure):
        if section_name not in generated_tracks:
            print(f"  Section '{section_name}' non disponible, saut")
            continue

        section_audio = generated_tracks[section_name]['audio'].copy()

        # Fade in pour la premiere section
        if i == 0:
            fade_in = np.linspace(0, 1, min(fade_samples, len(section_audio)))
            section_audio[:len(fade_in)] *= fade_in

        # Fade out pour la derniere section
        if i == len(composition_structure) - 1:
            fade_out = np.linspace(1, 0, min(fade_samples, len(section_audio)))
            section_audio[-len(fade_out):] *= fade_out

        # Crossfade avec la section precedente
        if i > 0 and len(composition) >= fade_samples:
            # Zone de crossfade
            xfade_len = min(fade_samples, len(section_audio))
            fade_out_curve = np.linspace(1, 0, xfade_len)
            fade_in_curve = np.linspace(0, 1, xfade_len)

            # Appliquer le crossfade
            composition[-xfade_len:] *= fade_out_curve
            section_audio[:xfade_len] *= fade_in_curve
            composition[-xfade_len:] += section_audio[:xfade_len]

            # Ajouter le reste de la section
            composition = np.concatenate([composition, section_audio[xfade_len:]])
        else:
            composition = np.concatenate([composition, section_audio])

        print(f"  [{i+1}] {section_name:8s} | {len(section_audio)/sr:.1f}s | Total : {len(composition)/sr:.1f}s")

    # Normalisation finale
    peak = np.max(np.abs(composition))
    if peak > 0:
        composition = composition / peak * 0.95

    print(f"\nComposition finale :")
    print(f"  Duree : {len(composition)/sr:.1f}s")
    print(f"  Sections : {len(composition_structure)}")
    print(f"  Sample rate : {sr}Hz")

    display(Audio(data=composition, rate=sr, autoplay=False))

    # Sauvegarder
    if save_audio_files:
        import soundfile as sf
        filepath = OUTPUT_DIR / "composition_finale.wav"
        sf.write(str(filepath), composition, sr)
        print(f"\nSauvegarde : {filepath.name} ({filepath.stat().st_size/1024:.1f} KB)")
else:
    composition = np.array([])
    print("Composition desactivee (pas de tracks disponibles)")

COMPOSITION MULTI-SECTIONS
Composition desactivee (pas de tracks disponibles)


### Interpretation : Composition multi-sections

| Aspect | Valeur | Signification |
|--------|--------|---------------|
| Crossfade | 500ms | Transition fluide entre sections |
| Fade in/out | 500ms | Entree/sortie progressive |
| Normalisation | -0.5 dBFS (0.95) | Marge de securite anti-clipping |

> **Note technique** : Pour une production plus avancee, utiliser des crossfades non-lineaires (courbes exponentielles) et des transitions musicalement coherentes (sur les temps forts).

## Section 6 : Export final avec metadonnees

L'export final ajoute des metadonnees ID3 au fichier audio pour une utilisation professionnelle.

In [9]:
# Export final avec metadonnees
print("EXPORT FINAL")
print("=" * 50)

if generate_audio and len(composition) > 0:
    sr = list(generated_tracks.values())[0]['sr']

    # Convertir numpy en AudioSegment pour l'export avec metadonnees
    # Normaliser en int16
    audio_int16 = (composition * 32767).astype(np.int16)
    audio_segment = AudioSegment(
        data=audio_int16.tobytes(),
        sample_width=2,
        frame_rate=sr,
        channels=1
    )

    # Export MP3 avec metadonnees
    export_path = OUTPUT_DIR / "production_finale.mp3"
    metadata = {
        "title": "Composition IA",
        "artist": "MusicGen + Demucs Pipeline",
        "album": "Cours Audio GenAI",
        "date": datetime.now().strftime("%Y"),
        "genre": "Electronic",
        "comment": f"Genere avec MusicGen ({musicgen_model}) et remixe avec Demucs ({demucs_model})"
    }

    tags = {f"-metadata {k}": v for k, v in metadata.items()}
    audio_segment.export(
        str(export_path),
        format="mp3",
        bitrate="256k",
        tags=metadata
    )

    print(f"Fichier exporte : {export_path.name}")
    print(f"Taille : {export_path.stat().st_size/1024:.1f} KB")
    print(f"Bitrate : 256 kbps")
    print(f"\nMetadonnees :")
    for key, value in metadata.items():
        print(f"  {key:10s} : {value}")

    print(f"\nEcoute de la production finale :")
    display(Audio(filename=str(export_path)))
else:
    print("Export desactive (pas de composition disponible)")

EXPORT FINAL
Export desactive (pas de composition disponible)


In [10]:
# Mode interactif - Composition personnalisee
if notebook_mode == "interactive" and not skip_widgets:
    print("MODE INTERACTIF - COMPOSITION PERSONNALISEE")
    print("=" * 50)
    print("\nDecrivez le style de musique que vous souhaitez generer :")
    print("(Laissez vide pour passer)")

    try:
        user_desc = input("\nDescription musicale : ")

        if user_desc.strip() and musicgen_available:
            print(f"\nGeneration en cours...")
            mg_model = MusicGen.get_pretrained(musicgen_model)
            mg_model.set_generation_params(duration=generation_duration)

            wav = mg_model.generate([user_desc])
            audio = wav[0].cpu().numpy().squeeze()
            sr = mg_model.sample_rate

            print(f"Musique generee ({len(audio)/sr:.1f}s) :")
            display(Audio(data=audio, rate=sr, autoplay=False))

            if save_audio_files:
                import soundfile as sf
                ts = datetime.now().strftime('%Y%m%d_%H%M%S')
                filepath = OUTPUT_DIR / f"custom_{ts}.wav"
                sf.write(str(filepath), audio, sr)
                print(f"Sauvegarde : {filepath.name}")

            del mg_model
            if torch.cuda.is_available():
                torch.cuda.empty_cache()
        else:
            print("Mode interactif ignore")

    except (KeyboardInterrupt, EOFError):
        print("Mode interactif interrompu")
    except Exception as e:
        error_type = type(e).__name__
        if "StdinNotImplemented" in error_type or "input" in str(e).lower():
            print("Mode interactif non disponible (execution automatisee)")
        else:
            print(f"Erreur : {error_type} - {str(e)[:100]}")
else:
    print("Mode batch - Interface interactive desactivee")

MODE INTERACTIF - COMPOSITION PERSONNALISEE

Decrivez le style de musique que vous souhaitez generer :
(Laissez vide pour passer)
Mode interactif non disponible (execution automatisee)


In [11]:
# Statistiques de session
print("STATISTIQUES DE SESSION")
print("=" * 50)

print(f"Date : {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"Mode : {notebook_mode}")
print(f"MusicGen : {musicgen_model} | Demucs : {demucs_model}")

if generated_tracks:
    total_gen_time = sum(t['time'] for t in generated_tracks.values())
    total_duration = sum(t['duration'] for t in generated_tracks.values())
    print(f"Tracks generes : {len(generated_tracks)} ({total_duration:.1f}s total)")
    print(f"Temps de generation total : {total_gen_time:.1f}s")

if separated_stems:
    print(f"Stems separes : {len(separated_stems)}")

if remix_results:
    print(f"Versions remixees : {len(remix_results)}")

if len(composition) > 0:
    sr = list(generated_tracks.values())[0]['sr']
    print(f"Composition finale : {len(composition)/sr:.1f}s")

if save_audio_files:
    saved_files = list(OUTPUT_DIR.glob('*'))
    total_size = sum(f.stat().st_size for f in saved_files if f.is_file()) / 1024
    print(f"Fichiers sauvegardes : {len(saved_files)} ({total_size:.1f} KB)")

print(f"\nPROCHAINES ETAPES")
print(f"1. Synchroniser audio et video (04-4-Audio-Video-Sync)")
print(f"2. Explorer la serie Video (Video/01-Foundation)")

print(f"\nNotebook termine - {datetime.now().strftime('%H:%M:%S')}")

STATISTIQUES DE SESSION
Date : 2026-02-18 10:52:36
Mode : interactive
MusicGen : facebook/musicgen-small | Demucs : htdemucs
Fichiers sauvegardes : 1 (0.0 KB)

PROCHAINES ETAPES
1. Synchroniser audio et video (04-4-Audio-Video-Sync)
2. Explorer la serie Video (Video/01-Foundation)

Notebook termine - 10:52:36
