<!-- Logo al 25% -->
<td width="45%" align="left" valign="middle">
  <img src="https://www.upc.edu/comunicacio/ca/identitat/descarrega-arxius-grafics/fitxers-marca-principal/upc-positiu-p3005.png" width="300">
</td>

<!-- Texto al 75%, alineado a la derecha -->
<td width="5%" align="right" valign="middle">
  <p style="margin: 0;"><b>Intelligence Data Science and Artificial Intelligence (IDEAI)</b></p>
  <p style="margin: 0;"><b>Grau en Estadística (UB - UPC)</b></p>
  <p style="margin: 0;">Mètodes Estadístics per la Mineria de Dades (MeMDa)</p>
</td>

# 🎼 **Redes Neuronales: Generador de música**

## 🧩 1. Instalación de dependencias

In [1]:
!pip install -q transformers scipy accelerate

### 🧠 2. Cargar el modelo de MusicGen

Usamos el modelo pequeño (`musicgen-small`) para que vaya más ligero en Colab.

In [2]:
import torch
from transformers import AutoProcessor, MusicgenForConditionalGeneration

device = "cuda" if torch.cuda.is_available() else "cpu"
print("Usando dispositivo:", device)

# Cargamos processor y modelo
processor = AutoProcessor.from_pretrained("facebook/musicgen-small")
model = MusicgenForConditionalGeneration.from_pretrained("facebook/musicgen-small")

# Mover a GPU si hay
model.to(device)


Usando dispositivo: cuda


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


preprocessor_config.json:   0%|          | 0.00/275 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/2.36G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/224 [00:00<?, ?B/s]

MusicgenForConditionalGeneration(
  (text_encoder): T5EncoderModel(
    (shared): Embedding(32128, 768)
    (encoder): T5Stack(
      (embed_tokens): Embedding(32128, 768)
      (block): ModuleList(
        (0): T5Block(
          (layer): ModuleList(
            (0): T5LayerSelfAttention(
              (SelfAttention): T5Attention(
                (q): Linear(in_features=768, out_features=768, bias=False)
                (k): Linear(in_features=768, out_features=768, bias=False)
                (v): Linear(in_features=768, out_features=768, bias=False)
                (o): Linear(in_features=768, out_features=768, bias=False)
                (relative_attention_bias): Embedding(32, 12)
              )
              (layer_norm): T5LayerNorm()
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (1): T5LayerFF(
              (DenseReluDense): T5DenseActDense(
                (wi): Linear(in_features=768, out_features=3072, bias=False)
                (wo): L

`facebook/musicgen-small` es un modelo de texto-a-música entrenado por Meta, publicado con licencia **cc-by-nc-4.0** y disponible en Hugging Face.

### 🎼 3. Función `compose_music(prompt, ...)`

Esta función:

* Recibe un **prompt de texto**
* Elige una **duración aproximada en segundos**
* Genera música con MusicGen
* Guarda el resultado en `.wav`
* Lo reproduce en la celda de Colab

In [3]:
import scipy.io.wavfile
from IPython.display import Audio, display
import numpy as np
import random

def compose_music(
    prompt: str,
    duration_seconds: int = 10,
    output_path: str = "musicgen_out.wav",
    seed: int | None = None,
):
    """
    Genera música a partir de un prompt de texto usando MusicGen (facebook/musicgen-small).

    - prompt: descripción musical (en inglés suele ir mejor)
    - duration_seconds: duración aproximada del audio (máx ~30s)
    - output_path: ruta donde se guardará el .wav
    - seed: semilla para reproducibilidad (None = aleatoria)
    """

    if seed is None:
        seed = random.randint(0, 10_000)

    torch.manual_seed(seed)

    # MusicGen está limitado a unos ~30s (1503 tokens aprox) :contentReference[oaicite:2]{index=2}
    tokens_per_second = 60.0
    max_new_tokens = int(duration_seconds * tokens_per_second)
    max_new_tokens = min(max_new_tokens, 1503)  # seguridad

    print(f"Prompt: {prompt}")
    print(f"Duración objetivo: ~{duration_seconds}s  | max_new_tokens: {max_new_tokens}  | seed: {seed}")

    # Preparar entrada
    inputs = processor(
        text=[prompt],
        padding=True,
        return_tensors="pt",
    )
    inputs = {k: v.to(device) for k, v in inputs.items()}

    # Generar audio
    with torch.no_grad():
        audio_values = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=True,   # sampling mejor que greedy para música :contentReference[oaicite:3]{index=3}
            temperature=1.0,
        )

    # audio_values: [batch, channels, time]
    audio_array = audio_values[0, 0].cpu().numpy()

    # Aseguramos tipo float32 y recortamos a [-1, 1] por si acaso
    audio_array = audio_array.astype(np.float32)
    audio_array = np.clip(audio_array, -1.0, 1.0)

    sampling_rate = model.config.audio_encoder.sampling_rate

    # Guardar a WAV
    scipy.io.wavfile.write(output_path, rate=sampling_rate, data=audio_array)
    print(f"\nArchivo guardado en: {output_path}")

    # Reproducir en Colab
    display(Audio(audio_array, rate=sampling_rate))

    return output_path

### ▶️ 4. Ejemplos de uso

#### Ejemplo 1: Lo-fi relajado

In [4]:
compose_music(
    prompt="lo-fi chill beat with soft piano and vinyl crackle, relaxed night study vibe",
    duration_seconds=12,
    output_path="lofi_study.wav",
    seed=1234,
)

Prompt: lo-fi chill beat with soft piano and vinyl crackle, relaxed night study vibe
Duración objetivo: ~12s  | max_new_tokens: 720  | seed: 1234

Archivo guardado en: lofi_study.wav


'lofi_study.wav'

#### Ejemplo 2: Épica orquestal

In [5]:
compose_music(
    prompt="epic orchestral soundtrack with strings, brass, and big percussion, heroic and cinematic",
    duration_seconds=15,
    output_path="epic_orchestra.wav",
)

Prompt: epic orchestral soundtrack with strings, brass, and big percussion, heroic and cinematic
Duración objetivo: ~15s  | max_new_tokens: 900  | seed: 3137

Archivo guardado en: epic_orchestra.wav


'epic_orchestra.wav'

#### Ejemplo 3: Techno / EDM

In [6]:
compose_music(
    prompt="fast techno track with punchy kick, deep bass and futuristic synths, club atmosphere",
    duration_seconds=8,
    output_path="techno_clip.wav",
)

Prompt: fast techno track with punchy kick, deep bass and futuristic synths, club atmosphere
Duración objetivo: ~8s  | max_new_tokens: 480  | seed: 3178

Archivo guardado en: techno_clip.wav


'techno_clip.wav'

#### Ejemplo4: Reggaeton

In [9]:
compose_music(
    prompt="modern reggaeton beat with strong dembow rhythm, sub heavy bass, crisp hi hats, and latin percussion. add male vocal melody singing wordless reggaeton-style phrases like \"ohh\", \"ahh\", \"ey\", with rhythmic flow, catchy hook, warm reverb, nightclub vibe, high quality production in the style of a modern latin urban hit, not copying any existing melody.",
    duration_seconds=20,
    output_path="reggaeton_music.wav",
)

Prompt: modern reggaeton beat with strong dembow rhythm, sub heavy bass, crisp hi hats, and latin percussion. add male vocal melody singing wordless reggaeton-style phrases like "ohh", "ahh", "ey", with rhythmic flow, catchy hook, warm reverb, nightclub vibe, high quality production in the style of a modern latin urban hit, not copying any existing melody.
Duración objetivo: ~20s  | max_new_tokens: 1200  | seed: 5698

Archivo guardado en: reggaeton_music.wav


'reggaeton_music.wav'