<a href="https://colab.research.google.com/github/hypereikon/ml_art_notebooks/blob/main/text2wav.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#text2wav

〰ଘ(੭ˊᵕˋ)੭ ..。.:*･ﾟ♫₊ ♪ *♬‧₊‹𝟹" </br>
notebook para jugar con [MusicGen de AudioCraft (MetaAI)](https://github.com/facebookresearch/audiocraft)

este notebook solo permite generacion larga de texto-a-audio</br>
otras funcionalidades de musicgen no estan incluidas en este notebook

**el proceso consiste en:**
- genera un audio de 30 segundos segun el prompt indicado en ***descripcion***
- continua esta generacion desde el final de audio anterior, lo usa ademas como prompt de audio, y continua generando-iterando la cantidad de veces definida en ***cantidad_pasadas***
- concatena (suma una tras otra) estas generaciones en un audio final

**multiples descripciones**
- se pueden usar mas de una descripcion, separandolas con // (con espacios antes y despues del separador)


los archivos quedan ordenados en el sistema de colab, puedes acceder a ellos clickeando 📁*el simbolo de carpeta*📁 en la barra lateral izquierda

demora 7 minutos en generar un audio de 1:30 con **large** en una T4 (colab free)<br>
demora 3 minutos en generar un audio de 1:30 con **small** en una T4


notebook por [@hypereikon](https://hypereikon.glitch.me/) 10/06/23, *consultas e ideas por [dm](https://www.instagram.com/hypereikon/)*

♥ ♬♪♫ ヾ(*・。・)ﾉ ♬♪♫ ♥

#####*detalles updates*


---
implementacion inicial 10/06<br>
update 07/07:
- setear seed para generacion deterministica, seed negativa elige una random
- en caso de usar el mismo nombre no sobreescribira las generaciones
- guarda los parametros de cada generacion como archivo de texto

update 17/07:
- setear multiples descripciones
---

#####**generacion**

In [1]:
#@markdown instalar e importar librerias
#!pip install -U git+https://github.com/facebookresearch/audiocraft

!pip install 'torch>=2.0'
!pip install -U git+https://github.com/hypereikon/audiocraft #fork nuestro para setear la seedfrom audiocraft.models import musicgen
import torch
from audiocraft.models import MusicGen
from audiocraft.utils.notebook import display_audio
from audiocraft.modules.conditioners import ConditioningAttributes
import math
import torchaudio
import os
from tqdm.notebook import tqdm
import random
from google.colab import output

output.clear()

In [2]:
#@markdown definir funciones
def save_file(output, project_name, projects_path, continuation_i=None, is_final=False):
    """
    This function saves an audio file.

    Args:
        output (torch.Tensor): The tensor containing the audio data to be saved.
        base_filename (str): The base directory name where the file will be saved.
        continuation_i (int, optional): The number of the continuation being saved. Default is None.
        seed (int): The seed used for generating the audio.
        is_final (bool, optional): Specifies if the audio being saved is the final output. Default is False.

    Returns:
        filename (str): The path of the saved file.
    """

    # Create a new directory for each base_filename if it does not exist
    dir_path = f"{projects_path}/{project_name}/continuations"
    os.makedirs(dir_path, exist_ok=True)

    # Create the new filename
    if is_final:
        filename = f"{projects_path}/{project_name}/{project_name}.wav"
    else:
        filename = f"{projects_path}/{project_name}/continuations/{project_name}_{continuation_i}.wav"

    # Save the file
    torchaudio.save(filename, output[0].cpu(), 32000)

    # Return the filename
    return filename


def generate_and_save_continuation(model, waveform, input_sr, project_name, projects_path, continuation_i, seed, input_texts):
    """
    This function generates an audio continuation using a trained model, and then saves the output.

    Args:
        model (torch.nn.Module): The trained model to generate the audio continuation.
        waveform (torch.Tensor): The tensor containing the prompt audio data.
        input_sr (int): The sample rate of the prompt audio.
        base_filename (str): The base directory name where the file will be saved.
        continuation_i (int): The number of the continuation being generated.
        generator (torch.Generator): The PyTorch generator object for random number generation.
        seed (int): The seed used for generating the audio.
        description (str): The description text for the continuation.

    Returns:
        output (torch.Tensor): The generated audio continuation.
        filename (str): The path of the saved file.
    """

    # Check if input_texts is a list or a single string and select the description accordingly
    if isinstance(input_texts, list):
        num_prompts = len(input_texts)
        description = input_texts[continuation_i % num_prompts]
    else:
        description = input_texts

    # Generate the continuation
    output = model.generate_continuation(
        prompt=waveform,
        prompt_sample_rate=input_sr,
        descriptions=[description],
        progress=True,
#        generator=torch.Generator(device='cuda').manual_seed(seed)
        generator=torch.Generator(device='cuda').manual_seed(seed+continuation_i)
    )

    # Save the file and get its filename
    filename = save_file(output, project_name, projects_path, continuation_i, is_final=False)

    return output, filename


def generate_arbitrary_length(input_texts, project_name, model, num_continuations, seed=None):
    input_texts = input_texts.split(' // ')

    # Generate seed
    if seed is None or seed <= 0:
        seed = random.randint(0, 2**32-1)
        print(seed)

    # Path where your projects are located
#    if guardar_en_drive
#    else ->
    projects_path = "/content"

    # Check if directory already exists and find the maximum suffix of existing directories
    dirs = [d for d in os.listdir(projects_path) if os.path.isdir(os.path.join(projects_path, d)) and d.startswith(project_name)]
    suffixes = [int(d.split("_")[-1]) for d in dirs if d.split("_")[-1].isdigit()]
    if suffixes:
        max_suffix = max(suffixes)
    else:
        max_suffix = -1

    # Create a new directory with an incremented suffix
    project_name = f"{project_name}_{str(max_suffix + 1).zfill(2)}"

    # Ensure the base directory exists
    os.makedirs(os.path.join(projects_path, project_name), exist_ok=True)

    # Save parameters to log file
    with open(f"{projects_path}/{project_name}/{project_name}.txt", "w") as log_file:
        # Check if input_texts is a list or a single string and write accordingly
        log_file.write(f"nombre: {project_name}\n")
        if isinstance(input_texts, list):
            for i, prompt in enumerate(input_texts):
                log_file.write(f"prompt {i + 1}: {prompt}\n")
        else:
            log_file.write(f"prompt: {input_texts}\n")
        log_file.write(f"seed: {seed}\n")
        log_file.write(f"modelo: {modelo}\n")
        log_file.write(f"diversidad: {diversidad}\n")
        log_file.write(f"temperatura: {temperatura}\n")
        log_file.write(f"guidance: {guidance}")

    # Check if input_texts is a list or a single string and select the initial description accordingly
    initial_text = input_texts[0] if isinstance(input_texts, list) else input_texts


    # Generate the initial audio clip
    audio_inicial = model.generate(
        descriptions=[initial_text],
        progress=True,
        generator=torch.Generator(device='cuda').manual_seed(seed),
    )

    # Save the initial file and get its filename
    filename = save_file(audio_inicial, project_name, projects_path, continuation_i=0, is_final=False)

    # Initialize list to hold continuations
    continuations = [audio_inicial]
    filenames = [filename]
    # Generate the continuations
    for continuation_i in range(1, num_continuations):
        prompt = continuations[-1][:, :, -10*32000:]  # Take the last 10 seconds as the new prompt

        continuation, filename = generate_and_save_continuation(
            model,
            prompt,
            32000,  # Sample rate
            project_name,
            projects_path,
            continuation_i,
            seed,
            input_texts  # Pass the list of prompts or a single prompt
        )

        continuations.append(continuation[:, :, 10*32000:])  # Discard the first 10 seconds of the continuation
        filenames.append(filename)

    # Concatenate and save the final output
    final_output = torch.cat(continuations, dim=-1)
    final_filename = save_file(final_output, project_name, projects_path, continuation_i=None, is_final=True)
    display_audio(final_output, 32000)

    print(" ")
    print(project_name)
    print(" ")
    print("seed:", seed)
    print("diversidad:", diversidad, "temperatura:", temperatura, "guidance:", guidance)
    print(" ")
    # Check if input_texts is a list or a single string and print accordingly
    if isinstance(input_texts, list):
        for i, prompt in enumerate(input_texts):
            print(f"prompt {i + 1}: {prompt}")
    else:
        print("prompt:", input_texts)

    print(" ")
    return final_filename


In [5]:
#@markdown definir modelo a usar
modelo = 'large' #@param ["small", "medium", "large"]
model = MusicGen.get_pretrained(modelo, device='cuda')


#@markdown los valores default son 250, 1.0, y 3.0 respectivamente
diversidad=512#@param{type:"integer"}
temperatura=1.05#@param{type:"number"}
guidance=2.5#@param{type:"number"}
duration=30#param{type:"number"}
model.set_generation_params(
    use_sampling=True,
    top_k=diversidad, #diversidad
    temperature=temperatura, #confianza
    duration=duration,
    cfg_coef=guidance, #guidance
)
output.clear()
print("diversidad:",diversidad,"temperatura:",temperatura,"guidance:",guidance)

diversidad: 512 temperatura: 1.05 guidance: 2.5


In [6]:

#@title hacer la ejecutacion

#@markdown recuerda cambiar el nombre cada ejecucion
nombre_archivos= "text2wav" #@param{type:"string"}

descripcion = "immersed sounds of a biomechanical organism that generates robotic rhythms, astringent sounds that corrode like metal and spilling liquids, along with drones and environments of an aphextwin-type rhythmic ecosystem where chaotic moments of saturated overdrive and complex combinations are generated" #@param {type:"string"}

cantidad_pasadas=3 #@param{type:"integer"}
seed = 1357924680 #@param{type:"integer"}
# Generate files
generate_arbitrary_length(descripcion, nombre_archivos, model, cantidad_pasadas, seed)




 
text2wav_01
 
seed: 1357924680
diversidad: 512 temperatura: 1.05 guidance: 2.5
 
prompt 1: immersed sounds of a biomechanical organism that generates robotic rhythms, astringent sounds that corrode like metal and spilling liquids, along with drones and environments of an aphextwin-type rhythmic ecosystem where chaotic moments of saturated overdrive and complex combinations are generated
 


'/content/text2wav_01/text2wav_01.wav'