Skip to content

jakeoneijk/FlowMatchingTextToMusicTutorial

Repository files navigation

Flow Matching Text-To-Music Tutorial

This is a PyTorch tutorial on Flow Matching for Text-To-Music. The main goal of this repository is to learn flow matching at the code level through a fun task and a simple dataset.

Setup

Clone the Repository

git clone git@github.com:jakeoneijk/FlowMatchingTextToMusicTutorial.git
cd FlowMatchingTextToMusicTutorial

Create a Conda Environment (Optional)

If you don't want to use a Conda environment, you may skip this step.

source conda create -n flow python==3.11
conda activate flow

Install PyTorch.

👉 You should check your CUDA Version and install compatible version.

Install Requirements

pip install -r ./requirements.txt

Download Pretrained Weights

Download the pretrained weights for both the AutoEncoder and CLAP models:

Save them to the following directory:

 .
 └── CKPT
     ├── autoencoder.pth
     └── music_audioset_epoch_15_esc_90.14.pt

Download the Medley-solos-DB dataset and place it in the following directory:

 .
 └── Data
     └── Dataset
         └── MedleySolosDB
             ├── ~.wav
             ├── ...
             └── ~.wav

Training

Check HParams.py for Configurations

class Mode:
    # You can choose how to optimize the model
    config_name:str = [
        'diffusion', 
        'flow'
    ][1]
    # Currently only supports the "train" stage  
    stage:str = {
        0:"preprocess", 
        1:"train", 
        2:"inference", 
        3:"evaluate"
    }[1]

class Resource:
    # Choose device
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

Train the Model

If you don’t set lv (log visualizer), TensorBoard will be used by default.

python Main.py -lv wandb -do

References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published