Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
sakemin committed Nov 2, 2023
2 parents fde0947 + eba100f commit 011220e
Showing 1 changed file with 118 additions and 0 deletions.
118 changes: 118 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# Cog Implementation of MusicGen-Remixer
[![Replicate](https://replicate.com/sakemin/musicgen-chord/badge)](https://replicate.com/sakemin/musicgen-remixer)

MusicGen Remixer is an app based on [MusicGen Chord](https://github.com/sakemin/cog-musicgen-chord), the modified version of Meta's [MusicGen](https://github.com/facebookresearch/audiocraft) Melody model, which can generate music based on audio-based chord conditions or text-based chord conditions.

You can demo this model or learn how to use it with Replicate's API [here](https://replicate.com/sakemin/musicgen-remixer).

# Run with Cog

[Cog](https://github.com/replicate/cog) is an open-source tool that packages machine learning models in a standard, production-ready container.
You can deploy your packaged model to your own infrastructure, or to [Replicate](https://replicate.com/), where users can interact with it via web interface or API.

## Prerequisites

**Cog.** Follow these [instructions](https://github.com/replicate/cog#install) to install Cog, or just run:

```
sudo curl -o /usr/local/bin/cog -L "https://github.com/replicate/cog/releases/latest/download/cog_$(uname -s)_$(uname -m)"
sudo chmod +x /usr/local/bin/cog
```

Note, to use Cog, you'll also need an installation of [Docker](https://docs.docker.com/get-docker/).

* **GPU machine.** You'll need a Linux machine with an NVIDIA GPU attached and the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker) installed. If you don't already have access to a machine with a GPU, check out our [guide to getting a
GPU machine](https://replicate.com/docs/guides/get-a-gpu-machine).

## Step 1. Clone this repository

```sh
git clone https://github.com/sakemin/cog-musicgen-chord
```

## Step 2. Run the model

To run the model, you need a local copy of the model's Docker image. You can satisfy this requirement by specifying the image ID in your call to `predict` like:

```
cog predict r8.im/sakemin/musicgen-remixer@sha256:d7e98a2e92eaa33c4e1d43588fb4b37a9766b3ba2df634295218d165618dc733 -i prompt="bossa nova" -i music_input=@/your/path/to/input/music.wav
```

For more information, see the Cog section [here](https://replicate.com/sakemin/musicgen-remixer/api#run)

Alternatively, you can build the image yourself, either by running `cog build` or by letting `cog predict` trigger the build process implicitly. For example, the following will trigger the build process and then execute prediction:

```
cog predict -i prompt="bossa nova" -i music_input=@/your/path/to/input/music.wav
```

Note, the first time you run `cog predict`, model weights and other requisite assets will be downloaded if they're not available locally. This download only needs to be executed once.

# Run on replicate

## Step 1. Ensure that all assets are available locally

If you haven't already, you should ensure that your model runs locally with `cog predict`. This will guarantee that all assets are accessible. E.g., run:

```
cog predict -i prompt="bossa nova" -i music_input=@/your/path/to/input/music.wav
```

## Step 2. Create a model on Replicate.

Go to [replicate.com/create](https://replicate.com/create) to create a Replicate model. If you want to keep the model private, make sure to specify "private".

## Step 3. Configure the model's hardware

Replicate supports running models on variety of CPU and GPU configurations. For the best performance, you'll want to run this model on an A100 instance.

Click on the "Settings" tab on your model page, scroll down to "GPU hardware", and select "A100". Then click "Save".

## Step 4: Push the model to Replicate


Log in to Replicate:

```
cog login
```

Push the contents of your current directory to Replicate, using the model name you specified in step 1:

```
cog push r8.im/username/modelname
```
[Learn more about pushing models to Replicate.](https://replicate.com/docs/guides/push-a-model)

---
# Prediction
## Prediction Parameters
- `prompt`: A description of the music you want to generate.
- `music_input`: An audio file input for the remix.
- `multi_band_diffusion`: If `True`, the EnCodec tokens will be decoded with MultiBand Diffusion.
- `normalization_strategy`: Strategy for normalizing audio.
- `beat_sync_threshold`: When beat syncing, if the gap between generated downbeat timing and input audio downbeat timing is larger than `beat_sync_threshold`, consider the beats are not corresponding.
- `chroma_coefficient`: Coefficient value multiplied to multi-hot chord chroma.
- `top_k`: Reduces sampling to the k most likely tokens.
- `top_p`: Reduces sampling to tokens with cumulative probability of p. When set to `0` (default), top_k sampling is used.
- `temperature`: Controls the 'conservativeness' of the sampling process. Higher temperature means more diversity.
- `classifier_free_guidance`: Increases the influence of inputs on the output. Higher values produce lower-varience outputs that adhere more closely to inputs.
- `output_format`: str = Output format for generated audio. "wav", "mp3"
- `seed`: Seed for random number generator. If `None` or `-1`, a random seed will be used.

### Multi-Band Diffusion
- [Multi-Band Diffusion(MBD)](https://github.com/facebookresearch/audiocraft/blob/main/docs/MBD.md) is used for decoding the EnCodec tokens.
- If the tokens are decoded with MBD, than the output audio quality is better.
- Using MBD takes more calculation time, since it has its own prediction sequence.
---
## References
- Chord recognition from audio file is performed using [BTC](https://github.com/jayg996/BTC-ISMIR19) model, by [Jonggwon Park](https://github.com/jayg996).
- Paper : [A Bi-Directional Transformer for Musical Chord Recognition](https://arxiv.org/abs/1907.02698)
- Vocal dropping is implemented using Meta's [`demucs`](https://github.com/facebookresearch/demucs).
- Downbeat tracking and BPM retrieval is perfromed using [All-In-One Music Structure Analyzer](https://github.com/mir-aidj/all-in-one#all-in-one-music-structure-analyzer) by [Taejun Kim](https://github.com/mir-aidj).
- Beat-syncing is performed with [PyTSMod](https://github.com/KAIST-MACLab/PyTSMod) by [MAC Lab @KAIST](https://github.com/KAIST-MACLab)
## Licenses
- All code in this repository is licensed under the [Apache License 2.0 license](https://github.com/sakemin/cog-musicgen-remixer/blob/main/LICENSE).
- The weights in [this repository](https://github.com/sakemin/cog-musicgen-remixer) repository are released under the CC-BY-NC 4.0 license as found in the [LICENSE_weights file](https://github.com/sakemin/cog-musicgen-remixer/blob/main/LICENSE_weights).
- The code in the [Audiocraft](https://github.com/facebookresearch/audiocraft) repository is released under the MIT license (see [LICENSE file](https://github.com/facebookresearch/audiocraft/blob/main/LICENSE)).
- The weights in the [Audiocraft](https://github.com/facebookresearch/audiocraft) repository are released under the CC-BY-NC 4.0 license (see [LICENSE_weights file](https://github.com/facebookresearch/audiocraft/blob/main/LICENSE_weights)).

0 comments on commit 011220e

Please sign in to comment.