Merge branch 'master' of https://github.com/sakemin/cog-musicgen-remixer

sakemin · Nov 2, 2023 · 011220e · 011220e
2 parents fde0947 + eba100f
commit 011220e
Showing 1 changed file with 118 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -0,0 +1,118 @@
+# Cog Implementation of MusicGen-Remixer
+[![Replicate](https://replicate.com/sakemin/musicgen-chord/badge)](https://replicate.com/sakemin/musicgen-remixer) 
+
+MusicGen Remixer is an app based on [MusicGen Chord](https://github.com/sakemin/cog-musicgen-chord), the modified version of Meta's [MusicGen](https://github.com/facebookresearch/audiocraft) Melody model, which can generate music based on audio-based chord conditions or text-based chord conditions.
+
+You can demo this model or learn how to use it with Replicate's API [here](https://replicate.com/sakemin/musicgen-remixer). 
+
+# Run with Cog
+
+[Cog](https://github.com/replicate/cog) is an open-source tool that packages machine learning models in a standard, production-ready container. 
+You can deploy your packaged model to your own infrastructure, or to [Replicate](https://replicate.com/), where users can interact with it via web interface or API.
+
+## Prerequisites 
+
+**Cog.** Follow these [instructions](https://github.com/replicate/cog#install) to install Cog, or just run: 
+
+```
+sudo curl -o /usr/local/bin/cog -L "https://github.com/replicate/cog/releases/latest/download/cog_$(uname -s)_$(uname -m)"
+sudo chmod +x /usr/local/bin/cog
+```
+
+Note, to use Cog, you'll also need an installation of [Docker](https://docs.docker.com/get-docker/).
+
+* **GPU machine.** You'll need a Linux machine with an NVIDIA GPU attached and the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker) installed. If you don't already have access to a machine with a GPU, check out our [guide to getting a 
+GPU machine](https://replicate.com/docs/guides/get-a-gpu-machine).
+
+## Step 1. Clone this repository
+
+```sh
+git clone https://github.com/sakemin/cog-musicgen-chord
+```
+
+## Step 2. Run the model
+
+To run the model, you need a local copy of the model's Docker image. You can satisfy this requirement by specifying the image ID in your call to `predict` like:
+
+```
+cog predict r8.im/sakemin/musicgen-remixer@sha256:d7e98a2e92eaa33c4e1d43588fb4b37a9766b3ba2df634295218d165618dc733 -i prompt="bossa nova" -i music_input=@/your/path/to/input/music.wav
+```
+
+For more information, see the Cog section [here](https://replicate.com/sakemin/musicgen-remixer/api#run)
+
+Alternatively, you can build the image yourself, either by running `cog build` or by letting `cog predict` trigger the build process implicitly. For example, the following will trigger the build process and then execute prediction: 
+
+```
+cog predict -i prompt="bossa nova" -i music_input=@/your/path/to/input/music.wav
+```
+
+Note, the first time you run `cog predict`, model weights and other requisite assets will be downloaded if they're not available locally. This download only needs to be executed once.
+
+# Run on replicate
+
+## Step 1. Ensure that all assets are available locally
+
+If you haven't already, you should ensure that your model runs locally with `cog predict`. This will guarantee that all assets are accessible. E.g., run: 
+
+```
+cog predict -i prompt="bossa nova" -i music_input=@/your/path/to/input/music.wav
+```
+
+## Step 2. Create a model on Replicate.
+
+Go to [replicate.com/create](https://replicate.com/create) to create a Replicate model. If you want to keep the model private, make sure to specify "private".
+
+## Step 3. Configure the model's hardware
+
+Replicate supports running models on variety of CPU and GPU configurations. For the best performance, you'll want to run this model on an A100 instance.
+
+Click on the "Settings" tab on your model page, scroll down to "GPU hardware", and select "A100". Then click "Save".
+
+## Step 4: Push the model to Replicate
+
+
+Log in to Replicate:
+
+```
+cog login
+```
+
+Push the contents of your current directory to Replicate, using the model name you specified in step 1:
+
+```
+cog push r8.im/username/modelname
+```
+[Learn more about pushing models to Replicate.](https://replicate.com/docs/guides/push-a-model)
+
+---
+# Prediction
+## Prediction Parameters
+- `prompt`: A description of the music you want to generate.
+- `music_input`: An audio file input for the remix.
+- `multi_band_diffusion`: If `True`, the EnCodec tokens will be decoded with MultiBand Diffusion.
+- `normalization_strategy`: Strategy for normalizing audio.
+- `beat_sync_threshold`: When beat syncing, if the gap between generated downbeat timing and input audio downbeat timing is larger than `beat_sync_threshold`, consider the beats are not corresponding.
+- `chroma_coefficient`: Coefficient value multiplied to multi-hot chord chroma.
+- `top_k`: Reduces sampling to the k most likely tokens.
+- `top_p`: Reduces sampling to tokens with cumulative probability of p. When set to  `0` (default), top_k sampling is used.
+- `temperature`: Controls the 'conservativeness' of the sampling process. Higher temperature means more diversity.
+- `classifier_free_guidance`: Increases the influence of inputs on the output. Higher values produce lower-varience outputs that adhere more closely to inputs.
+- `output_format`: str = Output format for generated audio. "wav", "mp3"
+- `seed`: Seed for random number generator. If `None` or `-1`, a random seed will be used.
+
+### Multi-Band Diffusion
+- [Multi-Band Diffusion(MBD)](https://github.com/facebookresearch/audiocraft/blob/main/docs/MBD.md) is used for decoding the EnCodec tokens.
+- If the tokens are decoded with MBD, than the output audio quality is better.
+- Using MBD takes more calculation time, since it has its own prediction sequence.
+---
+## References
+- Chord recognition from audio file is performed using [BTC](https://github.com/jayg996/BTC-ISMIR19) model, by [Jonggwon Park](https://github.com/jayg996).
+	-  Paper : [A Bi-Directional Transformer for Musical Chord Recognition](https://arxiv.org/abs/1907.02698)
+- Vocal dropping is implemented using Meta's [`demucs`](https://github.com/facebookresearch/demucs).
+- Downbeat tracking and BPM retrieval is perfromed using [All-In-One Music Structure Analyzer](https://github.com/mir-aidj/all-in-one#all-in-one-music-structure-analyzer) by [Taejun Kim](https://github.com/mir-aidj).
+- Beat-syncing is performed with [PyTSMod](https://github.com/KAIST-MACLab/PyTSMod) by [MAC Lab @KAIST](https://github.com/KAIST-MACLab)
+## Licenses
+- All code in this repository is licensed under the [Apache License 2.0 license](https://github.com/sakemin/cog-musicgen-remixer/blob/main/LICENSE).
+- The weights in [this repository](https://github.com/sakemin/cog-musicgen-remixer) repository are released under the CC-BY-NC 4.0 license as found in the [LICENSE_weights file](https://github.com/sakemin/cog-musicgen-remixer/blob/main/LICENSE_weights).
+- The code in the [Audiocraft](https://github.com/facebookresearch/audiocraft) repository is released under the MIT license (see [LICENSE file](https://github.com/facebookresearch/audiocraft/blob/main/LICENSE)).
+- The weights in the [Audiocraft](https://github.com/facebookresearch/audiocraft) repository are released under the CC-BY-NC 4.0 license (see [LICENSE_weights file](https://github.com/facebookresearch/audiocraft/blob/main/LICENSE_weights)).