Skip to content

suyuchenxm/MagicMusicMachine

Repository files navigation

MagicMusicMachine

Overview

MagicMusicMachine is a demonstration of multimodal music generation technology. This application combines several cutting-edge models to transform text or images into music. It utilizes:

  • Text-to-Music Generation: Powered by AudioCraft from Meta, this transformer-based model generates music from text prompts.
  • Music Continuation: Powered by MusicGen from Meta.
  • Image-to-Text Conversion: Converts images into descriptive text suitable for music generation. Powered by Vision API from ChatGPT 4 from OpenAI.
  • Music Transcription: Transcribed generated music audio clips into Midi file
    for musicians. Powered by Spotify Basic-Pitch.

Additionally, this app is hosted on Hugging Face Spaces, allowing users to easily interact with the model online.

Getting Started Locally

Prerequisites

Ensure you have git, python3, and pip installed on your computer.

Installation

  1. Clone the repository:

    git clone git@github.com:suyuchenxm/MagicMusicMachine.git
  2. Navigate into the project directory:

    cd MagicMusicMachine
  3. Create and activate a virtual environment:

    python3 -m venv venv
    source venv/bin/activate
    pip install -r requirements.txt

Running the app

  1. To start the application, run:

    python app.py
  2. For interactive rendering with changes, use:

    gradio run app.py

The app will be accessible at localhost:7860.

Usage Google Colab For an interactive demo, check out the colab notebook.

Hugging Face Spaces The application is available on Hugging Face Spaces. The free CPU instance can run smaller models, but for larger models, cloning to your space and using an A100 GPU instance is recommended.

Usage

Google Colab

For an interactive demo, check out the colab notebook

HuggingFace Space

The application is available on Hugging Face space. The free CPU instance can run smaller models, but for larger models, cloning to your space and using an A100 GPU instance is recommended.

Examples

  1. Text-to-Music

Select a model for text-to-music generation. After entering your prompts, you can choose the music style or adjust the generation settings, such as temperature and sampling methods.

alt text

  1. Music Continuation

As a piano performer, I don't have the knowledge to compose or improvise a structured piece of music. That's why music continuation is a dream for musicians like me -- it allows you to complete your input audio or motif seamlessly.

alt text

My improvisation recording:

melody_continuation_input.mp4

Output result

melody_continuation_ouput_sample.mp4
  1. Image to Music

Magic Music Machine lets you generate music from an image input. It uses ChatGPT to describe the image and create a text prompt for MusicGen. You can also provide a melody as a conditioning input, allowing the generated music to be based on your melody.

In addition to music generation, AudioCraft offers models to generate sound effects.

alt text

  1. Audio Transcription

Magic Music Machine is designed to enable musicians, music learners, and performers to play music that is generated or co-created by AI. To make this possible, I used Basic Pitch from Spotify to transcribe the audio into MIDI, which can then be rendered as a score using software like MuseScore.

alt text

Generated Music:

image-to-music-output-sample.mp4

Music Piano Score:

alt text

About

A demo on multimodal music generation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors