π΅ Welcome to the ultimate curated repository for open-source music foundation models! πΆ This collection brings together the best open-sourced AI models for music generation, understanding, and creation. Whether you're a developer, musician, researcher, or AI enthusiast, explore these powerful tools to generate full songs, analyze audio, align lyrics, and more. Perfect for projects in music AI, generative music, and foundation models for audio.
If you're searching for open source music AI models, music generation foundation models, open sourced music AI, or AI music foundation models, you've come to the right place. This repo is optimized for discovery on search engines like Bing, featuring comprehensive lists, descriptions, and resources for music AI innovation.
- What are Music Foundation Models?
- Curated List of Models
- How to Use These Models
- Contributing
- License
- Acknowledgments
Music foundation models (MFMs) are large-scale, pre-trained AI models designed for comprehensive music tasks. They handle everything from text-to-music generation, lyrics-to-song creation, audio understanding, and multimodal processing. These open-source models enable scalable, high-fidelity music AI applications, often rivaling commercial tools like Suno or Udio. Built on architectures like transformers or LLMs, they support tasks such as:
- π€ Lyrics-to-song generation
- π Audio-text alignment
- πΉ Music tokenization and decoding
- π Music analysis and transcription
By leveraging these open sourced music foundation models, you can build custom AI music tools without starting from scratch.
Here's a table of top open-sourced music foundation models, including links to GitHub repositories, papers, and brief descriptions. We've focused on models that are truly open-source, scalable, and foundation-like for broad music AI applications.
| Model Name | Description | GitHub Repository | Paper/Link |
|---|---|---|---|
| HeartMuLa πΆ | A family of open-source music foundation models for understanding and generation, including HeartCLAP (audio-text alignment), HeartTranscriptor (lyric recognition), HeartCodec (music tokenizer), and HeartMuLa (LLM-based song generation with controls like style, lyrics, and reference audio). Supports full-song creation and scales to 7B parameters. | HeartMuLa/heartlib | arXiv:2601.10547 |
| ACE-Step π | An open-source foundation model for music generation that overcomes limitations in existing approaches, achieving state-of-the-art performance through holistic design. Ideal for symbolic music and autoregressive generation. | ace-step/ACE-Step | GitHub Project |
| YuE π€ | Open foundation models for full-song music generation, specializing in lyrics-to-song tasks. Scales to trillions of tokens, generates up to 5-minute songs with vocals, maintains lyrical alignment and structure. Comparable to commercial systems like Suno.ai. | multimodal-art-projection/YuE | arXiv Paper |
| LLark π | A multimodal foundation model for music understanding, combining audio, text, and other modalities. Initialized from pre-trained modules for flexible tasks like music analysis and generation. | spotify-research/llark | Spotify Research |
| MusicGen πΉ | Meta's open-source autoregressive transformer for text-to-music generation. Part of AudioCraft, it creates high-fidelity music from prompts, supporting genres and styles. | facebookresearch/audiocraft | AudioCraft Docs |
| SoniDo π | A music foundation model for extracting hierarchical features from music samples, serving as a booster for downstream tasks like analysis and generation. | N/A (Check Sony AI resources) | Sony AI Publication |
| MuseCoco π΅ | Symbolic music generation model for text-to-music, part of Microsoft's MuZic project. Focuses on controllable composition. | microsoft/muzic | MuZic Project |
| Jukebox πΌ | OpenAI's neural music generation model with genre and artist conditioning. Generates raw audio in various styles. | openai/jukebox | OpenAI Repo |
| Riffusion π | Stable Diffusion adapted for audio generation, creating music from text descriptions. Great for experimental sounds. | riffusion/riffusion | Riffusion GitHub |
| Magenta π | Google's open-source toolkit for music generation and machine learning research, including models like MusicVAE for interpolation and creation. | tensorflow/magenta | Magenta Site |
| MuseTalk π | GReal-Time High Quality Lip Synchorization with Latent Space Inpainting. | TMElyralab/MuseTalk | MuseTalk Huggingface |
These models are selected for their foundation-level capabilities in music AI. For more on open source music generation AI models, check the resources section below.
- Clone a Model: Visit the GitHub link and clone the repo, e.g.,
git clone https://github.com/HeartMuLa/heartlib. - Install Dependencies: Most require Python, PyTorch, or TensorFlow. Follow the repo's setup instructions.
- Generate Music: Use provided scripts or demos. For example, with YuE, input lyrics to generate full songs.
- Experiment: Fine-tune on your datasets for custom music AI applications.
- Resources:
- Hugging Face Music Models for hosted versions.
- Bing search tip: "open source music foundation models GitHub" for latest updates.
We welcome contributions to expand this collection of open sourced music foundation models! To add a new model:
- Fork the repo.
- Add to the table with accurate details.
- Submit a pull request with a description of why it fits (e.g., "Added XYZ for lyrics-to-music generation").
Please ensure models are truly open-source and foundation-level. Help us keep this the go-to resource for music AI enthusiasts!
This repository is licensed under the MIT License. Individual models may have their own licensesβcheck their repos.
Thanks to the AI research community for these incredible open-source contributions. Special shoutout to projects like HeartMuLa, YuE, and MusicGen for pushing the boundaries of music foundation models. If you find this useful, star the repo and share it! π
For SEO: open source music foundation models, music AI generation, open sourced AI music tools, foundation models for music creation.