Skip to content
View ZZfive's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report ZZfive

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

audio

83 repositories

End-to-End Speech Processing Toolkit

Python 9,743 2,384 Updated Feb 25, 2026

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Python 20,301 2,153 Updated Feb 22, 2026

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 44,618 5,977 Updated Aug 16, 2024

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 55,259 6,032 Updated Feb 9, 2026

WhisperFusion builds upon the capabilities of WhisperLive and WhisperSpeech to provide a seamless conversations with an AI.

Python 1,643 128 Updated Jul 31, 2024

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.

Python 7,219 1,017 Updated Dec 24, 2024

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Jupyter Notebook 8,462 798 Updated Mar 15, 2025

Instant voice cloning by MIT and MyShell. Audio foundation model.

Python 36,007 4,024 Updated Apr 19, 2025

Create Music in Seconds with SunoAPI.

Python 1,748 281 Updated Apr 26, 2025

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing

Python 1,433 133 Updated Apr 24, 2024

SOTA Open Source TTS

Python 25,006 2,089 Updated Feb 2, 2026

[ACM MM 2024] This is the official code for "AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding"

Jupyter Notebook 1,604 142 Updated Aug 15, 2024

A generative speech model for daily dialogue.

Python 38,764 4,209 Updated Jan 18, 2026

一个简单的本地网页界面,使用ChatTTS将文字合成为语音,同时支持对外提供API接口。A simple native web interface that uses ChatTTS to synthesize text into speech, along with support for external API interfaces.

Python 7,520 907 Updated Dec 5, 2025

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Python 8,235 730 Updated Feb 24, 2026

[IJCV] FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝

Python 642 65 Updated Jul 26, 2024

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 19,698 2,225 Updated Feb 11, 2026

Multilingual Voice Understanding Model

Python 7,551 704 Updated Dec 30, 2025
Python 62 5 Updated Jun 15, 2025

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 2,058 161 Updated Apr 21, 2025

Official PyTorch implementation of BigVGAN (ICLR 2023)

Python 1,187 142 Updated Sep 5, 2024

Easily train a good VC model with voice data <= 10 mins!

Python 34,549 4,907 Updated Nov 24, 2024

Landing Page for All Things Source Separation

36 1 Updated Sep 12, 2025

Muzic: Music Understanding and Generation with Artificial Intelligence

Python 4,905 494 Updated Oct 12, 2024

[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling

Python 1,270 110 Updated Mar 2, 2025

An Open-Sourced LLM-empowered Foundation TTS System

Python 902 83 Updated Sep 28, 2025

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 3,125 223 Updated May 19, 2025

Music repair method to convert lossy MP3 compressed music to lossless music.

Python 357 34 Updated Aug 12, 2025

High-quality Text-to-Audio Generation with Efficient Diffusion Transformer

Python 328 25 Updated Dec 17, 2025