ai
Foundational Models for State-of-the-Art Speech and Text Translation
Deezer source separation library including pretrained models.
Wunjo CE: Face Swap, Lip Sync, Control Remove Objects & Text & Background, Restyling, Audio Separator, Clone Voice, Video Generation. Open Source, Local & Free.
GeneFace: Generalized and High-Fidelity 3D Talking Face Synthesis; ICLR 2023; Official code
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time
Chat with any character you like: ChatGLM2+SadTalker+Voice Cloning | 和喜欢的角色沉浸式对话吧:ChatGLM2+声音克隆+视频对话
Easily train a good VC model with voice data <= 10 mins!
GUI for a Vocal Remover that uses Deep Neural Networks.
[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
SoftVC VITS Singing Voice Conversion
This repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion
High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
Faster Whisper transcription with CTranslate2
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
vits2 backbone with multilingual-bert
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translatio…
FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, le…
LLM API 管理 & 分发系统,支持 OpenAI、Azure、Anthropic Claude、Google Gemini、DeepSeek、字节豆包、ChatGLM、文心一言、讯飞星火、通义千问、360 智脑、腾讯混元等主流模型,统一 API 适配,可用于 key 管理与二次分发。单可执行文件,提供 Docker 镜像,一键部署,开箱即用。LLM API management & k…
Cross browser audio/video/screen recording. It supports Chrome, Firefox, Opera and Microsoft Edge. It even works on Android browsers. It follows latest MediaRecorder API standards and provides simi…
Easy to use stem (e.g. instrumental/vocals) separation from CLI or as a python package, using a variety of amazing pre-trained models (primarily from UVR)
通过与OpenAI兼容的统一方式调用国内外各种大语言模型和Agent编排工具API的轻量级开源Python工具包。
Vocal Remover using Deep Neural Networks
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Multimodal Live Streaming on Your Phone


