Here we will keep track of the latest AI Game Development Tools, including LLM, Agent, Code, Writer, Image, Texture, Shader, 3D Model, Animation, Video, Audio, Music, Singing Voice and Analytics. 🔥
- Tool (AI LLM)
- Game (Agent)
- Code
- Writer
- Image
- Texture
- Shader
- 3D Model
- Avatar
- Animation
- Visual
- Video
- Audio
- Music
- Singing Voice
- Speech
- Analytics
| Source | Description | Paper | Game Engine | Type |
|---|---|---|---|---|
| AgentGPT | 🤖 Assemble, configure, and deploy autonomous AI Agents in your browser. | Tool | ||
| AICommand | ChatGPT integration with Unity Editor. | Unity | Tool | |
| AIOS | LLM Agent Operating System. | Tool | ||
| AI Scientist | The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery. | arXiv | Tool | |
| Assistant CLI | A comfortable CLI tool to use ChatGPT service🔥 | Tool | ||
| Auto-GPT | An experimental open-source attempt to make GPT-4 fully autonomous. | Tool | ||
| BabyAGI | This Python script is an example of an AI-powered task management system. | Tool | ||
| 👶🤖🖥️ BabyAGI UI | BabyAGI UI is designed to make it easier to run and develop with babyagi in a web app, like a ChatGPT. | Tool | ||
| baichuan-7B | A large-scale 7B pretraining language model developed by Baichuan. | Tool | ||
| Baichuan-13B | A 13B large language model developed by Baichuan Intelligent Technology. | Tool | ||
| Baichuan 2 | A series of large language models developed by Baichuan Intelligent Technology. | Tool | ||
| Bisheng | Bisheng is an open LLM devops platform for next generation AI applications. | Tool | ||
| Character-LLM | A Trainable Agent for Role-Playing. | arXiv | Tool | |
| ChatDev | Communicative Agents for Software Development. | arXiv | Tool | |
| ChatGPT-API-unity | Binds ChatGPT chat completion API to pure C# on Unity. | Unity | Tool | |
| ChatGPTForUnity | ChatGPT for unity. | Unity | Tool | |
| ChatRWKV | ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source. | Tool | ||
| ChatYuan | Large Language Model for Dialogue in Chinese and English. | Tool | ||
| Chinese-LLaMA-Alpaca-3 | (Chinese Llama-3 LLMs) developed from Meta Llama 3. | Tool | ||
| Chrome-GPT | An AutoGPT agent that controls Chrome on your desktop. | Tool | ||
| CogVLM | CogVLM, a powerful open-source visual language foundation model. | arXiv | Tool | |
| CoreNet | A library for training deep neural networks. | Tool | ||
| Cosmos | Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. | LLM | ||
| DBRX | DBRX is a large language model trained by Databricks. | Tool | ||
| DCLM | DataComp for Language Models. | arXiv | Tool | |
| DeepSeek-V3 | DeepSeek-V3 is a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. | arXiv | LLM | |
| DemoGPT | Auto Gen-AI App Generator with the Power of Llama 2 | Tool | ||
| Design2Code | Automating Front-End Engineering | Tool | ||
| Devika | Devika is an Agentic AI Software Engineer. | Tool | ||
| Devon | An open-source pair programmer. | Tool | ||
| Dora | Generating powerful websites, one prompt at a time. | Tool | ||
| Flowise | Drag & drop UI to build your customized LLM flow using LangchainJS. | Tool | ||
| Gemini | Gemini is built from the ground up for multimodality — reasoning seamlessly across text, images, video, audio, and code. | Tool | ||
| Gemma | Gemma is a family of lightweight, state-of-the art open models built from research and technology used to create Google Gemini models. | Tool | ||
| gemma.cpp | lightweight, standalone C++ inference engine for Google's Gemma models. | Tool | ||
| GLM-4 | GLM-4-9B is the open-source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. | Tool | ||
| GPT4All | A chatbot trained on a massive collection of clean assistant data including code, stories and dialogue. | Tool | ||
| GPT-4o | GPT-4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs. | Tool | ||
| GPTScript | Develop LLM Apps in Natural Language. | Tool | ||
| Grok-1 | The weights and architecture of our 314 billion parameter Mixture-of-Experts model, Grok-1. | Tool | ||
| HuggingChat | Making the community's best AI chat models available to everyone. | Tool | ||
| Hugging Face API Unity Integration | This Unity package provides an easy-to-use integration for the Hugging Face Inference API, allowing developers to access and use Hugging Face AI models within their Unity projects. | Unity | Tool | |
| ImageBind | ImageBind One Embedding Space to Bind Them All. | arXiv | Tool | |
| Index-1.9B | A SOTA lightweight multilingual LLM. | Tool | ||
| InteractML-Unity | InteractML, an Interactive Machine Learning Visual Scripting framework for Unity3D. | Unity | Tool | |
| InteractML-Unreal Engine | Bringing Machine Learning to Unreal Engine. | Unreal Engine | Tool | |
| InternLM | InternLM has open-sourced a 7 billion parameter base model, a chat model tailored for practical scenarios and the training system. | arXiv | Tool | |
| InternLM-XComposer | InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension. | arXiv | Tool | |
| Jan | Bring AI to your Desktop. | Tool | ||
| Lamini | Lamini allows any engineering team to outperform general purpose LLMs through RLHF and fine- tuning on their own data. | Tool | ||
| LaMini-LM | LaMini-LM is a collection of small-sized, efficient language models distilled from ChatGPT and trained on a large-scale dataset of 2.58M instructions. | Tool | ||
| LangChain | LangChain is a framework for developing applications powered by language models. | Tool | ||
| LangFlow | ⛓️ LangFlow is a UI for LangChain, designed with react-flow to provide an effortless way to experiment and prototype flows. | Tool | ||
| LaVague | Automate automation with Large Action Model framework. | Tool | ||
| Lemur | Open Foundation Models for Language Agents. | Tool | ||
| Lepton AI | A Pythonic framework to simplify AI service building. | Tool | ||
| Lit-LLaMA | Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. | Tool | ||
| llama2-webui | Run Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac). | Tool | ||
| Llama 3 | The official Meta Llama 3 GitHub site. | Tool | ||
| Llama 3.1 | Llama is an accessible, open large language model (LLM) designed for developers, researchers, and businesses to build, experiment, and responsibly scale their generative AI ideas. | Tool | ||
| LLaSM | Large Language and Speech Model. | Tool | ||
| LLM Answer Engine | Build a Perplexity-Inspired Answer Engine Using Next.js, Groq, Mixtral, Langchain, OpenAI, Brave & Serper. | Tool | ||
| llm.c | LLM training in simple, raw C/CUDA. | Tool | ||
| LLMUnity | Create characters in Unity with LLMs! | Unity | Tool | |
| LLocalSearch | LLocalSearch is a completely locally running search engine using LLM Agents. | Tool | ||
| LogicGamesSolver | A Python tool to solve logic games with AI, Deep Learning and Computer Vision. | Tool | ||
| LongWriter | LongWriter: Unleashing 10,000+ Word Generation From Long Context LLMs. | arXiv | Tool | |
| Large World Model (LWM) | Large World Model (LWM) is a general-purpose large-context multimodal autoregressive model. | arXiv | Tool | |
| Lumina-T2X | Lumina-T2X is a unified framework for Text to Any Modality Generation. | arXiv | Tool | |
| MetaGPT | The Multi-Agent Framework | Tool | ||
| MiniCPM-2B | An end-side LLM outperforms Llama2-13B. | Tool | ||
| MiniGPT-4 | Enhancing Vision-language Understanding with Advanced Large Language Models. | arXiv | Tool | |
| MiniGPT-5 | Interleaved Vision-and-Language Generation via Generative Vokens. | arXiv | Tool | |
| Mixtral 8x7B | A high quality Sparse Mixture-of-Experts. | arXiv | Tool | |
| Mistral 7B | The best 7B model to date, Apache 2.0. | Tool | ||
| Mistral Large | Mistral Large is a new cutting-edge text generation model. It reaches top-tier reasoning capabilities. | Tool | ||
| MLC LLM | Enable everyone to develop, optimize and deploy AI models natively on everyone's devices. | Tool | ||
| MobiLlama | Towards Accurate and Lightweight Fully Transparent GPT. | arXiv | Tool | |
| MoE-LLaVA | Mixture of Experts for Large Vision-Language Models. | arXiv | Tool | |
| Moshi | Moshi is an experimental conversational AI. | Tool | ||
| Moshi | Moshi: a speech-text foundation model for real time dialogue. | Tool | ||
| MOSS | An open-source tool-augmented conversational language model from Fudan University. | Tool | ||
| mPLUG-Owl🦉 | Modularization Empowers Large Language Models with Multimodality. | arXiv | Tool | |
| Nemotron-4 | A 15-billion-parameter large multilingual language model trained on 8 trillion text tokens. | arXiv | Tool | |
| NExT-GPT | Any-to-Any Multimodal Large Language Model. | Tool | ||
| OLMo | Open Language Model | arXiv | Tool | |
| OmniLMM | Large multi-modal models for strong performance and efficient deployment. | Tool | ||
| OneLLM | One Framework to Align All Modalities with Language. | arXiv | Tool | |
| Open-Assistant | OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so. | Tool | ||
| OpenDevin | An autonomous AI software engineer. | Tool | ||
| Orion-14B | Orion-14B is a family of models includes a 14B foundation LLM, and a series of models. | arXiv | Tool | |
| Panda | Overseas Chinese open source large language model, based on Llama-7B, -13B, -33B, -65B for continuous pre-training in the Chinese field. | Tool | ||
| Perplexica | An AI-powered search engine. | Tool | ||
| Pi | AI chatbot designed for personal assistance and emotional support. | Tool | ||
| Qwen1.5 | Qwen1.5 is the improved version of Qwen. | Tool | ||
| Qwen2 | Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud. | Tool | ||
| Qwen-7B | The official repo of Qwen-7B (通义千问-7B) chat & pretrained large language model proposed by Alibaba Cloud. | Tool | ||
| RepoAgent | RepoAgent is an Open-Source project driven by Large Language Models(LLMs) that aims to provide an intelligent way to document projects. | arXiv | Tool | |
| Sanity AI Engine | Sanity AI Engine for the Unity Game Development Tool. | Unity | Tool | |
| SearchGPT | 🌳 Connecting ChatGPT with the Internet | Tool | ||
| ShareGPT4V | Improving Large Multi-Modal Models with Better Captions. | Tool | ||
| Skywork | Skywork series models are pre-trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data. | Tool | ||
| StableLM | Stability AI Language Models. | arXiv | Tool | |
| Stanford Alpaca | An Instruction-following LLaMA Model. | Tool | ||
| Text generation web UI | A gradio web UI for running Large Language Models like LLaMA, llama.cpp, GPT-J, OPT, and GALACTICA. | Tool | ||
| TinyChatEngine | On-Device LLM Inference Library. | Tool | ||
| ToolBench | An open platform for training, serving, and evaluating large language model for tool learning. | Tool | ||
| Unity ChatGPT | Unity ChatGPT Experiments. | Unity | Tool | |
| Unity OpenAI-API Integration | Integrate openai GPT-3 language model and ChatGPT API into a Unity project. | Unity | Tool | |
| Unreal Engine 5 Llama LoRA | A proof-of-concept project that showcases the potential for using small, locally trainable LLMs to create next-generation documentation tools. | Unreal Engine | Tool | |
| UnrealGPT | A collection of Unreal Engine 5 Editor Utility widgets powered by GPT3/4. | Unreal Engine | Tool | |
| Video-LLaVA | Learning United Visual Representation by Alignment Before Projection. | arXiv | Tool | |
| WebGPT | Run GPT model on the browser with WebGPU. | Tool | ||
| Web3-GPT | Deploy smart contracts with AI | Tool | ||
| WordGPT | 🤖 Bring the power of ChatGPT to Microsoft Word | Tool | ||
| XAgent | An Autonomous LLM Agent for Complex Task Solving. | Tool | ||
| Yi | A series of large language models trained from scratch by developers. | Tool | ||
| 01 Project | The open-source language model computer. | Tool |
| Source | Description | Paper | Game Engine | Type |
|---|---|---|---|---|
| AgentBench | A Comprehensive Benchmark to Evaluate LLMs as Agents. | arXiv | Agent | |
| Agent Group Chat | An Interactive Group Chat Simulacra For Better Eliciting Collective Emergent Behavior. | arXiv | Agent | |
| Agent K | An autoagentic AGI that is self-evolving and modular. | Agent | ||
| Agent Laboratory | Agent Laboratory: Using LLM Agents as Research Assistants. | arXiv | Agent | |
| AgentScope | Start building LLM-empowered multi-agent applications in an easier way. | arXiv | Agent | |
| AgentSims | An Open-Source Sandbox for Large Language Model Evaluation. | Agent | ||
| AI Town | AI Town is a virtual town where AI characters live, chat and socialize. | Agent | ||
| anime.gf | Local & Open Source Alternative to CharacterAI. | Game | ||
| Astrocade | Create games with AI | Game | ||
| Atomic Agents | The Atomic Agents framework is designed to be modular, extensible, and easy to use. | Agent | ||
| AutoAgents | A Framework for Automatic Agent Generation. | Agent | ||
| AutoGen | Enable Next-Gen Large Language Model Applications. | arXiv | Agent | |
| behaviac | Behaviac is a framework of the game AI development. | Framework | ||
| Biomes | Biomes is an open source sandbox MMORPG built for the web using web technologies such as Next.js, Typescript, React and WebAssembly. | Game | ||
| Buffer of Thoughts | Thought-Augmented Reasoning with Large Language Models. | arXiv | Agent | |
| Byzer-Agent | Easy, fast, and distributed agent framework for everyone. | Agent | ||
| Cat Town | A C(h)atGPT-powered simulation with cats. | Agent | ||
| Cat Town | A C(h)atGPT-powered simulation with cats. | Agent | ||
| CharacterGLM | Customizing Chinese Conversational AI Characters with Large Language Models. | arXiv | Agent | |
| ChatDev | Communicative Agents for Software Development. | arXiv | Agent | |
| CogAgent | CogAgent is an open-source visual language model improved based on CogVLM. | arXiv | Agent | |
| Cradle | Towards General Computer Control. | Agent | ||
| crewAI | Framework for orchestrating role-playing, autonomous AI agents. | Agent | ||
| Dify | Dify is an open-source LLM app building platform. | Agent | ||
| Digital Life Project | Autonomous 3D Characters with Social Intelligence. | arXiv | Agent | |
| everything-ai | Your fully proficient, AI-powered and local chatbot assistant🤖. | Agent | ||
| fabric | fabric is an open-source framework for augmenting humans using AI. | Agent | ||
| FastGPT | FastGPT is a knowledge-based platform built on the LLM. | Agent | ||
| fastRAG | Efficient Retrieval Augmentation and Generation Framework. | Agent | ||
| GameAISDK | Image-based game AI automation framework. | Framework | ||
| GameNGen | Diffusion Models Are Real-Time Game Engines. | arXiv | Game | |
| GameGen-O | GameGen-O: Open-world Video Game Generation. | Game | ||
| GenAgent | GenAgent: Build Collaborative AI Systems with Automated Workflow Generation - Case Studies on ComfyUI. | arXiv | Agent | |
| Generative Agents | Interactive Simulacra of Human Behavior. | arXiv | Agent | |
| Genesis | Genesis: A Generative and Universal Physics Engine for Robotics and Beyond. | Game | ||
| Genie | Generative Interactive Environments. | Game | ||
| gigax | Runtime, LLM-powered NPCs. | Game | ||
| HippoRAG | Neurobiologically Inspired Long-Term Memory for Large Language Models. | arXiv | Agent | |
| Interactive LLM Powered NPCs | Interactive LLM Powered NPCs, is an open-source project that completely transforms your interaction with non-player characters (NPCs) in any game! | Game | ||
| IoA | An open-source framework for collaborative AI agents, enabling diverse, distributed agents to team up and tackle complex tasks through internet-like connectivity. | Agent | ||
| KwaiAgents | A generalized information-seeking agent system with Large Language Models (LLMs). | arXiv | Agent | |
| LangChain | Get your LLM application from prototype to production. | Agent | ||
| Langflow | Langflow is a UI for LangChain, designed with react-flow to provide an effortless way to experiment and prototype flows. | Agent | ||
| LangGraph Studio | LangGraph Studio offers a new way to develop LLM applications by providing a specialized agent IDE that enables visualization, interaction, and debugging of complex agentic applications. | Agent | ||
| LARP | Language-Agent Role Play for open-world games. | arXiv | Agent | |
| LLama Agentic System | Agentic components of the Llama Stack APIs. | Agent | ||
| LlamaIndex | LlamaIndex is a data framework for your LLM application. | Agent | ||
| MindSearch | 🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT). | Agent | ||
| Mixture of Agents (MoA) | Mixture-of-Agents Enhances Large Language Model Capabilities. | arXiv | Agent | |
| MMRole | MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents. | arXiv | Agent | |
| Moonlander.ai | Start building 3D games without any coding using generative AI. | Framework | ||
| MuG Diffusion | MuG Diffusion is a charting AI for rhythm games based on Stable Diffusion (one of the most powerful AIGC models) with a large modification to incorporate audio waves. | Game | ||
| Oasis | Oasis is an interactive world model developed by Decart and Etched. Based on diffusion transformers, Oasis takes in user keyboard input and generates gameplay in an autoregressive manner. | Game | ||
| OmAgent | A multimodal agent framework for solving complex tasks. | Agent | ||
| OpenAgents | An Open Platform for Language Agents in the Wild. | Agent | ||
| Opus | An AI app that turns text into a video game. | Game | ||
| Pipecat | Open Source framework for voice and multimodal conversational AI. | Agent | ||
| Qwen-Agent | Qwen-Agent is a framework for developing LLM applications based on the instruction following, tool usage, planning, and memory capabilities of Qwen. | Agent | ||
| Ragas | Ragas is a framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines. | Agent | ||
| RPBench-Auto | An automated pipeline for evaluating LLMs for role-playing. | Game | ||
| SIMA | A generalist AI agent for 3D virtual environments. | Agent | ||
| StoryGames.ai | AI for Dreamers Make Games. | Game | ||
| SWE-agent | Agent Computer Interfaces Enable Software Engineering Language Models. | arXiv | Agent | |
| TaskGen | A Task-based agentic framework building on StrictJSON outputs by LLM agents. | Agent | ||
| TEN Agent | TEN Agent is the world’s first real-time multimodal agent integrated with the OpenAI Realtime API, RTC, and features weather checks, web search, vision, and RAG capabilities. | Agent | ||
| Translation Agent | Agentic translation using reflection workflow. | Agent | ||
| Twitter Personality is a web application that analyzes your Twitter handle to create a personalized personality profile using Wordware AI Agent. | Agent | |||
| Unbounded | Unbounded: A Generative Infinite Game of Character Life Simulation. | arXiv | Game | |
| Video2Game | Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video. | arXiv | Game | |
| V-IRL | Grounding Virtual Intelligence in Real Life. | arXiv | Agent | |
| WebDesignAgent | An agent used for webdesign. | Agent | ||
| XAgent | An Autonomous LLM Agent for Complex Task Solving. | Agent |
| Source | Description | Paper | Game Engine | Type |
|---|---|---|---|---|
| AI Code Translator | Use AI to translate code from one language to another. | Code | ||
| aiXcoder-7B | aiXcoder-7B Code Large Language Model. | Code | ||
| bloop | bloop is a fast code search engine written in Rust. | Code | ||
| Chapyter | ChatGPT Code Interpreter in Jupyter Notebooks. | Code | ||
| CodeGeeX | An Open Multilingual Code Generation Model. | arXiv | Code | |
| CodeGeeX2 | A More Powerful Multilingual Code Generation Model. | Code | ||
| CodeGeeX4 | CodeGeeX4: Open Multilingual Code Generation Model. | Code | ||
| CodeGen | CodeGen is an open-source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex. | arXiv | Code | |
| CodeGen2 | CodeGen2 models for program synthesis. | arXiv | Code | |
| Code Llama | Code Llama is a large language models for code based on Llama 2. | Code | ||
| CodeTF | One-stop Transformer Library for State-of-the-art Code LLM. | Code | ||
| CodeT5 | Open Code LLMs for Code Understanding and Generation. | Code | ||
| Cursor | Write, edit, and chat about your code with GPT-4 in a new type of editor. | Code | ||
| DeepSeek Coder | DeepSeek Coder: Let the Code Write Itself. | arXiv | Code | |
| OpenAI Codex | OpenAI Codex is a descendant of GPT-3. | Code | ||
| PandasAI | Pandas AI is a Python library that integrates generative artificial intelligence capabilities into Pandas, making dataframes conversational. | Code | ||
| RobloxScripterAI | RobloxScripterAI is an AI-powered code generation tool for Roblox. | Roblox | Code | |
| Scikit-LLM | Seamlessly integrate powerful language models like ChatGPT into scikit-learn for enhanced text analysis tasks. | Code | ||
| SoTaNa | The Open-Source Software Development Assistant. | arXiv | Code | |
| Stable Code 3B | Coding on the Edge. | Code | ||
| StarCoder | 💫 StarCoder is a language model (LM) trained on source code and natural language text. | arXiv | Code | |
| StarCoder 2 | StarCoder2 is a family of code generation models (3B, 7B, and 15B), trained on 600+ programming languages from The Stack v2 and some natural language text such as Wikipedia, Arxiv, and GitHub issues. | arXiv | Code | |
| UnityGen AI | UnityGen AI is an AI-powered code generation plugin for Unity. | Unity | Code | |
| Void | Void is an open source Cursor alternative. Write code with the best AI tools, retain full control over your data, and access powerful AI features. | Code |
| Source | Description | Paper | Game Engine | Type |
|---|---|---|---|---|
| AI-Writer | AI writes novels, generates fantasy and romance web articles, etc. Chinese pre-trained generative model. | Writer | ||
| Notebook.ai | Notebook.ai is a set of tools for writers, game designers, and roleplayers to create magnificent universes – and everything within them. | Writer | ||
| Novel | Notion-style WYSIWYG editor with AI-powered autocompletions. | Writer | ||
| NovelAI | Driven by AI, painlessly construct unique stories, thrilling tales, seductive romances, or just fool around. | Writer |
| Source | Description | Paper | Game Engine | Type |
|---|---|---|---|---|
| AnyDoor | Zero-shot Object-level Image Customization. | arXiv | Image | |
| AnyText | Multilingual Visual Text Generation And Editing. | arXiv | Image | |
| AutoStudio | Crafting Consistent Subjects in Multi-turn Interactive Image Generation. | arXiv | Image | |
| Blender-ControlNet | Using ControlNet right in Blender. | Blender | Image | |
| BriVL | Bridging Vision and Language Model. | arXiv | Image | |
| CatVTON | CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models. | arXiv | Image | |
| CLIPasso | A method for converting an image of an object to a sketch, allowing for varying levels of abstraction. | arXiv | Image | |
| ClipDrop | Create stunning visuals in seconds. | Image | ||
| ComfyUI | A powerful and modular stable diffusion GUI with a graph/nodes interface. | Image | ||
| ConceptLab | Creative Generation using Diffusion Prior Constraints. | arXiv | Image | |
| ControlNet | ControlNet is a neural network structure to control diffusion models by adding extra conditions. | arXiv | Image | |
| CSGO | CSGO: Content-Style Composition in Text-to-Image Generation. | arXiv | Image | |
| DALL·E 2 | DALL·E 2 is an AI system that can create realistic images and art from a description in natural language. | Image | ||
| Dashtoon Studio | Dashtoon Studio is an AI powered comic creation platform. | Comic | ||
| DeepAI | DeepAI offers a suite of tools that use AI to enhance your creativity. | Image | ||
| DeepFloyd IF | IF by DeepFloyd Lab at StabilityAI. | Image | ||
| Depth Anything V2 | Depth Anything V2 | arXiv | Image | |
| Depth map library and poser | Depth map library for use with the Control Net extension for Automatic1111/stable-diffusion-webui. | Image | ||
| Diffuse to Choose | Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All. | arXiv | Image | |
| Disco Diffusion | A frankensteinian amalgamation of notebooks, models and techniques for the generation of AI Art and Animations. | Image | ||
| DragGAN | Interactive Point-based Manipulation on the Generative Image Manifold. | arXiv | Image | |
| Draw Things | AI- assisted image generation in Your Pocket. | Image | ||
| DWPose | Effective Whole-body Pose Estimation with Two-stages Distillation. | arXiv | Image | |
| EasyPhoto | Your Smart AI Photo Generator. | Image | ||
| Flux | This repo contains minimal inference code to run text-to-image and image-to-image with our Flux latent rectified flow transformers. | Image | ||
| Follow-Your-Click | Open-domain Regional Image Animation via Short Prompts. | arXiv | Image | |
| Fooocus | Focus on prompting and generating. | Image | ||
| GIFfusion | Create GIFs and Videos using Stable Diffusion. | Image | ||
| Grounded-Segment-Anything | Automatically Detect , Segment and Generate Anything with Image, Text, and Audio Inputs. | arXiv | Image | |
| HivisionIDPhotos | HivisionIDPhotos: a lightweight and efficient AI ID photos tools. | Image | ||
| Hua | Hua is an AI image editor with Stable Diffusion (and more). | Image | ||
| Hunyuan-DiT | A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding. | arXiv | Image | |
| IC-Light | IC-Light is a project to manipulate the illumination of images. | Image | ||
| Ideogram | Helping people become more creative. | Image | ||
| Imagen | Imagen is an AI system that creates photorealistic images from input text. | Image | ||
| img2img-turbo | One-Step Image-to-Image with SD-Turbo. | Image | ||
| Img2Prompt | Get prompts from stable diffusion generated images. | Image | ||
| Infinity | Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis. | arXiv | Image | |
| InstantID | Zero-shot Identity-Preserving Generation in Seconds. | arXiv | Image | |
| InternLM-XComposer2 | InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension. | arXiv | Image | |
| KOALA | Self-Attention Matters in Knowledge Distillation of Latent Diffusion Models for Memory-Efficient and Fast Image Synthesis. | Image | ||
| Kolors | Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis. | Image | ||
| KREA | Generate images and videos with a delightful AI-powered design tool. | Image | ||
| LaVi-Bridge | Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation. | arXiv | Image | |
| LayerDiffusion | Transparent Image Layer Diffusion using Latent Transparency. | arXiv | Image | |
| Lexica | A Stable Diffusion prompts search engine. | Image | ||
| LlamaGen | Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation. | arXiv | Image | |
| Lumina-mGPT | Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining. | arXiv | Image | |
| MetaShoot | MetaShoot is a digital twin of a photo studio, developed as a plugin for Unreal Engine that gives any creator the ability to produce highly realistic renders in the easiest and quickest way. | Unreal Engine | Image | |
| Midjourney | Midjourney is an independent research lab exploring new mediums of thought and expanding the imaginative powers of the human species. | Image | ||
| MIGC | MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis. | arXiv | Image | |
| MimicBrush | Zero-shot Image Editing with Reference Imitation. | arXiv | Image | |
| OmniGen | OmniGen: Unified Image Generation. | arXiv | Image | |
| Omost | Omost is a project to convert LLM's coding capability to image generation (or more accurately, image composing) capability. | Image | ||
| Openpose Editor | Openpose Editor for AUTOMATIC1111's stable-diffusion-webui. | Image | ||
| Outfit Anyone | Ultra-high quality virtual try-on for Any Clothing and Any Person. | Image | ||
| PaintsUndo | PaintsUndo: A Base Model of Drawing Behaviors in Digital Paintings. | Image | ||
| PhotoMaker | Customizing Realistic Human Photos via Stacked ID Embedding. | arXiv | Image | |
| Photoroom | AI Background Generator. | Image | ||
| Plask | AI image generation in the cloud. | Image | ||
| Prompt.Art | The Generators Hub. | Image | ||
| PuLID | Pure and Lightning ID Customization via Contrastive Alignment. | arXiv | Image | |
| Rich-Text-to-Image | Expressive Text-to-Image Generation with Rich Text. | arXiv | Image | |
| RPG-DiffusionMaster | Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG). | Image | ||
| SEED-Story | SEED-Story: Multimodal Long Story Generation with Large Language Model. | arXiv | Image | |
| Segment Anything | Segment Anything Model (SAM): a new AI model from Meta AI that can "cut out" any object , in any image , with a single click. | arXiv | Image | |
| Segment Anything Model 2 (SAM 2) | SAM 2: Segment Anything in Images and Videos. | arXiv | Image | |
| sd-webui-controlnet | WebUI extension for ControlNet. | Image | ||
| SDXL-Lightning | Progressive Adversarial Diffusion Distillation. | arXiv | Image | |
| SDXS | Real-Time One-Step Latent Diffusion Models with Image Conditions. | Image | ||
| Stable.art | Photoshop plugin for Stable Diffusion with Automatic1111 as backend (locally or with Google Colab). | Image | ||
| Stable Cascade | Stable Cascade consists of three models: Stage A, Stage B and Stage C, representing a cascade for generating images, hence the name "Stable Cascade". | Image | ||
| Stable Diffusion | A latent text-to-image diffusion model. | Image | ||
| stable-diffusion.cpp | Stable Diffusion in pure C/C++. | Image | ||
| Stable Diffusion web UI | A browser interface based on Gradio library for Stable Diffusion. | Image | ||
| Stable Diffusion web UI | Web-based UI for Stable Diffusion. | Image | ||
| Stable Diffusion WebUI Chinese | Chinese version of stable-diffusion-webui. | Image | ||
| Stable Diffusion XL | Generate images from text. | arXiv | Image | |
| Stable Diffusion XL Turbo | Real-Time Text-to-Image Generation. | Image | ||
| Stable Diffusion 3.5 | Stable Diffusion 3.5 open release includes multiple model variants, including Stable Diffusion 3.5 Large and Stable Diffusion 3.5 Large Turbo. | Image | ||
| Stable Doodle | Stable Doodle is a sketch-to-image tool that converts a simple drawing into a dynamic image. | Image | ||
| StableStudio | StableStudio by Stability AI | Image | ||
| StoryMaker | StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation. | arXiv | Image | |
| StreamDiffusion | A Pipeline-Level Solution for Real-Time Interactive Generation. | Image | ||
| StyleDrop | Text-To-Image Generation in Any Style. | arXiv | Image | |
| SyncDreamer | Generating Multiview-consistent Images from a Single-view Image. | arXiv | Image | |
| UltraEdit | UltraEdit: Instruction-based Fine-Grained Image Editing at Scale. | arXiv | Image | |
| UltraPixel | UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks. | arXiv | Image | |
| Unity ML Stable Diffusion | Core ML Stable Diffusion on Unity. | Unity | Image | |
| Vispunk Visions | Text-to-Image generation platform. | Image |
| Source | Description | Paper | Game Engine | Type |
|---|---|---|---|---|
| CRM | Single Image to 3D Textured Mesh with Convolutional Reconstruction Model. | arXiv | Texture | |
| DreamMat | High-quality PBR Material Generation with Geometry- and Light-aware Diffusion Models. | arXiv | Texture | |
| DreamSpace | Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation. | Texture | ||
| Dream Textures | Stable Diffusion built-in to Blender. Create textures, concept art, background assets, and more with a simple text prompt. | Blender | Texture | |
| InstructHumans | Editing Animated 3D Human Textures with Instructions. | arXiv | Texture | |
| InteX | Interactive Text-to-Texture Synthesis via Unified Depth-aware Inpainting. | arXiv | Texture | |
| LLaMA-Mesh | LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models. | arXiv | Mesh | |
| MaterialSeg3D | MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets. | arXiv | Texture | |
| MeshAnything | MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets. | arXiv | Mesh | |
| Neuralangelo | High-Fidelity Neural Surface Reconstruction. | arXiv | Texture | |
| Paint-it | Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering. | Texture | ||
| Polycam | Create your own 3D textures just by typing. | Texture | ||
| TexFusion | Synthesizing 3D Textures with Text-Guided Image Diffusion Models. | arXiv | Texture | |
| Text2Tex | Text-driven texture Synthesis via Diffusion Models. | arXiv | Texture | |
| Texture Lab | AI-generated texures. You can generate your own with a text prompt. | Texture | ||
| With Poly | Create Textures With Poly. Generate 3D materials with AI in a free online editor, or search our growing community library. | Texture | ||
| X-Mesh | X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance. | arXiv | Texture |
| Source | Description | Paper | Game Engine | Type |
|---|---|---|---|---|
| AI Shader | ChatGPT-powered shader generator for Unity. | Unity | Shader |
| Source | Description | Paper | Game Engine | Type |
|---|---|---|---|---|
| Animate3D | Animate3D: Animating Any 3D Model with Multi-view Video Diffusion. | arXiv | 3D | |
| Anything-3D | Segment-Anything + 3D. Let's lift the anything to 3D. | arXiv | Model | |
| Any2Point | Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding. | arXiv | 3D | |
| BlenderGPT | Use commands in English to control Blender with OpenAI's GPT-4. | Blender | Model | |
| Blender-GPT | An all-in-one Blender assistant powered by GPT3/4 + Whisper integration. | Blender | Model | |
| Blockade Labs | Digital alchemy is real with Skybox Lab - the ultimate AI-powered solution for generating incredible 360° skybox experiences from text prompts. | Model | ||
| CF-3DGS | COLMAP-Free 3D Gaussian Splatting. | arXiv | 3D | |
| CharacterGen | CharacterGen: Efficient 3D Character Generation from Single Images with Multi-View Pose Canonicalization. | arXiv | 3D | |
| chatGPT-maya | Simple Maya tool that utilizes open AI to perform basic tasks based on descriptive instructions. | Maya | Model | |
| CityDreamer | Compositional Generative Model of Unbounded 3D Cities. | arXiv | 3D | |
| CSM | Generate 3D worlds from images and videos. | 3D | ||
| Dash | Your Copilot for World Building in Unreal Engine. | Unreal Engine | 3D | |
| DreamCatalyst | DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation. | arXiv | 3D | |
| DreamGaussian4D | Generative 4D Gaussian Splatting. | arXiv | 4D | |
| DUSt3R | Geometric 3D Vision Made Easy. | arXiv | 3D | |
| Edify 3D | Edify 3D: Scalable High-Quality 3D Asset Generation. | arXiv | 3D | |
| GALA3D | GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting. | arXiv | 3D | |
| GaussCtrl | GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing. | arXiv | 3D | |
| GaussianCube | A Structured and Explicit Radiance Representation for 3D Generative Modeling. | arXiv | 3D | |
| GaussianDreamer | Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors. | arXiv | 3D | |
| GenieLabs | Empower your game with AI-UGC. | 3D | ||
| HiFA | High-fidelity Text-to-3D with advance Diffusion guidance. | Model | ||
| HoloDreamer | HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions. | arXiv | 3D | |
| Hunyuan3D-1.0 | Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation. | arXiv | 3D | |
| Infinigen | Infinite Photorealistic Worlds using Procedural Generation. | arXiv | 3D | |
| Instruct-NeRF2NeRF | Editing 3D Scenes with Instructions. | arXiv | Model | |
| Interactive3D | Create What You Want by Interactive 3D Generation. | arXiv | 3D | |
| Isotropic3D | Image-to-3D Generation Based on a Single CLIP Embedding. | 3D | ||
| LATTE3D | Large-scale Amortized Text-To-Enhanced3D Synthesis. | arXiv | 3D | |
| LION | Latent Point Diffusion Models for 3D Shape Generation. | arXiv | Model | |
| Luma AI | Capture in lifelike 3D. Unmatched photorealism, reflections, and details. The future of VFX is now, for everyone! | Model | ||
| lumine AI | AI-Powered Creativity. | 3D | ||
| Make-It-3D | High-Fidelity 3D Creation from A Single Image with Diffusion Prior. | arXiv | Model | |
| Meshy | Create Stunning 3D Game Assets with AI. | 3D | ||
| Mootion | Magical 3D AI Animation Maker. | 3D | ||
| MVDream | Multi-view Diffusion for 3D Generation. | arXiv | 3D | |
| NVIDIA Instant NeRF | Instant neural graphics primitives: lightning fast NeRF and more. | Model | ||
| One-2-3-45 | Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization. | arXiv | Model | |
| Paint3D | Paint Anything 3D with Lighting-Less Texture Diffusion Models. | arXiv | 3D | |
| PAniC-3D | Stylized Single-view 3D Reconstruction from Portraits of Anime Characters. | arXiv | Model | |
| Point·E | Point cloud diffusion for 3D model synthesis. | Model | ||
| ProlificDreamer | High-Fidelity and diverse Text-to-3D generation with Variational score Distillation. | arXiv | Model | |
| SF3D | SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement. | arXiv | 3D | |
| Shap-E | Generate 3D objects conditioned on text or images. | arXiv | Model | |
| Sloyd | 3D modelling has never been easier. | Model | ||
| Spline AI | The power of AI is coming to the 3rd dimension. Generate objects, animations, and textures using prompts. | Model | ||
| Stable Dreamfusion | A pytorch implementation of the text-to-3D model Dreamfusion, powered by the Stable Diffusion text-to-2D model. | Model | ||
| SV3D | Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion. | arXiv | 3D | |
| Tafi | AI text to 3D character engine. | Model | ||
| 3D-GPT | Procedural 3D Modeling with Large Language Models. | arXiv | 3D | |
| 3D-LLM | Injecting the 3D World into Large Language Models. | arXiv | 3D | |
| 3Dpresso | Extract a 3D model of an object, captured on a video. | Model | ||
| 3DTopia | Text-to-3D Generation within 5 Minutes. | arXiv | 3D | |
| 3DTopia-XL | 3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion. | arXiv | 3D | |
| threestudio | A unified framework for 3D content generation. | Model | ||
| TripoSR | A state-of-the-art open-source model for fast feedforward 3D reconstruction from a single image. | arXiv | Model | |
| Unique3D | High-Quality and Efficient 3D Mesh Generation from a Single Image. | arXiv | 3D | |
| UnityGaussianSplatting | Toy Gaussian Splatting visualization in Unity. | Unity | 3D | |
| ViVid-1-to-3 | Novel View Synthesis with Video Diffusion Models. | arXiv | 3D | |
| Voxcraft | Crafting Ready-to-Use 3D Models with AI. | 3D | ||
| Wonder3D | Single Image to 3D using Cross-Domain Diffusion. | arXiv | 3D | |
| Zero-1-to-3 | Zero-shot One Image to 3D Object. | arXiv | Model |
| Source | Description | Paper | Game Engine | Type |
|---|---|---|---|---|
| AniPortrait | Audio-Driven Synthesis of Photorealistic Portrait Animations. | arXiv | Avatar | |
| CALM | Conditional Adversarial Latent Models for Directable Virtual Characters. | arXiv | Avatar | |
| ChatAvatar | Progressive generation Of Animatable 3D Faces Under Text guidance. | Avatar | ||
| ChatdollKit | ChatdollKit enables you to make your 3D model into a chatbot. | Unity | Avatar | |
| DreamTalk | When Expressive Talking Head Generation Meets Diffusion Probabilistic Models. | arXiv | Avatar | |
| Duix | Duix - Silicon-Based Digital Human SDK 🌐🤖 | Avatar | ||
| EchoMimic | EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions. | arXiv | Avatar | |
| EMOPortraits | Emotion-enhanced Multimodal One-shot Head Avatars. | Avatar | ||
| E3 Gen | Efficient, Expressive and Editable Avatars Generation. | arXiv | Avatar | |
| ExAvatar | ExAvatar - Expressive Whole-Body 3D Gaussian Avatar. | arXiv | Avatar | |
| GeneAvatar | Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image. | arXiv | Avatar | |
| GeneFace++ | Generalized and Stable Real-Time 3D Talking Face Generation. | Avatar | ||
| Hallo | Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation. | arXiv | Avatar | |
| Hallo2 | Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation. | arXiv | Avatar | |
| HeadSculpt | Crafting 3D Head Avatars with Text. | arXiv | Avatar | |
| IntrinsicAvatar | IntrinsicAvatar: Physically Based Inverse Rendering of Dynamic Humans from Monocular Videos via Explicit Ray Tracing. | arXiv | Avatar | |
| Linly-Talker | Digital Avatar Conversational System. | Avatar | ||
| LivePortrait | LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control. | arXiv | Avatar | |
| MotionGPT | Human Motion as a Foreign Language, a unified motion-language generation model using LLMs. | arXiv | Avatar | |
| MusePose | MusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation. | Avatar | ||
| MuseTalk | Real-Time High Quality Lip Synchorization with Latent Space Inpainting. | Avatar | ||
| MuseV | Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising. | Avatar | ||
| Portrait4D | Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data. | arXiv | Avatar | |
| Ready Player Me | Integrate customizable avatars into your game or app in days. | Avatar | ||
| RodinHD | RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models. | arXiv | Avatar | |
| StyleAvatar3D | Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation. | arXiv | Avatar | |
| Text2Control3D | Controllable 3D Avatar Generation in Neural Radiance Fields using Geometry-Guided Text-to-Image Diffusion Model. | arXiv | Avatar | |
| Topo4D | Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture. | arXiv | Avatar | |
| UnityAIWithChatGPT | Based on Unity, ChatGPT+UnityChan voice interactive display is realized. | Unity | Avatar | |
| Vid2Avatar | 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition. | arXiv | Avatar | |
| VLOGGER | Multimodal Diffusion for Embodied Avatar Synthesis. | Avatar | ||
| Wild2Avatar | Rendering Humans Behind Occlusions. | arXiv | Avatar |
| Source | Description | Paper | Game Engine | Type |
|---|---|---|---|---|
| Animate Anyone | Consistent and Controllable Image-to-Video Synthesis for Character Animation. | arXiv | Animation | |
| AnimateAnything | Fine-Grained Open Domain Image Animation with Motion Guidance. | arXiv | Animation | |
| AnimateDiff | Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning. | arXiv | Animation | |
| AnimateLCM | Let's Accelerate the Video Generation within 4 Steps! | arXiv | Animation | |
| Animate-X | Animate-X: Universal Character Image Animation with Enhanced Motion Representation. | arXiv | Animation | |
| AnimateZero | Video Diffusion Models are Zero-Shot Image Animators. | arXiv | Animation | |
| AnimationGPT | An AIGC tool for generating game combat motion assets. | Animation | ||
| Deforum | Deforum leverages Stable Diffusion to generate evolving AI visuals. | Animation | ||
| DrawingSpinUp | DrawingSpinUp: 3D Animation from Single Character Drawings. | arXiv | Animation | |
| DreaMoving | A Human Video Generation Framework based on Diffusion Models. | arXiv | Animation | |
| FaceFusion | Next generation face swapper and enhancer. | Animation | ||
| FreeInit | Bridging Initialization Gap in Video Diffusion Models. | arXiv | Animation | |
| GeneFace | Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis. | arXiv | Animation | |
| ID-Animator | Zero-Shot Identity-Preserving Human Video Generation. | arXiv | Animation | |
| MagicAnimate | Temporally Consistent Human Image Animation using Diffusion Model. | arXiv | Animation | |
| NUWA | DragNUWA is an open-domain diffusion-based video generation model takes text, image, and trajectory controls as inputs to achieve controllable video generation. | arXiv | Animation | |
| NUWA-Infinity | NUWA-Infinity is a multimodal generative model that is designed to generate high-quality images and videos from given text, image or video input. | Animation | ||
| NUWA-XL | A novel Diffusion over Diffusion architecture for eXtremely Long video generation. | Animation | ||
| Omni Animation | AI Generated High Fidelity Animations. | Animation | ||
| PIA | Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models. | arXiv | Animation | |
| SadTalker | Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation. | arXiv | Animation | |
| SadTalker-Video-Lip-Sync | This project is based on SadTalkers Wav2lip for video lip synthesis. | Animation | ||
| Stable Animation | A powerful text-to-animation tool for developers. | Animation | ||
| TaleCrafter | An interactive story visualization tool that support multiple characters. | arXiv | Animation | |
| ToonCrafter | ToonCrafter: Generative Cartoon Interpolation. | arXiv | Animation | |
| Wav2Lip | Accurately Lip-syncing Videos In The Wild. | arXiv | Animation | |
| Wonder Studio | An AI tool that automatically animates, lights and composes CG characters into a live-action scene. | Animation |
| Source | Description | Paper | Game Engine | Type |
|---|---|---|---|---|
| Cambrian-1 | Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs. | arXiv | Multimodal LLMs | |
| CogVLM2 | GPT4V-level open-source multi-modal model based on Llama3-8B. | Visual | ||
| CoTracker | It is Better to Track Together. | arXiv | Visual | |
| EVF-SAM | EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model. | arXiv | Visual | |
| FaceHi | It is Better to Track Together. | Visual | ||
| InternLM-XComposer2 | InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension. | arXiv | Visual | |
| Kangaroo | Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input. | Visual | ||
| LGVI | Towards Language-Driven Video Inpainting via Multimodal Large Language Models. | Visual | ||
| LLaVA++ | Extending Visual Capabilities with LLaMA-3 and Phi-3. | Visual | ||
| LLaVA-OneVision | LLaVA-OneVision: Easy Visual Task Transfer. | arXiv | Visual | |
| LongVA | Long Context Transfer from Language to Vision. | arXiv | Visual | |
| MaskViT | Masked Visual Pre-Training for Video Prediction. | arXiv | Visual | |
| MiniCPM-Llama3-V 2.5 | A GPT-4V Level MLLM on Your Phone. | Visual | ||
| MoE-LLaVA | Mixture of Experts for Large Vision-Language Models. | arXiv | Visual | |
| MotionLLM | Understanding Human Behaviors from Human Motions and Videos. | arXiv | Visual | |
| PLLaVA | Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning. | arXiv | Visual | |
| Qwen-VL | A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond. | arXiv | Visual | |
| Sapiens | Sapiens: Foundation for Human Vision Models. | arXiv | Visual | |
| ShareGPT4V | Improving Large Multi-modal Models with Better Captions. | arXiv | Visual | |
| SOLO | SOLO: A Single Transformer for Scalable Vision-Language Modeling. | arXiv | Visual | |
| Video-CCAM | Video-CCAM: Advancing Video-Language Understanding with Causal Cross-Attention Masks. | Visual | ||
| Video-LLaVA | Learning United Visual Representation by Alignment Before Projection. | arXiv | Visual | |
| VideoLLaMA 2 | Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs. | arXiv | Visual | |
| Video-MME | The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis. | arXiv | Visual | |
| Vitron | A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing. | Visual | ||
| VILA | VILA: On Pre-training for Visual Language Models. | arXiv | Visual |
| Source | Description | Paper | Game Engine | Type |
|---|---|---|---|---|
| 360DVD | Controllable Panorama Video Generation with 360-Degree Video Diffusion Model. | arXiv | Video | |
| Animate-A-Story | Retrieval-Augmented Video Generation for Telling a Story. | arXiv | Video | |
| Anything in Any Scene | Photorealistic Video Object Insertion. | Video | ||
| ART•V | Auto-Regressive Text-to-Video Generation with Diffusion Models. | arXiv | Video | |
| Assistive | Meet the generative video platform that brings your ideas to life. | Video | ||
| AtomoVideo | High Fidelity Image-to-Video Generation. | arXiv | Video | |
| BackgroundRemover | Background Remover lets you Remove Background from images and video using AI with a simple command line interface that is free and open source. | Video | ||
| Boximator | Generating Rich and Controllable Motions for Video Synthesis. | arXiv | Video | |
| CoDeF | Content Deformation Fields for Temporally Consistent Video Processing. | arXiv | Video | |
| CogVideo | Generate Videos from Text Descriptions. | Video | ||
| CogVideoX | CogVideoX is an open-source version of the video generation model, which is homologous to 清影. | Video | ||
| CogVLM | CogVLM is a powerful open-source visual language model (VLM). | Visual | ||
| CoNR | Genarate vivid dancing videos from hand-drawn anime character sheets(ACS). | arXiv | Video | |
| Decohere | Create what can't be filmed. | Video | ||
| Descript | Descript is the simple, powerful , and fun way to edit. | Video | ||
| Diffutoon | High-Resolution Editable Toon Shading via Diffusion Models. | arXiv | Video | |
| dolphin | General video interaction platform based on LLMs. | Video | ||
| DomoAI | Amplify Your Creativity with DomoAI. | Video | ||
| DreamCinema | DreamCinema: Cinematic Transfer with Free Camera and 3D Character. | arXiv | Video | |
| DynamiCrafter | Animating Open-domain Images with Video Diffusion Priors. | arXiv | Video | |
| EDGE | We introduce EDGE, a powerful method for editable dance generation that is capable of creating realistic, physically-plausible dances while remaining faithful to arbitrary input music. | arXiv | Video | |
| EMO | Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions. | arXiv | Video | |
| Emu Video | Factorizing Text-to-Video Generation by Explicit Image Conditioning. | Video | ||
| Etna | Etna can generate corresponding video content based on short text descriptions. | Video | ||
| Fairy | Fast Parallelized Instruction-Guided Video-to-Video Synthesis. | Video | ||
| Follow-Your-Canvas | Follow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive Content Generation. | arXiv | Video | |
| Follow Your Pose | Pose-Guided Text-to-Video Generation using Pose-Free Videos. | arXiv | Video | |
| FullJourney | Your complete suite of AI Creation tools at your fingertips. | Video | ||
| Gen-2 | A multi-modal AI system that can generate novel videos with text, images, or video clips. | Video | ||
| Generative Dynamics | Generative Image Dynamics. | Video | ||
| Genie | Generative Interactive Environments. | arXiv | Video | |
| Genmo | Magically make videos with AI. | Video | ||
| GenTron | Diffusion Transformers for Image and Video Generation. | Video | ||
| HiGen | Hierarchical Spatio-temporal Decoupling for Text-to-Video generation. | Video | ||
| Hotshot-XL | Hotshot-XL is an AI text-to-GIF model trained to work alongside Stable Diffusion XL. | Video | ||
| HunyuanVideo | HunyuanVideo: A Systematic Framework For Large Video Generation Model. | arXiv | Video | |
| Imagen Video | Given a text prompt, Imagen Video generates high definition videos using a base video generation model and a sequence of interleaved spatial and temporal video super-resolution models. | Video | ||
| InstructVideo | Instructing Video Diffusion Models with Human Feedback. | arXiv | Video | |
| I2VGen-XL | High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models. | arXiv | Video | |
| LaVie | High-Quality Video Generation with Cascaded Latent Diffusion Models. | arXiv | Video | |
| LTX Studio | LTX Studio is a holistic, AI-driven filmmaking platform for creators, marketers, filmmakers and studios. | Video | ||
| LTX-Video | LTX-Video is the first DiT-based video generation model that can generate high-quality videos in real-time. It can generate 24 FPS videos at 768x512 resolution, faster than it takes to watch them. | Video | ||
| Lumiere | A Space-Time Diffusion Model for Video Generation. | arXiv | Video | |
| LVDM | Latent Video Diffusion Models for High-Fidelity Long Video Generation. | arXiv | Video | |
| MagicVideo | Efficient Video Generation With Latent Diffusion Models. | arXiv | Video | |
| MagicVideo-V2 | Multi-Stage High-Aesthetic Video Generation. | arXiv | Video | |
| Magic Hour | AI Video for Creators made simple. | Video | ||
| MAGVIT-v2 | Tokenizer is key to visual generation. | Video | ||
| MAGVIT | Masked Generative Video Transformer. | Video | ||
| Make-A-Video | Make-A-Video is a state-of-the-art AI system that generates videos from text. | arXiv | Video | |
| Make Pixels Dance | High-Dynamic Video Generation. | arXiv | Video | |
| Make-Your-Video | Customized Video Generation Using Textual and Structural Guidance. | arXiv | Video | |
| MicroCinema | A Divide-and-Conquer Approach for Text-to-Video Generation. | arXiv | Video | |
| MIMO | MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling. | arXiv | Video | |
| Mini-Gemini | Mining the Potential of Multi-modality Vision Language Models. | Vision | ||
| MobileVidFactory | Automatic Diffusion-Based Social Media Video Generation for Mobile Devices from Text. | Video | ||
| Mochi 1 | Mochi 1 is an open state-of-the-art video generation model with high-fidelity motion and strong prompt adherence in preliminary evaluation. | Video | ||
| MOFA-Video | Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model. | arXiv | Video | |
| MoneyPrinterTurbo | Use large models to generate short videos with one click. | Video | ||
| Moonvalley | Moonvalley is a groundbreaking new text-to-video generative AI model. | Video | ||
| Mora | More like Sora for Generalist Video Generation. | arXiv | Video | |
| Morph Studio | With our Text-to-Video AI Magic, manifest your creativity through your prompt. | Video | ||
| MotionClone | MotionClone: Training-Free Motion Cloning for Controllable Video Generation. | arXiv | Video | |
| MotionCtrl | A Unified and Flexible Motion Controller for Video Generation. | arXiv | Video | |
| MotionDirector | Motion Customization of Text-to-Video Diffusion Models. | arXiv | Video | |
| Motionshop | An application of replacing the characters in video with 3D avatars. | Video | ||
| Mov2mov | Mov2mov plugin for Automatic1111/stable-diffusion-webui. | Video | ||
| MovieFactory | Automatic Movie Creation from Text using Large Generative Models for Language and Images. | arXiv | Video | |
| Neural Frames | Discover the synthesizer for the visual world. | Video | ||
| NeverEnds | Create your world. | Video | ||
| Open-Sora | Democratizing Efficient Video Production for All. | Video | ||
| Open-Sora | Open-Sora Plan. | Video | ||
| Phenaki | A model for generating videos from text, with prompts that can change over time, and videos that can be as long as multiple minutes. | arXiv | Video | |
| Pika Labs | Pika Labs is revolutionizing video-making experience with AI. | Video | ||
| Pixeling | Pixeling empowers our customers to create highly precise, ultra-realistic, and extremely controllable visual content including images, videos and 3D models. | Video | ||
| PixVerse | Create breath-taking videos with AI. | Video | ||
| Pollinations | Creating gets easy, fast, and fun. | Video | ||
| Reuse and Diffuse | Iterative Denoising for Text-to-Video Generation. | arXiv | Video | |
| Ruyi | Ruyi is an image-to-video model capable of generating cinematic-quality videos at a resolution of 768, with a frame rate of 24 frames per second, totaling 5 seconds and 120 frames. | Video | ||
| ShortGPT | An experimental AI framework for automated short/video content creation. | Video | ||
| Show-1 | Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation. | arXiv | Video | |
| Snap Video | Scaled Spatiotemporal Transformers for Text-to-Video Synthesis. | arXiv | Video | |
| Sora | Creating video from text. | Video | ||
| SoraWebui | SoraWebui is an open-source Sora web client, enabling users to easily create videos from text with OpenAI's Sora model. | Video | ||
| StableVideo | Text-driven Consistency-aware Diffusion Video Editing. | Video | ||
| Stable Video Diffusion | Stable Video Diffusion (SVD) Image-to-Video. | Video | ||
| StoryDiffusion | Consistent Self-Attention for Long-Range Image and Video Generation. | arXiv | Video | |
| StreamingT2V | Consistent, Dynamic, and Extendable Long Video Generation from Text. | arXiv | Video | |
| StyleCrafter | nhancing Stylized Text-to-Video Generation with Style Adapter. | arXiv | Video | |
| TATS | Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer. | Video | ||
| Text2Video-Zero | Text-to-Image Diffusion Models are Zero-Shot Video Generators. | arXiv | Video | |
| TF-T2V | A Recipe for Scaling up Text-to-Video Generation with Text-free Videos. | arXiv | Video | |
| Tora | Tora: Trajectory-oriented Diffusion Transformer for Video Generation. | arXiv | Video | |
| Track-Anything | Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything and XMem. | arXiv | Video | |
| Tune-A-Video | One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation. | arXiv | Video | |
| TwelveLabs | Multimodal AI that understands videos like humans. | Video | ||
| UniVG | Towards UNIfied-modal Video Generation. | Video | ||
| Vchitect-2.0 | Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models. | Video | ||
| VGen | A holistic video generation ecosystem for video generation building on diffusion models. | arXiv | Video | |
| ViewCrafter | ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis. | arXiv | Video | |
| Video-ChatGPT | Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. | arXiv | Video | |
| VideoComposer | Compositional Video Synthesis with Motion Controllability. | arXiv | Video | |
| VideoCrafter1 | Open Diffusion Models for High-Quality Video Generation. | arXiv | Video | |
| VideoCrafter2 | Overcoming Data Limitations for High-Quality Video Diffusion Models. | arXiv | Video | |
| VideoDrafter | Content-Consistent Multi-Scene Video Generation with LLM. | arXiv | Video | |
| VideoElevator | Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models. | arXiv | Video | |
| VideoFactory | Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation. | Video | ||
| VideoGen | A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation. | arXiv | Video | |
| VideoLCM | Video Latent Consistency Model. | arXiv | Video | |
| Video LDMs | Align your Latents: High- resolution Video Synthesis with Latent Diffusion Models. | arXiv | Video | |
| Video-LLaVA | Learning United Visual Representation by Alignment Before Projection. | arXiv | Video | |
| VideoMamba | State Space Model for Efficient Video Understanding. | arXiv | Video | |
| Video-of-Thought | Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition. | Video | ||
| VideoPoet | A large language model for zero-shot video generation. | arXiv | Video | |
| Vispunk Motion | Create realistic videos using just text. | Video | ||
| VisualRWKV | VisualRWKV is the visual-enhanced version of the RWKV language model, enabling RWKV to handle various visual tasks. | Visual | ||
| V-JEPA | Video Joint Embedding Predictive Architecture. | arXiv | Video | |
| W.A.L.T | Photorealistic Video Generation with Diffusion Models. | arXiv | Video | |
| Zeroscope | Zeroscope Text-to-Video. | Video |
| Source | Description | Paper | Game Engine | Type |
|---|---|---|---|---|
| AcademiCodec | An Open Source Audio Codec Model for Academic Research. | Audio | ||
| Amphion | An Open-Source Audio, Music, and Speech Generation Toolkit. | arXiv | Audio | |
| ArchiSound | Audio generation using diffusion models, in PyTorch. | Audio | ||
| Audiobox | Unified Audio Generation with Natural Language Prompts. | Audio | ||
| AudioEditing | Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion. | arXiv | Audio | |
| Audiogen Codec | A low compression 48khz stereo neural audio codec for general audio, optimizing for audio fidelity 🎵. | Audio | ||
| AudioGPT | Understanding and Generating Speech, Music, Sound, and Talking Head. | arXiv | Audio | |
| AudioLCM | Text-to-Audio Generation with Latent Consistency Models. | arXiv | Audio | |
| AudioLDM | Text-to-Audio Generation with Latent Diffusion Models. | arXiv | Audio | |
| AudioLDM 2 | Learning Holistic Audio Generation with Self-supervised Pretraining. | arXiv | Audio | |
| Auffusion | Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation. | arXiv | Audio | |
| CTAG | Creative Text-to-Audio Generation via Synthesizer Programming. | Audio | ||
| FoleyCrafter | FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. | arXiv | Audio | |
| MAGNeT | Masked Audio Generation using a Single Non-Autoregressive Transformer. | Audio | ||
| Make-An-Audio | Text-To-Audio Generation with Prompt-Enhanced Diffusion Models. | arXiv | Audio | |
| Make-An-Audio 3 | Transforming Text into Audio via Flow-based Large Diffusion Transformers. | arXiv | Audio | |
| NeuralSound | Learning-based Modal Sound Synthesis with Acoustic Transfer. | arXiv | Audio | |
| OptimizerAI | Sounds for Creators, Game makers, Artists, Video makers. | Audio | ||
| Qwen2-Audio | Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud. | arXiv | Audio | |
| SEE-2-SOUND | Zero-Shot Spatial Environment-to-Spatial Sound. | arXiv | Audio | |
| SoundStorm | Efficient Parallel Audio Generation. | arXiv | Audio | |
| Stable Audio | Fast Timing-Conditioned Latent Audio Diffusion. | Audio | ||
| Stable Audio Open | Stable Audio Open 1.0 generates variable-length (up to 47s) stereo audio at 44.1kHz from text prompts. | Audio | ||
| SyncFusion | SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley Synthesis. | arXiv | Audio | |
| TANGO | Text-to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model. | Audio | ||
| VTA-LDM | Video-to-Audio Generation with Hidden Alignment. | arXiv | Audio | |
| WavJourney | Compositional Audio Creation with Large Language Models. | arXiv | Audio |
| Source | Description | Paper | Game Engine | Type |
|---|---|---|---|---|
| AIVA | The Artificial Intelligence composing emotional soundtrack music. | Music | ||
| Amper Music | Custom music generation technology powered by Amper. | Music | ||
| Boomy | Create generative music. Share it with the world. | Music | ||
| ChatMusician | Fostering Intrinsic Musical Abilities Into LLM. | Music | ||
| Chord2Melody | Automatic Music Generation AI. | Music | ||
| Diff-BGM | A Diffusion Model for Video Background Music Generation. | arXiv | Music | |
| FluxMusic | FluxMusic: Text-to-Music Generation with Rectified Flow Transformer. | arXiv | Music | |
| GPTAbleton | Draft script for processing GPT response and sending the MIDI notes into the Ableton clips with AbletonOSC and python-osc. | Music | ||
| HeyMusic.AI | AI Music Generator | Music | ||
| Image to Music | AI Image to Music Generator is a tool that uses artificial intelligence to convert images into music. | Music | ||
| JEN-1 | Text-Guided Universal Music Generation with Omnidirectional Diffusion Models. | Music | ||
| Jukebox | A Generative Model for Music. | arXiv | Music | |
| Magenta | Magenta is a research project exploring the role of machine learning in the process of creating art and music. | Music | ||
| MeLoDy | Efficient Neural Music Generation | Music | ||
| Mubert | AI Generative Music. | Music | ||
| MuseNet | A deep neural network that can generate 4-minute musical compositions with 10 different instruments, and can combine styles from country to Mozart to the Beatles. | Music | ||
| MusicGen | Simple and Controllable Music Generation. | arXiv | Music | |
| MusicLDM | Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies. | arXiv | Music | |
| MusicLM | Generating Music From Text. | arXiv | Music | |
| Riffusion App | Riffusion is an app for real-time music generation with stable diffusion. | Music | ||
| Sonauto | Sonauto is an AI music editor that turns prompts, lyrics, or melodies into full songs in any style. | Music | ||
| SoundRaw | AI music generator for creators. | Music | ||
| Soundry AI | Generative AI tools including text-to-sound and infinite sample packs. | Music |
| Source | Description | Paper | Game Engine | Type |
|---|---|---|---|---|
| DiffSinger | Singing Voice Synthesis via Shallow Diffusion Mechanism. | arXiv | Singing Voice | |
| Retrieval-based-Voice-Conversion-WebUI | An easy-to-use SVC framework based on VITS. | Singing Voice | ||
| so-vits-svc | SoftVC VITS Singing Voice Conversion. | Singing Voice | ||
| VI-SVS | Use VITS and Opencpop to develop singing voice synthesis; Different from VISinger. | Singing Voice |
| Source | Description | Paper | Game Engine | Type |
|---|---|---|---|---|
| Applio | Ultimate voice cloning tool, meticulously optimized for unrivaled power, modularity, and user-friendly experience. | Speech | ||
| Audyo | Text in. Audio out. | Speech | ||
| Bark | Text-Prompted Generative Audio Model. | Speech | ||
| Bert-VITS2 | VITS2 Backbone with multilingual bert. | Speech | ||
| ChatTTS | ChatTTS is a generative speech model for daily dialogue. | Speech | ||
| CLAPSpeech | Learning Prosody from Text Context with Contrastive Language-Audio Pre-Training. | arXiv | Speech | |
| CosyVoice | Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability. | Speech | ||
| DEX-TTS | Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability. | arXiv | Speech | |
| EmotiVoice | A Multi-Voice and Prompt-Controlled TTS Engine. | Speech | ||
| Fliki | Turn text into videos with AI voices. | Speech | ||
| GLM-4-Voice | GLM-4-Voice is an end-to-end voice model launched by Zhipu AI. GLM-4-Voice can directly understand and generate Chinese and English speech, engage in real-time voice conversations, and change attributes such as emotion, intonation, speech rate, and dialect based on user instructions. | Speech | ||
| Glow-TTS | A Generative Flow for Text-to-Speech via Monotonic Alignment Search. | arXiv | Speech | |
| GPT-SoVITS | A Powerful Few-shot Voice Conversion and Text-to-Speech WebUI. | Speech | ||
| LOVO | LOVO is the go-to AI Voice Generator & Text to Speech platform for thousands of creators. | Speech | ||
| MahaTTS | An Open-Source Large Speech Generation Model. | Speech | ||
| Matcha-TTS | A fast TTS architecture with conditional flow matching. | arXiv | Speech | |
| MeloTTS | High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean. | Speech | ||
| MetaVoice-1B | AI for human-level speech intelligence. | Speech | ||
| Narakeet | Easily Create Voiceovers Using Realistic Text to Speech. | Speech | ||
| Mini-Omni | Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming. Mini-Omni is an open-source multimodel large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities. | arXiv | Speech | |
| One-Shot-Voice-Cloning | One Shot Voice Cloning base on Unet-TTS. | Speech | ||
| OpenVoice | Instant voice cloning by MyShell. | Speech | ||
| OverFlow | Putting flows on top of neural transducers for better TTS. | Speech | ||
| RealtimeTTS | RealtimeTTS is a state-of-the-art text-to-speech (TTS) library designed for real-time applications. | Speech | ||
| SenseVoice | SenseVoice is a speech foundation model with multiple speech understanding capabilities, including automatic speech recognition (ASR), spoken language identification (LID), speech emotion recognition (SER), and audio event detection (AED). | Speech | ||
| SpeechGPT | Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities. | arXiv | Speech | |
| speech-to-text-gpt3-unity | This is the repo I use Whisper and ChatGPT API from OpenAI in Unity. | Unity | Speech | |
| Stable Speech | Stability AI's Text-to-Speech model. | Speech | ||
| StableTTS | Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3. | Speech | ||
| StyleTTS 2 | Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models. | arXiv | Speech | |
| tortoise.cpp | tortoise.cpp: GGML implementation of tortoise-tts. | Speech | ||
| TorToiSe-TTS | A multi-voice TTS system trained with an emphasis on quality. | Speech | ||
| TTS Generation WebUI | TTS Generation WebUI (Bark, MusicGen, Tortoise, RVC, Vocos, Demucs). | Speech | ||
| VALL-E | Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers. | arXiv | Speech | |
| VALL-E X | Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling | arXiv | Speech | |
| Vocode | Vocode is an open-source library for building voice-based LLM applications. | Speech | ||
| Voicebox | Text-Guided Multilingual Universal Speech Generation at Scale. | arXiv | Speech | |
| VoiceCraft | Zero-Shot Speech Editing and Text-to-Speech in the Wild. | Speech | ||
| Whisper | Whisper is a general-purpose speech recognition model. | Speech | ||
| WhisperSpeech | An Open Source text-to-speech system built by inverting Whisper. | Speech | ||
| X-E-Speech | Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion. | Speech | ||
| XTTS | XTTS is a library for advanced Text-to-Speech generation. | Speech | ||
| YourTTS | Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone. | arXiv | Speech | |
| ZMM-TTS | Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations. | arXiv | Speech |
| Source | Description | Game Engine | Type |
|---|---|---|---|
| Ludo.ai | Assistant for game research and design. | Analytics |
