ComfyUI-LongCat-Avatar v0.2.0

Initial public release of ComfyUI LongCat Avatar, a ComfyUI custom node package for LongCat Video Avatar 1.5 audio-driven human video generation.

Highlights

ComfyUI-native workflow for LongCat Video Avatar 1.5
Supports single-person audio-driven generation in ai2v and at2v modes
Supports two-person / dual-audio generation in ai2v mode
Uses Whisper-large-v3 audio conditioning for Avatar 1.5
Supports required Avatar 1.5 DMD/distill LoRA inference with the official 8-step default
Supports 480p and 720p generation
Includes a README demo video and ready-to-use example workflow

Model Loading

This release supports three DiT weight modes:

single_file_safetensors
official_sharded
official_int8_sharded

Official sharded and INT8 sharded checkpoints are validated before inference. The node can also perform bounded automatic downloads for known official Avatar 1.5 sharded DiT assets and shared LongCat text encoder assets.

VRAM And Runtime Controls

This release includes practical controls for lower-VRAM setups:

official_int8_sharded mode for lower VRAM usage
block_num layer-streaming control
CPU offload options for VAE and native text encoder paths
Optional attention backends: sdpa, flash_attn_2, flash_attn_3, xformers, and sageattn

For 12GB-class GPUs, start with:

official_int8_sharded
480p
block_num = 1
sampler offload_device = cpu
text encode offload_device = cpu

Nodes Included

(auto)Load LongCat Avatar Model
LongCat Avatar Whisper
LongCat Avatar Text Encode
LongCat Avatar Audio Crop
LongCat Avatar Audio Encode
LongCat Avatar Audio Window
LongCat Avatar Sampler
LongCat Avatar Vocal Model
LongCat Avatar Vocal Extract

Optional Features

Optional vocal extraction through requirements-vocal.txt
Optional acceleration packages documented separately in requirements-acceleration.txt
Audio crop preview support
Optional muxed MP4 output through mux_audio_path

Not Supported Yet

GGUF DiT loading
CPU inference
macOS / MPS inference
Avatar 1.0 / Wav2Vec2 runtime path
FP16 or FP8 runtime precision switches
Generic third-party wrapper scheduler modes

Notes

This release is CUDA-oriented and expects a working ComfyUI Python environment with a compatible NVIDIA GPU and CUDA PyTorch installation already available. The default requirements.txt intentionally avoids installing or replacing PyTorch and optional acceleration packages to reduce the chance of breaking an existing ComfyUI setup.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial Release

Choose a tag to compare

Sorry, something went wrong.