Skip to content

Releases: rookiestar28/ComfyUI-LongCat-Avatar

Initial Release

10 Jun 18:57

Choose a tag to compare

ComfyUI-LongCat-Avatar v0.2.0

Initial public release of ComfyUI LongCat Avatar, a ComfyUI custom node package for LongCat Video Avatar 1.5 audio-driven human video generation.

Highlights

  • ComfyUI-native workflow for LongCat Video Avatar 1.5
  • Supports single-person audio-driven generation in ai2v and at2v modes
  • Supports two-person / dual-audio generation in ai2v mode
  • Uses Whisper-large-v3 audio conditioning for Avatar 1.5
  • Supports required Avatar 1.5 DMD/distill LoRA inference with the official 8-step default
  • Supports 480p and 720p generation
  • Includes a README demo video and ready-to-use example workflow

Model Loading

This release supports three DiT weight modes:

  • single_file_safetensors
  • official_sharded
  • official_int8_sharded

Official sharded and INT8 sharded checkpoints are validated before inference. The node can also perform bounded automatic downloads for known official Avatar 1.5 sharded DiT assets and shared LongCat text encoder assets.

VRAM And Runtime Controls

This release includes practical controls for lower-VRAM setups:

  • official_int8_sharded mode for lower VRAM usage
  • block_num layer-streaming control
  • CPU offload options for VAE and native text encoder paths
  • Optional attention backends: sdpa, flash_attn_2, flash_attn_3, xformers, and sageattn

For 12GB-class GPUs, start with:

  • official_int8_sharded
  • 480p
  • block_num = 1
  • sampler offload_device = cpu
  • text encode offload_device = cpu

Nodes Included

  • (auto)Load LongCat Avatar Model
  • LongCat Avatar Whisper
  • LongCat Avatar Text Encode
  • LongCat Avatar Audio Crop
  • LongCat Avatar Audio Encode
  • LongCat Avatar Audio Window
  • LongCat Avatar Sampler
  • LongCat Avatar Vocal Model
  • LongCat Avatar Vocal Extract

Optional Features

  • Optional vocal extraction through requirements-vocal.txt
  • Optional acceleration packages documented separately in requirements-acceleration.txt
  • Audio crop preview support
  • Optional muxed MP4 output through mux_audio_path

Not Supported Yet

  • GGUF DiT loading
  • CPU inference
  • macOS / MPS inference
  • Avatar 1.0 / Wav2Vec2 runtime path
  • FP16 or FP8 runtime precision switches
  • Generic third-party wrapper scheduler modes

Notes

This release is CUDA-oriented and expects a working ComfyUI Python environment with a compatible NVIDIA GPU and CUDA PyTorch installation already available. The default requirements.txt intentionally avoids installing or replacing PyTorch and optional acceleration packages to reduce the chance of breaking an existing ComfyUI setup.