CoInteract: Spatially-Structured Co-Generation for Interactive Human-Object Video Synthesis

Xiangyang Luo^1,2*, Xiaozhe Xin^2*✉, Tao Feng¹, Xu Guo¹, Meiguang Jin², Junfeng Ma²
¹ Tsinghua University ² Alibaba Group
^* Equal contribution ^✉ Corresponding author

Demo

demo.mp4

🗺️Roadmap

Stage	Status	Description
1	🔜	Release inference code and model weights (within one week)
2	🔜	Release training code
3	📋	Add pose control support

🔥News

[April 22, 2026] We release the Paper and Project page of CoInteract.

✨Highlights

CoInteract enables high-quality speech-driven human-object interaction video synthesis with fine-grained spatial control. It supports diverse generation modes including video generation, unified generation, and interactive generation.

Key contributions:

Human-Aware Mixture-of-Experts (MoE): A spatial routing mechanism that dynamically dispatches tokens to specialized expert networks (hand expert + face expert), supervised by GT bounding boxes during training and fully automatic at inference.
Spatially-Structured Co-Generation: Joint training of RGB video and HOI depth maps provides structural guidance for realistic interactions, without requiring depth input at inference time.

Citation

@misc{luo2026cointeractphysicallyconsistenthumanobjectinteraction,
      title={CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation}, 
      author={Xiangyang Luo and Xiaozhe Xin and Tao Feng and Xu Guo and Meiguang Jin and Junfeng Ma},
      year={2026},
      eprint={2604.19636},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2604.19636}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
.gitignore		.gitignore
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CoInteract: Spatially-Structured Co-Generation for Interactive Human-Object Video Synthesis

Demo

🗺️Roadmap

🔥News

✨Highlights

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

CoInteract: Spatially-Structured Co-Generation for Interactive Human-Object Video Synthesis

Demo

🗺️Roadmap

🔥News

✨Highlights

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages