Multi-view Consistent 3D Gaussian Head Avatars 'without' Multi-view Generation

Carnegie Mellon University

CVPR 2026

MVCHead achieves state-of-the-art for unconditional generation of high fidelity, multi-view consistent 3D Gaussian head avatars in minimal resource setting, without requiring intermediate views, or even 3D data. The generated Gaussian heads capture complex textures and fine facial micro-structure, including wrinkles, hair wisps, ear rims, lip contours, skin blemishes, eyes, and accessories.

🚀 Updates

✅ Coming Soon: MVCHead Codes, Weights and FaceGS-10K dataset. Stay Tuned!
✅ June. 7, 2026: We will be presenting our Poster at CVPR 2026. Stop to check our Poster. See everyone at Denver!
✅ May. 25, 2026: MVCHead project page is now live!
✅ May. 25, 2026: We released the MVCHead Paper on arXiv. Check the preprint!

📖 Abstract

High-fidelity 3D Gaussian head avatar generation is critical for applications such as AR/VR, telepresence, and digital humans. Existing methods depend on multi-view datasets, 3D captures, or intermediate 2D view synthesis. In contrast, we learn both conditional and unconditional 3D head models from randomly sampled 2D images alone, without using multi-view data, 3D supervision, or intermediate view generation. We introduce MVCHead, a single-shot state space model that enforces multi-view consistency (MVC) directly in the 3D representation while regressing 3D Gaussians under these constraints. At its core, we propose a Hierarchical State Space (HiSS) block that progressively refines Gaussians from coarse to fine, while capturing long-range dependencies. Within each HiSS block, we modify Mamba's standard unidirectional scan with the proposed Hierarchical Bi-directional State Scan (HiBiSS) that aligns recurrence with the axes along which multi-view inconsistencies are strongest. Finally, we design an SE(3) Multi-view Critic that judges whether a set of self-renders arises from a single underlying 3D configuration, rewarding cross-view pixel alignment without observing real multi-view pairs. MVCHead achieves state-of-the-art perceptual quality, surpasses prior methods in both texture and geometric consistency, and maintains comparable shape consistency. To demonstrate scalability, we release FaceGS-10K, the first large-scale dataset of ready-to-use 3D Gaussian head assets for training and evaluation of 3D head models.

Star History

License

Shield:

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permission is granted for non-commercial research. For commerical use, please reachout to our Lab.

Acknowledgements

Parts of the codes have been taken and adapted from the below repos. Please acknowledge and adhere to the licenses of each repository that MVCHead builds upon.

📑 Citation

If you find our work useful for your project, please consider adding a star to this repo and citing our paper:

    @inproceedings{chharia2026multiviewconsistent,
        title={Multi-view Consistent 3D Gaussian Head Avatars 'without' Multi-view Generation},
        author={Aviral Chharia and Fernando De la Torre},
        archivePrefix={arXiv},
        primaryClass={cs.CV},
        year={2026}
    }

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
docs		docs
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-view Consistent 3D Gaussian Head Avatars 'without' Multi-view Generation

🚀 Updates

📖 Abstract

Star History

License

Acknowledgements

📑 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Folders and files

Latest commit

History

Repository files navigation

Multi-view Consistent 3D Gaussian Head Avatars 'without' Multi-view Generation

🚀 Updates

📖 Abstract

Star History

License

Acknowledgements

📑 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Packages