MVCHead achieves state-of-the-art for unconditional generation of high fidelity, multi-view consistent 3D Gaussian head avatars in minimal resource setting, without requiring intermediate views, or even 3D data. The generated Gaussian heads capture complex textures and fine facial micro-structure, including wrinkles, hair wisps, ear rims, lip contours, skin blemishes, eyes, and accessories.
- ✅ Coming Soon: MVCHead Codes, Weights and FaceGS-10K dataset. Stay Tuned!
- ✅ June. 7, 2026: We will be presenting our Poster at CVPR 2026. Stop to check our Poster. See everyone at Denver!
- ✅ May. 25, 2026: MVCHead project page is now live!
- ✅ May. 25, 2026: We released the MVCHead Paper on arXiv. Check the preprint!
High-fidelity 3D Gaussian head avatar generation is critical for applications such as AR/VR, telepresence, and digital humans. Existing methods depend on multi-view datasets, 3D captures, or intermediate 2D view synthesis. In contrast, we learn both conditional and unconditional 3D head models from randomly sampled 2D images alone, without using multi-view data, 3D supervision, or intermediate view generation. We introduce MVCHead, a single-shot state space model that enforces multi-view consistency (MVC) directly in the 3D representation while regressing 3D Gaussians under these constraints. At its core, we propose a Hierarchical State Space (HiSS) block that progressively refines Gaussians from coarse to fine, while capturing long-range dependencies. Within each HiSS block, we modify Mamba's standard unidirectional scan with the proposed Hierarchical Bi-directional State Scan (HiBiSS) that aligns recurrence with the axes along which multi-view inconsistencies are strongest. Finally, we design an SE(3) Multi-view Critic that judges whether a set of self-renders arises from a single underlying 3D configuration, rewarding cross-view pixel alignment without observing real multi-view pairs. MVCHead achieves state-of-the-art perceptual quality, surpasses prior methods in both texture and geometric consistency, and maintains comparable shape consistency. To demonstrate scalability, we release FaceGS-10K, the first large-scale dataset of ready-to-use 3D Gaussian head assets for training and evaluation of 3D head models.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permission is granted for non-commercial research. For commerical use, please reachout to our Lab.
Parts of the codes have been taken and adapted from the below repos. Please acknowledge and adhere to the licenses of each repository that MVCHead builds upon.
- Mamba
- VMamba
- VisionMamba
- FFHQ
- EG3D
- GSGAN
- CGSGAN
- GGHead
- 3DGS
- Diff. 3DGS Rasterizer
- MASt3R
- DINO
- FeatUp
- MEt3R
- MVGBench
If you find our work useful for your project, please consider adding a star to this repo and citing our paper:
@inproceedings{chharia2026multiviewconsistent,
title={Multi-view Consistent 3D Gaussian Head Avatars 'without' Multi-view Generation},
author={Aviral Chharia and Fernando De la Torre},
archivePrefix={arXiv},
primaryClass={cs.CV},
year={2026}
} 