-
Notifications
You must be signed in to change notification settings - Fork 27
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 4cd9c39
Showing
1,755 changed files
with
464,618 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
# GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation | ||
|
||
|
||
|
||
> **GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation** <br> | ||
> Yinghao Xu*, Zifan Shi*, Wang Yifan, Hansheng Chen, Ceyuan Yang, Sida Peng, Yujun Shen, Gordon Wetzstein<br> | ||
## [[Paper](https://arxiv.org/abs/2403.14621)] [[Project Page](https://justimyhxu.github.io/projects/grm)] [[Blender Demo](https://github.com/justimyhxu/GRM/assets/29980330/0cf713aa-ba87-4a15-a8ee-1b0da643cb3c)] [[HF Demo](https://huggingface.co/spaces/GRM-demo/GRM)][[Weights](https://huggingface.co/justimyhxu/GRM/tree/main)] | ||
|
||
https://github.com/justimyhxu/GRM/assets/29980330/32f41f04-5ebe-4aa4-b1b7-bf4f78e5f197 | ||
|
||
## Todo List | ||
- [x] Release gradio demo code. | ||
- [x] Release inference code. | ||
- [x] Release pretrained models. | ||
- [ ] Release training code. | ||
|
||
## GRM Demo | ||
* [Huggingface Demo](https://huggingface.co/spaces/GRM-demo/GRM) | ||
* [Replicate Demo](https://replicate.com/camenduru/grm). Thanks [@camenduru](https://github.com/camenduru) for the [jupyter code](https://github.com/camenduru/GRM-jupyter)! | ||
|
||
## Requirements | ||
* 64-bit Python 3.10 and PyTorch 2.0.1 or higher. | ||
* CUDA 11.8 | ||
* Users can use the following commands to install the packages | ||
```bash | ||
conda create -n grm python=3.10 | ||
conda activate grm | ||
pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu118 | ||
cd third_party/diff-gaussian-rasterization && pip install -e . | ||
``` | ||
## Pretrained weights | ||
Pretrained weights can be downloaded from [Hugging Face](https://huggingface.co/justimyhxu/GRM/tree/main). | ||
```bash | ||
# Example | ||
mkdir checkpoints && cd checkpoints | ||
wget https://huggingface.co/justimyhxu/GRM/blob/main/grm_u.pth && cd .. | ||
``` | ||
|
||
Note that we provide three checkpoints for use. We use the OpenCV coordinate system. | ||
|
||
| Checkpoint | Training settings | | ||
| ---------- | ----------------- | | ||
| [grm_u.pth](https://huggingface.co/justimyhxu/GRM/blob/main/grm_u.pth) | The elevations are all 20 degrees and the azimuths uniformly cover all the 360-degree information.| | ||
| [grm_r.pth](https://huggingface.co/justimyhxu/GRM/blob/main/grm_r.pth) | The azimuths roughly cover the 360-degree information. | | ||
| [grm_zero123plus.pth](https://huggingface.co/justimyhxu/GRM/blob/main/grm_zero123plus.pth) | Three views are with 30-degree elevations and the azimuths are evenly distributed at intervals of 120 degrees. Another view has the elevation of -20 degrees and the azimuth is 60 degrees different from one of the three. | | ||
| [instant3d.pth](https://huggingface.co/justimyhxu/GRM/resolve/main/instant3d.pth) | We reproduce the first-stage diffusion model of [instant3d](https://arxiv.org/pdf/2311.06214.pdf), which can produce consistent multi-view images. | | ||
|
||
|
||
Besides, you need to download checkpoints for [SV3D](https://huggingface.co/stabilityai/sv3d/tree/main). | ||
```bash | ||
cd checkpoints | ||
wget https://huggingface.co/stabilityai/sv3d/blob/main/sv3d_p.safetensors && cd .. | ||
``` | ||
|
||
|
||
## Inference | ||
```bash | ||
# text-to-3D | ||
python test.py --prompt 'a car made out of cheese' | ||
# image-to-3D with zero123plus-v1.1 | ||
python test.py --image_path examples/dragon2.png --model zero123plus-v1.1 | ||
# image-to-3D with zero123plus-v1.2 | ||
python test.py --image_path examples/dragon2.png --model zero123plus-v1.2 | ||
# image-to-3D with SV3D | ||
python test.py --image_path examples/dragon2.png --model sv3d | ||
``` | ||
|
||
Add ```--fuse_mesh True``` if you would like to get the textured mesh. | ||
Add ```--optimize_texture True``` if you would like to optimize texture on extracted textured mesh. | ||
|
||
## Gradio Demo | ||
We provide an offline gradio demo, which can be run with the following command: | ||
```bash | ||
python app.py | ||
``` | ||
|
||
## Results | ||
|
||
### Blender Demo | ||
https://github.com/justimyhxu/GRM/assets/29980330/0cf713aa-ba87-4a15-a8ee-1b0da643cb3c | ||
|
||
### Sparse-view Reconstruction | ||
https://github.com/justimyhxu/GRM/assets/29980330/d436bca9-ddf9-4507-aed3-828fd6508ec3 | ||
|
||
|
||
## Acknowledgement | ||
We thank all of the following amazing codes: | ||
- [gaussian-splatting](https://github.com/graphdeco-inria/gaussian-splatting), and [diff-gaussian-rasterization](https://github.com/ashawkey/diff-gaussian-rasterization) for depth rendering | ||
- [ARF](https://github.com/Kai-46/ARF-svox2) | ||
- [zero123++](https://github.com/SUDO-AI-3D/zero123plus) | ||
- [Instant3D](https://instant-3d.github.io/) | ||
- [SV3D](https://github.com/Stability-AI/generative-models) | ||
- [V3D](https://github.com/heheyas/V3D) | ||
- [nvdiffrast](https://github.com/NVlabs/nvdiffrast) | ||
- [MVEdit](https://github.com/Lakonik/MVEdit) | ||
|
||
## BibTeX | ||
|
||
```bibtex | ||
@article{xu2024grm, | ||
author = {Xu, Yinghao and Shi, Zifan and Yifan, Wang and Peng, Sida and Yang, Ceyuan and Shen, Yujun and Wetzstein Gordon}, | ||
title = {GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation}, | ||
journal = {arxiv: 2403.14621}, | ||
year = {2024}, | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
import os | ||
import sys | ||
|
||
sys.path.append(os.path.abspath(os.path.join(__file__, '../'))) | ||
if 'OMP_NUM_THREADS' not in os.environ: | ||
os.environ['OMP_NUM_THREADS'] = '16' | ||
|
||
import shutil | ||
import os.path as osp | ||
import argparse | ||
import torch | ||
import gradio as gr | ||
from functools import partial | ||
from webui.tab_text_to_img_to_3d import create_interface_text_to_img_to_3d | ||
from webui.tab_img_to_3d import create_interface_img_to_3d | ||
from webui.tab_instant3d import create_interface_instant3d | ||
from webui.runner import GRMRunner | ||
from webui.shared_opts import send_to_click | ||
|
||
|
||
def parse_args(): | ||
parser = argparse.ArgumentParser(description='GRM Live Demo') | ||
parser.add_argument('--advanced', action='store_true', help='Show advanced settings') | ||
return parser.parse_args() | ||
|
||
|
||
def main(): | ||
args = parse_args() | ||
|
||
torch.set_grad_enabled(False) | ||
device = torch.device('cuda') | ||
runner = GRMRunner(device) | ||
|
||
with gr.Blocks(analytics_enabled=False, | ||
title='GRM Live Demo', | ||
css='webui/style.css' | ||
) as demo: | ||
md_txt = '# GRM Live Demo' \ | ||
'\n\nOfficial demo of the paper [GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation](https://justimyhxu.github.io/projects/grm/). ' \ | ||
'Part of this demo is based on [MVEdit Web UI](https://huggingface.co/spaces/Lakonik/MVEdit).' \ | ||
'<br>GRM can reconstruct 3D Gaussians and meshes from various sources, including **Zero123++**, **Instant3D**, **V3D**, **SV3D**. To save VRAM, this demo only supports **Zero123++** and **Instant3D**, while the the full supports will be available in the official [code release](https://github.com/justimyhxu/grm).' | ||
gr.Markdown(md_txt) | ||
|
||
with gr.Tabs() as main_tabs: | ||
|
||
with gr.TabItem('Image-to-3D', id='tab_img_to_3d'): | ||
_, var_img_to_3d = create_interface_img_to_3d( | ||
runner.run_segmentation, | ||
runner.run_img_to_3d) | ||
|
||
with gr.TabItem('Text-to-3D', id='tab_text_to_3d'): | ||
with gr.Tabs() as sub_tabs_text_to_3d: | ||
with gr.TabItem('Instant3D', id='tab_instant3d'): | ||
_, var_instant3d = create_interface_instant3d( | ||
runner.run_instant3d, | ||
examples=[ | ||
'a wooden carving of a wise old turtle', | ||
'a glowing robotic unicorn, full body', | ||
'a ceramic mug shaped like a smiling cat', | ||
'a car made out of cheese', | ||
'a beagle in a detective’s outfit', | ||
]) | ||
with gr.TabItem('Text-to-Image-to-3D', id='tab_text_to_img_to_3d'): | ||
_, var_text_to_img_to_3d = create_interface_text_to_img_to_3d( | ||
runner.run_text_to_img, | ||
examples=[ | ||
[768, 512, 'a wooden carving of a wise old turtle', ''], | ||
[512, 512, 'a glowing robotic unicorn, full body', ''], | ||
[512, 512, 'a ceramic mug shaped like a smiling cat', ''], | ||
], | ||
advanced=args.advanced) | ||
|
||
var_text_to_img_to_3d[f'to_img_to_3d'].click( | ||
fn=partial(send_to_click, target_tab_ids=['tab_img_to_3d']), | ||
inputs=[var_text_to_img_to_3d['output_image']], | ||
outputs=[var_img_to_3d['in_image'], main_tabs], | ||
api_name=False | ||
) | ||
|
||
demo.queue().launch(share=False) | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.
Oops, something went wrong.
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
import torch | ||
from torch import nn | ||
|
||
from model.visual_encoder.vit_gs import ViTGSEncoder | ||
from model.render.gaussian_renderer import GaussianRenderer | ||
|
||
class GRM(nn.Module): | ||
def __init__(self, config): | ||
|
||
super().__init__() | ||
|
||
self.gs_renderer = GaussianRenderer( | ||
renderer_config=config.render.params | ||
) | ||
|
||
self.visual_encoder = ViTGSEncoder( | ||
**config.visual.params, | ||
) | ||
|
||
self.num_input_views = config.visual.params.get("num_input_views", 1) | ||
|
||
|
||
def forward_visual(self, x, camera=None, input_c2ws=None, input_fxfycxcy=None): | ||
features = self.visual_encoder(x, camera, input_c2ws=input_c2ws, input_fxfycxcy=input_fxfycxcy) | ||
latent, img_features = features | ||
return latent, img_features, None | ||
|
||
|
||
def forward( | ||
self, | ||
imgs, | ||
camera: torch.Tensor=None, | ||
num_input_views=None, | ||
input_c2ws=None, | ||
input_fxfycxcy=None, | ||
output_c2ws=None, | ||
output_fxfycxcy=None | ||
): | ||
|
||
num_input_views = num_input_views or self.num_input_views | ||
num_input_views = min(num_input_views, imgs.shape[1]) | ||
|
||
if num_input_views == 1: | ||
imgs = imgs[:, 0] | ||
camera = camera[:, 0] | ||
else: | ||
imgs = imgs[:, :num_input_views] | ||
camera = camera[:, :num_input_views] | ||
input_c2ws = input_c2ws[:, :num_input_views] | ||
input_fxfycxcy = input_fxfycxcy[:, :num_input_views] | ||
|
||
latent, _, posterior = self.forward_visual(imgs, camera, input_c2ws=input_c2ws, input_fxfycxcy=input_fxfycxcy) | ||
|
||
result = {"latent": latent, "posterior": posterior} | ||
|
||
gs_result = self.gs_renderer.render(latent=latent, | ||
output_c2ws=output_c2ws, | ||
output_fxfycxcy=output_fxfycxcy) | ||
result.update(gs_result) | ||
|
||
return result | ||
|
||
|
||
|
||
|
Empty file.
Oops, something went wrong.