init code

justimyhxu · Apr 2, 2024 · 4cd9c39 · 4cd9c39
commit 4cd9c39
Show file tree

Hide file tree

Showing 1,755 changed files with 464,618 additions and 0 deletions.
diff --git a/Readme.md b/Readme.md
@@ -0,0 +1,107 @@
+# GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation
+
+
+
+> **GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation** <br>
+> Yinghao Xu*, Zifan Shi*, Wang Yifan, Hansheng Chen, Ceyuan Yang, Sida Peng, Yujun Shen, Gordon Wetzstein<br>
+
+## [[Paper](https://arxiv.org/abs/2403.14621)] [[Project Page](https://justimyhxu.github.io/projects/grm)] [[Blender Demo](https://github.com/justimyhxu/GRM/assets/29980330/0cf713aa-ba87-4a15-a8ee-1b0da643cb3c)] [[HF Demo](https://huggingface.co/spaces/GRM-demo/GRM)][[Weights](https://huggingface.co/justimyhxu/GRM/tree/main)]
+
+https://github.com/justimyhxu/GRM/assets/29980330/32f41f04-5ebe-4aa4-b1b7-bf4f78e5f197
+
+## Todo List
+- [x] Release gradio demo code.
+- [x] Release inference code.
+- [x] Release pretrained models.
+- [ ] Release training code.
+
+## GRM Demo
+* [Huggingface Demo](https://huggingface.co/spaces/GRM-demo/GRM)
+* [Replicate Demo](https://replicate.com/camenduru/grm). Thanks [@camenduru](https://github.com/camenduru) for the [jupyter code](https://github.com/camenduru/GRM-jupyter)! 
+
+## Requirements
+* 64-bit Python 3.10 and PyTorch 2.0.1 or higher.
+* CUDA 11.8 
+* Users can use the following commands to install the packages
+```bash
+conda create -n grm python=3.10
+conda activate grm 
+pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu118
+cd third_party/diff-gaussian-rasterization &&  pip install -e .
+```
+## Pretrained weights
+Pretrained weights can be downloaded from [Hugging Face](https://huggingface.co/justimyhxu/GRM/tree/main).
+```bash
+# Example
+mkdir checkpoints && cd checkpoints
+wget https://huggingface.co/justimyhxu/GRM/blob/main/grm_u.pth && cd ..
+```
+
+Note that we provide three checkpoints for use. We use the OpenCV coordinate system.
+
+| Checkpoint | Training settings |
+| ---------- | ----------------- |
+| [grm_u.pth](https://huggingface.co/justimyhxu/GRM/blob/main/grm_u.pth)  | The elevations are all 20 degrees and the azimuths uniformly cover all the 360-degree information.|
+| [grm_r.pth](https://huggingface.co/justimyhxu/GRM/blob/main/grm_r.pth)  | The azimuths roughly cover the 360-degree information. |
+| [grm_zero123plus.pth](https://huggingface.co/justimyhxu/GRM/blob/main/grm_zero123plus.pth) | Three views are with 30-degree elevations and the azimuths are evenly distributed at intervals of 120 degrees. Another view has the elevation of -20 degrees and the azimuth is 60 degrees different from one of the three. |
+| [instant3d.pth](https://huggingface.co/justimyhxu/GRM/resolve/main/instant3d.pth) | We reproduce the first-stage diffusion model of [instant3d](https://arxiv.org/pdf/2311.06214.pdf), which can produce consistent multi-view images. |
+
+
+Besides, you need to download checkpoints for [SV3D](https://huggingface.co/stabilityai/sv3d/tree/main).
+```bash
+cd checkpoints
+wget https://huggingface.co/stabilityai/sv3d/blob/main/sv3d_p.safetensors && cd ..
+```
+
+
+## Inference
+```bash
+# text-to-3D
+python test.py --prompt 'a car made out of cheese'
+# image-to-3D with zero123plus-v1.1
+python test.py --image_path examples/dragon2.png --model zero123plus-v1.1
+# image-to-3D with zero123plus-v1.2
+python test.py --image_path examples/dragon2.png --model zero123plus-v1.2
+# image-to-3D with SV3D
+python test.py --image_path examples/dragon2.png --model sv3d
+```
+
+Add ```--fuse_mesh True``` if you would like to get the textured mesh.
+Add ```--optimize_texture True``` if you would like to optimize texture on extracted textured mesh.
+
+## Gradio Demo
+We provide an offline gradio demo, which can be run with the following command:
+```bash
+python app.py
+```
+
+## Results
+
+### Blender Demo
+https://github.com/justimyhxu/GRM/assets/29980330/0cf713aa-ba87-4a15-a8ee-1b0da643cb3c
+
+### Sparse-view Reconstruction
+https://github.com/justimyhxu/GRM/assets/29980330/d436bca9-ddf9-4507-aed3-828fd6508ec3
+
+
+## Acknowledgement
+We thank all of the following amazing codes:
+- [gaussian-splatting](https://github.com/graphdeco-inria/gaussian-splatting), and [diff-gaussian-rasterization](https://github.com/ashawkey/diff-gaussian-rasterization) for depth rendering
+- [ARF](https://github.com/Kai-46/ARF-svox2)
+- [zero123++](https://github.com/SUDO-AI-3D/zero123plus)
+- [Instant3D](https://instant-3d.github.io/)
+- [SV3D](https://github.com/Stability-AI/generative-models)
+- [V3D](https://github.com/heheyas/V3D)
+- [nvdiffrast](https://github.com/NVlabs/nvdiffrast)
+- [MVEdit](https://github.com/Lakonik/MVEdit)
+
+## BibTeX
+
+```bibtex
+@article{xu2024grm,
+     author    = {Xu, Yinghao and Shi, Zifan and Yifan, Wang and Peng, Sida and Yang, Ceyuan and Shen, Yujun and Wetzstein Gordon},
+     title     = {GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation},
+     journal   = {arxiv: 2403.14621},
+     year      = {2024},
+    }
+```
diff --git a/app.py b/app.py
@@ -0,0 +1,84 @@
+import os
+import sys
+
+sys.path.append(os.path.abspath(os.path.join(__file__, '../')))
+if 'OMP_NUM_THREADS' not in os.environ:
+    os.environ['OMP_NUM_THREADS'] = '16'
+
+import shutil
+import os.path as osp
+import argparse
+import torch
+import gradio as gr
+from functools import partial
+from webui.tab_text_to_img_to_3d import create_interface_text_to_img_to_3d
+from webui.tab_img_to_3d import create_interface_img_to_3d
+from webui.tab_instant3d import create_interface_instant3d
+from webui.runner import GRMRunner
+from webui.shared_opts import send_to_click
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(description='GRM Live Demo')
+    parser.add_argument('--advanced', action='store_true', help='Show advanced settings')
+    return parser.parse_args()
+
+
+def main():
+    args = parse_args()
+
+    torch.set_grad_enabled(False)
+    device = torch.device('cuda')
+    runner = GRMRunner(device)
+
+    with gr.Blocks(analytics_enabled=False,
+                   title='GRM Live Demo',
+                   css='webui/style.css'
+                   ) as demo:
+        md_txt = '# GRM Live Demo' \
+                 '\n\nOfficial demo of the paper [GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation](https://justimyhxu.github.io/projects/grm/). ' \
+                 'Part of this demo is based on [MVEdit Web UI](https://huggingface.co/spaces/Lakonik/MVEdit).' \
+                 '<br>GRM can reconstruct 3D Gaussians and meshes from various sources, including **Zero123++**, **Instant3D**, **V3D**, **SV3D**. To save VRAM, this demo only supports **Zero123++** and **Instant3D**, while the the full supports will be available in the official [code release](https://github.com/justimyhxu/grm).'
+        gr.Markdown(md_txt)
+
+        with gr.Tabs() as main_tabs:
+
+            with gr.TabItem('Image-to-3D', id='tab_img_to_3d'):
+                _, var_img_to_3d = create_interface_img_to_3d(
+                    runner.run_segmentation,
+                    runner.run_img_to_3d)
+
+            with gr.TabItem('Text-to-3D', id='tab_text_to_3d'):
+                with gr.Tabs() as sub_tabs_text_to_3d:
+                    with gr.TabItem('Instant3D', id='tab_instant3d'):
+                        _, var_instant3d = create_interface_instant3d(
+                            runner.run_instant3d,
+                            examples=[
+                                'a wooden carving of a wise old turtle',
+                                'a glowing robotic unicorn, full body',
+                                'a ceramic mug shaped like a smiling cat',
+                                'a car made out of cheese',
+                                'a beagle in a detective’s outfit',
+                            ])
+                    with gr.TabItem('Text-to-Image-to-3D', id='tab_text_to_img_to_3d'):
+                        _, var_text_to_img_to_3d = create_interface_text_to_img_to_3d(
+                            runner.run_text_to_img,
+                            examples=[
+                                [768, 512, 'a wooden carving of a wise old turtle', ''],
+                                [512, 512, 'a glowing robotic unicorn, full body', ''],
+                                [512, 512, 'a ceramic mug shaped like a smiling cat', ''],
+                            ],
+                            advanced=args.advanced)
+
+        var_text_to_img_to_3d[f'to_img_to_3d'].click(
+            fn=partial(send_to_click, target_tab_ids=['tab_img_to_3d']),
+            inputs=[var_text_to_img_to_3d['output_image']],
+            outputs=[var_img_to_3d['in_image'], main_tabs],
+            api_name=False
+        )
+
+        demo.queue().launch(share=False)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/docs/assets/blender_demo.mp4 b/docs/assets/blender_demo.mp4
diff --git a/docs/assets/image-to-3d.mp4 b/docs/assets/image-to-3d.mp4
diff --git a/docs/assets/pipeline.png b/docs/assets/pipeline.png
diff --git a/docs/assets/sparse-view.mp4 b/docs/assets/sparse-view.mp4
diff --git a/docs/assets/text-to-3d.mp4 b/docs/assets/text-to-3d.mp4
diff --git a/examples/1.jpg b/examples/1.jpg
diff --git a/examples/17_dalle3_rockingchair1.png b/examples/17_dalle3_rockingchair1.png
diff --git a/examples/19_dalle3_stump1.png b/examples/19_dalle3_stump1.png
diff --git a/examples/astronaut.webp b/examples/astronaut.webp
diff --git a/examples/bag.jpg b/examples/bag.jpg
diff --git a/examples/bowl.png b/examples/bowl.png
diff --git a/examples/cdog.webp b/examples/cdog.webp
diff --git a/examples/coat.webp b/examples/coat.webp
diff --git a/examples/david.jpg b/examples/david.jpg
diff --git a/examples/david.png b/examples/david.png
diff --git a/examples/dragon2.png b/examples/dragon2.png
diff --git a/examples/dreamcraft3d_00.png b/examples/dreamcraft3d_00.png
diff --git a/examples/dreamcraft3d_01.png b/examples/dreamcraft3d_01.png
diff --git a/examples/dreamcraft3d_02.png b/examples/dreamcraft3d_02.png
diff --git a/examples/frog.png b/examples/frog.png
diff --git a/examples/girl.jpg b/examples/girl.jpg
diff --git a/examples/girl1_padded.png b/examples/girl1_padded.png
diff --git a/examples/girl2_copy.png b/examples/girl2_copy.png
diff --git a/examples/horse.jpg b/examples/horse.jpg
diff --git a/examples/horsing.webp b/examples/horsing.webp
diff --git a/examples/image.png b/examples/image.png
diff --git a/examples/ironman_helmet.png b/examples/ironman_helmet.png
diff --git a/examples/kunkun.webp b/examples/kunkun.webp
diff --git a/examples/panda.png b/examples/panda.png
diff --git a/examples/porsche.png b/examples/porsche.png
diff --git a/examples/pumpkin.png b/examples/pumpkin.png
diff --git a/examples/sculpture_0.png b/examples/sculpture_0.png
diff --git a/examples/sdog.webp b/examples/sdog.webp
diff --git a/examples/turtle.png b/examples/turtle.png
diff --git a/examples/unicorn.png b/examples/unicorn.png
diff --git a/examples/yann-lecun.jpg b/examples/yann-lecun.jpg
diff --git a/examples/zebra.png b/examples/zebra.png
diff --git a/model/__init__.py b/model/__init__.py
diff --git a/model/model.py b/model/model.py
@@ -0,0 +1,65 @@
+import torch
+from torch import nn
+
+from model.visual_encoder.vit_gs import ViTGSEncoder
+from model.render.gaussian_renderer import GaussianRenderer
+
+class GRM(nn.Module):
+    def __init__(self, config):
+
+        super().__init__()
+
+        self.gs_renderer = GaussianRenderer(
+            renderer_config=config.render.params
+        )
+
+        self.visual_encoder = ViTGSEncoder(
+            **config.visual.params,
+        )
+
+        self.num_input_views = config.visual.params.get("num_input_views", 1)
+
+
+    def forward_visual(self, x, camera=None, input_c2ws=None, input_fxfycxcy=None):
+        features = self.visual_encoder(x, camera, input_c2ws=input_c2ws, input_fxfycxcy=input_fxfycxcy)
+        latent, img_features = features
+        return latent, img_features, None 
+
+
+    def forward(
+            self,
+            imgs,
+            camera: torch.Tensor=None,
+            num_input_views=None,
+            input_c2ws=None, 
+            input_fxfycxcy=None,
+            output_c2ws=None,
+            output_fxfycxcy=None
+    ):
+
+        num_input_views = num_input_views or self.num_input_views
+        num_input_views = min(num_input_views, imgs.shape[1])
+
+        if num_input_views == 1:
+            imgs = imgs[:, 0]
+            camera = camera[:, 0]
+        else:
+            imgs = imgs[:, :num_input_views]
+            camera = camera[:, :num_input_views]
+            input_c2ws = input_c2ws[:, :num_input_views]
+            input_fxfycxcy = input_fxfycxcy[:, :num_input_views]
+
+        latent, _, posterior = self.forward_visual(imgs, camera, input_c2ws=input_c2ws, input_fxfycxcy=input_fxfycxcy)
+
+        result = {"latent": latent, "posterior": posterior}
+
+        gs_result = self.gs_renderer.render(latent=latent,
+                output_c2ws=output_c2ws, 
+                output_fxfycxcy=output_fxfycxcy)
+        result.update(gs_result)    
+
+        return result
+
+
+
+
diff --git a/model/render/__init__.py b/model/render/__init__.py