# Example usage of [stable-dreamfusion](https://github.com/ashawkey/stable-dreamfusion)

In order to run this notebook you need to have a hugging face account with access to the stable diffusion repo and an access token (https://huggingface.co/CompVis/stable-diffusion , https://huggingface.co/docs/hub/security-tokens)

### Check the machine

In [None]:
! nvidia-smi

Tue Oct 11 05:47:51 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   38C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

### Setup

In [None]:
#@title install dependencies
! git clone https://github.com/ashawkey/stable-dreamfusion.git

%cd stable-dreamfusion

# install requirements
! pip install -r requirements.txt
! pip install git+https://github.com/NVlabs/nvdiffrast/

# install extension modules
! bash scripts/install_ext.sh

Cloning into 'stable-dreamfusion'...
remote: Enumerating objects: 211, done.[K
remote: Counting objects: 100% (58/58), done.[K
remote: Compressing objects: 100% (30/30), done.[K
remote: Total 211 (delta 35), reused 36 (delta 27), pack-reused 153[K
Receiving objects: 100% (211/211), 120.23 KiB | 17.18 MiB/s, done.
Resolving deltas: 100% (110/110), done.
/content/stable-dreamfusion
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting torch-ema
  Downloading torch_ema-0.3-py3-none-any.whl (5.5 kB)
Collecting ninja
  Downloading ninja-1.10.2.4-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (120 kB)
[K     |████████████████████████████████| 120 kB 30.8 MB/s 
[?25hCollecting trimesh
  Downloading trimesh-3.15.3-py3-none-any.whl (659 kB)
[K     |████████████████████████████████| 659 kB 51.5 MB/s 
Collecting tensorboardX
  Downloading tensorboardX-2.5.1-py2.py3-none-any.whl (125 kB)
[K     |██████████████████████████████

In [None]:
#@title login to huggingface to download stable diffusion
from huggingface_hub import notebook_login
from google.colab import output

output.enable_custom_widget_manager()
notebook_login()

Login successful
Your token has been saved to /root/.huggingface/token


### Training & Testing
* The training already includes exporting video and mesh once finished. So you don't need to run the testing if you wait until the training finishes. But if you interrupt the training, a manual testing is needed.
* It takes about 0.7s per training step, so the default 5000 training steps take around 1 hour to finish. A larger `Training_iters` usually leads to better results.
* Increasing `Training_nerf_resolution` leads to better rendering quality too, but requires more GPU memory.
* If the NeRF fails to learn anything (empty scene, only background), try to decrease `Lambda_entropy` which regularizes the learned opacity.

In [None]:
#@markdown ####**Training Settings:**
Prompt_text = "a DSLR photo of a delicious hamburger" #@param {type: 'string'}
Training_iters = 5000 #@param {type: 'integer'}
Learning_rate = 1e-3 #@param {type: 'number'}
Training_nerf_resolution = 64  #@param {type: 'integer'}
# CUDA_ray = True #@param {type: 'boolean'}
# View_dependent_prompt = True #@param {type: 'boolean'}
# FP16 = True #@param {type: 'boolean'}
Seed = 0 #@param {type: 'integer'}
Lambda_entropy = 1e-4 #@param {type: 'number'}
Checkpoint = 'scratch' #@param {type: 'string'}

#@markdown ---

#@markdown ####**Output Settings:**
Workspace = "trial" #@param{type: 'string'}
# Save_mesh = True #@param {type: 'boolean'}

# processings
Prompt_text = "'" + Prompt_text + "'"

In [None]:
#@title start training
%run main.py -O --text {Prompt_text} --workspace {Workspace} --iters {Training_iters} --lr {Learning_rate} --w {Training_nerf_resolution} --h {Training_nerf_resolution} --seed {Seed} --lambda_entropy {Lambda_entropy} --ckpt {Checkpoint} --save_mesh

Namespace(H=800, O=True, O2=False, W=800, albedo_iters=1000, angle_front=60, angle_overhead=30, backbone='grid', bg_radius=1.4, bound=1, ckpt='scratch', cuda_ray=True, density_thresh=10, dir_text=True, dt_gamma=0, eval_interval=10, fovy=60, fovy_range=[40, 70], fp16=True, gui=False, guidance='stable-diffusion', h=64, iters=5000, jitter_pose=False, lambda_entropy=0.0001, lambda_opacity=0, lambda_orient=0.01, light_phi=0, light_theta=60, lr=0.001, max_ray_batch=4096, max_spp=1, max_steps=1024, min_near=0.1, num_steps=64, radius=3, radius_range=[1.0, 1.5], save_mesh=True, seed=0, test=False, text='a DSLR photo of a delicious hamburger', update_extra_interval=16, upsample_steps=64, w=64, workspace='trial')
NeRFNetwork(
  (encoder): GridEncoder: input_dim=3 num_levels=16 level_dim=2 resolution=16 -> 2048 per_level_scale=1.3819 params=(6119864, 2) gridtype=tiled align_corners=False
  (sigma_net): MLP(
    (net): ModuleList(
      (0): Linear(in_features=32, out_features=64, bias=True)
      

Downloading:   0%|          | 0.00/335M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/522 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/961k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/525k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/389 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/905 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/4.52k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.71G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/3.44G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/743 [00:00<?, ?B/s]

[INFO] loaded stable diffusion!


loss=0.0000 (0.0000), lr=0.009550: : 100% 100/100 [01:04<00:00,  1.54it/s]


loss=0.0000 (0.0000), lr=0.009120: : 100% 100/100 [01:03<00:00,  1.58it/s]


loss=0.0000 (0.0000), lr=0.008710: : 100% 100/100 [01:04<00:00,  1.55it/s]


loss=0.0000 (0.0000), lr=0.008318: : 100% 100/100 [01:04<00:00,  1.54it/s]


loss=0.0000 (0.0000), lr=0.007943: : 100% 100/100 [01:05<00:00,  1.53it/s]


loss=0.0000 (0.0000), lr=0.007586: : 100% 100/100 [01:06<00:00,  1.51it/s]


loss=0.0000 (0.0000), lr=0.007244: : 100% 100/100 [01:06<00:00,  1.50it/s]


loss=0.0000 (0.0000), lr=0.006918: : 100% 100/100 [01:07<00:00,  1.49it/s]


loss=0.0000 (0.0000), lr=0.006607: : 100% 100/100 [01:06<00:00,  1.49it/s]


loss=0.0000 (0.0000), lr=0.006310: : 100% 100/100 [01:06<00:00,  1.49it/s]


loss=0.0000 (0.0000): : 100% 5/5 [00:02<00:00,  2.11it/s]


loss=0.0021 (0.0014), lr=0.006026: : 100% 100/100 [01:23<00:00,  1.20it/s]


loss=0.0017 (0.0014), lr=0.005754: : 100% 100/100 [01:23<00:00,  1.20it/s]


loss=0.0022 (0.0015), lr=0.005495: : 100% 100/100 [01:23<00:00,  1.19it/s]


loss=0.0020 (0.0014), lr=0.005248: : 100% 100/100 [01:22<00:00,  1.21it/s]


loss=0.0015 (0.0013), lr=0.005012: : 100% 100/100 [01:23<00:00,  1.20it/s]


loss=0.0000 (0.0013), lr=0.004786: : 100% 100/100 [01:22<00:00,  1.20it/s]


loss=0.0018 (0.0014), lr=0.004571: : 100% 100/100 [01:24<00:00,  1.18it/s]


loss=0.0017 (0.0016), lr=0.004365: : 100% 100/100 [01:25<00:00,  1.18it/s]


loss=0.0023 (0.0014), lr=0.004169: : 100% 100/100 [01:23<00:00,  1.20it/s]


loss=0.0015 (0.0014), lr=0.003981: : 100% 100/100 [01:21<00:00,  1.22it/s]


loss=0.0000 (0.0000): : 100% 5/5 [00:01<00:00,  2.77it/s]


loss=0.0018 (0.0014), lr=0.003802: : 100% 100/100 [01:22<00:00,  1.21it/s]


loss=0.0000 (0.0012), lr=0.003631: : 100% 100/100 [01:20<00:00,  1.23it/s]


loss=0.0023 (0.0015), lr=0.003467: : 100% 100/100 [01:24<00:00,  1.19it/s]


loss=0.0024 (0.0014), lr=0.003311: : 100% 100/100 [01:21<00:00,  1.23it/s]


loss=0.0000 (0.0013), lr=0.003162: : 100% 100/100 [01:22<00:00,  1.21it/s]


loss=0.0023 (0.0013), lr=0.003020: : 100% 100/100 [01:21<00:00,  1.23it/s]


loss=0.0015 (0.0014), lr=0.002884: : 100% 100/100 [01:22<00:00,  1.21it/s]


loss=0.0015 (0.0015), lr=0.002754: : 100% 100/100 [01:24<00:00,  1.18it/s]


loss=0.0015 (0.0013), lr=0.002630: : 100% 100/100 [01:22<00:00,  1.22it/s]


loss=0.0022 (0.0013), lr=0.002512: : 100% 100/100 [01:21<00:00,  1.23it/s]


loss=0.0000 (0.0000): : 100% 5/5 [00:01<00:00,  2.99it/s]


loss=0.0013 (0.0014), lr=0.002399: : 100% 100/100 [01:21<00:00,  1.23it/s]


loss=0.0000 (0.0013), lr=0.002291: : 100% 100/100 [01:22<00:00,  1.22it/s]


loss=0.0022 (0.0013), lr=0.002188: : 100% 100/100 [01:22<00:00,  1.22it/s]


loss=0.0014 (0.0012), lr=0.002089: : 100% 100/100 [01:20<00:00,  1.25it/s]


loss=0.0014 (0.0013), lr=0.001995: : 100% 100/100 [01:22<00:00,  1.22it/s]


loss=0.0014 (0.0013), lr=0.001905: : 100% 100/100 [01:21<00:00,  1.22it/s]


loss=0.0000 (0.0013), lr=0.001820: : 100% 100/100 [01:20<00:00,  1.24it/s]


loss=0.0013 (0.0013), lr=0.001738: : 100% 100/100 [01:20<00:00,  1.24it/s]


loss=0.0014 (0.0014), lr=0.001660: : 100% 100/100 [01:20<00:00,  1.24it/s]


loss=0.0014 (0.0015), lr=0.001585: : 100% 100/100 [01:22<00:00,  1.21it/s]


loss=0.0000 (0.0000): : 100% 5/5 [00:01<00:00,  3.26it/s]


loss=0.0015 (0.0013), lr=0.001514: : 100% 100/100 [01:23<00:00,  1.20it/s]


loss=0.0015 (0.0013), lr=0.001445: : 100% 100/100 [01:23<00:00,  1.20it/s]


loss=0.0018 (0.0012), lr=0.001380: : 100% 100/100 [01:19<00:00,  1.26it/s]


loss=0.0021 (0.0014), lr=0.001318: : 100% 100/100 [01:22<00:00,  1.22it/s]


loss=0.0015 (0.0012), lr=0.001259: : 100% 100/100 [01:22<00:00,  1.22it/s]


loss=0.0013 (0.0013), lr=0.001202: : 100% 100/100 [01:22<00:00,  1.21it/s]


loss=0.0016 (0.0012), lr=0.001148: : 100% 100/100 [01:20<00:00,  1.23it/s]


loss=0.0014 (0.0012), lr=0.001096: : 100% 100/100 [01:20<00:00,  1.24it/s]


loss=0.0021 (0.0014), lr=0.001047: : 100% 100/100 [01:22<00:00,  1.21it/s]


loss=0.0018 (0.0014), lr=0.001000: : 100% 100/100 [01:21<00:00,  1.22it/s]


loss=0.0000 (0.0000): : 100% 5/5 [00:01<00:00,  3.36it/s]


100% 100/100 [00:27<00:00,  3.84it/s]

100% 100/100 [00:36<00:00,  2.73it/s]


[INFO] running xatlas to unwrap UVs for mesh: v=(129072, 3) f=(257500, 3)
[INFO] writing obj mesh to trial/mesh/mesh.obj
[INFO] writing vertices (129072, 3)
[INFO] writing vertices texture coords (236023, 2)
[INFO] writing faces (257500, 3)


In [None]:
#@markdown ####**Testing Settings:**

Workspace_test = "trial" #@param{type: 'string'}
# Save_mesh = True #@param {type: 'boolean'}

In [None]:
#@title (optional) testing 
%run main.py -O --test --workspace {Workspace_test} --save_mesh

### Display results
* RGB and Depth video are located at `{Workspace}/results/*.mp4`
* Mesh is under `{Workspace}/mesh/`, you could see three files named `mesh.obj`, `mesh.mtl`, and `albedo.png`.

In [None]:
#@title display RGB video
import os
import glob
from IPython.display import HTML
from base64 import b64encode

def get_latest_file(path):
  dir_list = glob.glob(path)
  dir_list.sort(key=lambda x: os.path.getmtime(x))
  return dir_list[-1]

def show_video(video_path, video_width = 600):
   
  video_file = open(video_path, "r+b").read()
  video_url = f"data:video/mp4;base64,{b64encode(video_file).decode()}"

  return HTML(f"""<video width={video_width} controls><source src="{video_url}"></video>""")
 
rgb_video = get_latest_file(os.path.join(Workspace, 'results', '*_rgb.mp4'))
show_video(rgb_video)