# dreamfield-3D

A toolkit to generate 3D mesh model / video / nerf instance / multiveiw images of colourful 3D objects by text and image prompts input. Edited by [Shengyu Meng (Simon)](https://twitter.com/meng_shengyu)   
Wellcome to share the outputs in social media with tag **#dreamfields3D**  😀

## Update logs

**Beta v0.65** ：add options for apply image prompts only in assigned direction to avoid overfitting; could skip saving depth maps in training and testing; output sequence images in validation process. (2022-10-03)  
**Beta v0.60** : apply random fovy (view angle) in training; update image augmentation before feed into CLIP; improve training stability and performance. (2022-09-25)

## Credits & Changelog
Dreamfields-3D is modified from ashawkey's [dreamfields-torch](https://github.com/ashawkey/dreamfields-torch), and its forks of [IV Pravdin](https://github.com/ivpravdin) and
[Pollinations.AI](https://github.com/pollinations), released under [MIT License](https://github.com/ashawkey/dreamfields-torch/blob/main/LICENSE).

The main improvements in this notebook & repository were done by [Shengyu Meng (Simon)](https://twitter.com/meng_shengyu), including allow visualizing training process in colab, export 360° video and 3D mesh model with vertex colour, and more arguments.This code is released under [MIT License](https://github.com/shengyu-meng/dreamfields-3D/blob/main/LICENSE)

The [original dreamfields](https://ajayj.com/dreamfields) was issued under [Apache-2.0 license](https://github.com/google-research/google-research/blob/master/LICENSE), proposed by [Jain, Ajay ](https://ajayj.com/)and [Mildenhall, Ben](https://bmild.github.io/) and[ Barron, Jonathan T.](https://jonbarron.info/) and [Abbeel, Pieter](https://people.eecs.berkeley.edu/~pabbeel/) and[ Poole, Ben](https://cs.stanford.edu/~poole/) in their paper, [Zero-Shot Text-Guided Object Generation with Dream Fields](https://arxiv.org/abs/2112.01455) published on CVPR 2022. The main different of dreamfields-torch compared with original dreamfields is,  dreamfields-torch applied the [torch version of instant-ngp](https://github.com/ashawkey/torch-ngp) to replace the [original NeRF](https://github.com/bmild/nerf) as backend, and re-write most of the codes.

The technical bases of original dreamfields are [NeRF: Neural Radiance Fields](https://github.com/bmild/nerf), released under [MIT License](https://github.com/bmild/nerf/blob/master/LICENSE), proposed by Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, Ren Ng in their paper [NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis](https://arxiv.org/abs/2003.08934) published on ECCV 2020, and [CLIP: Connecting Text and Images model ](https://openai.com/blog/clip/)developed by [OpenAI](https://openai.com/), released under [MIT License](https://github.com/openai/CLIP/blob/main/LICENSE).

## check the machine

In [None]:
#@title ### Check GPU
!nvidia-smi

In [None]:
#@title ### Mount Google drive (otpional)
from google.colab import drive
drive.mount('/content/drive')

## setup

In [None]:
#@title ### pull the repostory & get working dir
import os 
!git clone https://github.com/shengyu-meng/dreamfields-3D
%cd dreamfields-3D
DC_dir = os.getcwd()

In [None]:
#@title install dependencies
!pip install -r requirements.txt
!bash scripts/install_ext.sh
!bash scripts/install_PyMarchingCubes.sh

## Run training and test in colab

In [None]:
#@title ###Define necessary functions 

#find the path of marching_cubes
import site
import glob
import sys

def find_pm_path(pm_path = '/usr/local/lib/python3.7/dist-packages/'):
  pm_path = pm_path + "/PyMarchingCubes*/"
  path = sorted(glob.glob(pm_path), reverse=True)
  if path:
      return path[0]

pm_path = find_pm_path(site.getsitepackages()[0])
# marching_cubes_path = r"/usr/local/lib/python3.7/dist-packages/PyMarchingCubes-0.0.2-py3.7-linux-x86_64.egg" #it depend on your OS but just paste the path where is mcubes
if not pm_path in sys.path:
    sys.path.append(pm_path)    
import marching_cubes as mcubes


def get_latest_dir(path):
    dir_list = os.listdir(path)
    dir_list.sort(key=lambda x:os.path.getmtime((path+"/"+x)))
    assert len(dir_list) >= 1, "there is not previous project in present output_dir, please check the path" 
    folder_name = os.path.join(output_dir, dir_list[-1])
    folder_name  = os.path.basename(folder_name)
    return folder_name

def get_latest_file(path):
    dir_list = os.listdir(path)
    dir_list.sort(key=lambda x:os.path.getmtime((path+"/"+x)))
    return dir_list[-1]

###**Training:**

Parameters introduction：

*  Use Vit-B/16 in **clip_model** and 100 in **Epoch_num** for draft training; Vit-L/14 and 200 for fine training; the Vit-L/14 336 has not been tested yet.
*  check [here](https://openai.com/blog/clip/) about the performance for different **clip_model** , normally the model with better performance will cause more training time. 
*  **clip_w_h** and **clip_aug_copy** will significantly affect the GPU RAM occupation and training speed, try to turn-in that with different combination, such as 232x12 \ 225x16 \ 168x20 \ 128x24 in 16G RAM GPU.
*  **use_clip_direction_text** will determine if will add the prefix to the text prompt about camera direction, such as "the font view of....". This may reduce mismath but cause over fitting.
*  Strongly recommend to select **Prompt_image_direction** when used **Prompt_image** to avoid overfitting, to assign the view direction of image prompt (even with that the training with image prompt still quite not stable).
*  set **dt_gamma** >= 0 to adaptive ray marching, which will accelerate the training but possibly lead to worse result.
*  set **seed** to -1 to get random seed  
*  **clip_aug** is an experimental function about augment the image before input into CLIP; seems enable it to allow more details output and when turn off will train more stable.
*  **random_fovy_training** will apply random view angle in traning, to allow CLIP to observe the render in various scale.
*  larger **camerfa_fovy** amount will lead to smaller object seen in overall view.
*  set **resume_project** to "latest" to autoload the latest project in **output_dir**, otherwise indicate the specific project name.
*  The training outputs will be put in the subfolder with **project_name** plus timestamp when training start under **output_dir**; 
*  **Attention:** Sometimes the colab UI will be stuck, but if the files under checkpoints folder is keep updating, the training is still running. You could wait to it finish then restart the kernel then run the **test** stage.
*  For the full usages, please check the [main_nerf.py](https://github.com/shengyu-meng/dreamfields-3D/blob/main/main_nerf.py) for detail.

In [None]:
# from typing_extensions import Text
#@markdown ####**Training Settings:**
Prompt_text = "a organic biological pavilion could breath with building skin growing moss by Zaha Hadid, cgsociety, surreal, 8K HD" #@param{type: 'string'}
Prompt_image = ""  #@param{type: 'string'}
Prompt_image_direction = 'None' #@param ['None','front', 'left side', 'back', 'right side', 'top', 'bottom']
Prompt_image_direction = "" if Prompt_image_direction == 'None' else "--image_direction " + "'" + Prompt_image_direction  + "'"
Epoch_num =  200 #@param {type: 'integer'}
learning_rate = 0.0005 #@param {type: 'number'}
dt_gamma = 0 #@param {type: 'number'}
render_W_H = 384  #@param {type: 'integer'}
clip_w_h = 224  #@param {type: 'integer'}
clip_aug_copy = 8 #@param {type: 'integer'}
clip_aug = True #@param {type:"boolean"}
clip_aug = "--clip_aug " if clip_aug else ""
use_clip_direction_text = True #@param {type:"boolean"}
use_clip_direction_text = "--dir_text " if use_clip_direction_text else ""
clip_model =  'ViT-L/14' #@param ['ViT-B/32','ViT-B/16','ViT-L/14','ViT-L/14@336px','RN50x16','RN50x64']
seed =  -1 #@param {type: 'integer'}
camera_fovy = 43 #@param {type: 'integer'}
random_fovy_training = True #@param {type:"boolean"}
random_fovy_training = "--rnd_fovy" if random_fovy_training else ""
##@markdown *  set seed to -1 to get random seed
resume_train = False #@param {type:"boolean"}
resume_project = "latest"  #@param{type: 'string'}
ckpt = "latest"  #@param{type: 'string'}

#@markdown ---

#@markdown ####**Output Settings:**
output_dir = "/content/drive/MyDrive/Dreamfields3D_output/" #@param{type: 'string'}
project_name = "pavilion"  #@param{type: 'string'}
display_interval = 12 #@param {type: "integer"}
interval_samples = 12 #@param {type: "integer"}
save_interval_img = False #@param {type:"boolean"}
save_interval_img = "--save_interval_img" if save_interval_img else ""
save_depth = False #@param {type:"boolean"}
save_depth = "--save_depth" if save_depth else ""
ckpt_save_interval = 50 #@param {type: "integer"}
mesh_resolution = 256 #@param {type: "integer"}
mesh_threshold = 10 #@param {type: "integer"}
# assign_subdir = True #@param {type:"boolean"}

#paramater conversion
import time
import os

#convert the prompt text
Prompt_text = "'" + Prompt_text + "'"
iteration = Epoch_num * 100
#get the workspace folder

try:
    DC_dir
except NameError:
    DC_dir = '/content/dreamfields-3D/'
assert os.path.isfile (DC_dir + '/main_nerf.py'), \
"wrong working folder for dreamfileds3D, please enter the right folder and run again"

if not os.path.exists(output_dir):
    os.mkdir(output_dir)

#check the parameters
assert clip_w_h % 8 ==0, f"the value of clip_w_h should be divisible by 8, but current value {clip_w_h} is not, try to use 224 or 128"

#convert the augments
if not resume_train:
  workspace = project_name + "_" + time.strftime('%Y%m%d-%H%M%S')[4:-2]

if resume_train:
  if resume_project == "latest":
    workspace = get_latest_dir(output_dir)
  else:
    workspace = resume_project

if Prompt_image != "":
  Prompt_image = "--image " +  "'" + Prompt_image + "'"

if not os.path.exists(output_dir):
    os.makedirs(output_dir)

In [None]:
#@title Training
%cd {DC_dir}

%run main_nerf.py --text {Prompt_text} {Prompt_image} {Prompt_image_direction} --cuda_ray --fp16 --iters {iteration} --seed {seed} \
--output_dir {output_dir} --workspace {workspace} --colab --val_int {display_interval} --val_samples {interval_samples} --w {clip_w_h} --h {clip_w_h} \
--H {render_W_H} --W {render_W_H} --fovy {camera_fovy} --clip_model {clip_model} --aug_copy {clip_aug_copy} --lr {learning_rate} --dt_gamma {dt_gamma} \
--ckpt_save_interval {ckpt_save_interval} {save_interval_img} {use_clip_direction_text} {clip_aug} {random_fovy_training} --ckpt {ckpt} {save_depth} 
%cd {DC_dir}

###**Test:**

Generate image / video / mesh with latest pretrained checkpoint file.

*  set **test_project** to "latest" to autoload the latest project in output_dir, otherwise indicate the folder name of specific project (include the auto generate timestamp).
*  **test_samples** will control how many images will be generate, but only 20 will be displayed in colab, please check the output folder to view all images.
*  Check the **save_video** to generate the 360° video. The Video will be 20 FPS and the total frames of video will be as same as **test_samples**.
*  Check the **save_mesh** to generate obj and ply 3D model by marching cube algorithm. 
*  The output models contain vertex colour, and could be directly view in meshlab and Rhino3D. For viewing the colour in Blender, please import the ply model (OBJ also work on latest blender 3.3), then create a new material for it, and plug a Color Attribute node into the new material in shader editor, then you should see the vertex colour.

In [None]:
# from typing_extensions import Text
#@markdown ####**Test Settings:**
test_project = "latest"  #@param{type: 'string'}
ckpt = "latest"  #@param{type: 'string'}
test_samples = 240 #@param {type: "integer"}
render_W_H = 1024  #@param {type: 'integer'}
mesh_resolution = 256 #@param {type: "integer"}
mesh_threshold = 10 #@param {type: "integer"}
save_depth = False #@param {type:"boolean"}
save_depth = "--save_depth" if save_depth else ""
save_video = True #@param {type:"boolean"}
save_video = "--save_video" if save_video else ""
save_mesh = True #@param {type:"boolean"}
save_mesh = "--save_mesh" if save_mesh else ""

#convert the parameters
if test_project == "latest" :
  workspace = get_latest_dir(output_dir)
else:
  workspace = test_project

In [None]:
#@title Testing
#@ markdown Will generate images and mesh output use the latest checkpoint, even when the training has not finished.
%cd {DC_dir}
%run main_nerf.py --text {Prompt_text} --cuda_ray --fp16 --iters {iteration} --seed {seed} \
--output_dir {output_dir} --workspace {workspace} --colab --val_int {display_interval} --w {clip_w_h} --h {clip_w_h} \
--H {render_W_H} --W {render_W_H} --fovy {camera_fovy} --clip_model {clip_model} --aug_copy {clip_aug_copy} --dt_gamma 0 \
--ckpt_save_interval {ckpt_save_interval} {save_interval_img} {use_clip_direction_text} {clip_aug} --ckpt {ckpt} {save_depth} \
--test_samples {test_samples} --mesh_res {mesh_resolution} --mesh_trh {mesh_threshold}  {save_video} {save_mesh} \
--test
%cd {DC_dir}

In [None]:
#@title display latest RGB video
from IPython.display import HTML
from base64 import b64encode
path2video =  f"{output_dir}/{workspace}/videos/" + get_latest_file (f"{output_dir}/{workspace}/videos/")
def show_video(video_path, video_width = 600):
   
  video_file = open(video_path, "r+b").read()
 
  video_url = f"data:video/mp4;base64,{b64encode(video_file).decode()}"
  return HTML(f"""<video width={video_width} controls><source src="{video_url}"></video>""")
 
show_video(path2video)