Goals: In this assignment, you will explore the types of loss and decoder functions for regressing to voxels, point clouds, and mesh representation from single view RGB input.
Please download and extract the dataset from here.
After unzipping, set the appropiate path references in dataset_location.py
file here
# Better do this after you've secured a GPU.
conda create -n pytorch3d-env python=3.9
conda activate pytorch3d-env
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch
conda install -c fvcore -c iopath -c conda-forge fvcore iopath
conda install pytorch3d -c pytorch3d
pip install numpy PyMCubes matplotlib
Make sure you have installed the packages mentioned in requirements.txt
.
This assignment will need the GPU version of pytorch.
How to use GPUs on UMIACS cluster?
This section will involve defining a loss function, for fitting voxels, point clouds and meshes.
In this subsection, we will define binary cross entropy loss that can help us fit a 3D binary voxel grid.
Define the loss functions here in losses.py
file.
For this you can use the pre-defined losses in pytorch library.
Run the file python fit_data.py --type 'vox'
, to fit the source voxel grid to the target voxel grid.
Visualize the optimized voxel grid along-side the ground truth voxel grid using the tools learnt in previous section.
In this subsection, we will define chamfer loss that can help us fit a 3D point cloud .
Define the loss functions here in losses.py
file.
We expect you to write your own code for this and not use any pytorch3d utilities. You are allowed to use functions inside pytorch3d.ops.knn such as knn_gather or knn_points
Run the file python fit_data.py --type 'point'
, to fit the source point cloud to the target point cloud.
Visualize the optimized point cloud along-side the ground truth point cloud using the tools learnt in previous section.
In this subsection, we will define an additional smoothening loss that can help us fit a mesh.
Define the loss functions here in losses.py
file.
For this you can use the pre-defined losses in pytorch library.
Run the file python fit_data.py --type 'mesh'
, to fit the source mesh to the target mesh.
Visualize the optimized mesh along-side the ground truth mesh using the tools learnt in previous section.
This section will involve training a single view to 3D pipeline for voxels, point clouds and meshes.
Refer to the save_freq
argument in train_model.py
to save the model checkpoint quicker/slower.
We also provide pretrained ResNet18 features of images to save computation and GPU resources required. Use --load_feat
argument to use these features during training and evaluation. This should be False by default, and only use this if you are facing issues in getting GPU resources. You can also enable training on a CPU by the device
argument. Also indiciate in your submission if you had to use this argument.
In this subsection, we will define a neural network to decode binary voxel grids.
Define the decoder network here in model.py
file, then reference your decoder here in model.py
file.
We have provided a decoder network in model.py
, but you can also modify it as you wish.
Run the file python train_model.py --type 'vox'
, to train single view to voxel grid pipeline, feel free to tune the hyperparameters as per your need.
After trained, visualize the input RGB, ground truth voxel grid and predicted voxel in eval_model.py
file using:
python eval_model.py --type 'vox' --load_checkpoint
You need to add the respective visualization code in eval_model.py
On your webpage, you should include visuals of any three examples in the test set. For each example show the input RGB, render of the predicted 3D voxel grid and a render of the ground truth mesh.
In this subsection, we will define a neural network to decode point clouds.
Similar as above, define the decoder network here in model.py
file, then reference your decoder here in model.py
file
Run the file python train_model.py --type 'point'
, to train single view to pointcloud pipeline, feel free to tune the hyperparameters as per your need.
After trained, visualize the input RGB, ground truth point cloud and predicted point cloud in eval_model.py
file using:
python eval_model.py --type 'point' --load_checkpoint
You need to add the respective visualization code in eval_model.py
.
On your webpage, you should include visuals of any three examples in the test set. For each example show the input RGB, render of the predicted 3D point cloud and a render of the ground truth mesh.
In this subsection, we will define a neural network to decode mesh.
Similar as above, define the decoder network here in model.py
file, then reference your decoder here in model.py
file
Run the file python train_model.py --type 'mesh'
, to train single view to mesh pipeline, feel free to tune the hyperparameters as per your need. We also encourage the student to try different mesh initializations here
After trained, visualize the input RGB, ground truth mesh and predicted mesh in eval_model.py
file using:
python eval_model.py --type 'mesh' --load_checkpoint
You need to add the respective visualization code in eval_model.py
.
On your webpage, you should include visuals of any three examples in the test set. For each example show the input RGB, render of the predicted mesh and a render of the ground truth mesh.
Quantitatively compare the F1 score of 3D reconstruction for meshes vs pointcloud vs voxelgrids. Provide an intutive explaination justifying the comparision.
For evaluating you can run:
python eval_model.py --type voxel|mesh|point --load_checkpoint
On your webpage, you should include the f1-score curve at different thresholds for voxelgrid, pointcloud and the mesh network. The plot is saved as eval_{type}.png
.
Analyse the results, by varying an hyperparameter of your choice.
For example n_points
or vox_size
or w_chamfer
or initial mesh(ico_sphere)
etc.
Try to be unique and conclusive in your analysis.
Simply seeing final predictions and numerical evaluations is not always insightful. Can you create some visualizations that help highlight what your learned model does? Be creative and think of what visualizations would help you gain insights. There is no `right' answer - although reading some papers to get inspiration might give you ideas.