# i3ce 2024: Workshop on Deep Learning Tools for Understanding and Modeling the Built Environment

In this workshop, we will implement a deep learning pipeline to perform classification of 3D point clouds of building elements.

The neural network architecture will be based on [PointNet](https://arxiv.org/abs/1612.00593), *Qi et al. (2017) PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation*. The original source code for PointNet can be found [here](https://github.com/charlesq34/pointnet/blob/master/models/pointnet_cls.py).

A basic PyTorch tutorial can be found here:
[link](https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html).
Doing the tutorial is optional but it should help explain many concepts that we will cover in this workshop.

# Part 1: Setup

In [1]:
# Install the Open3D library for point cloud processing and visualization
# This step is necessary because Open3D is not included by default on Google Colab
!pip install open3d



In [2]:
# Mount a Google Drive folder so that the data files can be accessed
from google.colab import drive
from google.colab import files
import sys
drive.mount('/content/drive', force_remount=True)
%cd drive/MyDrive/i3ce 2024 DL Workshop/code/
sys.path.insert(0,'/content/drive/MyDrive/i3ce 2024 DL Workshop/code/')

Mounted at /content/drive
/content/drive/MyDrive/i3ce 2024 DL Workshop/code


In [3]:
# Import the necessary libraries and utility functions
import numpy as np
import torch
import torch.nn.functional as F
import open3d as o3d

# This file contains the model definition for PointNet
from pointnet import PointNet_Classification

# This file contains the data loader code for the S3DIS dataset
from dataloader_modelnet import ClassificationDataset

# This file contains the code for drawing 3D point clouds in a Python notebook
from utils import draw_geometries

In [5]:
 # define training parameters for deep learning
learning_rate = 2e-4
batch_size = 10
max_epochs = 100
num_resampled_points = 1024

# Part 2: Data Loading and Visualization

In [6]:
# Label-mapping for our object classification dataset

class_names = ['balcony', 'beam', 'column', 'door', 'fence', 'floor', 'roof', 'stairs', 'wall', 'window']
num_class = len(class_names)

In [7]:
# create data loaders for the ModelNet10 dataset
train_dataset = ClassificationDataset(filepath='data/modelnet10_train.h5', N=num_resampled_points)
train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=1)
test_dataset = ClassificationDataset(filepath='data/modelnet10_test.h5', N=num_resampled_points)
test_dataloader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size, shuffle=False, num_workers=1)
num_train_batches = int(np.ceil(len(train_dataset) / batch_size))

Created dataset from data/modelnet10_train.h5 with 165 samples
Created dataset from data/modelnet10_test.h5 with 51 samples


In [8]:
# Visualize one point cloud in the training dataset
pc = train_dataset.points[0]
class_id = train_dataset.labels[0]
print('Visualizing point cloud of a <%s> with %d points' % (class_names[class_id], len(pc)))

# Use Open3D to plot the point cloud
pcd_object = o3d.geometry.PointCloud()
pcd_object.points = o3d.utility.Vector3dVector(pc)
draw_geometries([pcd_object], show_axes=True)

Visualizing point cloud of a <column> with 2048 points


# Part 3: Creating the Neural Network Model

In [9]:
# Allow the model to be trained on GPU (if the CUDA driver is available)
# Google Colab allows using a T4 GPUs for free accounts whereas premium GPUs require a subscription
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print('Using device:', device)

# Create a PointNet model
# PointNet consists of 5 convolution and batch norm layers and 1 max pooling layer
model = PointNet_Classification(num_class = num_class).to(device)
print('PointNet model:')
print(model)

# Create the Adam optimizer, an extension to the stochastic gradient descent algorithm for updating model weights
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

Using device: cuda
PointNet model:
PointNet_Classification(
  (conv1): Conv1d(3, 64, kernel_size=(1,), stride=(1,))
  (conv2): Conv1d(64, 128, kernel_size=(1,), stride=(1,))
  (conv3): Conv1d(128, 1024, kernel_size=(1,), stride=(1,))
  (fc1): Linear(in_features=1024, out_features=512, bias=True)
  (fc2): Linear(in_features=512, out_features=256, bias=True)
  (fc3): Linear(in_features=256, out_features=10, bias=True)
  (dropout): Dropout(p=0.4, inplace=False)
  (bn1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (bn2): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (bn3): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (bn4): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (bn5): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)


# Part 4: Training

The training process for a neural network proceeds in 2 passes: a forward pass and a backward pass

Data is passed to the neural network in batches: each batch consists of a set of 10 point clouds and 10 labels.

During the forward pass, the point cloud is passed to the input layer and processed through successive layers until the prediction is generated at the output layer. The prediction is compared to the ground truth labels and the negative log-likelihood loss is calculated.

During the backward pass, backpropagation is performed to update the weights of the network to minize the loss function, using the technique of gradient descent.

This process is repeated over and over until the loss function reaches a minimum. As this point, the neural network is optimized to accurately predict the class label from point cloud data


In [10]:
# First, we will try to run the forward pass and backward pass using one batch of data

# Get a batch of training data from the dataloader
points, target = next(iter(train_dataloader))
print('"points" is a tensor with shape:', points.shape)
print('"target" is a tensor with shape:', target.shape)

print('Target labels are:')
for i in range(len(target)):
    print(target[i], class_names[target[i]])

"points" is a tensor with shape: torch.Size([10, 3, 1024])
"target" is a tensor with shape: torch.Size([10])
Target labels are:
tensor(9, dtype=torch.uint8) window
tensor(4, dtype=torch.uint8) fence
tensor(6, dtype=torch.uint8) roof
tensor(5, dtype=torch.uint8) floor
tensor(7, dtype=torch.uint8) stairs
tensor(8, dtype=torch.uint8) wall
tensor(2, dtype=torch.uint8) column
tensor(9, dtype=torch.uint8) window
tensor(4, dtype=torch.uint8) fence
tensor(1, dtype=torch.uint8) beam


In [11]:
#put the model in training mode
model = model.train()

#move this batch of data to the GPU if device is cuda
points, target = points.to(device), target.to(device)

In [12]:
# Forward pass: process the input point clouds through the neural network and predict the output probabilities
pred_probs = model(points)

print('"pred_probs" is a tensor with shape:', pred_probs.shape)
print(pred_probs)

pred_labels = pred_probs.data.max(1)[1]
print(pred_labels)
print('"pred_labels" is a tensor with shape:', pred_labels.shape)

print('Predicted labels are:')
for i in range(len(pred_labels)):
    print(pred_labels[i], class_names[pred_labels[i]])

"pred_probs" is a tensor with shape: torch.Size([10, 10])
tensor([[-2.9997, -1.8713, -2.5450, -1.8975, -1.7093, -2.6521, -2.4794, -2.3673,
         -2.3743, -3.0845],
        [-2.9080, -2.7940, -2.2493, -2.3085, -1.7644, -2.0400, -2.3829, -2.9602,
         -2.1606, -2.1312],
        [-2.9965, -2.2069, -2.9962, -3.3246, -1.3425, -1.5249, -3.0260, -2.9405,
         -2.7886, -2.1863],
        [-2.1399, -2.9318, -2.1772, -2.3297, -2.0516, -1.6786, -2.7266, -2.3561,
         -2.5221, -2.7700],
        [-2.0487, -1.8649, -3.1387, -2.4812, -2.3644, -1.8838, -2.9560, -1.8449,
         -2.4606, -3.0424],
        [-2.7752, -2.5274, -2.2579, -2.4716, -2.0180, -1.8802, -2.8105, -2.7729,
         -1.9695, -2.1110],
        [-2.3340, -2.4040, -3.0352, -2.8668, -1.5797, -3.0288, -2.3593, -2.1852,
         -2.0507, -2.1388],
        [-3.0440, -2.0812, -2.1098, -2.3499, -2.0787, -2.0895, -2.6810, -2.5517,
         -2.1894, -2.2666],
        [-1.8439, -3.1155, -2.2300, -2.2647, -2.0269, -1.9868, -3.0167

In [13]:
# compare the prediction vs the target labels and determine the negative log-likelihood loss
loss = F.nll_loss(pred_probs, target)
print('Negative log-likelihood loss is', loss)

Negative log-likelihood loss is tensor(2.3213, device='cuda:0', grad_fn=<NllLossBackward0>)


In [14]:
# Backward pass: perform backpropagation to update the weights of the network based on the computed loss function
loss.backward()
optimizer.step()

In [15]:
# Run the forward pass again: this time, the predictions should be closer to the target
# The network has 'learned' how to associate patterns in the point cloud with specific classes of objects
pred_probs = model(points)

print('"pred_probs" is a tensor with shape:', pred_probs.shape)
print(pred_probs)

pred_labels = pred_probs.data.max(1)[1]
print(pred_labels)
print('"pred_labels" is a tensor with shape:', pred_labels.shape)

print('Predicted labels are:')
for i in range(len(pred_labels)):
    print(pred_labels[i], class_names[pred_labels[i]])

"pred_probs" is a tensor with shape: torch.Size([10, 10])
tensor([[-3.2446, -1.8836, -3.3151, -2.4810, -2.0278, -2.1020, -2.5399, -2.1390,
         -2.3602, -1.9372],
        [-2.3211, -2.7600, -2.5418, -2.8177, -1.3268, -2.4712, -2.7998, -2.8014,
         -2.0278, -2.3320],
        [-2.4754, -2.9789, -2.1080, -2.5820, -1.7874, -1.5414, -2.2048, -3.0312,
         -2.6624, -2.8448],
        [-2.6848, -2.1607, -3.0947, -2.2893, -2.2692, -1.7292, -2.0760, -2.3944,
         -2.2221, -2.7484],
        [-2.4540, -2.0353, -2.4499, -2.3724, -1.8902, -2.0462, -2.8067, -1.9217,
         -2.4675, -3.4409],
        [-2.4146, -2.4316, -2.6224, -2.6457, -1.6489, -2.4176, -2.9670, -2.8666,
         -2.1662, -1.7445],
        [-2.1538, -2.5169, -2.1983, -3.1367, -2.2798, -2.2404, -2.9521, -1.8705,
         -1.8708, -2.5278],
        [-2.5148, -2.5290, -2.1144, -2.3082, -1.8320, -2.3137, -2.7330, -2.6868,
         -2.2648, -2.0935],
        [-2.3262, -2.5609, -2.6935, -2.2453, -1.6950, -2.1048, -3.1882

In [16]:
# Now we will scale up the training process
# Run the training loop for 100 epochs
# Observe the trend in how the training loss and accuracy values changes over time

for epoch in range(max_epochs):
    train_loss, train_correct, train_accuracy = 0, 0, 0
    for i, data in enumerate(train_dataloader):
        points, target = data

        # put the model in training mode
        model = model.train()

        # move this batch of data to the GPU if device is cuda
        points, target = points.to(device), target.to(device)

        # run a forward pass through the neural network and predict the outputs
        pred_probs = model(points)

        # compare the prediction vs the target labels and determine the negative log-likelihood loss
        loss = F.nll_loss(pred_probs, target)

        # perform backpropagation to update the weights of the network based on the computed loss function
        loss.backward()
        optimizer.step()

        # keep track of the accuracy of our predictions
        pred_labels = pred_probs.data.max(1)[1]
        train_loss += loss.item()
        train_correct += pred_labels.eq(target).sum().item()

    train_loss /= num_train_batches
    train_accuracy = train_correct / len(train_dataset)
    print('[Epoch %d] train loss: %.3f accuracy: %.3f' % (epoch, train_loss, train_accuracy))

[Epoch 0] train loss: 1.934 accuracy: 0.327
[Epoch 1] train loss: 1.606 accuracy: 0.442
[Epoch 2] train loss: 1.467 accuracy: 0.479
[Epoch 3] train loss: 1.333 accuracy: 0.503
[Epoch 4] train loss: 1.331 accuracy: 0.588
[Epoch 5] train loss: 1.212 accuracy: 0.564
[Epoch 6] train loss: 1.125 accuracy: 0.588
[Epoch 7] train loss: 1.164 accuracy: 0.624
[Epoch 8] train loss: 1.143 accuracy: 0.624
[Epoch 9] train loss: 1.286 accuracy: 0.552
[Epoch 10] train loss: 1.245 accuracy: 0.594
[Epoch 11] train loss: 1.052 accuracy: 0.606
[Epoch 12] train loss: 1.044 accuracy: 0.600
[Epoch 13] train loss: 0.980 accuracy: 0.588
[Epoch 14] train loss: 1.021 accuracy: 0.558
[Epoch 15] train loss: 1.016 accuracy: 0.636
[Epoch 16] train loss: 0.946 accuracy: 0.661
[Epoch 17] train loss: 0.977 accuracy: 0.648
[Epoch 18] train loss: 0.927 accuracy: 0.648
[Epoch 19] train loss: 0.876 accuracy: 0.661
[Epoch 20] train loss: 0.881 accuracy: 0.703
[Epoch 21] train loss: 0.752 accuracy: 0.655
[Epoch 22] train los

In [17]:
# Save the trained model weights in Google Drive so that it can be used later
torch.save(model.state_dict(), 'pointnet_classification.pth')

# Part 4: Testing

In [18]:
# Load the previously saved PointNet model

import os
model_path = 'pointnet_classification.pth'
if os.path.exists(model_path):
    print('Loading PointNet model from', model_path)
    model.load_state_dict(torch.load(model_path))
else:
    print('Failed to load model from', model_path)
    sys.exit(1)


Loading PointNet model from pointnet_classification.pth


In [19]:
# Compute predictions on test dataset

# put model in evaluation mode
model.eval()

all_predicted_labels = []
for i, data in enumerate(test_dataloader):
    points, target = data

    #move this batch of data to the GPU if device is cuda
    points, target = points.to(device), target.to(device)

    # run a forward pass through the neural network and predict the outputs
    pred_probs = model(points)

    # IMPORTANT: backward pass is not performed for test data

    # keep track of the accuracy of our predictions
    pred_labels = pred_probs.data.max(1)[1]
    all_predicted_labels.extend(pred_labels.cpu().numpy())

print('Predicted labels:')
print(all_predicted_labels)
test_accuracy = sum(all_predicted_labels == test_dataset.labels) / len(test_dataset)
print('Accuracy on test dataset is %.3f over %d samples' % (test_accuracy, len(test_dataset)))

Predicted labels:
[2, 4, 2, 0, 4, 3, 8, 2, 3, 8, 4, 0, 4, 2, 7, 5, 5, 9, 3, 8, 3, 5, 6, 8, 2, 4, 6, 6, 9, 3, 4, 6, 4, 3, 1, 1, 3, 3, 8, 4, 7, 7, 0, 8, 8, 6, 4, 3, 8, 5, 2]
Accuracy on test dataset is 0.765 over 51 samples


In [20]:
# Visualize predictions on test dataset and compare to the ground truth

sample_idx = 0
pc = test_dataset.points[sample_idx]
actual_class = test_dataset.labels[sample_idx]
predicted_class = all_predicted_labels[sample_idx]

print('Predicted class: %s, Actual class: %s' % (class_names[predicted_class], class_names[actual_class]))

# Use Open3D to plot the point cloud
pcd_object = o3d.geometry.PointCloud()
pcd_object.points = o3d.utility.Vector3dVector(pc)
draw_geometries([pcd_object], show_axes=True)

Predicted class: column, Actual class: column
