# <font color='blue'> 2. Point Clouds Segmentation with Deep Learning </font> 
- Introduction
- Network Architecture
- Hands-on-Experience

## <font color = 'black'> 2.1 Introduction <font>

<font color = 'black'> Point Cloud is an important geometrical datatype that is canonical (depth, lidar) but irregular. Due to its irregularity, most research works focus on transforming the point cloud data to regular 3D voxel grids (3D-CNN) or collections of images (2D-CNN). However, such transformation leads to various issues, and it becomes voluminous. The point cloud transformation may lead to losing the basic structure (or features) of point cloud data. </font> 

<font color = 'black'> To end this, PointNet was proposed in 2017 that focuses on learning a model on raw point cloud data. The network is the first one in this area, and basic but it is robust to perturbation and corruption. It is efficient and effective in many point cloud tasks such as object classification, part segmentation and semantic segmentation. 

In the following, we will see semantic segmentation with this method using the S3DIS dataset.  It is possible to extend this method it to any custom dataset.

For further details, the article is available at: https://arxiv.org/abs/1612.00593 

Reference: Qi, Charles R., et al. "Pointnet: Deep learning on point sets for 3d classification and segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. </font> 

## 2.2 Network Architecture: PointNet
<img src="../../description/pointnet.png">

## 2.3 Hands-on-Experience

Import required modules

In [1]:
import os
import torch
from tqdm import tqdm
import numpy as np
import time
from utilities.data_provide import rotate_point_cloud_z
from dataloaders.S3DISDataLoader import S3DISDataset

Classes in list for S3DIS Dataset

In [2]:
classes = ['ceiling', 'floor', 'wall', 'beam', 'column', 'window', 'door', 'table', 'chair', 'sofa', 'bookcase',
           'board', 'clutter']
class2label = {cls: i for i, cls in enumerate(classes)}
seg_classes = class2label
seg_label_to_cat = {}
for i, cat in enumerate(seg_classes.keys()):
    seg_label_to_cat[i] = cat


<font color = 'red'> Parameters </font>

In [3]:
class Args:
    '''PARAMETERS'''
    gpu='0'
    batch_size = 16
    model='pointnet_sem_seg'
    epoch=32
    learning_rate=0.001
    num_point=1024
    optimizer='Adam'
    log_dir = 'runs'
    decay_rate=1e-4
    npoint = 4096
    step_size =10
    lr_decay = 0.7
    test_area=5
args = Args()

Inplace Relu saves memory!!!

In [4]:
def inplace_relu(m):
    classname = m.__class__.__name__
    if classname.find('ReLU') != -1:
        m.inplace=True

Test Function for the test dataloader. Used inside the training loop.

In [5]:
def test(model, loader):    
    num_batches = len(loader)
    total_correct = 0
    total_seen = 0
    loss_sum = 0
    labelweights = np.zeros(NUM_CLASSES)
    total_seen_class = [0 for _ in range(NUM_CLASSES)]
    total_correct_class = [0 for _ in range(NUM_CLASSES)]
    total_iou_deno_class = [0 for _ in range(NUM_CLASSES)]
    classifier = model.eval()

    print('---- EPOCH %03d EVALUATION ----' % (global_epoch + 1))
    for i, (points, target) in tqdm(enumerate(loader), total=len(loader), smoothing=0.9):
        points = points.data.numpy()
        points = torch.Tensor(points)
        points, target = points.float().cuda(), target.long().cuda()
        points = points.transpose(2, 1)

        seg_pred, trans_feat = classifier(points)
        pred_val = seg_pred.contiguous().cpu().data.numpy()
        seg_pred = seg_pred.contiguous().view(-1, NUM_CLASSES)

        batch_label = target.cpu().data.numpy()
        target = target.view(-1, 1)[:, 0]
        loss = criterion(seg_pred, target, trans_feat, weights)
        loss_sum += loss
        pred_val = np.argmax(pred_val, 2)
        correct = np.sum((pred_val == batch_label))
        total_correct += correct
        total_seen += (BATCH_SIZE * NUM_POINT)
        tmp, _ = np.histogram(batch_label, range(NUM_CLASSES + 1))
        labelweights += tmp

        for l in range(NUM_CLASSES):
            total_seen_class[l] += np.sum((batch_label == l))
            total_correct_class[l] += np.sum((pred_val == l) & (batch_label == l))
            total_iou_deno_class[l] += np.sum(((pred_val == l) | (batch_label == l)))

    labelweights = labelweights.astype(np.float32) / np.sum(labelweights.astype(np.float32))
    mIoU = np.mean(np.array(total_correct_class) / (np.array(total_iou_deno_class, dtype=np.float) + 1e-6))
    print('eval mean loss: %f' % (loss_sum / float(num_batches)))
    print('eval point avg class IoU: %f' % (mIoU))
    print('eval point accuracy: %f' % (total_correct / float(total_seen)))
    print('eval point avg class acc: %f' % (
        np.mean(np.array(total_correct_class) / (np.array(total_seen_class, dtype=np.float) + 1e-6))))

    iou_per_class_str = '------- IoU --------\n'
    for l in range(NUM_CLASSES):
        iou_per_class_str += 'class %s weight: %.3f, IoU: %.3f \n' % (
            seg_label_to_cat[l] + ' ' * (14 - len(seg_label_to_cat[l])), labelweights[l - 1],
            total_correct_class[l] / float(total_iou_deno_class[l]))

    print(iou_per_class_str)
    print('Eval mean loss: %f' % (loss_sum / num_batches))
    print('Eval accuracy: %f' % (total_correct / float(total_seen)))

    return mIoU


We do require a GPU. Define the GPU value in a cluster of GPUs. By default it is 0 (a single GPU environment).

In [6]:
os.environ["CUDA_VISIBLE_DEVICES"] = args.gpu

### Dataloader Part:
We choose S3DIS dataset for our implementation. The Stanford 3D Indoor Scene Dataset (S3DIS CVPR16) dataset contains 6 large-scale indoor areas with 271 rooms. Each point in the scene point cloud is annotated with one of the 13 semantic categories.

References:
Iro Armeni et. al. "3D Semantic Parsing of Large-Scale Indoor Spaces", CVPR, 2016


The dataset and the related information is available at: http://buildingparser.stanford.edu/dataset.html


 <font color = 'red'> Note: We respect the data policy imposed by the author. So, we do not distribute the data or support of distribution. Kindly follow the data download link, read the conditions, fill-up the google form, and download the data from the generated link after you submit the google form. </font>


In [7]:
root = '../../data/stanford_indoor3d/'
NUM_CLASSES = 13
NUM_POINT = args.npoint
BATCH_SIZE = args.batch_size

In [8]:
print("start loading training data ...")
TRAIN_DATASET = S3DISDataset(split='train', data_root=root, num_point=NUM_POINT, test_area=args.test_area, block_size=1.0, sample_rate=1.0, transform=None)
print("start loading test data ...")
TEST_DATASET = S3DISDataset(split='test', data_root=root, num_point=NUM_POINT, test_area=args.test_area, block_size=1.0, sample_rate=1.0, transform=None)

trainDataLoader = torch.utils.data.DataLoader(TRAIN_DATASET, batch_size=BATCH_SIZE, shuffle=True, num_workers=10,
                                                pin_memory=True, drop_last=True,
                                                worker_init_fn=lambda x: np.random.seed(x + int(time.time())))
testDataLoader = torch.utils.data.DataLoader(TEST_DATASET, batch_size=BATCH_SIZE, shuffle=False, num_workers=10,
                                                pin_memory=True, drop_last=True)
weights = torch.Tensor(TRAIN_DATASET.labelweights).cuda()

print("The number of training data is: %d" % len(TRAIN_DATASET))
print("The number of test data is: %d" % len(TEST_DATASET))


start loading training data ...


100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 204/204 [00:19<00:00, 10.35it/s]


[1.1233332 1.1800324 1.        2.238213  2.337216  2.3404622 1.7047739
 2.0308683 1.8827153 3.8201103 1.7911378 2.7820194 1.343442 ]
Totally 47576 samples in train set.
start loading test data ...


100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 67/67 [00:09<00:00,  7.19it/s]


[ 1.1457089  1.2112554  1.        10.023792   2.5368133  2.0141404
  2.1255984  2.0063603  2.5044875  4.7404885  1.4208089  2.9067025
  1.4772114]
Totally 18822 samples in test set.
The number of training data is: 47576
The number of test data is: 18822


### Model Part:

In [9]:
'''MODEL LOADING'''

from models.pointnet_sem_seg import get_model, get_loss

classifier = get_model(NUM_CLASSES).cuda()
criterion = get_loss().cuda()
classifier.apply(inplace_relu)

get_model(
  (feat): PointNetEncoder(
    (stn): STN3d(
      (conv1): Conv1d(9, 64, kernel_size=(1,), stride=(1,))
      (conv2): Conv1d(64, 128, kernel_size=(1,), stride=(1,))
      (conv3): Conv1d(128, 1024, kernel_size=(1,), stride=(1,))
      (fc1): Linear(in_features=1024, out_features=512, bias=True)
      (fc2): Linear(in_features=512, out_features=256, bias=True)
      (fc3): Linear(in_features=256, out_features=9, bias=True)
      (relu): ReLU(inplace=True)
      (bn1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (bn2): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (bn3): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (bn4): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (bn5): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (conv1): Conv1d(9, 64, kernel_size=(1,), stride=

Pretrained Weights? Load it with Optimizer

In [10]:
try:
    checkpoint = torch.load('weights/best_model_segmentation.pth')
    start_epoch = checkpoint['epoch']
    classifier.load_state_dict(checkpoint['model_state_dict'])
    print('Use pretrain model')
except:
    print('No existing model, starting training from scratch...')
    start_epoch = 0


if args.optimizer == 'Adam':
        optimizer = torch.optim.Adam(
            classifier.parameters(),
            lr=args.learning_rate,
            betas=(0.9, 0.999),
            eps=1e-08,
            weight_decay=args.decay_rate
        )
else:
    optimizer = torch.optim.SGD(classifier.parameters(), lr=args.learning_rate, momentum=0.9)

No existing model, starting training from scratch...


Adjust Batchnorm momentum

Momentum is the “lag” in learning mean and variance, so that noise due to mini-batch can be ignored. ... So high momentum will result in slow but steady learning (more lag) of the moving mean.

In [11]:
def bn_momentum_adjust(m, momentum):
    if isinstance(m, torch.nn.BatchNorm2d) or isinstance(m, torch.nn.BatchNorm1d):
        m.momentum = momentum

<font color ='red'> Hyperparameters </font> 

In [12]:
LEARNING_RATE_CLIP = 1e-5
MOMENTUM_ORIGINAL = 0.1
MOMENTUM_DECCAY = 0.5
MOMENTUM_DECCAY_STEP = args.step_size

global_epoch = 0
best_iou = 0

In [None]:
for epoch in range(start_epoch, args.epoch):
    '''Train on chopped scenes'''
    print('**** Epoch %d (%d/%s) ****' % (global_epoch + 1, epoch + 1, args.epoch))
    lr = max(args.learning_rate * (args.lr_decay ** (epoch // args.step_size)), LEARNING_RATE_CLIP)
    print('Learning rate:%f' % lr)
    for param_group in optimizer.param_groups:
        param_group['lr'] = lr
    momentum = MOMENTUM_ORIGINAL * (MOMENTUM_DECCAY ** (epoch // MOMENTUM_DECCAY_STEP))
    if momentum < 0.01:
        momentum = 0.01
    print('BN momentum updated to: %f' % momentum)
    classifier = classifier.apply(lambda x: bn_momentum_adjust(x, momentum))
    num_batches = len(trainDataLoader)
    total_correct = 0
    total_seen = 0
    loss_sum = 0
    classifier = classifier.train()

    for i, (points, target) in tqdm(enumerate(trainDataLoader), total=len(trainDataLoader), smoothing=0.9):
        optimizer.zero_grad()

        points = points.data.numpy()
        points[:, :, :3] = rotate_point_cloud_z(points[:, :, :3])
        points = torch.Tensor(points)
        points, target = points.float().cuda(), target.long().cuda()
        points = points.transpose(2, 1)

        seg_pred, trans_feat = classifier(points)
        seg_pred = seg_pred.contiguous().view(-1, NUM_CLASSES)

        batch_label = target.view(-1, 1)[:, 0].cpu().data.numpy()
        target = target.view(-1, 1)[:, 0]
        loss = criterion(seg_pred, target, trans_feat, weights)
        loss.backward()
        optimizer.step()

        pred_choice = seg_pred.cpu().data.max(1)[1].numpy()
        correct = np.sum(pred_choice == batch_label)
        total_correct += correct
        total_seen += (BATCH_SIZE * NUM_POINT)
        loss_sum += loss
    print('Training mean loss: %f' % (loss_sum / num_batches))
    print('Training accuracy: %f' % (total_correct / float(total_seen)))

    with torch.no_grad():
        
        mIoU = test(classifier.eval(), testDataLoader)

        if mIoU >= best_iou:
            best_iou = mIoU
            print('Save model...')
            savepath = 'weights' + '/best_model_segmentation.pth'
            print('Saving at %s' % savepath)
            state = {
                'epoch': epoch,
                'class_avg_iou': mIoU,
                'model_state_dict': classifier.state_dict(),
                'optimizer_state_dict': optimizer.state_dict(),
            }
            torch.save(state, savepath)
            print('Saving model....')
        print('Best mIoU: %f' % best_iou)
    global_epoch += 1



**** Epoch 1 (1/32) ****
Learning rate:0.001000
BN momentum updated to: 0.100000


  1%|██▏                                                                                                                                                                                                 | 33/2973 [00:48<1:08:13,  1.39s/it]