### Name: Sally Sisi Qu
### ID: 169001

# Introduction
In this project, you will be asked to implement [PointNet](https://arxiv.org/abs/1612.00593) architecture and train a classification network (left) and a segmentation network (middle).
![title](img/cls_sem.jpg)

### Grading Points
* Task 1.1 - 5
* Task 1.2 - 5
* Task 2.1 - 10
* Task 2.2 - 5
* Task 2.3 - 5
* Task 2.4 - 5
* Task 2.5 - 5
* Task 2.6 - 10
* Task 2.7 - 5
* Task 2.8 - 10
* Task 2.9 - 10
* Task 2.10 - 5 
* Task 2.11 - 5
* Task 2.12 - 5
* Task 2.13 - 10

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import random
import math
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.utils.data as data

from torchvision.transforms import Compose

import dataset # custom dataset for ModelNet10 and ShapeNet

# 1. Data Loading

Usually, we write the point cloud as $X\in\mathbb{R}^{N\times 3}$. While in programming, we use `B x 3 x N` layout, where `B` is the batch-size and `N` is the number of points in a single point cloud.

## 1.1 Jitter the position of each points by a zero mean Gaussian
For input $X\in\mathbb{R}^{N\times 3}$, we transform $X$ by $X \leftarrow X + \mathcal{N}(0, \sigma^2)$.

In [3]:
class RandomJitter(object):
    def __init__(self, sigma):
        self.sigma = sigma
        
    def __call__(self, data):
        ## hint: useful function `torch.randn` and `torch.randn_like`
        ## TASK 1.1
        ## This function takes as input a point cloud of layout `3 x N`, 
        ## and output the jittered point cloud of layout `3 x N`.
        data += torch.randn_like(data).normal_(0, self.sigma)
        
        return data

In [4]:
## random generate data and test your transform here
a = torch.randn(3,10)
b = torch.randn_like(a).normal_(0, 2)
print(b)

tensor([[-1.0735,  0.0092, -0.4380,  3.6914, -0.6869,  0.6769, -0.7465,  0.2815,
          0.1756, -1.5641],
        [-0.2080, -0.2250,  0.3701, -0.6151, -0.3259,  0.7582, -1.2463, -2.5635,
         -2.3841, -1.8180],
        [-0.9242, -3.9937,  2.8220, -2.9388, -1.0028, -0.1808,  0.1078,  0.9008,
          3.0704, -2.7517]])


## 1.2 Rotate the object along the z-axis randomly
For input $X\in\mathbb{R}^{N\times 3}$, we rotate all points along z-axis (up-axis) by a degree $\theta$.


Suppose $T$ is the transformation matrix,
$$X\leftarrow XT,$$
where $$T=\begin{bmatrix}\cos\theta & \sin\theta & 0 \\ -\sin\theta & \cos\theta & 0 \\ 0 & 0 & 1 \end{bmatrix}.$$

In [5]:
class RandomZRotate(object):
    def __init__(self, degrees):
        ## here `self.degrees` is a tuple (0, 360) which defines the range of degree
        self.degrees = degrees
        
    def __call__(self, data):
        ## TASK 1.2
        ## This function takes as input a point cloud of layout `3 x N`, 
        ## and output the rotated point cloud of layout `3 x N`.
        ##
        ## The rotation is along z-axis, and the degree is uniformly distributed
        ## between [0, 360]
        ##
        ## hint: useful function `torch.randn`， `torch.randn_like` and `torch.matmul`
        ##
        ## Notice:   
        ## Different from its math notation `N x 3`, the input has size of `3 x N`
        # torch.rand "(0,1)"
        theta = torch.rand(1) * (self.degrees[1] - self.degrees[0]) + self.degrees[0]
        sin = torch.sin(theta/360 * 2 * np.pi)
        cos = torch.cos(theta)
        mat = torch.tensor([[cos, sin, 0],
                          [-sin, cos, 0],
                          [0,0,1]])
        data = torch.matmul(mat, data)
        
        return data

In [6]:
## random generate data and test your transform here
a = torch.randn(3,10)
thetat = torch.rand(1) * 360 + 0
sint = torch.sin(thetat)
cost = torch.cos(thetat)
matt = torch.tensor([[cost, sint, 0],[-sint, cost, 0],[0,0,1]])
datat = torch.matmul(matt, a)
print(datat)

tensor([[ 0.4911, -0.7101,  0.3581, -0.5455,  1.4950, -1.5668, -0.6998, -2.6751,
          1.1887,  1.1018],
        [-0.4432, -0.7006, -0.0878, -0.3871, -2.4042, -0.1400, -0.0682, -0.9424,
         -0.5377,  0.0166],
        [ 0.3169,  2.2118, -0.7978,  1.3652, -0.1872, -0.6404,  0.1839, -0.8140,
          0.9096,  1.0974]])


## 1.3 Load dataset ModelNet10 for Point Cloud Classification

### ModelNet10
By loading this dataset, we have data of shape `B x 3 x N` and label of shape `B`.

In [7]:
# It may taske some time to download and pre-process the dataset.
train_transform = Compose([RandomZRotate((0, 360)), RandomJitter(0.02)])
train_cls_dataset = dataset.ModelNet(root='./ModelNet10', transform=train_transform, train=True)
test_cls_dataset = dataset.ModelNet(root='./ModelNet10', train=False)
train_cls_loader = data.DataLoader(
    train_cls_dataset,
    batch_size=32,
    shuffle=True,
    num_workers=1,
)
test_cls_loader = data.DataLoader(
    test_cls_dataset,
    batch_size=1,
    shuffle=False,
    num_workers=1,
)

In [8]:
print(train_cls_dataset.num_classes)

10


## ShapeNet
By loading this dataset, we have data of shape `B x 3 x N` and target of shape `B x N`.

Here is the list of categories:
['Airplane', 'Bag', 'Cap', 'Car', 'Chair', 'Earphone', 'Guitar', 'Knife', 'Lamp', 'Laptop', 'Motorbike', 'Mug', 'Pistol', 'Rocket', 'Skateboard', 'Table']

In [9]:
## Here as an example, we choose the cateogry 'Airplane'
category = 'Airplane'
train_seg_dataset = dataset.ShapeNet(root='./ShapeNet', category=category, train=True)
test_seg_dataset = dataset.ShapeNet(root='./ShapeNet', category=category, train=False)
train_seg_loader = data.DataLoader(
    train_seg_dataset,
    batch_size=32,
    shuffle=True,
    num_workers=1,
)
test_seg_loader = data.DataLoader(
    test_seg_dataset,
    batch_size=1,
    shuffle=False,
    num_workers=1,
)

In [10]:
print(train_seg_dataset.num_classes)

5


# 2 PointNet Architecture (Read Section 4.2 and Appendix C)
In this section, you will be asked to implement classification and segmentation step by step.
![pointnet](img/pointnet.jpg)

## 2.1 Joint Alignment Network 
This mini-network takes as input matrix of size $N \times K$, and outputs a transformation matrix of size $K \times K$. 

In programming, the input size of this module is `B x K x N` and output size is `B x K x K`.

For the shared MLP, use structure like this `(FC(64), BN, ReLU, FC(128), BN, ReLU, FC(1024), BN, ReLU)`.

For the MLP after global max pooling, use structure like this `(FC(512), BN, ReLU, FC(256), BN, ReLU, FC(K*K)`.


In [11]:
from collections import OrderedDict as OrderedDict


In [12]:
class Transformation(nn.Module):
    def __init__(self, k=3):
        super(Transformation, self).__init__()
        
        self.k = k
        
        ## TASK 2.1
        
        ## define your network layers here
        ## shared mlp
        ## input size: B x K x N
        ## output size: B x 1024 x N
        ## hint: you may want to use `nn.Conv1d` here. Why?
        # nn.Conv1d for 3-dimensional data
        self.shared_mlp = nn.Sequential(OrderedDict([
          ('conv1', nn.Conv1d(k,64,1)),
          ('bn1', nn.BatchNorm1d(64)),  
          ('relu1', nn.ReLU()),
          ('conv2', nn.Conv1d(64,128,1)),
          ('bn2', nn.BatchNorm1d(128)),  
          ('relu2', nn.ReLU()),
          ('conv3', nn.Conv1d(128,1024,1)),
          ('bn3', nn.BatchNorm1d(1024)),  
          ('relu3', nn.ReLU())
        ]))
        
        ## define your network layers here
        ## mlp
        ## input size: B x 1024
        ## output size: B x (K*K)
        self.mlp = nn.Sequential(OrderedDict([
          ('fc1', nn.Linear(1024,512,1)),
          ('bn1', nn.BatchNorm1d(512)),  
          ('relu1', nn.ReLU()),
          ('fc2', nn.Linear(512,256,1)),
          ('bn2', nn.BatchNorm1d(256)),  
          ('relu2', nn.ReLU()),
          ('fc3', nn.Linear(256,k*k,1)),            
        ]))
        
    
    def forward(self, x):
        B, K, N = x.shape # batch-size, dim, number of points
        ## TASK 2.1

        ## forward of shared mlp
        # input - B x K x N
        # output - B x 1024 x N
        x = self.shared_mlp(x)
        
        ## global max pooling
        # input - B x 1024 x N
        # output - B x 1024
        x = torch.max(x, 2, keepdim=True)[0]
        x = x.view(-1, 1024)
        
        ## mlp
        # input - B x 1024
        # output - B x (K*K)
        x = self.mlp(x)
        
        ## reshape the transformation matrix to B x K x K
        identity = torch.eye(self.k, device=x.device)
        x = x.view(B, self.k, self.k) + identity[None]
        return x

In [13]:
from torchsummary import summary
T = Transformation().cuda()
summary(T, input_size= (3,  100))
print(T)

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv1d-1              [-1, 64, 100]             256
       BatchNorm1d-2              [-1, 64, 100]             128
              ReLU-3              [-1, 64, 100]               0
            Conv1d-4             [-1, 128, 100]           8,320
       BatchNorm1d-5             [-1, 128, 100]             256
              ReLU-6             [-1, 128, 100]               0
            Conv1d-7            [-1, 1024, 100]         132,096
       BatchNorm1d-8            [-1, 1024, 100]           2,048
              ReLU-9            [-1, 1024, 100]               0
           Linear-10                  [-1, 512]         524,800
      BatchNorm1d-11                  [-1, 512]           1,024
             ReLU-12                  [-1, 512]               0
           Linear-13                  [-1, 256]         131,328
      BatchNorm1d-14                  [

In [14]:
## random generate data and test this network
d = torch.randn(10,3,100).cuda()
T(d)

tensor([[[ 1.9902,  0.1703, -0.6293],
         [-0.3769,  1.6853,  0.3421],
         [-0.0086,  1.0022,  1.8099]],

        [[ 1.8576, -0.4449,  0.7118],
         [-0.1011,  1.6754, -0.1013],
         [ 0.2772,  0.3234,  1.2093]],

        [[ 1.3320, -0.4658, -0.6449],
         [-0.0248,  1.5130, -0.0843],
         [-1.0352, -0.3630,  0.7296]],

        [[ 1.8176, -0.8262, -0.2564],
         [ 0.1584,  1.4714,  0.3585],
         [-0.3193,  0.6245,  0.9849]],

        [[ 1.1638, -0.3135,  0.0896],
         [-0.2103,  0.9192, -0.0642],
         [ 0.1120, -0.3275,  0.8746]],

        [[ 0.7147, -0.2876, -0.1989],
         [ 0.3344,  1.4045, -0.3549],
         [ 0.0133,  0.1862,  1.3974]],

        [[ 1.9801,  0.5717,  0.1130],
         [-0.3813,  1.4358,  0.2630],
         [-0.4527,  0.8670,  1.1567]],

        [[ 1.3908, -0.1667,  1.2239],
         [ 0.4074,  1.3785, -0.1214],
         [ 0.1438,  0.6716,  0.8487]],

        [[ 1.3346, -0.4561, -0.0301],
         [-0.2884,  1.6216, -0.110

## 2.2 Regularization Loss
$$L_{reg}=\|I-TT^\intercal\|^2_F$$

The output of `Transformation` network is of size `B x K x K`. The module `OrthoLoss` has no trainable parameters, only computes this norm.

In [15]:
class OrthoLoss(nn.Module):
    def __init__(self):
        super(OrthoLoss, self).__init__()
        
    def forward(self, x):
        ## hint: useful function `torch.bmm` and `torch.matmul`
        
        ## TASK 2.2
        ## compute the matrix product
        prod = torch.bmm(x, x.transpose(1,2))

        norm = torch.norm(prod - torch.eye(x.shape[1], device=x.device)[None], dim=(1,2))
        return norm.mean()

In [16]:
## random generate data and test this network
a = torch.randn(3, 10, 10)
ortho = OrthoLoss()
ortho(a)

tensor(46.0790)

## 2.3 Feature Network
In this subsection, you will be asked to implement the feature network (the top branch).

Local features are a matrix of size `B x 64 x N`, which will be used in the segmentation task.

Global features are a matrix of size `B x 1024`, which will be used in the classification task.

In [17]:
class Feature(nn.Module):
    def __init__(self, alignment=False):
        super(Feature, self).__init__()
        
        self.alignment = alignment
        
        ## `input_transform` calculates the input transform matrix of size `3 x 3`
        if self.alignment:
            self.input_transform = Transformation(3)
        
        ## TASK 2.3
        ## define your network layers here
        ## local feature
        ## shared mlp
        ## input size: B x 3 x N
        ## output size: B x 64 x N
        ## hint: you may want to use `nn.Conv1d` here.
        self.local_sharedmlp = nn.Sequential(OrderedDict([
          ('conv1', nn.Conv1d(3,64,1)),
          ('bn1', nn.BatchNorm1d(64)),  
          ('relu1', nn.ReLU()),
          ('conv2', nn.Conv1d(64,64,1)),
          ('bn2', nn.BatchNorm1d(64)),  
          ('relu2', nn.ReLU()),
        ]))        
        
        ## `feature_transform` calculates the feature transform matrix of size `64 x 64`
        if self.alignment:
            self.feature_transform = Transformation(64)
        
        ## TASK 2.4
        ## define your network layers here
        ## global feature
        ## shared mlp
        ## input size: B x 64 x N
        self.global_sharedmlp = nn.Sequential(OrderedDict([
          ('conv1', nn.Conv1d(64,64,1)),
          ('bn1', nn.BatchNorm1d(64)),  
          ('relu1', nn.ReLU()),
          ('conv2', nn.Conv1d(64,128,1)),
          ('bn2', nn.BatchNorm1d(128)),  
          ('relu2', nn.ReLU()),
          ('conv3', nn.Conv1d(128,1024,1)),
          ('bn3', nn.BatchNorm1d(1024)),  
        ]))
  
    
    
    def forward(self, x):
        
        ## apply the input transform
        if self.alignment:
            transform = self.input_transform(x)
            ## TASK 2.5
            ## apply the input transform
            x = torch.bmm(transform, x)

        ## TASK 2.3
        ## forward of shared mlp
        # input - B x K x N
        # output - B x 64 x N
        x = self.local_sharedmlp(x)
        
        if self.alignment:
            transform = self.feature_transform(x)
            ## TASK 2.5
            ## apply the feature transform
            x = torch.bmm(transform, x)
        else:
            ## do not modify this line
            transform = None
        
        local_feature = x
        
        ## TASK 2.4
        ## forward of shared mlp
        # input - B x 64 x N
        # output - B x 1024 x N
        x = self.global_sharedmlp(x)
        
        ## TASK 2.4
        ## global max pooling
        # input - B x 1024 x N
        # output - B x 1024
        x = torch.max(x, 2, keepdim=True)[0]
        x = x.view(-1, 1024)
        
        global_feature = x
        
        ## summary:
        ## global_feature: B x 1024
        ## local_feature: B x 64 x N
        ## transform: B x K x K
        return global_feature, local_feature, transform

In [18]:
Ft = Feature().cuda()
summary(Ft, input_size= (3,  100))
print(F)

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv1d-1              [-1, 64, 100]             256
       BatchNorm1d-2              [-1, 64, 100]             128
              ReLU-3              [-1, 64, 100]               0
            Conv1d-4              [-1, 64, 100]           4,160
       BatchNorm1d-5              [-1, 64, 100]             128
              ReLU-6              [-1, 64, 100]               0
            Conv1d-7              [-1, 64, 100]           4,160
       BatchNorm1d-8              [-1, 64, 100]             128
              ReLU-9              [-1, 64, 100]               0
           Conv1d-10             [-1, 128, 100]           8,320
      BatchNorm1d-11             [-1, 128, 100]             256
             ReLU-12             [-1, 128, 100]               0
           Conv1d-13            [-1, 1024, 100]         132,096
      BatchNorm1d-14            [-1, 10

In [20]:
## random generate data and test this network
d = torch.randn(10,3,100).cuda()
test = Ft(d)
print(len(test))

3


## 2.4 Classification Network
In this network, you will use the global features generated by the `Feature` network defined above.

In [21]:
class Classification(nn.Module):
    def __init__(self, num_classes, alignment=False):
        super(Classification, self).__init__()
                
        self.feature = Feature(alignment=alignment)
        
        ## TASK 2.6
        ## define your network layers here
        ## mlp
        ## input size: B x 1024
        ## output size: B x num_classes
        self.cls_layers = nn.Sequential(OrderedDict([
          ('fc1', nn.Linear(1024,512,1)),
          ('bn1', nn.BatchNorm1d(512)),  
          ('relu1', nn.ReLU()),
          ('fc2', nn.Linear(512,256,1)),
          ('dp', nn.Dropout(p = 0.3)),
          ('bn2', nn.BatchNorm1d(256)),  
          ('relu2', nn.ReLU()),
          ('fc3', nn.Linear(256,num_classes,1))
        ]))       
        
    def forward(self, x):
        # x is the global feature matrix
        # here we don't use local feature matrix
        x, _, trans = self.feature(x)
        
        ## TASK 2.6
        ## forward of mlp
        # input - B x 1024
        # output - B x num_classes        
        x = self.cls_layers(x)
        
        ## x: B x num_classes
        ## trans: B x K x K
        return x, trans

In [22]:
C = Classification(10, True).cuda()
print(C)

Classification(
  (feature): Feature(
    (input_transform): Transformation(
      (shared_mlp): Sequential(
        (conv1): Conv1d(3, 64, kernel_size=(1,), stride=(1,))
        (bn1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU()
        (conv2): Conv1d(64, 128, kernel_size=(1,), stride=(1,))
        (bn2): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU()
        (conv3): Conv1d(128, 1024, kernel_size=(1,), stride=(1,))
        (bn3): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu3): ReLU()
      )
      (mlp): Sequential(
        (fc1): Linear(in_features=1024, out_features=512, bias=True)
        (bn1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU()
        (fc2): Linear(in_features=512, out_features=256, bias=True)
        (bn2): BatchNorm1d(256, eps=1e-05, moment

In [23]:
## random generate data and test this network
d = torch.randn(10,3,100).cuda()
test = C(d)
print(test)

(tensor([[-0.0149, -0.3381, -0.4680,  0.2336, -0.6388, -0.6762,  0.4531, -0.4477,
          0.4952,  0.2996],
        [ 0.6362, -0.4092,  0.3624,  0.5588, -0.0513, -0.3604, -0.1470, -0.7514,
          0.1698, -0.3216],
        [ 0.0385,  0.2391, -0.2328, -0.0151, -1.0534, -0.8248, -0.1934, -0.1495,
          0.1572, -0.6947],
        [ 0.5139, -0.3029, -0.5252,  0.4859, -0.1303, -0.4121, -0.0935, -0.2230,
          0.5934, -0.4780],
        [ 0.6860, -0.1916, -0.3970,  0.1552,  0.1949, -0.0973,  0.6671, -1.1011,
          0.2994, -0.9529],
        [ 0.1610,  0.0054, -0.0847,  0.0691, -0.1432, -0.1653, -0.2491, -0.3015,
          0.0695, -0.3644],
        [ 0.2532, -0.0539, -0.1568, -0.0044, -0.1443, -0.3656,  0.2104, -0.7889,
         -0.3642, -0.8007],
        [ 0.3872, -0.1963, -0.4969,  0.3615, -0.0271, -0.1964,  0.1987, -0.0254,
         -0.3401, -0.8509],
        [ 1.0935, -0.0339,  0.3628,  0.5094,  0.0744, -0.8301,  0.5789,  0.4883,
          0.6891,  0.3797],
        [ 0.4256, 

### 2.4.1 Train this network on ModelNet10

In [24]:
# main train function for classification
def train_cls(train_loader, test_loader, network, optimizer, epochs, scheduler):
    reg = OrthoLoss()
    for epoch in range(epochs):
        print('Epoch:[{:02d}/{:02d}]'.format(epoch+1, epochs))
        print('Training...')
        # .train, batch mean
        network.train()
        train_loss = 0
        correct = 0
        for batch, (pos, label) in enumerate(train_loader):
            network.zero_grad()
            pos, label = pos.cuda(), label.cuda()
            
            ## TASK 2.7
            ## forward propagation
            output, trans = network(pos)
            loss = nn.functional.nll_loss(nn.functional.log_softmax(output, dim = 1), label)
            ##########
            
            ## regularizer
            if trans is not None:
                loss += reg(trans) * 0.001

            pred = output.max(1)[1]
            correct += pred.eq(label).sum().item()

            loss.backward()
            optimizer.step()
            train_loss += loss.item()
            print('\rIter: [{:03d}/{:03d}] Loss: {:.4f}'.format(batch+1, len(train_loader), loss.item()), end='', flush=True)
        
        scheduler.step()
        print('\nAverage Train Loss: {:.4f}; Train Acc: {:.4f}'.format(train_loss/len(train_loader), correct/len(train_loader.dataset) * 100))
        
        print('\nTesting...')
        with torch.no_grad():
            # eval doesn't calcualte gradient, last batch mean, no dropout
            network.eval()
            test_loss = 0
            correct = 0
            for batch, (pos, label) in enumerate(test_loader):
                pos, label = pos.cuda(), label.cuda()
    
                ## TASK 2.7
                ## forward propagation
                output, trans = network(pos)
                loss = nn.functional.nll_loss(nn.functional.log_softmax(output, dim = 1), label)

                ##########

                if trans is not None:
                    loss += reg(trans) * 0.001

                pred = output.max(1)[1]
                correct += pred.eq(label).sum().item()

                test_loss += loss.item()
                print('\rIter: [{:03d}/{:03d}] Loss: {:.4f}'.format(batch+1, len(test_loader), loss.item()), end='', flush=True)

            print('\nAverage Test Loss: {:.4f}; Test Acc: {:.4f}'.format(test_loss/len(test_loader), correct/len(test_loader.dataset) * 100))
        print('-------------------------------------------')


In [25]:
network = Classification(10, alignment=True).cuda()
epochs = 200 # you can change the value to a small number for debugging

## TASK 2.8
# see Appendix C
# choose an optimizer and an initial learning rate
optimizer = torch.optim.Adam(network.parameters(), lr=0.001, betas=(0.9, 0.999), eps=1e-08,
            weight_decay=1e-4)
# choose a lr scheduler
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=20, gamma=0.5)
#######3

# start training
train_cls(train_cls_loader, test_cls_loader, network, optimizer, epochs, scheduler)

Epoch:[01/200]
Training...
Iter: [125/125] Loss: 1.0431
Average Train Loss: 1.6643; Train Acc: 46.1789

Testing...
Iter: [908/908] Loss: 1.5541
Average Test Loss: 1.1715; Test Acc: 59.2511
-------------------------------------------
Epoch:[02/200]
Training...
Iter: [125/125] Loss: 1.2018
Average Train Loss: 1.0418; Train Acc: 66.7502

Testing...
Iter: [908/908] Loss: 0.2933
Average Test Loss: 0.9675; Test Acc: 64.3172
-------------------------------------------
Epoch:[03/200]
Training...
Iter: [125/125] Loss: 1.2838
Average Train Loss: 0.9357; Train Acc: 70.1579

Testing...
Iter: [908/908] Loss: 0.6684
Average Test Loss: 1.1025; Test Acc: 60.9031
-------------------------------------------
Epoch:[04/200]
Training...
Iter: [125/125] Loss: 0.8934
Average Train Loss: 0.8105; Train Acc: 74.5928

Testing...
Iter: [908/908] Loss: 0.0889
Average Test Loss: 0.9539; Test Acc: 64.5374
-------------------------------------------
Epoch:[05/200]
Training...
Iter: [125/125] Loss: 0.5806
Average Trai

### Report the best test accuracy you can get.

# The best test accuracy is 90.7489.

## 2.5 Segmentation Network
In this network, you will use the global features and local features generated by the `Feature` network defined above.

The global feature matrix is of size `B x 1024` and the local feature matrix is of size `B x 64 x N`.

They need to be stacked together to a new matrix of size `B x 1088 x n` (How?). 

In [26]:
# main train function for classification
class Segmentation(nn.Module):
    def __init__(self, num_classes, alignment=False):
        super(Segmentation, self).__init__()
        
        self.feature = Feature(alignment=alignment)
        
        ## TASK 2.9
        ## shared mlp
        ## input size: B x 1088 x N
        ## output size: B x num_classes x N
        self.seg_layers = nn.Sequential(OrderedDict([
          ('conv1', nn.Conv1d(1088,512,1)),
          ('bn1', nn.BatchNorm1d(512)),  
          ('relu1', nn.ReLU()),
          ('conv2', nn.Conv1d(512,256,1)),
          ('bn2', nn.BatchNorm1d(256)),  
          ('relu2', nn.ReLU()),
          ('conv3', nn.Conv1d(256,128,1)),
          ('bn3', nn.BatchNorm1d(128)),  
          ('relu3', nn.ReLU()),
          ('conv4', nn.Conv1d(128,num_classes,1))         
        ])) 
        
    def forward(self, x):
        g, l, trans = self.feature(x)
        
        ## TASK 2.10
        # concat global features and local features to a single matrix
        # g - B x 1024, global features
        # l - B x 64 x N, local features
        # x - B x 1088 x N, concatenated features
        B,_,N = l.shape
        # expand = out_max.view(-1, 2048+16, 1).repeat(1, 1, x.size)
        g = g.view(B,1024,1).expand([B,1024,N])
        x = torch.cat([g,l],dim=1)
        
        ## TASK 2.9
        ## forward of shared mlp
        # input - B x 1088 x N
        # output - B x num_classes x N  
        x = self.seg_layers(x)
        
        return x, trans

In [27]:
S = Segmentation(5, True).cuda()
print(S)

Segmentation(
  (feature): Feature(
    (input_transform): Transformation(
      (shared_mlp): Sequential(
        (conv1): Conv1d(3, 64, kernel_size=(1,), stride=(1,))
        (bn1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU()
        (conv2): Conv1d(64, 128, kernel_size=(1,), stride=(1,))
        (bn2): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU()
        (conv3): Conv1d(128, 1024, kernel_size=(1,), stride=(1,))
        (bn3): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu3): ReLU()
      )
      (mlp): Sequential(
        (fc1): Linear(in_features=1024, out_features=512, bias=True)
        (bn1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU()
        (fc2): Linear(in_features=512, out_features=256, bias=True)
        (bn2): BatchNorm1d(256, eps=1e-05, momentum

In [28]:
## random generate data and test this network
d = torch.randn(10,3,100).cuda()
test = S(d)
print(test)

(tensor([[[ 4.0678e-01,  3.3083e-01,  4.8476e-01,  ...,  5.5892e-01,
           2.5157e-01,  4.2127e-01],
         [ 2.7037e-01,  2.1058e-01,  3.6191e-01,  ...,  2.3889e-01,
          -1.9467e-01,  1.0230e-01],
         [-1.9216e-01, -7.6937e-02, -1.5251e-01,  ..., -5.5411e-03,
          -3.5494e-04, -1.2999e-01],
         [ 7.2277e-01,  6.9677e-01,  6.7076e-01,  ...,  4.4341e-01,
           7.9080e-01,  2.9861e-01],
         [ 1.2703e-01, -2.6921e-01,  8.4761e-02,  ...,  3.9022e-01,
           3.4498e-02,  4.7865e-01]],

        [[-2.5109e-01, -4.8191e-01,  6.0220e-02,  ..., -3.5375e-01,
          -3.2851e-01, -4.7964e-01],
         [ 1.6387e-01,  1.7673e-01, -9.4886e-02,  ...,  3.4510e-01,
           2.3036e-01,  2.8644e-01],
         [ 6.8194e-02, -5.8677e-02, -1.9294e-01,  ..., -5.4642e-02,
          -9.5231e-02,  6.8759e-02],
         [ 1.4040e-01,  1.9371e-01,  3.2248e-01,  ..., -2.6951e-01,
           8.0368e-02, -7.2421e-02],
         [-1.7834e-01,  1.3822e-01,  2.1928e-01,  ..

### 2.5.1 Calculating Intersection over Union (IoU) 
For 2D image, the IoU is calculated as follows,
![iou](img/iou.png)

How is it used in the literature of point clouds?

In [29]:

## TASK 2.11
# implement the helper functions to calculate the IoU
def get_i_and_u(pred, target, num_classes):
    """Calculate intersection and union between pred and target.
    pred -- B x N matrix
    target -- B x N matrix
    num_classes -- number of classes
    return i, u
    i -- B x N binary matrix, intersection, i[b, n] equals 1 if and only if it is a true-positive.
    u -- B x N binary matrix, union, u[b, n] equals 0 if and only if it is a true-negative
    """
    ## TASK 2.11
    ## calculate i and u here
    ## hint: useful function `F.one_hot`    
    ## hint: use element-wise logical tensor operation (`&` and `|`)
    p = F.one_hot(pred, num_classes)
    t = F.one_hot(target, num_classes)
    # i = (p & t).sum(0)
    i = (p & t).sum(-1)
    #print(i.shape) [32,2048]
    u = (p | t ).sum(-1)
    
    return i, u

def get_iou(pred, target, num_classes):
    """Calculate IoU
    pred -- B x N matrix
    target -- B x N matrix
    num_classes -- number of classes
    return iou
    iou -- B matrix, iou[b] is the IoU of b-th point cloud in this batch
    """
    ## use the helper function `i_and_u` defined above
    i, u = get_i_and_u(pred, target, num_classes)
    ## TASK 2.11
    ## calculate iou
    iou = i.float() / u.float()
    #print(iou.shape)
    return iou

### 2.5.2 Train this network on ShapeNet

In [30]:
# main train function for segmentation
def train_seg(train_loader, test_loader, network, optimizer, epochs, scheduler):  
    reg = OrthoLoss()
    for epoch in range(epochs):
        print('Epoch:[{:02d}/{:02d}]'.format(epoch+1, epochs))
        print('Training...')
        network.train()
        train_loss = 0
        correct = 0
        total = 0
        ious = []
        for batch, (pos, label) in enumerate(train_loader):
            network.zero_grad()
            pos, label = pos.cuda(), label.cuda()
            
            ## TASK 2.12
            ## forward propagation
            output, trans = network(pos)
            loss = nn.functional.nll_loss(nn.functional.log_softmax(output, dim = 1), label)
            ##########
            if trans is not None:
                loss += reg(trans) * 0.001        

            pred = output.max(1)[1]
            correct += pred.eq(label).sum().item()
            total += label.numel()

            loss.backward()
            optimizer.step()
            train_loss += loss.item()

            ious += [get_iou(pred, label, train_loader.dataset.num_classes)]
            print('\rIter: [{:03d}/{:03d}] Loss: {:.4f}'.format(batch+1, len(train_loader), loss.item()), end='', flush=True)
        
        scheduler.step()
        print('\nAverage Train Loss: {:.4f}; Train Acc: {:.4f}; Train mean IoU: {:.4f}'.format(train_loss/len(train_loader), correct/total * 100, torch.cat(ious, dim=0).mean().item()))

        print('\nTesting...')
        with torch.no_grad():
            network.eval()
            test_loss = 0
            correct = 0
            total = 0
            ious = []
            for batch, (pos, label) in enumerate(test_loader):
                pos, label = pos.cuda(), label.cuda()
                
                ## TASK 2.12
                ## forward propagation
                output, trans = network(pos)
                loss = nn.functional.nll_loss(nn.functional.log_softmax(output, dim = 1), label)
                ##########
                
                if trans is not None:
                    loss += reg(trans) * 0.001   

                pred = output.max(1)[1]
                correct += pred.eq(label).sum().item()
                total += label.numel()

                test_loss += loss.item()

                ious += [get_iou(pred, label, train_loader.dataset.num_classes)]
                print('\rIter: [{:03d}/{:03d}] Loss: {:.4f}'.format(batch+1, len(test_loader), loss.item()), end='', flush=True)

            print('\nAverage Test Loss: {:.4f}; Test Acc: {:.4f}; Test mean IoU: {:.4f}'.format(test_loss/len(test_loader), correct/total * 100, torch.cat(ious, dim=0).mean().item()))
        print('-------------------------------------------')

In [31]:
network = Segmentation(train_seg_dataset.num_classes, alignment=True).cuda()
epochs = 200 # you can change the value to a small number for debugging

## TASK 2.13
# see Appendix C
# choose an optimizer and an initial learning rate
optimizer = torch.optim.Adam(network.parameters(), lr=0.001, betas=(0.9, 0.999), eps=1e-08,
            weight_decay=1e-4)
# choose a lr scheduler
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=20, gamma=0.5)
#######3

train_seg(train_seg_loader, test_seg_loader, network, optimizer, epochs, scheduler)

Epoch:[01/200]
Training...
Iter: [074/074] Loss: 0.5263
Average Train Loss: 0.6830; Train Acc: 76.9861; Train mean IoU: 0.7699

Testing...
Iter: [341/341] Loss: 0.5374
Average Test Loss: 0.5251; Test Acc: 82.2901; Test mean IoU: 0.8229
-------------------------------------------
Epoch:[02/200]
Training...
Iter: [074/074] Loss: 0.4487
Average Train Loss: 0.3812; Train Acc: 87.3273; Train mean IoU: 0.8733

Testing...
Iter: [341/341] Loss: 0.3340
Average Test Loss: 0.4339; Test Acc: 84.0112; Test mean IoU: 0.8401
-------------------------------------------
Epoch:[03/200]
Training...
Iter: [074/074] Loss: 0.3557
Average Train Loss: 0.3299; Train Acc: 88.6881; Train mean IoU: 0.8869

Testing...
Iter: [341/341] Loss: 0.3667
Average Test Loss: 0.4137; Test Acc: 84.8100; Test mean IoU: 0.8481
-------------------------------------------
Epoch:[04/200]
Training...
Iter: [074/074] Loss: 0.2502
Average Train Loss: 0.2986; Train Acc: 89.3741; Train mean IoU: 0.8937

Testing...
Iter: [341/341] Loss:

### Report the best test mIoU you can get.

# The best test mIoU obtained from the training is 0.9166