# Exercise 11 - PointNet for 3D Point Cloud Classification

This exercises is about implementing the PointNet architecture for point-wise classification of a 3D point cloud by considering local neighborhoods of 20 points around each point to classify. (The point itself is also used, so that in the end there are 21 points as input to PointNet.)

The datasets are provided as 3-dimensional tensors, where for each point of the point cloud (D), the 21 points (N=21) are given, with their 3D coordinates (Dx21x3). The training dataset of ISPRS is used for training, and the evaluation dataset for testing (predictions).

## Setup TensorFlow

In [1]:
# Change X to the GPU number you want to use,
# otherwise you will get a Python error
# e.g. USE_GPU = 4
USE_GPU = 4

In [2]:
# Import TensorFlow 
import tensorflow as tf

# Print the installed TensorFlow version
print(f'TensorFlow version: {tf.__version__}\n')

# Get all GPU devices on this server
gpu_devices = tf.config.list_physical_devices('GPU')

# Print the name and the type of all GPU devices
print('Available GPU Devices:')
for gpu in gpu_devices:
    print(' ', gpu.name, gpu.device_type)
    
# Set only the GPU specified as USE_GPU to be visible
tf.config.set_visible_devices(gpu_devices[USE_GPU], 'GPU')

# Get all visible GPU  devices on this server
visible_devices = tf.config.get_visible_devices('GPU')

# Print the name and the type of all visible GPU devices
print('\nVisible GPU Devices:')
for gpu in visible_devices:
    print(' ', gpu.name, gpu.device_type)
    
# Set the visible device(s) to not allocate all available memory at once,
# but rather let the memory grow whenever needed
for gpu in visible_devices:
    tf.config.experimental.set_memory_growth(gpu, True)
    
# Import Keras
from tensorflow import keras

# Print the installed Keras version
print(f'\nKeras version: {keras.__version__}\n')

TensorFlow version: 2.3.0

Available GPU Devices:
  /physical_device:GPU:0 GPU
  /physical_device:GPU:1 GPU
  /physical_device:GPU:2 GPU
  /physical_device:GPU:3 GPU
  /physical_device:GPU:4 GPU
  /physical_device:GPU:5 GPU
  /physical_device:GPU:6 GPU
  /physical_device:GPU:7 GPU

Visible GPU Devices:
  /physical_device:GPU:4 GPU

Keras version: 2.4.0



## Helper functions

In [3]:
def save_colorized_point_cloud(xyz, y, filename):

    color_map = np.array([
        [255, 255, 125],
        [  0, 255, 255],
        [255, 255, 255],
        [255, 255,   0],
        [  0, 255, 125],
        [  0,   0, 255],
        [  0, 125, 255],
        [125, 255,   0],
        [  0, 255,   0]])
    
    u, inverses = np.unique(y, return_inverse=True)    
    
    colors = color_map[inverses]
    
    df = pd.DataFrame(xyz, columns=['x', 'y', 'z'])    

    df['red'] = pd.Series(data=colors[:,0], name='red')
    df['green'] = pd.Series(data=colors[:,1], name='green')
    df['blue'] = pd.Series(data=colors[:,2], name='blue')
    
    df.to_csv(filename, index=False, header=False)
    
    print(f'Saved "{filename}"')

## Load training data

The dataset used in this exercise consists of a large number (753.876 for training) of very small point clouds with 21 points each. They are the result of taking each point from the original point cloud together with their 20 neighbor points. And they form the input to the PointNet neural network to predict the class for the one "central" point.

The point clouds are centered by their "central" point, so that the x,y-coordinates of the central point is in the origin of the coordinates system. And the z-coordinate of this point is the elevation over the terrain.

In [4]:
from pathlib import Path
import os

root_dir = str(Path.home()) + r'/coursematerial/GIS/ISPRS/PointNet'

In [5]:
import numpy as np

filename_xyz = 'Vaihingen3D_Training_PointNet_XYZ.npy'
filename_lab = 'Vaihingen3D_Training_PointNet_Labels.npy'

# Load the point clouds with the  x,y,z-values
X = np.load(os.path.join(root_dir, filename_xyz))
print(f'Successfully read "{filename_xyz}" feature matrix of shape {X.shape}')

# Load labels
y = np.load(os.path.join(root_dir, filename_lab))
print(f'Successfully read "{filename_lab}" label vector of shape  {y.shape}')

Successfully read "Vaihingen3D_Training_PointNet_XYZ.npy" feature matrix of shape (753876, 21, 3)
Successfully read "Vaihingen3D_Training_PointNet_Labels.npy" label vector of shape  (753876, 1)


In [6]:
# see, e.g., the point cloud of the third point (at index 2)
# (the elevation of the first point should be 0.02)
# this is the input for each training and prediction pass
X[2]

array([[ 0.  ,  0.  ,  0.02],
       [ 0.07,  0.15,  0.  ],
       [ 0.04, -0.25, -0.02],
       [-0.3 , -0.2 , -0.03],
       [ 0.36, -0.1 ,  0.  ],
       [-0.01, -0.38, -0.01],
       [ 0.01,  0.38, -0.02],
       [ 0.38,  0.27,  0.  ],
       [ 0.1 ,  0.55, -0.01],
       [ 0.35, -0.48,  0.02],
       [-0.32, -0.56, -0.03],
       [ 0.01, -0.66, -0.02],
       [ 0.39,  0.64, -0.01],
       [-0.02, -0.77,  0.02],
       [ 0.34, -0.85,  0.03],
       [ 0.4 ,  1.01, -0.01],
       [ 1.15,  0.02,  0.  ],
       [ 1.13, -0.35,  0.01],
       [ 1.18,  0.39, -0.01],
       [ 1.25, -0.14, -0.01],
       [ 1.1 , -0.71,  0.02]])

## Shuffle the training data

In [7]:
# Calculate a permutation of indices (shuffle indices)
p = np.random.default_rng().permutation(X.shape[0])

# Use the permutated indices to shuffle the array of features
X = np.take(X, p, axis=0)

# Use the (same) permutated indices to shuffle the array of labels
# (It is very important to use the same indices for both features and labels.)
y = np.take(y, p)

# Construct the PointNet model

In [8]:
from keras import models
from keras import layers

Please construct the PointNet model in the following cell. The comments will guide you through the construction. If you want to do it without any guidance, then just delete the comments and start from scratch.

**Note:** You can implement the PointNet model either with 1D or 2D convolutional layers, but the explanations that follow are for using 2D convolutional layers. Your suggested task in the end will be to also implement PointNet with the other type of convolutional layers. So, you should do it in both ways to get more familiar with convolutional filters.

In order to use a 2D convolutional layer, the input tensor must have 4 dimensions: batch size (B), number of points(N), 1 (in our case to make it 4 dimensional), and channels (C). Using the dimensions in this way, the filter size is (1, 1). If you decide to exchange the 3rd and 4th dimension, and you end up with BxNx3x1, then your filter size is (1,3). Remember that you do not provide the last dimension (the number of channels) in the convolutional filter. Either way, after the first layer, the network will be the same, and the features derived through the convolutional layers will be in the last dimension. Then, (1,1) convolutional filters are used. The dimensions of the input need to be provided to the Input layer.

The implementation of the PointNet model follows the architecture presented on the lecture slides. The layers of the MLPs are each composed of a 2D convolutional layer, followed by batch normalization, and a ReLU activation layer. After 5 layers for the first MLP, there is a max pooling layer, and another (second) MLP with 3 layers. In between the 2nd and the 3rd layer (of the second MLP), a dropout layer with a dropout rate of 0.3 can help the training process. The last layer of the second MLP does not have batch normalization or an activation layer. Or rather, a softmax activation layer is used to generate the 9 class scores. 

Either before or after the softmax activation layer, you need to reduce the dimensions of the tensor from Bx1x1x9 to Bx1x9 by reshaping it to (1,9). Otherwise, the loss function will not work properly.

I suggest you use the functional API to define the neural network model as is already given in the following cell. Then, you can add further information from the point cloud like the intensity as an additional input. Your (optional) task later on will be to inject intensity, number of returns, and return number that is also provided by the ISPRS data for the point cloud into the network after the feature extraction (after max pooling). In the functional model, you need to provide the sequence of layers by defining the output of the previous layer to be the input of the current layer.

If you have difficulties with the functional API, then you can also use the sequential API. But we have not looked into if you can then inject further information as another input layer into the model.

In [15]:
#input layer that takes input of shape Batch size x Number of points x 1 x 3 channel (B, N, 1, 3)
input_xyz = keras.layers.Input(shape=(21, 1, 3), dtype='float32', name='Input_xyz')

# first layer of first MLP with 64 filters
conv1 = keras.layers.Conv2D(filters=64, kernel_size=(1, 1), name ='1_Conv2D')(input_xyz)
bn1   = keras.layers.BatchNormalization(name='1_BN')(conv1)
relu1 = keras.layers.Activation('relu', name='1_ReLU')(bn1)

# second layer of first MLP with 64 filters
conv2 = keras.layers.Conv2D(filters=64, kernel_size=(1,1), name="2_Conv2D")(relu1)
bn2   = keras.layers.BatchNormalization(name='2_BN')(conv2)
relu2 = keras.layers.Activation('relu', name='2_ReLU')(bn2)

# third layer of first MLP with 64 filters
conv3 = keras.layers.Conv2D(filters=64, kernel_size=(1,1), name="3_Conv2D")(relu2)
bn3   = keras.layers.BatchNormalization(name='3_BN')(conv3)
relu3 = keras.layers.Activation('relu', name='3_ReLU')(bn3)

# fourth layer of first MLP with 128 filters
conv4 = keras.layers.Conv2D(filters=128, kernel_size=(1,1), name="4_Conv2D")(relu3)
bn4   = keras.layers.BatchNormalization(name='4_BN')(conv4)
relu4 = keras.layers.Activation('relu', name='4_ReLU')(bn4)

# fifth layer of first MLP with 1024 filters
conv5 = keras.layers.Conv2D(filters=1024, kernel_size=(1,1), name="5_Conv2D")(relu4)
bn5   = keras.layers.BatchNormalization(name='5_BN')(conv5)
relu5 = keras.layers.Activation('relu', name='5_ReLU')(bn5)

# max pooling, the pool size is (21, 1)
maxpooling = keras.layers.MaxPooling2D(pool_size=(21,1), name="MaxPooling")(relu5)

# first layer of the second MLP with 512 filters
# (note: the second MLP could also be implemented with dense layers, but then the 
#        tensor needs to be reshaped to get rid of the extra dimension using (1, 1024).)
conv6 = keras.layers.Conv2D(filters=512, kernel_size=(1,1), name="6_Conv2D")(maxpooling)
bn6   = keras.layers.BatchNormalization(name='6_BN')(conv6)
relu6 = keras.layers.Activation('relu', name='6_ReLU')(bn6)

# second layer of the second MLP with 256 filters
conv7 = keras.layers.Conv2D(filters=256, kernel_size=(1,1), name="7_Conv2D")(relu6)
bn7   = keras.layers.BatchNormalization(name='7_BN')(conv7)
relu7 = keras.layers.Activation('relu', name='7_ReLU')(bn7)

# insert a dropout layer with rate 0.3
dropout = keras.layers.Dropout(0.3)(relu7)

# third layer of the second MLP with 9 filters (for classification scores)
# WITHOUT batch normalization and without activation function
conv8 = keras.layers.Conv2D(filters=9, kernel_size=(1,1), name="8_Conv2D")(dropout)

# the tensor needs to be reshaped to (1, 9) to get rid of extra dimension
# (If you decided to implement the second MLP with dense layers, then you do not
#  need to perform a reshape here as you already did earlier.)
#...

# softmax actication layer
softmax = keras.layers.Softmax()(conv8)

# Functional model with defined inputs and outputs
model = keras.Model(inputs=[input_xyz], outputs=[softmax], name='PointNet')

model.summary()

Model: "PointNet"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
Input_xyz (InputLayer)       [(None, 21, 1, 3)]        0         
_________________________________________________________________
1_Conv2D (Conv2D)            (None, 21, 1, 64)         256       
_________________________________________________________________
1_BN (BatchNormalization)    (None, 21, 1, 64)         256       
_________________________________________________________________
1_ReLU (Activation)          (None, 21, 1, 64)         0         
_________________________________________________________________
2_Conv2D (Conv2D)            (None, 21, 1, 64)         4160      
_________________________________________________________________
2_BN (BatchNormalization)    (None, 21, 1, 64)         256       
_________________________________________________________________
2_ReLU (Activation)          (None, 21, 1, 64)         0  

In [16]:
model.compile(loss='sparse_categorical_crossentropy',
              optimizer='sgd', 
              metrics=['accuracy'])

## Train the model

The dataset is currently in the shape (753876, 21, 3). But the first dimension of the training data is the number of training instances (753876) and not a dimension that is used for training. The network will randomly use one (or a whole batch) of these 753876 training instances and provide it to the network. Then, the data instance is of shape (21,3). But the network expects a tensor of shape (21,1,3). (The first dimension is the batch size, and we do not need to provide that ourselves.). Therefore, we need to expand the dimensions of the tensor in the second to last (-2) dimension. (We do not actually change the data itself, we just change the definition of the tensor. No data is added and no extra memory is required.)

Use maybe 20 (to 40 epochs) for training the given dataset.

In [17]:
print(f'Shape before expand: {X.shape}\nShape after expand:  {np.expand_dims(X, axis=-2).shape}')

Shape before expand: (753876, 21, 3)
Shape after expand:  (753876, 21, 1, 3)


In [18]:
BATCH_SIZE = 64  # you can increase this for faster training

# train directly on numpy array
history = model.fit(np.expand_dims(X, axis=-2), y, 
                    batch_size=BATCH_SIZE, 
                    epochs=20, 
                    validation_split=0.15)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


## Testing on Vaihingen evaluation data

In this exercise, we use the evaluation part of the ISPRS dataset for real testing, so that we get some realistic numbers for quality metrics.

In [19]:
filename_xyz = 'Vaihingen3D_Evaluation_PointNet_XYZ.npy'
filename_lab = 'Vaihingen3D_Evaluation_PointNet_Labels.npy'

# Load the point clouds with the  x,y,z-values
X_test = np.load(os.path.join(root_dir, filename_xyz))
print(f'Successfully read "{filename_xyz}" feature matrix of shape {X_test.shape}')

# Load labels
y_test = np.load(os.path.join(root_dir, filename_lab))
print(f'Successfully read "{filename_lab}" label vector of shape  {y_test.shape}')

Successfully read "Vaihingen3D_Evaluation_PointNet_XYZ.npy" feature matrix of shape (411722, 21, 3)
Successfully read "Vaihingen3D_Evaluation_PointNet_Labels.npy" label vector of shape  (411722, 1)


The accuracy on the test dataset should be around 68%. This low accuracy is to be expected and should be approximately the same as with the hand-craftet features. However, the network now extracts features by itself and we provided less information to it than in the last exercise.

In [20]:
model.evaluate(np.expand_dims(X_test, axis=-2), y_test, 
               batch_size=BATCH_SIZE, 
               verbose=1)



[0.8911899328231812, 0.40693071484565735]

In the following, some typical evaluation reports are given.

**confusion matrix**

In [27]:
from sklearn.metrics import confusion_matrix
import pandas as pd

class_names = ['Powerline', 'Low vegetation', 'Impervious surfaces', 'Car', 'Fence/Hedge', 'Roof', 'Facade', 'Shrub', 'Tree']

y_test_pred = np.argmax(model.predict(np.expand_dims(X_test, axis=-2)), axis=-1)[:,:,0]
cm = confusion_matrix(y_test, y_test_pred)

df = pd.DataFrame(data=cm, columns=class_names, index=class_names)
df

(411722, 1)


Unnamed: 0,Powerline,Low vegetation,Impervious surfaces,Car,Fence/Hedge,Roof,Facade,Shrub,Tree
Powerline,100,1,2,0,0,98,17,6,376
Low vegetation,0,38191,47279,214,219,838,270,8585,3094
Impervious surfaces,0,8662,92788,14,8,120,43,291,60
Car,0,479,31,969,205,198,8,1691,127
Fence/Hedge,0,886,79,67,741,533,199,3610,1307
Roof,3,701,651,45,389,85609,291,2065,19294
Facade,7,742,108,24,31,693,4162,1181,4276
Shrub,1,3021,568,187,289,1629,254,12296,6573
Tree,8,890,80,44,94,5291,535,5191,42093


**precision, recall, F1-score**

In [None]:
from sklearn.metrics import precision_score, recall_score, f1_score

precision = precision_score(y_test, y_test_pred, average='weighted')
recall = recall_score(y_test, y_test_pred, average='weighted')
f1 = f1_score(y_test, y_test_pred, average='weighted')

print(f'Precision: {precision:.2f}')
print(f'Recall   : {recall:.2f}')
print(f'F1       : {f1:.2f}')

**classification report**

In [None]:
from sklearn.metrics import classification_report
print(classification_report(y_test, y_test_pred, target_names=class_names))

## Save colorized point cloud of testing data

For outputting the point cloud with the predicted class labels, we need the original 3D point cloud with original x,y,z-coordinates. The points are in the same order as the NumPy arrays that provide the x,y,z-coordinates of the small point clouds (with the 21-points) and the labels. (Remember that the points in the small point clouds are all centered by the first point, so that we do not have the original coordinates anymore.)

In [None]:
filename_pc = 'Vaihingen3D_Evaluation.pts'

xyz_df = pd.read_csv(os.path.join(root_dir, filename_pc), 
                     sep=" ", 
                     names=['x', 'y', 'z', 'intensity', 'return_number', 'number_of_returns', 'label'],
                     usecols=['x', 'y', 'z'],
                     header=None)

print(f'Successfully read "{filename_pc}" of shape {xyz_df.shape}')

In [None]:
save_colorized_point_cloud(xyz_df.to_numpy(), y_test_pred, 'Vaihingen3D_Evaluation_Results.csv')

## Conclusion

In this exercise, we used PointNet as a feature encoder for points with their 20-point neighborhood for the classification of 3D point clouds. The feature encoding is the part until (including) max pooling. What follows is a typical classification network with fully connected layers that are implemented with convolutional filters and with softmax . (A convolutional filter that has the same size as the input has the same effect as a fully connected layer. Only the interpretation of the data is different with regard to tensor dimensions.)

As seen by the results, the quality is approximately the same as with hand-crafted features.

## Tasks

Below are a few ideas for tasks that could still be done for PointNet in order to practice more.

## Task 1 - Predictions on large dataset

Use the large dataset for prediction and outputting the colorized point cloud. The files are called "Vaihingen3D_Large_PointNet_XYZ.npy" for the small input point clouds. And the points with the original x,y,z-coordinates are stored in "Vaihingen3D_Large_Points.csv.gz". As there are no labels in this large dataset, it can only be used for prediction and not for training.

## Task 2 - 1D Convolutions

Implement the PointNet model also with 1D convolutions. Be careful with the tensor dimensions (input data, input and filter sizes, reshape?, etc.). When working with 1D convolutions, your whole model needs to process data with 3 dimensions: batch size, number of points, channels. (You do not need to provide batch sizes, so the definition of layers is without the batch size dimension.)

## Task 3 - Intensity, return number, number of returns

You can include the 3 other columns of the dataset, 'intensity', 'return_number', and 'number_of_returns', as further (second) input to the network and inject this data into the network after the max pooling layer. The following cells should give you some information on how to implement this.

Explanation on the sensor features:
- Intensity is the intensity with which the laser beam was reflected. Flat, impervious surfaces typically have a higher intensity as vegetation.
- The laser beam is sometimes reflected several times, e.g. going through a tree, the laser beam is reflected at the branches, and several returns are registered and digitized. Number of returns is the total number of these reflections.
- The laser beam is typically not going straight downwards, so there is a tilt in its direction. The different returns (of the same beam) at different elevation levels, lead to 3D points with different x,y-coordinates. Therefore, all points have individual coordinates and the return number denotes the "index" of the return it was derived from.

Read the point cloud data with intensity, return number, number of returns. Do not forget to also shuffle it with the same permutation as the xyz and labels. But make absolutely sure that the shape of the tensor does not change. Check with the .shape() method.

In [None]:
import pandas as pd

pc_df = pd.read_csv(os.path.join(root_dir, 'Vaihingen3D_Training.pts'), 
                     sep=" ", 
                     names=['x', 'y', 'z', 'intensity', 'return_number', 'number_of_returns', 'label'],
                     header=None)

intensity = pc_df[['intensity', 'return_number', 'number_of_returns']].to_numpy(dtype=float)

Construct a second input layer (right after the first input layer) with input dimensions (3). (Depending on your implementation of PointNet, you will need to reshape the tensor within the nextwork before the concatenation with the extracted features.) In the summary, your input layer will only appear if it is actually used and at the position where it is used. So please be aware that you might not find it right away or at a position you might not have expected it to be.

In [None]:
input_intensity = keras.layers.Input(shape=(3), dtype='float32', name='Input_intensity')

Insert a concatenation layer to concatenate the (1024) channels of the extracted features and with the (3) channels of the intensity input. But before that, you have to adjust the dimensions of the second input, so that it fits the global features.

If your output dimensions from the max pooling layer are (None, 1, 1, 1024) and the output dimensions of your intensity are (None, 1, 3), then reshape your intensity to (None, 1, 1, 3).

Then concatenate the output of the pooling layer with the reshape layer of the intensity. Do not forget to change the next layer to take the new concat output as input.

In [None]:
reshape_intensity = keras.layers.Reshape((1, 1, 3), name='ReshapeIntensity')(input_intensity)

concat = keras.layers.Concatenate(name='Concatenate')([pool,reshape_intensity])

The model must be defined to have two input layers. Just exchange the respective line in the model.

In [None]:
model = keras.Model(inputs=[input_xyz, input_intensity], outputs=[softmax])

When fitting the model, you have to provide a tuple (a pair) of the inputs, by using the parenthesis (for the tuple) and the two input Numpy arrays. (You could also expand the dimension of the intensity input before it is inputted it into the fit() function instead of reshaping it in the model. The effect, however, is the same.)

In [None]:
history = model.fit((np.expand_dims(X, axis=-2), intensity), y, ...

Note, that you also need to provide a tuple as input for the predictions you make. The evaluation dataset for testing is called "Vaihingen3D_Evaluation.pts", from which you need to extract the intensity, etc. for the predictions.

Your accuracy on the evaluation dataset should get up about 8% to 76%. As you can see, this kind of sensor information can be to quite an improvement. The added value will, however, dimish once we continue with multi-scale feature extraction.

But also output the colorized point cloud and check what really changed. You can also take the labeled file "Vaihingen3D_Training.pts", extract the x,y,z-coordinates, and the labels as two numpy arrays, and save it with the helper function as a colorized point cloud as a referenc