# Exercise 10 - Feature-Based Classification of 3D Point Clouds

In this exercise, we use a multi-layer perceptron (fully connected feedforward neural network) to classify individual points from a 3D point cloud by a set of 12 pre-computed features. The inputs of the network are the 12 feature values of a point, and it predicts the label for this point only.

The dataset used is the one for the 3D semantic labeling contest of the ISPRS. You can find more information about the dataset on the web site of the ISPRS (https://www2.isprs.org/commissions/comm2/wg4/benchmark/3d-semantic-labeling/). It was generated for aerial 3D point cloud classification before the time of Deep Learning, and is really too small for Deep Learning to sufficiently learn from it. But it is open and should be sufficient to learn about the concepts and methods. With the Deep Learning approach presented here, you should get typical accuracies as with pre-Deep Learning machine learning methods.

**Learning objectives:**
- Repeat fully connected neural networks
- Perform and understand a feature-based classification using neural networks

**Please note:** In order to get more out of the dataset and this methodology, we will cheat a little (ok, not a little, but a lot) by using both the training and testing (validation) part of the benchmark dataset. And only use random points within the dataset for validating our training process. (Validating in the sense to make sure it actually works.) Normally, we should have seperate regions for training, validation, and testing. But the dataset does not provide enough variety for this kind of Deep Learning approach. In future exercises, when we fully make use of Deep Learning as a feature encoder, we will use the dataset as intended and even get better results than presented here.

**Your tasks:**
- Go through and understand the implementation as well as the underlying methodology
- Define and implemenent the neural network model
- Experiment with hyperparameters
- Compare the results of the small training dataset with the results of the large dataset

(The implementation of how the points are colorized in the respective helper function uses Numpy vector operations. This might not be very obvious to understand without any further experience or explanation. I suggest you skip over this function and just accept that it colorizes and saves point clouds. The neural network part is more important.)

## Setup TensorFlow

In [1]:
# Change X to the GPU number you want to use,
# otherwise you will get a Python error
# e.g. USE_GPU = 4
USE_GPU = 4

In [2]:
# Import TensorFlow 
import tensorflow as tf

# Print the installed TensorFlow version
print(f'TensorFlow version: {tf.__version__}\n')

# Get all GPU devices on this server
gpu_devices = tf.config.list_physical_devices('GPU')

# Print the name and the type of all GPU devices
print('Available GPU Devices:')
for gpu in gpu_devices:
    print(' ', gpu.name, gpu.device_type)
    
# Set only the GPU specified as USE_GPU to be visible
tf.config.set_visible_devices(gpu_devices[USE_GPU], 'GPU')

# Get all visible GPU  devices on this server
visible_devices = tf.config.get_visible_devices('GPU')

# Print the name and the type of all visible GPU devices
print('\nVisible GPU Devices:')
for gpu in visible_devices:
    print(' ', gpu.name, gpu.device_type)
    
# Set the visible device(s) to not allocate all available memory at once,
# but rather let the memory grow whenever needed
for gpu in visible_devices:
    tf.config.experimental.set_memory_growth(gpu, True)
    
# Import Keras
from tensorflow import keras

# Print the installed Keras version
print(f'\nKeras version: {keras.__version__}\n')

TensorFlow version: 2.3.0

Available GPU Devices:
  /physical_device:GPU:0 GPU
  /physical_device:GPU:1 GPU
  /physical_device:GPU:2 GPU
  /physical_device:GPU:3 GPU
  /physical_device:GPU:4 GPU
  /physical_device:GPU:5 GPU
  /physical_device:GPU:6 GPU
  /physical_device:GPU:7 GPU

Visible GPU Devices:
  /physical_device:GPU:4 GPU

Keras version: 2.4.0



Setup Numpy and Pandas.

In [3]:
import numpy as np
import pandas as pd

print(f'Numpy version:  {np.__version__}\nPandas version: {pd.__version__}\n')

Numpy version:  1.18.5
Pandas version: 1.0.5



## Helper functions

The helper function **load_csv_file()** reads a file in csv (comma seperated values) format, outputs its shape and columns, and returns the data as a Numpy array.

In [4]:
from pathlib import Path
import os

def load_csv_file(filename):

    root_dir = str(Path.home()) + r'/coursematerial/GIS/ISPRS/PointsWithFeatures'

    df = pd.read_csv(os.path.join(root_dir, filename), sep=" ")
    
    print(f'Loaded "{filename}"\n  Shape: {df.shape}')

    print('  Columns:', ', '.join([c for c in df.columns]), '\n')
    
    return df.to_numpy()

The helper function **save_colorized_point_cloud()** assigns each point of the Numpy array xyz a color according to the labels y, and saves the colorized points as a csv file with the given filename. The color coding is the same as used for the ISPRS benchmark.

You can then download this file on your computer, and visualize it with the open source software CloudCompare (https://www.danielgm.net/cc/). CloudCompare is probably the most used (open source) point cloud visualization software, especially in the field of geodata. When you open the file in CloudCompare, make sure you change the file filter to either all or csv. (Sorry about making you install a software on your own computer. We normally have this software on the computers in the computer pool. Depending on your hardware, the visualization of the large point cloud might be slow.) For the two dialogs that pop up, you can press "Yes to All". It translates your data into the estimated centroid of the dataset and interprets the columns of the data to determine what column is x, y, z, red, green, and blue.

If you click your dataset in the list view of CloudCompare that is called 'DB Tree', and then move your mouse in the 3D view to the top-left corner, then you can increase the points size, which helps when you zoom into the point clouds. You can do this also in the 'Properties' below 'DB Tree'. Try both ways and decide, which you find more convenient.

In [5]:
def save_colorized_point_cloud(xyz, y, filename):

    color_map = np.array([
        [255, 255, 125],
        [  0, 255, 255],
        [255, 255, 255],
        [255, 255,   0],
        [  0, 255, 125],
        [  0,   0, 255],
        [  0, 125, 255],
        [125, 255,   0],
        [  0, 255,   0]])
    
    u, inverses = np.unique(y, return_inverse=True)    
    
    colors = color_map[inverses]
    
    df = pd.DataFrame(xyz, columns=['x', 'y', 'z'])    

    df['red'] = pd.Series(data=colors[:,0], name='red')
    df['green'] = pd.Series(data=colors[:,1], name='green')
    df['blue'] = pd.Series(data=colors[:,2], name='blue')
    
    df.to_csv(filename, index=False, header=False)
    
    print(f'Saved "{filename}"')

## Load training data

Load the features and the labels as Numpy arrays from the files provided in the course material folder. The features are 12 selected features mostly from the lecture that proved to be effective for point cloud classification and on this particular data set. For other data sets, other features might work better and should be selected accordingly.

The only feature that is not mentioned in the lecture is 'ground_z', which is the height of the point over the digital elevation model. This is, of course, a very strong feature, and it helps to differentiate between impervious surfaces and roofs. The digital elevation model was generated from a prior classification that differentiates between ground and non-ground points. We therefore introduce already some strong prior information with this feature. However, the generation of digital terrain models is a long solved problem and such data exists from many sources. (There was a bug in the data preparation, and the 'ground_z' values are negated. But this will not effect the classification in any way. The neural network does not know that we do not live in an upside down world.)

The 2D geometric features from a discretized grid were generated with a bin size of 1m by 1m.

As k for the nearest neighbor search, a value of 21 was used.

In [6]:
# Numpy array with pre-calculated features
X = load_csv_file('Vaihingen3D_FEX12.csv.gz')

# Numpy array with labels
y = load_csv_file('Vaihingen3D_Labels.csv.gz')

Loaded "Vaihingen3D_FEX12.csv.gz"
  Shape: (1165598, 12)
  Columns: planarity, scattering, omnivariance, sum_eigenvalues, change_of_curvature, verticality, delta_z_knn, std_z_knn, delta_z, std_z, eigenvalue 3, ground_z 

Loaded "Vaihingen3D_Labels.csv.gz"
  Shape: (1165598, 1)
  Columns: label 



Define the class names of the labels, and determine and output the number of points per class. You will notice that some classes (e.g. powerline, car) are strongly underrepresented.

In [7]:
class_names = ['Powerline', 'Low vegetation', 'Impervious surfaces', 'Car', 'Fence/Hedge', 'Roof', 'Facade', 'Shrub', 'Tree']
u, counts = np.unique(y, return_counts=True)

for i in u:
    print(f'{class_names[i]:20}: {counts[i]}')

Powerline           : 1146
Low vegetation      : 279540
Impervious surfaces : 295709
Car                 : 8322
Fence/Hedge         : 19492
Roof                : 261093
Facade              : 38474
Shrub               : 72423
Tree                : 189399


We could normalize the features, but it has no noticeable effect. But you can try it out by uncommenting and executing the next cell.

In [None]:
#X = tf.keras.utils.normalize(X, axis=-1, order=2)

## Shuffle the training data

The data set used in this exercise is a toy dataset and not sufficiently large for Deep Learning. We therefore cheat a little in what part of the data we use for training and validation. And we do not evaluate the performance on test data. Rather, we use as much data for training and just enough for validation to observe how the training process is doing. In the end, we do predictions on a larger data set (where we have no labels) and do a visual evaluation (testing) instead.

In the following cell, the input point features and the labels are shuffled (using a permutation of indices). The reason is that the points of the input 3D point cloud are ordered according to how they were acquired by the laser scanner. Therefore, the points are close to each other. If you let TensorFlow reserve a certain percentage of this data as validation data, then it takes this validation part from the end of the data set. Unfortunately, this validation data is then a region of the data set that is not represented well in the other part, the training data. So, validation data and training data would not be much alike. This would make the evaluation scores rather low. Also, we would loose that underrepresented part of the data for training, which might lower the quality of the results for the large data set.

Therefore, another approach is taken. Since we classify each point individually, we just shuffle all the points and take a certain percentage of the shuffled points as validation data. These validation points are now better distributed in the data set. Of course, the validation points are also located close to the points that have been used for training. The evaluation is therefore not really fair and has little validity for unseen data. It just states how well the model would work on points from objects that have similar features as the ones in the training data. But like already stated, the data set is just not large enough for this task.

You can try to not shuffle the data and check out how it effects the training.

The data set is also a benchmark data set for (non Deep Learning) semantic segmentation. (However, the original training data and evaluation data is now concatenated in one large data set.) If you keep the sequential order, you could use 40% of the data for training, 20% for validation, and 40% for testing. That should be approximately be the original setup and see how this network performs.

Once we continue with the more elaborate Deep Learning architectures for 3D point cloud classification in future exercises, we will use the benchmark data as it was supposed to and will be more fair towards the evaluation of the results.

In [8]:
# Calculate a permutation of indices (shuffle indices)
p = np.random.default_rng().permutation(X.shape[0])

# Use the permutated indices to shuffle the array of features
X = np.take(X, p, axis=0)

# Use the (same) permutated indices to shuffle the array of labels
# (It is very important to use the same indices for both features and labels.)
y = np.take(y, p)

## Create the model

Define here your classification model (with the sequential API) as you have learned in previous exerices. Use only dense layers. Convolutions would not make sense on the features as they are not in any way (spatially or otherwise) related. ReLU activation and Xavier (He normal) initialization should work fine, but you can also try out different ones. Check out if batch normalization or Dropout makes any difference.

Most relevant will probably be the number of layers and the number of neurons per layer.

In [14]:
import tensorflow as tf

model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(12,)),
    tf.keras.layers.Dense(128, activation='relu', kernel_initializer=keras.initializers.HeNormal()),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dense(256, activation='relu', kernel_initializer=keras.initializers.HeNormal()),
    tf.keras.layers.Dense(9, activation="softmax", kernel_initializer=keras.initializers.HeNormal())],
    name='point_cloud_classifier')

The summary() method outputs a summary of the model, including the name of each layer, its type, the output shape, and the number of parameters. The first dimension (row) of the output shape determines the batch size, where None means that the batch size can be anything.

In [15]:
model.summary()

Model: "point_cloud_classifier"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten_1 (Flatten)          (None, 12)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 128)               1664      
_________________________________________________________________
batch_normalization_1 (Batch (None, 128)               512       
_________________________________________________________________
dense_3 (Dense)              (None, 256)               33024     
_________________________________________________________________
dense_4 (Dense)              (None, 9)                 2313      
Total params: 37,513
Trainable params: 37,257
Non-trainable params: 256
_________________________________________________________________


Compile the model with loss function, optimizer, and metrics to be calculated.

The sparse categorical crossentropy is used as each instance of the training data has a target class index (with a value between 0 and 9) as its label.

In [16]:
model.compile(loss='sparse_categorical_crossentropy',
              optimizer='sgd', 
              metrics=['accuracy'])

## Train the model

Train the model with the fit method, prodiving the features (X) as the training data and the labels y. The percentage of validation to be used is defined in the validation_split argument. Training for 5 epochs should get quite good results that might increase with further epochs. Just note that since we have no real seperate validation and testing data, you will not notice any overfitting. You might just see it visually in the predictions of the larger data, when the quality of class predictions decreases for the whole area, but is very good for the training area (that is included in the larger dataset).

The batch size really improves training time and model convergence. (When you do not provide any batch size yourself, a batch size of 32 is the default. This will slow down your training considerably. Try it out!)

In [17]:
# Provide the training data as a Numpy array X.
history = model.fit(x=X, y=y, batch_size=128, epochs=5, validation_split=0.20)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


Your validation accuracy should be around 72%, maybe even a little higher. But this is the best that can be achieved in a feature-based approach on this particular dataset. (Beware of overfitting.)

If you did not shuffle the data, then your accuracy should be a little lower. But it also depends on the percentage of the validation split in the fit method. If only 10% is used, then the validation data probably contains mostly ground points and vegetation that are quite easy to differentiate. (This is because without shuffling, it cuts the last 10% of points, which is the border of the region without any buildings.) And you should get almost the same accuracy. But if you use 20% as validation data, then it includes more buildings that have different properties as the ones in the other 80%. Then your accuracy drops to maybe 67%.

With no test data, we unfortunately cannot determine any performance measures, so we skip this part and note that our validation accuracy is not really a fair measure.

## Predictions for training data

Again, this is not a fair evaluation of the model. But the dataset is small enough so that it can be visualized even on a slow notebook. The following cell predicts the labels for the training data, uses the predicted labels to colorize the point cloud, and saves the output as a csv file. (You can then visualize it in CloudCompare (as noted above)).

Because the training data contains no coordinates, we also need load the original points.

In [18]:
y_pred = np.argmax(model.predict(X, verbose=1), axis=-1)

xyz = load_csv_file('Vaihingen3D_Points.csv.gz')

# make sure to shuffle also the xyz in the same way
xyz = np.take(xyz, p, axis=0)

save_colorized_point_cloud(xyz, y_pred, 'Vaihingen3D_Results.csv')

Loaded "Vaihingen3D_Points.csv.gz"
  Shape: (1165598, 3)
  Columns: x, y, z 

Saved "Vaihingen3D_Results.csv"


The calculated evaluation values that follow are in no way legitimate (as we did the predictions on the training data) and should never be performed like this in practice.

It is just to show the implementation of how to get the confusion matrix, the precision, recall, and f1 score for the predictions.

In [19]:
from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y, y_pred)

df = pd.DataFrame(data=cm, columns=class_names, index=class_names)

df

Unnamed: 0,Powerline,Low vegetation,Impervious surfaces,Car,Fence/Hedge,Roof,Facade,Shrub,Tree
Powerline,199,6,33,0,0,79,33,10,786
Low vegetation,0,142112,118376,230,377,3069,646,12072,2658
Impervious surfaces,0,46239,247994,68,84,229,159,847,89
Car,0,1758,18,816,416,426,18,4804,66
Fence/Hedge,0,2930,39,367,1409,1439,151,11490,1667
Roof,5,1128,958,38,241,234085,1247,3453,19938
Facade,45,2079,153,32,27,4115,18113,2741,11169
Shrub,10,13158,919,384,645,2398,1077,40963,12869
Tree,29,3482,167,82,193,7749,2184,16536,158977


In [20]:
from sklearn.metrics import precision_score, recall_score, f1_score

precision = precision_score(y, y_pred, average='weighted')
recall = recall_score(y, y_pred, average='weighted')
f1 = f1_score(y, y_pred, average='weighted')

print(f'Precision: {precision:.2f}')
print(f'Recall   : {recall:.2f}')
print(f'F1       : {f1:.2f}')

Precision: 0.72
Recall   : 0.72
F1       : 0.72


## Predictions on large 3D point cloud

In the following, we do the predictions on a larger area around and including the ISPRS benchmark area, and save the result as a colorized point cloud.

In [21]:
X_large = load_csv_file('Vaihingen3D_Large_FEX12.csv.gz')

y_large_pred = np.argmax(model.predict(X_large, verbose=1), axis=-1)

xyz_large = load_csv_file('Vaihingen3D_Large_Points.csv.gz')

save_colorized_point_cloud(xyz_large, y_large_pred, 'Vaihingen3D_Large_Results.csv')

Loaded "Vaihingen3D_Large_FEX12.csv.gz"
  Shape: (6112862, 12)
  Columns: planarity, scattering, omnivariance, sum_eigenvalues, change_of_curvature, verticality, delta_z_knn, std_z_knn, delta_z, std_z, eigenvalue 3, ground_z 

Loaded "Vaihingen3D_Large_Points.csv.gz"
  Shape: (6112862, 3)
  Columns: x, y, z 

Saved "Vaihingen3D_Large_Results.csv"


## Visual evaluation

Use CloudCompare to visually evaluate the results. You can load in both datasets, the small one from the training data and the larger one. By clicking "Yes to All", they should be prefectly aligned with one another. Then,  you can switch them on and off in the "DB Tree" and compare how the model performed on the larger dataset in comparison to the training data. Buildings that are similar to the ones in the training data should be rather well captured, while other might be worse.

While preparing this notebook, we noticed that the prediction on the larger dataset does not give the same results for the points that are also included in the training area. This is rather strange and should not happen. The reasons could be a bug or the use of different parameters when preparing the data. We cannot exclude this as a source of error. More likely, however, is that different digital terrain models were generated for the small dataset and the larger dataset, and that the elevations differ, e.g., because of the used interpolation and ground/non-ground point classification. As we will not continue with the dataset of features, we decided not to follow up on this issue.

## Outlook

In the next lecture and exercise, we will have the Deep Learning model calculate relevant features by itself. This will reflect better on the recent advances of neural networks as feature encoders.