# Train PointNet (https://arxiv.org/abs/1612.00593).

This notebook shows you how to use the PreprocessedDataGenerator in order to train PointNet.

The PreprocessedDataGenerator uses preprocessed-data instead of ETL-data. Wheras ETL-data comes mainly as PCD-files, preprocessed-data comes mainly as pointclouds stored as numpy-arrays. We identified PCD-loading as a bottleneck. 

In [3]:
import sys
sys.path.insert(0, "..")

import numpy as np
import os
import random

# Get the dataset path.

This snippet shows you how to get the lates preprocessed path.

In [4]:
from cgmcore.preprocesseddatagenerator import get_dataset_path

dataset_path = get_dataset_path("/whhdata/preprocessed")
print("Using dataset path", dataset_path)

Using dataset path /whhdata/preprocessed/2019_04_21_22_32_09


# Hyperparameters.

In [6]:
steps_per_epoch = 10
validation_steps = 10
epochs = 100
batch_size = 1
random_seed = 667

# Create data-generator.

The method create_datagenerator_from_parameters is a convencience method. It allows you to instantiate a generator from a specification-dictionary.

In [7]:
from cgmcore.preprocesseddatagenerator import create_datagenerator_from_parameters

dataset_parameters_pointclouds = {}
dataset_parameters_pointclouds["input_type"] = "pointcloud"
dataset_parameters_pointclouds["output_targets"] = ["weight"]
dataset_parameters_pointclouds["random_seed"] = random_seed
dataset_parameters_pointclouds["pointcloud_target_size"] = 35000
dataset_parameters_pointclouds["pointcloud_random_rotation"] = False
dataset_parameters_pointclouds["sequence_length"] = 0
datagenerator_instance_pointclouds = create_datagenerator_from_parameters(dataset_path, dataset_parameters_pointclouds)

Creating data-generator...


# Getting the QR-Codes and do a train-validate-split.

The data-generator is perfectly capable of retrieving all QR-codes from the dataset. This snipped shows how to do so and how to split the QR-codes into two sets: Train and validate.

In [11]:
# Get the QR-codes.
qrcodes_to_use = datagenerator_instance_pointclouds.qrcodes[0:9999]

# Do the split.
random.seed(random_seed)
qrcodes_shuffle = qrcodes_to_use[:]
random.shuffle(qrcodes_shuffle)
split_index = int(0.8 * len(qrcodes_shuffle))
qrcodes_train = sorted(qrcodes_shuffle[:split_index])
qrcodes_validate = sorted(qrcodes_shuffle[split_index:])
del qrcodes_shuffle
print("QR-codes for training:\n", "\t".join(qrcodes_train))
print("QR-codes for validation:\n", "\t".join(qrcodes_validate))

QR-codes for training:
 MH_WHH_0001	MH_WHH_0002	MH_WHH_0003	MH_WHH_0004	MH_WHH_0007	MH_WHH_0008	MH_WHH_0011	MH_WHH_0012	MH_WHH_0013	MH_WHH_0014	MH_WHH_0016	MH_WHH_0018	MH_WHH_0019	MH_WHH_0022	MH_WHH_0027	MH_WHH_0028	MH_WHH_0030	MH_WHH_0031	MH_WHH_0032	MH_WHH_0033	MH_WHH_0034	MH_WHH_0035	MH_WHH_0036	MH_WHH_0039	MH_WHH_0041	MH_WHH_0042	MH_WHH_0043	MH_WHH_0044	MH_WHH_0045	MH_WHH_0046	MH_WHH_0047	MH_WHH_0048	MH_WHH_0049	MH_WHH_0053	MH_WHH_0056	MH_WHH_0063	MH_WHH_0075	MH_WHH_0081	MH_WHH_0082	MH_WHH_0083	MH_WHH_0095	MH_WHH_0096	MH_WHH_0104	MH_WHH_0116	MH_WHH_0117	MH_WHH_0118	MH_WHH_0120	MH_WHH_0125	MH_WHH_0135	MH_WHH_0137	MH_WHH_0143	MH_WHH_0148	MH_WHH_0149	MH_WHH_0150	MH_WHH_0153	MH_WHH_0154	MH_WHH_0155	MH_WHH_0156	MH_WHH_0158	MH_WHH_0159	MH_WHH_0161	MH_WHH_0162	MH_WHH_0164	MH_WHH_0166	MH_WHH_0167	MH_WHH_0170	MH_WHH_0176	MH_WHH_0177	MH_WHH_0178	MH_WHH_0179	MH_WHH_0182	MH_WHH_0183	MH_WHH_0185	MH_WHH_0187	MH_WHH_0188	MH_WHH_0189	MH_WHH_0192	MH_WHH_0202	MH_WHH_0204	MH_WHH_0206	MH_WHH_0207	MH_W

# Creating python generators for training and validation.

Now both QR-codes lists can be used for creating the actual generators. One for training and one for validation.

In [12]:
# Create python generators.
generator_pointclouds_train = datagenerator_instance_pointclouds.generate(size=batch_size, qrcodes_to_use=qrcodes_train)
generator_pointclouds_validate = datagenerator_instance_pointclouds.generate(size=batch_size, qrcodes_to_use=qrcodes_validate)

# Using the generator to create data manually.

Of course you can use the generator to create data manually anytime.

In [13]:
train_x, train_y = next(generator_pointclouds_train)
print("Input-shape:", train_x.shape)
print("Output-shape:", train_y.shape)

Input-shape: (1, 35000, 3)
Output-shape: (1, 1)


# Training-details.

Training-details are a dictionary that gets stored in a file after training. It is supposed to contain information that is valuable. For example data that is relevant for training including the hyper-parameters. Intended to be used when comparing different models.

In [14]:
training_details = {
    "dataset_path" : dataset_path,
    "qrcodes_train" : qrcodes_train,
    "qrcodes_validate" : qrcodes_validate,
    "steps_per_epoch" : steps_per_epoch,
    "validation_steps" : validation_steps,
    "epochs" : epochs,
    "batch_size" : batch_size,
    "random_seed" : random_seed,
}

# Training PointNet.

The module modelutils contains methods for creating Neural Nets. The following code shows how to instantiate and train PointNet.

In [15]:
from cgmcore import modelutils

input_shape = (dataset_parameters_pointclouds["pointcloud_target_size"], 3)
output_size = 1
model_pointnet = modelutils.create_point_net(input_shape, output_size, hidden_sizes = [64])
model_pointnet.summary()
    
model_pointnet.compile(
    optimizer="rmsprop",
    loss="mse",
    metrics=["mae"]
    )

history = model_pointnet.fit_generator(
    generator_pointclouds_train,
    steps_per_epoch=steps_per_epoch,
    epochs=epochs,
    validation_data=generator_pointclouds_validate,
    validation_steps=validation_steps
    )

Using TensorFlow backend.


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 35000, 3)          0         
_________________________________________________________________
lambda_1 (Lambda)            (None, 35000, 3)          0         
_________________________________________________________________
conv1d_4 (Conv1D)            (None, 35000, 64)         256       
_________________________________________________________________
batch_normalization_6 (Batch (None, 35000, 64)         256       
_________________________________________________________________
conv1d_5 (Conv1D)            (None, 35000, 64)         4160      
_________________________________________________________________
batch_normalization_7 (Batch (None, 35000, 64)         256       
_________________________________________________________________
lambda_2 (Lambda)            (None, 35000, 64)         0         
__________

Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78/100
Epoch 79/100


Epoch 80/100
Epoch 81/100
Epoch 82/100
Epoch 83/100
Epoch 84/100
Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


# Saving everything.

This saves the model, its history and the training-details to some output directory. The created artifacts can later be uses in order to compare different models.

In [11]:
output_path = "."

modelutils.save_model_and_history(output_path, model_pointnet, history, training_details, "pointnet")

Saving model and history...
Saved model weights to./20190505-1652-pointnet-model-weights.h5
Saved training details to./20190505-1652-pointnet-details.p
Saved history to./20190505-1652-pointnet-history.p
