## Training a Semantic Segmentation Model

In this tutorial, we will learn how to train a semantic segmentation model using either PyTorch or Tensorflow. 

## Installing dependencies

Before you start, chose either Pyorch or Tensorflow. You cannot chose both. The requirements text files are present in the main directory of Open3D-ML respoitory. Use them to ensure that you have the right dependencies installed in your enviroment. 

From Open3D-ML main directory,

PyTorch users may run:
```sh
pip install -r requirements-torch-cuda.txt
```

Tensorflow users may run:
```sh
pip install -r requirements-tensoflow.txt
```

Create a folder where your dataset will be downloaded.

In [3]:
!mkdir -p data

### Downloading Toronto3D dataset

Open3D-ML provides [scripts](https://github.com/isl-org/Open3D-ML/tree/master/scripts/download_datasets) to download datasets locally. Let us download Toronto3D dataset. (We chose this dataset becuase it is small in size compared to other datasets like SemanticKITTI).

Let us use the Toronto3D dataset script to download point clouds. Let use write the downloading script locally first:

In [4]:
%%writefile data/download_toronto3d.sh
#!/bin/bash
  
if [ "$#" -ne 1 ]; then
    echo "Please, provide the base directory to store the dataset."
    exit 1
fi

if ! command -v unzip &> /dev/null
then
    echo "Error: unzip could not be found. Please, install it to continue."
    exit
fi

BASE_DIR="$1"/Toronto3D

export url="https://xx9lca.sn.files.1drv.com/y4mUm9-LiY3vULTW79zlB3xp0wzCPASzteId4wdUZYpzWiw6Jp4IFoIs6ADjLREEk1-IYH8KRGdwFZJrPlIebwytHBYVIidsCwkHhW39aQkh3Vh0OWWMAcLVxYwMTjXwDxHl-CDVDau420OG4iMiTzlsK_RTC_ypo3z-Adf-h0gp2O8j5bOq-2TZd9FD1jPLrkf3759rB-BWDGFskF3AsiB3g"

mkdir -p $BASE_DIR

wget -c -N -O $BASE_DIR'/Toronto_3D.zip' $url

cd $BASE_DIR

unzip -j Toronto_3D.zip

# cleanup
mkdir -p $BASE_DIR/zip_files
mv Toronto_3D.zip $BASE_DIR/zip_files

Writing data/download_toronto3d.sh


The bash script takes path to output folder as input where dataset must be downloaded.

In [6]:
!bash data/download_toronto3d.sh data/toronto3d_dataset

for details.

--2022-06-14 22:46:55--  https://xx9lca.sn.files.1drv.com/y4mUm9-LiY3vULTW79zlB3xp0wzCPASzteId4wdUZYpzWiw6Jp4IFoIs6ADjLREEk1-IYH8KRGdwFZJrPlIebwytHBYVIidsCwkHhW39aQkh3Vh0OWWMAcLVxYwMTjXwDxHl-CDVDau420OG4iMiTzlsK_RTC_ypo3z-Adf-h0gp2O8j5bOq-2TZd9FD1jPLrkf3759rB-BWDGFskF3AsiB3g
Resolving xx9lca.sn.files.1drv.com (xx9lca.sn.files.1drv.com)... 13.107.42.12
Connecting to xx9lca.sn.files.1drv.com (xx9lca.sn.files.1drv.com)|13.107.42.12|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1143871417 (1.1G) [application/zip]
Saving to: ‘data/toronto3d_dataset/Toronto3D/Toronto_3D.zip’


2022-06-14 22:48:03 (16.2 MB/s) - ‘data/toronto3d_dataset/Toronto3D/Toronto_3D.zip’ saved [1143871417/1143871417]

Archive:  Toronto_3D.zip
  inflating: Colors.xml              
  inflating: L001.ply                
  inflating: L002.ply                
  inflating: L003.ply                
  inflating: L004.ply                
  inflating: Mavericks_classes_9.txt  


You may see downloaded point cloud files as shown below:

```
/data/toronto3d
└── Toronto3D
    ├── Colors.xml
    ├── L001.ply
    ├── L002.ply
    ├── L003.ply
    ├── L004.ply
    ├── Mavericks_classes_9.txt
    └── zip_files
        └── Toronto_3D.zip
```

## Limiting memory usage (Tesorflow Users)

TensorFlow maps nearly all of GPU memory by default. This may result in out_of_memory error if some of the ops allocate memory independent to tensorflow. You may want to limit memory usage as and when needed by the process. Use following code right after importing tensorflow:

```python
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
    except RuntimeError as e:
        print(e)
```

## Building componets for training

First, let us do some basic imports. Import statements are depend on what framework you are using - Pytorch or Tensorflow.

Tesorflow users may run:

```py
import open3d.ml.tf as ml3d
from open3d.ml.tf.models import RandLANet
from open3d.ml.tf.pipelines import SemanticSegmentation
```

Pytorch users may run:

In [23]:
import open3d.ml.torch as ml3d
from open3d.ml.torch.models import RandLANet
from open3d.ml.torch.pipelines import SemanticSegmentation

The goal was to create `ml3d`, `RandLANet` and `SemanticSegmentation` based on either PyTorch or Tensorflow framework.

Let us now define a config file with `dataset`, `model` and `pipeline`. Just add dataset path to the config file present in `Open3D-ML/ml3d/configs/randlanet_toronto3d.yml`

In [30]:
%%writefile data/randlanet_toronto3d.yml
dataset:
  name: Toronto3D
  cache_dir: ./logs/cache
  dataset_path: data/toronto3d_dataset/Toronto3D # path/to/your/dataset
  class_weights: [41697357, 1745448, 6572572, 19136493, 674897, 897825, 4634634, 374721]
  ignored_label_inds:
  - 0
  num_classes: 8
  num_points: 65536
  test_files:
  - L002.ply
  test_result_folder: ./test
  train_files:
  - L001.ply
  - L003.ply
  - L004.ply
  use_cache: true
  val_files:
  - L002.ply
  steps_per_epoch_train: 100
  steps_per_epoch_valid: 10
model:
  name: RandLANet
  batcher: DefaultBatcher
  ckpt_path: # path/to/your/checkpoint
  num_neighbors: 16
  num_layers: 5
  num_points: 65536
  num_classes: 8
  ignored_label_inds: [0]
  sub_sampling_ratio: [4, 4, 4, 4, 2]
  in_channels: 6
  dim_features: 8
  dim_output: [16, 64, 128, 256, 512]
  grid_size: 0.05
  augment:
    recenter:
      dim: [0, 1, 2]
    normalize:
      points:
        method: linear
pipeline:
  name: SemanticSegmentation
  optimizer:
    lr: 0.001
  batch_size: 2
  main_log_dir: ./logs
  max_epoch: 200
  save_ckpt_freq: 5
  scheduler_gamma: 0.99
  test_batch_size: 1
  train_sum_dir: train_log
  val_batch_size: 2
  summary:
    record_for: []
    max_pts:
    use_reference: false
    max_outputs: 1

Overwriting data/randlanet_toronto3d.yml


Load the config file:

In [31]:
from open3d.ml import utils

cfg_file = "data/randlanet_toronto3d.yml"
cfg = utils.Config.load_from_file(cfg_file)

Create dataset object from config file:

In [32]:
dataset = ml3d.datasets.Toronto3D(**cfg.dataset)

Create model object from config file: 

In [33]:
model = RandLANet(**cfg.model)

Create a pipeline object from model, dataset and config file

In [34]:
pipeline = SemanticSegmentation(model=model,
                                dataset=dataset,
                                **cfg.pipeline)

### Training the model

In [None]:
pipeline.run_train()

INFO - 2022-06-14 23:01:41,840 - semantic_segmentation - DEVICE : cpu
INFO - 2022-06-14 23:01:41,842 - semantic_segmentation - Logging in file : ./logs/RandLANet_Toronto3D_torch/log_train_2022-06-14_23:01:41.txt
INFO - 2022-06-14 23:01:41,844 - toronto3d - Found 3 pointclouds for train
preprocess:   0%|                                                | 0/3 [00:00<?, ?it/s]

The training checkpoints are saved in: `pipeline.main_log_dir` (default path is: `'./logs/Model_Dataset/'`). You may use them for testing, inference and re-training purposes.

Have a look at training logs and loss curves. Understand if your model is generalizing well.

## Evaluation on holdout split

evaluate the trained model on the test split by calling the `run_test()` method:

In [None]:
pipeline.run_test()

INFO - 2022-06-15 00:43:22,308 - semantic_segmentation - DEVICE : cpu
INFO - 2022-06-15 00:43:22,309 - semantic_segmentation - Logging in file : ./logs/RandLANet_Toronto3D_torch/log_test_2022-06-15_00:43:22.txt
INFO - 2022-06-15 00:43:22,311 - toronto3d - Found 1 pointclouds for test
INFO - 2022-06-15 00:43:25,911 - semantic_segmentation - Initializing from scratch.
INFO - 2022-06-15 00:43:25,912 - semantic_segmentation - Started testing
test 0/1:  75%|████████████████████▉       | 3731827/4990714 [10:25<05:27, 3846.59it/s]

## Running inference on individual point cloud
 
`pipeline.run_inference` method lets us make predictions on novel point cloud data. It takes a dictionary with `point`, `feat` and `label` keys as input.

In [None]:
# Input data
train_split = dataset.get_split("test")
data = train_split.get_data(0)

print(data.keys())

In [None]:
# Run inference
results = pipeline.run_inference(data)

# Print the results
print(results)

<div class="alert alert-info">
**Note:** You may replace `data` dictionary with custom point cloud data. 
</div>

## Restoring from checkoints

In the above example, you may notice that we are using latest trained model to make predictions. Latest trained models may not always be the best model so you may want to use model from previous checkpoints instead. 

`pipeline` provides `load_ckpt` method to restore model weights from training checkpoints.

In [None]:
ckpt_path = "path/to/trained/model"
pipeline.load_ckpt(ckpt_path=ckpt_path)