# Example training job using pytorch lightning with local gpu compute

This notebook contains steps to download data and run a training job using pytorch lightning with distributed training on multi-core gpu instances. Note that this code is tested on a SageMaker notebook instance with ml.p3.16xlarge (8gpu cores). The kernel used is `conda_pytorch_p38`

Steps in the notebook:
1. download and prepare the KITTI Semantic Segmentation Benchmark dataset.
2. install addtional packages
3. run training job in a python script

The training code is developed based on the original source on [Lighting-AI](https://github.com/Lightning-AI/lightning/blob/9cc714cdd12b90faea1b4fc7265dd384b224792e/src/pytorch_lightning/trainer/trainer.py).

### download data

In [None]:
!wget https://s3.eu-central-1.amazonaws.com/avg-kitti/data_semantics.zip

In [None]:
!unzip data_semantics.zip -d data_semantics

### install packages

In [None]:
! pip install --upgrade future
! pip install -r code/requirements.txt

### run python script
This script will use the data stored locally to train a sematic segmentation model and save the trained model as a '.pth' file at the local directory.
Note that in this example, we set the precision to 16-bit floating-point which requires pytorch version equal to or greater than 1.10.0

In [11]:
!python code/semantic_segmentation.py --data_path data_semantics --batch_size=4 --max_epochs 5 --model_dir ./

Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/8
Initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/8
data_semantics
Input file list: []
Initializing distributed: GLOBAL_RANK: 3, MEMBER: 4/8
data_semantics
Input file list: []
Initializing distributed: GLOBAL_RANK: 4, MEMBER: 5/8
data_semantics
Input file list: []
Initializing distributed: GLOBAL_RANK: 5, MEMBER: 6/8
data_semantics
Input file list: []
Initializing distributed: GLOBAL_RANK: 6, MEMBER: 7/8
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/8
data_semantics
Input file list: []
Initializing distributed: GLOBAL_RANK: 7, MEMBER: 8/8
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 8 processes


While the code is running, you can open a Terminal session and run "\$nvidia-smi" to check the gpu utilisation.
You can see the output as below:
![nvidia-smi](img/nvidia-smi.png)