# Pytorch Lightning
#### Experiments using Pytorch Lightning to build custom CNN classification model.


Pytorch Lightning implementation
In this notebook I will implement CNN model using Pytorch Lightning. This model will be more flexible, than model from initial_experiments.ipynb, to provide more hyperparameters for training sessions.

In [1]:
from scripts.models.lightning_model import LightningCNN, ImageDataModule

In [2]:
import os
from pathlib import Path

from dotenv import load_dotenv
load_dotenv()
root_data = os.getenv("KAGGLE_FILES_DIR")
dataset_path = Path(os.getcwd(), "..", root_data, 'processed')

Transformations by default are only resizing to 256x256.

In [3]:
datamodule = ImageDataModule(data_dir=dataset_path, batch_size=32)

# Training
1st model parameters:
- conv_layers: 5
- fc_layer_sizes: (256, 128)
- input_size: torch.Size([3, 256, 256])
- out_classes: 1
- initial_filters: 32
- hl_kernel_size: 5
- activation_func: nn.ReLU
- max_pool_kernel: 2
- dropout_conv: False
- dropout_fc: False
- dropout_rate: 0.5
- initial_learning_rate: 0.01
- loss_func: nn.BCEWithLogitsLoss
- optimizer: Adam
- metrics: Accuracy, Precision, Recall, F1, AUC, ConfusionMatrix

In [5]:
import torch
import pytorch_lightning as pl

model_1 = LightningCNN(
    conv_layers=5,
    fc_layer_sizes=(256, 128),
    input_size=torch.Size([3, 256, 256]),
    initial_filters=32,
    out_classes=1,
    hl_kernel_size=5,
    max_pool_kernel=2,
    dropout_conv=False,
    dropout_fc=False,
    initial_learning_rate=0.01
)

checkpoint_callback = pl.callbacks.ModelCheckpoint(
    monitor='valid_loss',
    dirpath='../models/lightning',
    filename='model-9M-5conv-nodrop-{epoch:02d}-{valid_loss:.2f}',
    save_top_k=2,
    mode='min') 

early_stopping = pl.callbacks.EarlyStopping(
    monitor='valid_loss',
    min_delta=0.001,
    patience=5,
    verbose=True,
    mode='min'
)

trainer = pl.Trainer(
    max_epochs=100,
    callbacks=[checkpoint_callback, early_stopping]
)

GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


In [9]:
trainer.fit(model_1, datamodule)


  | Name          | Type                  | Params
--------------------------------------------------------
0 | loss_func     | BCEWithLogitsLoss     | 0     
1 | accuracy      | BinaryAccuracy        | 0     
2 | precision     | BinaryPrecision       | 0     
3 | recall        | BinaryRecall          | 0     
4 | f1            | BinaryF1Score         | 0     
5 | auc           | BinaryAUROC           | 0     
6 | confmat       | BinaryConfusionMatrix | 0     
7 | hidden_layers | Sequential            | 4.4 M 
8 | fc_layers     | Sequential            | 4.8 M 
--------------------------------------------------------
9.1 M     Trainable params
0         Non-trainable params
9.1 M     Total params
36.429    Total estimated model params size (MB)


Sanity Checking: |          | 0/? [00:00<?, ?it/s]



Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Metric valid_loss improved. New best score: 0.587


Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Monitored metric valid_loss did not improve in the last 5 records. Best score: 0.587. Signaling Trainer to stop.


In [11]:
trainer.test(model_1, datamodule)

Testing: |          | 0/? [00:00<?, ?it/s]

[{'test_loss': 0.5862252116203308,
  'test_accuracy': 0.7269971966743469,
  'test_precision': 0.0,
  'test_recall': 0.0,
  'test_f1': 0.0,
  'test_auc': 0.0}]

Accuracy: 0.727 -> this is no better than distribution of classes in dataset.

# Training
2nd model parameters:
- conv_layers: 6
- fc_layer_sizes: (256, 128)
- input_size: torch.Size([3, 256, 256])
- out_classes: 1
- initial_filters: 32
- hl_kernel_size: 5
- activation_func: nn.ReLU
- max_pool_kernel: 2
- dropout_conv: True
- dropout_fc: True
- dropout_rate: 0.5
- initial_learning_rate: 0.01
- loss_func: nn.BCEWithLogitsLoss
- optimizer: Adam
- metrics: Accuracy, Precision, Recall, F1, AUC, ConfusionMatrix

In [9]:

model_2 = LightningCNN(
    conv_layers=6,
    fc_layer_sizes=(256, 128),
    input_size=torch.Size([3, 256, 256]),
    initial_filters=32,
    out_classes=1,
    hl_kernel_size=5,
    max_pool_kernel=2,
    dropout_conv=True,
    dropout_fc=True,
    dropout_rate=0.5,
    initial_learning_rate=0.01
)

checkpoint_callback = pl.callbacks.ModelCheckpoint(
    monitor='valid_loss',
    dirpath='../models/lightning',
    filename='model-18M-6conv-0.5drop-{epoch:02d}-{valid_loss:.2f}',
    save_top_k=2,
    mode='min') 

early_stopping = pl.callbacks.EarlyStopping(
    monitor='valid_loss',
    min_delta=0.001,
    patience=3,
    verbose=True,
    mode='min'
)

trainer = pl.Trainer(
    max_epochs=100,
    callbacks=[checkpoint_callback, early_stopping]
)

GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


In [10]:
trainer.fit(model_2, datamodule)


  | Name          | Type                  | Params
--------------------------------------------------------
0 | loss_func     | BCEWithLogitsLoss     | 0     
1 | accuracy      | BinaryAccuracy        | 0     
2 | precision     | BinaryPrecision       | 0     
3 | recall        | BinaryRecall          | 0     
4 | f1            | BinaryF1Score         | 0     
5 | auc           | BinaryAUROC           | 0     
6 | confmat       | BinaryConfusionMatrix | 0     
7 | hidden_layers | Sequential            | 17.5 M
8 | fc_layers     | Sequential            | 1.1 M 
--------------------------------------------------------
18.5 M    Trainable params
0         Non-trainable params
18.5 M    Total params
74.182    Total estimated model params size (MB)


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Metric valid_loss improved. New best score: 0.587


Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Monitored metric valid_loss did not improve in the last 3 records. Best score: 0.587. Signaling Trainer to stop.


In [11]:
trainer.test(model_2, datamodule)

Testing: |          | 0/? [00:00<?, ?it/s]

[{'test_loss': 0.5863068699836731,
  'test_accuracy': 0.7269971966743469,
  'test_precision': 0.0,
  'test_recall': 0.0,
  'test_f1': 0.0,
  'test_auc': 0.0}]

Accuracy: 0.727 -> almost no difference.

One more complicated model with more layers untill I am out of memory to allocate.

# Training
3rd model parameters:
- conv_layers: 6
- fc_layer_sizes: (1024, 512, 256)
- input_size: torch.Size([3, 256, 256])
- out_classes: 1
- initial_filters: 64
- hl_kernel_size: 5
- activation_func: nn.ReLU
- max_pool_kernel: 2
- dropout_conv: True
- dropout_fc: True
- dropout_rate: 0.5
- initial_learning_rate: 0.01
- loss_func: nn.BCEWithLogitsLoss
- optimizer: Adam
- metrics: Accuracy, Precision, Recall, F1, AUC, ConfusionMatrix

In [27]:

model_3 = LightningCNN(
    conv_layers=6,
    fc_layer_sizes=(1024, 512, 256),
    input_size=torch.Size([3, 256, 256]),
    initial_filters=64,
    out_classes=1,
    hl_kernel_size=5,
    max_pool_kernel=2,
    dropout_conv=True,
    dropout_fc=True,
    dropout_rate=0.5,
    initial_learning_rate=0.01
)

checkpoint_callback = pl.callbacks.ModelCheckpoint(
    monitor='valid_loss',
    dirpath='../models/lightning',
    filename='model-79M-6conv-0.5drop-{epoch:02d}-{valid_loss:.2f}',
    save_top_k=2,
    mode='min') 

early_stopping = pl.callbacks.EarlyStopping(
    monitor='valid_loss',
    min_delta=0.001,
    patience=3,
    verbose=True,
    mode='min'
)

trainer = pl.Trainer(
    max_epochs=100,
    callbacks=[checkpoint_callback, early_stopping]
)

GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


In [28]:
trainer.fit(model_3, datamodule)


  | Name          | Type                  | Params
--------------------------------------------------------
0 | loss_func     | BCEWithLogitsLoss     | 0     
1 | accuracy      | BinaryAccuracy        | 0     
2 | precision     | BinaryPrecision       | 0     
3 | recall        | BinaryRecall          | 0     
4 | f1            | BinaryF1Score         | 0     
5 | auc           | BinaryAUROC           | 0     
6 | confmat       | BinaryConfusionMatrix | 0     
7 | hidden_layers | Sequential            | 69.8 M
8 | fc_layers     | Sequential            | 9.0 M 
--------------------------------------------------------
78.9 M    Trainable params
0         Non-trainable params
78.9 M    Total params
315.567   Total estimated model params size (MB)


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Metric valid_loss improved. New best score: 0.586


Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Monitored metric valid_loss did not improve in the last 3 records. Best score: 0.586. Signaling Trainer to stop.


In [30]:
trainer.test(model_3, datamodule)

Testing: |          | 0/? [00:00<?, ?it/s]

[{'test_loss': 0.5862261652946472,
  'test_accuracy': 0.7269971966743469,
  'test_precision': 0.0,
  'test_recall': 0.0,
  'test_f1': 0.0,
  'test_auc': 0.5}]

Pytorch Lighnitng models are easier to build, but results are no better.
In next step I will use pretrained Resnet model.