# Metrics
Once we have trained our model we want to estimate whether it is good enough for our purposes or whether we have to do something else. The problem with the CT scan dataset is that most voxels aren't nodules. The ratio betweenn positive samples (non nodules) and negative samples (nodules) is 400:1. So when we train our model it doesn't learn very much how to discriminate a nodule from a normal cell. We have to distinguish between false positives (FP) and true positives (TP) and also between false negatives (FN) and true negatives (TN). In our CT scan dataset there are many true negatives and few true positives that can be incorrectly classified resulting in false positives or false negatives. We have to use a different metric than the one we used previously focused only on the true positives or true negatives.

## Precision and Recall
These are two metrics. Recall (R), also known as sensitivity, represents the ratio between the true positives and the union of the true positives and the false negatives

$$R = \frac{TP}{TP + FN}$$

Precision (P) is the ratio between true positives and the union of the true positives and false positives

$$P = \frac{TP}{TP + FP}$$

A model can have a high recall or a high precision. If precison or recall is very low, that means our model is not working properly. 

## F1 score
A metric that combines precision and recall is the [F1 score](https://en.wikipedia.org/wiki/F-score). We want a metric that is high when precision or recall is high in a balanced way. The $F_1$ score is defined as the harmonic mean of precision and recall

$$F_1 = 2\frac{1}{\frac{1}{P} + \frac{1}{R}} = 2\frac{PR}{P + R}$$

The $F_1$ score ranges in the interval [0, 1] where 0 represents a model with no precision nor recall and 1 represents a model with perfect precision and recall. The F1 score is implemented in the p2ch12/training.py script.



In [None]:
!git clone https://github.com/deep-learning-with-pytorch/dlwpt-code.git

Cloning into 'dlwpt-code'...
remote: Enumerating objects: 703, done.[K
remote: Total 703 (delta 0), reused 0 (delta 0), pack-reused 703[K
Receiving objects: 100% (703/703), 176.00 MiB | 16.15 MiB/s, done.
Resolving deltas: 100% (309/309), done.
Checking out files: 100% (228/228), done.


## Downloading the data

In [None]:
cd dlwpt-code/

/content/dlwpt-code


In [None]:
mkdir data-unversioned

In [None]:
cd data-unversioned/

/content/dlwpt-code/data-unversioned


In [None]:
mkdir part2

In [None]:
cd part2

/content/dlwpt-code/data-unversioned/part2


In [None]:
mkdir luna

In [None]:
cd luna

/content/dlwpt-code/data-unversioned/part2/luna


In [None]:
!wget https://zenodo.org/record/3723295/files/subset0.zip

--2022-12-04 18:19:44--  https://zenodo.org/record/3723295/files/subset0.zip
Resolving zenodo.org (zenodo.org)... 188.185.124.72
Connecting to zenodo.org (zenodo.org)|188.185.124.72|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6811924508 (6.3G) [application/octet-stream]
Saving to: ‘subset0.zip’


2022-12-04 18:27:01 (14.9 MB/s) - ‘subset0.zip’ saved [6811924508/6811924508]



In [None]:
!7z x subset0.zip


7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,2 CPUs Intel(R) Xeon(R) CPU @ 2.00GHz (50653),ASM,AES-NI)

Scanning the drive for archives:
  0M Scan         1 file, 6811924508 bytes (6497 MiB)

Extracting archive: subset0.zip

ERRORS:
Headers Error

--
Path = subset0.zip
Type = zip
ERRORS:
Headers Error
Physical Size = 6811924508
64-bit = +

  0%      0% 1 - subset0/1.3.6.1.4.1.14519.5.2.1.6 . 105756658031515062000744821260.raw                                                                                 0% 2        0% 3 - subset0/1.3.6.1.4.1.14519.5.2.1.6 . 108197895896446896160048741492.raw                                

In [None]:
!pip install SimpleITK

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting SimpleITK
  Downloading SimpleITK-2.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (52.8 MB)
[K     |████████████████████████████████| 52.8 MB 256 kB/s 
[?25hInstalling collected packages: SimpleITK
Successfully installed SimpleITK-2.2.0


In [None]:
!pip install "diskcache==4.1.0"

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting diskcache==4.1.0
  Downloading diskcache-4.1.0-py2.py3-none-any.whl (44 kB)
[K     |████████████████████████████████| 44 kB 3.1 MB/s 
[?25hInstalling collected packages: diskcache
Successfully installed diskcache-4.1.0


In [None]:
cd /content/dlwpt-code/

/content/dlwpt-code


## Setting up the LunaDataset
We set up the LunaDataset to train the model. We use the code in p2ch12explore_data.ipynb.

In [None]:
import torch
from p2ch12.dsets import getCandidateInfoList, getCt, LunaDataset
from util.util import xyz2irc

In [None]:
candidateInfo_list = getCandidateInfoList(requireOnDisk_bool=False)
candidateInfo_list[0]

CandidateInfoTuple(isNodule_bool=True, diameter_mm=32.27003025, series_uid='1.3.6.1.4.1.14519.5.2.1.6279.6001.287966244644280690737019247886', center_xyz=(67.61451718, 85.02525992, -109.8084416))

In [None]:
from p2ch12.vis import findPositiveSamples, showCandidate
positiveSample_list = findPositiveSamples()

2022-12-04 18:32:43,945 INFO     pid:75 p2ch12.dsets:266:__init__ <p2ch12.dsets.LunaDataset object at 0x7f8644e739d0>: 56938 training samples, 56816 neg, 122 pos, unbalanced ratio


0 CandidateInfoTuple(isNodule_bool=True, diameter_mm=25.23320204, series_uid='1.3.6.1.4.1.14519.5.2.1.6279.6001.511347030803753100045216493273', center_xyz=(63.4740118048, 73.9174523314, -213.736128767))
1 CandidateInfoTuple(isNodule_bool=True, diameter_mm=21.58311204, series_uid='1.3.6.1.4.1.14519.5.2.1.6279.6001.905371958588660410240398317235', center_xyz=(109.142472723, 49.6356928166, -121.183579092))
2 CandidateInfoTuple(isNodule_bool=True, diameter_mm=19.65387738, series_uid='1.3.6.1.4.1.14519.5.2.1.6279.6001.752756872840730509471096155114', center_xyz=(56.1226132601, 67.868268695, -65.6269886453))
3 CandidateInfoTuple(isNodule_bool=True, diameter_mm=18.7832325, series_uid='1.3.6.1.4.1.14519.5.2.1.6279.6001.202811684116768680758082619196', center_xyz=(-82.79150362, -21.43587141, -97.18427459))
4 CandidateInfoTuple(isNodule_bool=True, diameter_mm=17.75323185, series_uid='1.3.6.1.4.1.14519.5.2.1.6279.6001.187451715205085403623595258748', center_xyz=(94.1132711884, -15.8936132585, -2

In [None]:
tuple_list = LunaDataset()

2022-12-04 18:32:48,565 INFO     pid:75 p2ch12.dsets:266:__init__ <p2ch12.dsets.LunaDataset object at 0x7f85c1c38fd0>: 56938 training samples, 56816 neg, 122 pos, unbalanced ratio


## Class balancing
Since the number of negative samples (no nodule) are many more than the positive ones (nodules) if we use the data as it is to build our batches the grat majority of them will contain only negative samples and the network will not learn how to discriminate negative from positive samples. In order to overcome this problem we have to make each batch more balanced, with a 1:1 ratio, using the positive samples more than once. The class balancing is implemented in p2ch12/dset.py

## Train and run the model
We first train the model without balancing. We use the script p2_run_everything.ipynb for chapter 12

In [None]:
import datetime

from util.util import importstr
from util.logconf import logging
log = logging.getLogger('nb')

In [None]:
def run(app, *argv):
    argv = list(argv)
    argv.insert(0, '--num-workers=4')  # <1>
    log.info("Running: {}({!r}).main()".format(app, argv))
    
    app_cls = importstr(*app.rsplit('.', 1))  # <2>
    app_cls(argv).main()
    
    log.info("Finished: {}.{!r}).main()".format(app, argv))

In [None]:
import os
import shutil

# clean up any old data that might be around.
# We don't call this by default because it's destructive, 
# and would waste a lot of time if it ran when nothing 
# on the application side had changed.
def cleanCache():
    shutil.rmtree('data-unversioned/cache')
    os.mkdir('data-unversioned/cache')

# cleanCache()

In [None]:
training_epochs = 4
experiment_epochs = 2
final_epochs = 5
seg_epochs = 10

In [None]:
run('p2ch12.prepcache.LunaPrepCacheApp')

2022-12-04 18:33:20,089 INFO     pid:75 nb:004:run Running: p2ch12.prepcache.LunaPrepCacheApp(['--num-workers=4']).main()
2022-12-04 18:33:20,096 INFO     pid:75 p2ch12.prepcache:043:main Starting LunaPrepCacheApp, Namespace(batch_size=1024, num_workers=4)
2022-12-04 18:33:20,145 INFO     pid:75 p2ch12.dsets:266:__init__ <p2ch12.dsets.LunaDataset object at 0x7f85c1c38f70>: 56938 training samples, 56816 neg, 122 pos, unbalanced ratio
2022-12-04 18:34:12,542 INFO     pid:75 util.util:236:enumerateWithEstimate Stuffing cache    8/56, done at 2022-12-04 18:38:51, 0:05:08
2022-12-04 18:35:00,050 INFO     pid:75 util.util:236:enumerateWithEstimate Stuffing cache   16/56, done at 2022-12-04 18:38:51, 0:05:08
2022-12-04 18:36:34,066 INFO     pid:75 util.util:236:enumerateWithEstimate Stuffing cache   32/56, done at 2022-12-04 18:38:49, 0:05:07
2022-12-04 18:38:38,263 INFO     pid:75 nb:009:run Finished: p2ch12.prepcache.LunaPrepCacheApp.['--num-workers=4']).main()


In [None]:
run('p2ch12.training.LunaTrainingApp', '--epochs=1', 'unbalanced')

2022-12-04 18:39:56,828 INFO     pid:75 nb:004:run Running: p2ch12.training.LunaTrainingApp(['--num-workers=4', '--epochs=1', 'unbalanced']).main()
2022-12-04 18:39:58,745 INFO     pid:75 p2ch12.training:127:initModel Using CUDA; 1 devices.
2022-12-04 18:40:03,599 INFO     pid:75 p2ch12.training:188:main Starting LunaTrainingApp, Namespace(augment_flip=False, augment_noise=False, augment_offset=False, augment_rotate=False, augment_scale=False, augmented=False, balanced=False, batch_size=32, comment='unbalanced', epochs=1, num_workers=4, tb_prefix='p2ch12')
2022-12-04 18:40:03,636 INFO     pid:75 p2ch12.dsets:266:__init__ <p2ch12.dsets.LunaDataset object at 0x7f85bb8619a0>: 51244 training samples, 51135 neg, 109 pos, unbalanced ratio
2022-12-04 18:40:03,641 INFO     pid:75 p2ch12.dsets:266:__init__ <p2ch12.dsets.LunaDataset object at 0x7f85c1e5bc40>: 5694 validation samples, 5681 neg, 13 pos, unbalanced ratio
2022-12-04 18:40:03,645 INFO     pid:75 p2ch12.training:195:main Epoch 1 of 1,

## Train and run with balancing
The LunaDataset is already set up to use balanced batches, we have only to train the model again passing the argument 'balanced' to the LunaTrainApp application. We train the model for two epochs to see whether there is an improvement or not by comparing the training loss and the validation loss. We use only one subset of the data for a limit in the available resources on Google Colab. That subset contains 5681 negative samples and only 13 positive samples. In the validation for the positive samples we have 53.8% correct for the 1st epoch and only 69.2% for the 2nd epoch, so the validation loss for positive samples is decreasing. The same, and even better, happens for the negative samples: 94.3 correct in the 1st epoch and 98.8% correct in the 2nd epoch. This result may be different if we run more epochs. We could see a lower performance for the positive samples and that would mean that there is overfitting since there are fewer positive samples than negative ones.

In [None]:
run('p2ch12.training.LunaTrainingApp', f'--epochs={training_epochs}', '--balanced', 'balanced')


2022-12-04 20:03:38,340 INFO     pid:75 nb:004:run Running: p2ch12.training.LunaTrainingApp(['--num-workers=4', '--epochs=4', '--balanced', 'balanced']).main()
2022-12-04 20:03:38,368 INFO     pid:75 p2ch12.training:127:initModel Using CUDA; 1 devices.
2022-12-04 20:03:38,383 INFO     pid:75 p2ch12.training:188:main Starting LunaTrainingApp, Namespace(augment_flip=False, augment_noise=False, augment_offset=False, augment_rotate=False, augment_scale=False, augmented=False, balanced=True, batch_size=32, comment='balanced', epochs=4, num_workers=4, tb_prefix='p2ch12')
2022-12-04 20:03:38,421 INFO     pid:75 p2ch12.dsets:266:__init__ <p2ch12.dsets.LunaDataset object at 0x7f85b34fafa0>: 51244 training samples, 51135 neg, 109 pos, 1:1 ratio
2022-12-04 20:03:38,429 INFO     pid:75 p2ch12.dsets:266:__init__ <p2ch12.dsets.LunaDataset object at 0x7f85c1afd1c0>: 5694 validation samples, 5681 neg, 13 pos, unbalanced ratio
2022-12-04 20:03:38,432 INFO     pid:75 p2ch12.training:195:main Epoch 1 of 

## Data augmentation
We can address the overfitting problem by using the data augmentation technique. It consists of applying some affine transformations to our dataset to have more samples for training

- Mirroring the image up-down, left-right, and/or front-back
- Shifting the image around by a few voxels
- Scaling the image up or down
- Rotating the image around the head-foot axis
- Adding noise to the image 

The function to augment the data is implementd in p2ch12/dsets.py

In [None]:
augmentation_dict = {}
augmentation_list = [
    ('None', {}),
    ('flip', {'flip': True}),
    ('offset', {'offset': 0.1}),
    ('scale', {'scale': 0.2}),
    ('rotate', {'rotate': True}),
    ('noise', {'noise': 25.0}),    
]
ds_list = [
    LunaDataset(sortby_str='label_and_size', augmentation_dict=augmentation_dict) 
    for title_str, augmentation_dict in augmentation_list
]

all_dict = {}
for title_str, augmentation_dict in augmentation_list:
    all_dict.update(augmentation_dict)
all_ds = LunaDataset(sortby_str='label_and_size', augmentation_dict=all_dict)

augmentation_list.extend([('All', augmentation_dict)] * 3)
ds_list.extend([all_ds] * 3)

2022-12-04 20:42:35,352 INFO     pid:75 p2ch12.dsets:266:__init__ <p2ch12.dsets.LunaDataset object at 0x7f85b33c8940>: 56938 training samples, 56816 neg, 122 pos, unbalanced ratio
2022-12-04 20:42:35,360 INFO     pid:75 p2ch12.dsets:266:__init__ <p2ch12.dsets.LunaDataset object at 0x7f85b33c8760>: 56938 training samples, 56816 neg, 122 pos, unbalanced ratio
2022-12-04 20:42:35,370 INFO     pid:75 p2ch12.dsets:266:__init__ <p2ch12.dsets.LunaDataset object at 0x7f85b33c8550>: 56938 training samples, 56816 neg, 122 pos, unbalanced ratio
2022-12-04 20:42:35,379 INFO     pid:75 p2ch12.dsets:266:__init__ <p2ch12.dsets.LunaDataset object at 0x7f85b34fa730>: 56938 training samples, 56816 neg, 122 pos, unbalanced ratio
2022-12-04 20:42:35,387 INFO     pid:75 p2ch12.dsets:266:__init__ <p2ch12.dsets.LunaDataset object at 0x7f85b34fa670>: 56938 training samples, 56816 neg, 122 pos, unbalanced ratio
2022-12-04 20:42:35,395 INFO     pid:75 p2ch12.dsets:266:__init__ <p2ch12.dsets.LunaDataset object a

We train the model again using the augmented dataset

In [None]:
run('p2ch12.training.LunaTrainingApp', f'--epochs={training_epochs}', '--balanced', '--augmented', 'fully-augmented')

2022-12-04 20:44:17,275 INFO     pid:75 nb:004:run Running: p2ch12.training.LunaTrainingApp(['--num-workers=4', '--epochs=4', '--balanced', '--augmented', 'fully-augmented']).main()
2022-12-04 20:44:17,292 INFO     pid:75 p2ch12.training:127:initModel Using CUDA; 1 devices.
2022-12-04 20:44:17,296 INFO     pid:75 p2ch12.training:188:main Starting LunaTrainingApp, Namespace(augment_flip=False, augment_noise=False, augment_offset=False, augment_rotate=False, augment_scale=False, augmented=True, balanced=True, batch_size=32, comment='fully-augmented', epochs=4, num_workers=4, tb_prefix='p2ch12')
2022-12-04 20:44:17,336 INFO     pid:75 p2ch12.dsets:266:__init__ <p2ch12.dsets.LunaDataset object at 0x7f85b34029d0>: 51244 training samples, 51135 neg, 109 pos, 1:1 ratio
2022-12-04 20:44:17,341 INFO     pid:75 p2ch12.dsets:266:__init__ <p2ch12.dsets.LunaDataset object at 0x7f85c1e74280>: 5694 validation samples, 5681 neg, 13 pos, unbalanced ratio
2022-12-04 20:44:17,345 INFO     pid:75 p2ch12.t