<a href="https://ibb.co/5Mpm3ZF"><img src="https://i.ibb.co/gjqN8YV/metastases.png" alt="metastases" border="0" align=”right”></a>
**Sections of this kernel**
- Project understanding
- Data understanding
- Data visualization
- Baseline model
- Validation and analysis
    - Metrics
    - Prediction visualizations
    - Confusion matrix
    - ROC & AUC
- Submit

---------------------------------------------------
# Project understanding
###  What exactly is the problem?

**Binary image classification problem.** Identify the presence of metastases from 96 x 96px digital histopathology images. One key challenge is that the metastases can be as small as single cells in a large area of tissue.

### How would a solution look like?

**Our evaluation metric is [area under the ROC curve](http://en.wikipedia.org/wiki/Receiver_operating_characteristic).** The ROC curve is a plot of *True positive rate* against *False positive rate* at various thresholds and the area under the curve (AUC) is equal to the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one. The best possible solution would yield an AUC of 1 which means we would classify all positive samples correctly without getting any false positives. 

![ROC curve example](https://i.ibb.co/mBKh6ZB/roc.png)
<p style="text-align: center;"> ROC curve from a previous run of this kernel </p>

### What is known about the domain?

**The histopathological images are glass slide microscope images of lymph nodes that are stained with hematoxylin and eosin (H&E).** This staining method is one of the most widely used in medical diagnosis and it produces blue, violet and red colors. Dark blue hematoxylin binds to negatively charged substances such as nucleic acids and pink eosin to positively charged substances like amino-acid side chains (most proteins). Typically nuclei are stained blue, whereas cytoplasm and extracellular parts in various shades of pink.

**Low-resolution**             | **Mid-resolution**            | **High-resolution** 
:-------------------------:|:-------------------------:|:-------------------------:
![](https://camelyon17.grand-challenge.org/site/CAMELYON17/serve/public_html/example_low_resolution.png) | ![Example of a metastatic region](https://camelyon17.grand-challenge.org/site/CAMELYON17/serve/public_html/example_mid_resolution.png) | ![Example of a metastatic region](https://camelyon17.grand-challenge.org/site/CAMELYON17/serve/public_html/example_high_resolution.png)
**[<p style="text-align: center;"> Example of a metastatic region in lymph nodes, CHAMELYON17 </p>](https://camelyon17.grand-challenge.org/Background/)**

Lymph nodes are small glands that filter the fluid in the lymphatic system and they are the first place a breast cancer is likely to spread. Histological assessment of lymph node metastases is part of determining the stage of breast cancer in TNM classification which is a globally recognized standard for classifying the extent of spread of cancer. The diagnostic procedure for pathologists is tedious and time-consuming as a large area of tissue has to be examined and small metastases can be easily missed.

**Useful links for background knowledge**
- [Patch Camelyon (PCam)](https://github.com/basveeling/pcam)
- [Hematoxylin and eosin staining of tissue and cell sections](https://www.ncbi.nlm.nih.gov/pubmed/21356829)
- [H&E-stained sentinel lymph node sections of breast cancer patients: the CAMELYON dataset](https://academic.oup.com/gigascience/article/7/6/giy065/5026175)
- [CAMELYON16 - background](https://camelyon16.grand-challenge.org/Background/)
- [CAMELYON17 - background](https://camelyon17.grand-challenge.org/Background/)
- [TNM classification](https://www.uicc.org/resources/tnm)

----------------------------------------------
# Data understanding
### What data do we have available?

**220k training images and 57k evaluation images.** The dataset is a subset of the [PCam dataset](https://github.com/basveeling/pcam) and the only difference between these two is that all duplicate images have been removed. The PCam dataset is derived from the [Camelyon16 Challenge dataset](https://camelyon16.grand-challenge.org/Data/) which contains 400 H&E stained whole slide images of sentinel lymph node sections that were acquired and digitized at 2 different centers using a 40x objective. The PCam's dataset including this one uses 10x undersampling to increase the field of view, which gives the resultant pixel resolution of 2.43 microns.

According to the data description, there is a 50/50 balance between positive and negative examples in the training and test splits. However, **the training distribution seems to be 60/40 (negatives/positives)**. A positive label means that there is at least one pixel of tumor tissue in the center region (32 x 32px) of the image. **Tumor tissue in the outer region of the patch does not influence the label.** This means that a negatively labeled image could contain metastases in the outer region. Thus, it would be a good idea to crop the images to the center region.

**Image file descriptors**

Description | 
:--------:|:-------:
Format | TIF
Size | 96 x 96
Channels | 3
Bits per channel | 8
Data type | Unsigned char
Compression | Jpeg

### Is the data relevant to the problem?

This dataset is a combination of two independent datasets collected in Radboud University Medical Center (Nijmegen, the Netherlands), and the University Medical Center Utrecht (Utrecht, the Netherlands). The slides are produced by routine clinical practices and a trained pathologist would examine similar images for identifying metastases. However, some relevant information about the surroundings might be left out with these small-sized image samples.

### Is it valid? Does it reflect our expectations?

According to the data description, the dataset has been stripped of duplicates. However, this has not been confirmed by testing.

> For the entire dataset, when the slide-level label was unclear during the inspection of the H&E-stained slide, an additional WSI with a consecutive tissue section, immunohistochemically stained for cytokeratin, was used to confirm the classification.
- [1399 H&E-stained sentinel lymph node sections of breast cancer patients: the CAMELYON dataset](https://academic.oup.com/gigascience/article/7/6/giy065/5026175)

### Is the data quality, quantity, recency sufficient?

> All glass slides included in the CAMELYON dataset were part of routine clinical care and are thus of diagnostic quality. However, during the acquisition process, scanning can fail or result in out-of-focus images. As a quality-control measure, all slides were inspected manually after scanning. The inspection was performed by an experienced technician (Q.M. and N.S. for UMCU, M.H. or R.vd.L. for the other centers) to assess the quality of the scan; when in doubt, a pathologist was consulted on whether scanning issues might affect diagnosis.
- [1399 H&E-stained sentinel lymph node sections of breast cancer patients: the CAMELYON dataset](https://academic.oup.com/gigascience/article/7/6/giy065/5026175)

-----------------------------------------
# Data visualization

In [None]:
import numpy as np
import pandas as pd
import os
import cv2
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import random
from sklearn.utils import shuffle

data = pd.read_csv('/kaggle/input/train_labels.csv')
train_path = '/kaggle/input/train/'
test_path = '/kaggle/input/test/'
# quick look at the label stats
data.describe()

We can see that the negative/positive ratio is not entirely 50/50 as the label mean is well below 0.5. The ratio is closer to 60/40 meaning that there are 1.5 times more negative images than positives.

### Plot some images with and without cancer tissue for comparison

In [None]:
def readImage(path):
    # OpenCV reads the image in bgr format by default
    bgr_img = cv2.imread(path)
    # We flip it to rgb for visualization purposes
    b,g,r = cv2.split(bgr_img)
    rgb_img = cv2.merge([r,g,b])
    return rgb_img

In [None]:
# random sampling
shuffled_data = shuffle(data)

fig, ax = plt.subplots(2,5, figsize=(20,8))
fig.suptitle('Histopathologic scans of lymph node sections',fontsize=20)
# Negatives
for i, idx in enumerate(shuffled_data[shuffled_data['label'] == 0]['id'][:5]):
    path = os.path.join(train_path, idx)
    ax[0,i].imshow(readImage(path + '.tif'))
    # Create a Rectangle patch
    box = patches.Rectangle((32,32),32,32,linewidth=4,edgecolor='b',facecolor='none', linestyle=':', capstyle='round')
    ax[0,i].add_patch(box)
ax[0,0].set_ylabel('Negative samples', size='large')
# Positives
for i, idx in enumerate(shuffled_data[shuffled_data['label'] == 1]['id'][:5]):
    path = os.path.join(train_path, idx)
    ax[1,i].imshow(readImage(path + '.tif'))
    # Create a Rectangle patch
    box = patches.Rectangle((32,32),32,32,linewidth=4,edgecolor='r',facecolor='none', linestyle=':', capstyle='round')
    ax[1,i].add_patch(box)
ax[1,0].set_ylabel('Tumor tissue samples', size='large')

Classifying metastases is probably not an easy task for a trained pathologist and extremely difficult for an untrained eye. According to [Libre Pathology](https://librepathology.org/wiki/Lymph_node_metastasis), lymph node metastases can have these features:

> - Foreign cell population - key feature (Classic location: subcapsular sinuses)
- Cells with cytologic features of malignancy
    - Nuclear pleomorphism (variation in size, shape and staining).
    - Nuclear atypia:
        - **Nuclear enlargement**.
        - **Irregular nuclear membrane**.
        - **Irregular chromatin pattern, esp. asymmetry**.
        - **Large or irregular nucleolus**.
     - Abundant mitotic figures.
- Cells in architectural arrangements seen in malignancy; highly variable - dependent on tumour type and differentiation.
    - Gland formation.
    - Single cells.
    - Small clusters of cells.
  
**The takeaway from this is probably that irregular nuclear shapes, sizes or staining shades can indicate metastases.**

### How is the data best transformed for modeling?

We know that the label of the image is influenced only by the center region (32 x 32px) so it would make sense to crop our data to that region only. However, some useful information about the surroundings could be lost if we crop too close.  This hypothesis could be confirmed by training models with varying crop sizes. My initial results with 32 x 32px size showed worse performance than with 48 x 48px but I haven't done a search for optimal size.

### How may we increase the data quality?

We could inspect if the data contains bad data (too unfocused or corrupted) and remove those to increase the overall quality. *TODO*

### Preprocessing and augmentation
There are couple of ways we can use to avoid overfitting; more data, augmentation, regularization and less complex model architectures. Here we will define what image augmentations to use and add them directly to our image loader function. Note that if we apply augmentation here, augmentations will also be applied when we are predicting (inference). This is called test time augmentation (TTA) and it can improve our results if we run inference multiple times for each image and average out the predictions. 

**The augmentations we can use for this type of data:**
- random rotation
- random crop
- random flip (horizontal and vertical both)
- random lighting
- random zoom (not implemented here)
- Gaussian blur (not implemented here)

We will use OpenCV with image operations because in my experience, OpenCV is a lot faster than *PIL* or *scikit-image*.

In [None]:
import random
ORIGINAL_SIZE = 96      # original size of the images - do not change

# AUGMENTATION VARIABLES
CROP_SIZE = 68          # final size after crop
RANDOM_ROTATION = 180   # range (0-180), 180 allows all rotation variations, 0=no change
RANDOM_SHIFT = 4        # center crop shift in x and y axes, 0=no change
RANDOM_BRIGHTNESS = 5   # range (0-100), 0=no change
RANDOM_CONTRAST = 5     # range (0-100), 0=no change

def readCroppedImage(path):
    # OpenCV reads the image in bgr format by default
    bgr_img = cv2.imread(path)
    # We flip it to rgb for visualization purposes
    b,g,r = cv2.split(bgr_img)
    rgb_img = cv2.merge([r,g,b])
    
    #random rotation
    rotation = random.randint(-RANDOM_ROTATION,RANDOM_ROTATION)  
    M = cv2.getRotationMatrix2D((48,48),rotation,1)
    rgb_img = cv2.warpAffine(rgb_img,M,(96,96))
    
    #random x,y-shift
    x = random.randint(-RANDOM_SHIFT, RANDOM_SHIFT)
    y = random.randint(-RANDOM_SHIFT, RANDOM_SHIFT)
    
    # crop to center and normalize to 0-1 range
    start_crop = (ORIGINAL_SIZE - CROP_SIZE) // 2
    end_crop = start_crop + CROP_SIZE
    rgb_img = rgb_img[(start_crop + x):(end_crop + x), (start_crop + y):(end_crop + y)] / 255
    
    # Random flip
    flip_hor = bool(random.getrandbits(1))
    flip_ver = bool(random.getrandbits(1))
    if(flip_hor):
        rgb_img = rgb_img[:, ::-1]
    if(flip_ver):
        rgb_img = rgb_img[::-1, :]
        
    # Random brightness
    br = random.randint(-RANDOM_BRIGHTNESS, RANDOM_BRIGHTNESS) / 100.
    rgb_img = rgb_img + br
    
    # Random contrast
    cr = 1.0 + random.randint(-RANDOM_CONTRAST, RANDOM_CONTRAST) / 100.
    rgb_img = rgb_img * cr
    
    # clip values to 0-1 range
    rgb_img = np.clip(rgb_img, 0, 1.0)
    
    return rgb_img

In [None]:
fig, ax = plt.subplots(2,5, figsize=(20,8))
fig.suptitle('Cropped histopathologic scans of lymph node sections',fontsize=20)
# Negatives
for i, idx in enumerate(shuffled_data[shuffled_data['label'] == 0]['id'][:5]):
    path = os.path.join(train_path, idx)
    ax[0,i].imshow(readCroppedImage(path + '.tif'))
ax[0,0].set_ylabel('Negative samples', size='large')
# Positives
for i, idx in enumerate(shuffled_data[shuffled_data['label'] == 1]['id'][:5]):
    path = os.path.join(train_path, idx)
    ax[1,i].imshow(readCroppedImage(path + '.tif'))
ax[1,0].set_ylabel('Tumor tissue samples', size='large')

**To see the effects of our augmentation, we can plot one image multiple times.**

In [None]:
fig, ax = plt.subplots(1,5, figsize=(20,4))
fig.suptitle('Random augmentations to the same image',fontsize=20)
# Negatives
for i, idx in enumerate(shuffled_data[shuffled_data['label'] == 0]['id'][:1]):
    for j in range(5):
        path = os.path.join(train_path, idx)
        ax[j].imshow(readCroppedImage(path + '.tif'))

# Baseline model
In ML production pipeline, it is a good idea to start with a relatively simple model, sort of a minimum viable product (MVP) or a baseline. With MVP, we can very quickly see if there are some unexpected problems like bad data quality that will make any further investments into the model tuning not worth it.

### What kind of model architecture suits the problem best?

Here we will be using a pretrained convnet model and transfer learning to adjust the weights to our data. Going for a deeper model architecture will start overfitting faster.

For differenet pretrained model architectures, check [Fast.ai conv_learner.py](https://github.com/fastai/fastai/blob/master/old/fastai/conv_learner.py). Note that some of the architectures require additional pretrained weight files downloaded. In that case, uncomment the **Download missing weight files** -cell.

In [None]:
# This is a temporary fix to work with fastai 0.7. I will later convert this notebook to fastai 1.0
!pip install fastai==0.7.0 --no-deps
!pip install torch==0.4.1 torchvision==0.2.1

In [None]:
from fastai.conv_learner import *
from fastai.dataset import *
from fastai.plots import ImageModelResults
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_curve, auc

In [None]:
# Download missing weight files for inceptionresnet pretrained model
#!wget "http://files.fast.ai/models/weights.tgz"
# unzip to the correct location
#!mkdir /opt/conda/lib/python3.6/site-packages/fastai/weights/
#!tar -xvzf weights.tgz -C /opt/conda/lib/python3.6/site-packages/fastai/

In [None]:
nw = 3   #number of workers for data loader
arch = resnet50 #specify target architecture
BATCH_SIZE = 128
sz = CROP_SIZE
MODEL_PATH = 'resnet50_1'

### Prepare the data and split train
Split train data to 90% training and 10% validation parts. We want to maintain equal ratios of negative/positive (60/40) in both, training and test splits. This is not so crucial here as both labels are almost equally represented but in case we had a rare class, random split could cause severe underrepresentation or in worst case, leave all rare classes out of one split.

In [None]:
train_df = pd.read_csv('/kaggle/input/train_labels.csv').set_index('id')
train_names = train_df.index.values
train_labels = np.asarray(data['label'].values)
test_names = [f.replace(".tif","") for f in os.listdir(test_path)]
tr_n, val_n = train_test_split(train_names, test_size=0.1,
                               random_state=123, stratify=train_labels)

In [None]:
class hcdDataset(FilesDataset):
    def __init__(self, fnames, path, transform):
        self.train_df = train_df
        super().__init__(fnames, transform, path)

    def get_x(self, i):
        img = readCroppedImage(os.path.join(self.path, str(self.fnames[i]) + '.tif'))
        return img

    def get_y(self, i):
        if (self.path == test_path): return 0
        return self.train_df.loc[self.fnames[i]]['label']

    def get_c(self):
        return 2 #number of classes

In [None]:
# Fast.ai data loader
def get_data(sz, bs):
    aug_tfms = [] # we are using our own augmentations with our image loader function so we leave this array empty
    
    # mean and std in of each channel in the train set
    # these stats have been calculated for 48x48 crop size but it has hopefully captured the general variance
    stats = A([0.70185, 0.54483, 0.69568], [0.22262, 0.26757, 0.1995 ])
    
    # Here we define the transforms that are performed for images when loaded. Only statistical regularization.
    tfms = tfms_from_stats(stats, sz, crop_type=CropType.NO, tfm_y=TfmType.NO,
                           aug_tfms=aug_tfms)
    
    ds = ImageData.get_ds(hcdDataset, (tr_n[:-(len(tr_n) % bs)], train_path),
                          (val_n, train_path), tfms, test=(test_names, test_path))
    md = ImageData("./", ds, bs, num_workers=nw, classes=None)
    return md

### Compute image statistics
Doing these once is enough. **Do not use color/brightness/contrast augmentation here!**
This statistics function is copied and altered from [iafoss's kernel](https://www.kaggle.com/iafoss/pretrained-resnet34-with-rgby-0-460-public-lb)

This will give channel averages of [0.70185, 0.54483, 0.69568],
and std's of [0.22262, 0.26757, 0.1995 ].

In [None]:
#md = get_data(sz, BATCH_SIZE)
#x_tot = np.zeros(3)
#x2_tot = np.zeros(3)
#for x,y in iter(md.trn_dl):
#    tmp =  md.trn_ds.denorm(x).reshape(BATCH_SIZE,-1)
#    x = md.trn_ds.denorm(x).reshape(-1,3)
#    x_tot += x.mean(axis=0)
#    x2_tot += (x**2).mean(axis=0)
#
#channel_avr = x_tot/len(md.trn_dl)
#channel_std = np.sqrt(x2_tot/len(md.trn_dl) - channel_avr**2)
#channel_avr,channel_std

### Training
We will use [Adam](https://arxiv.org/abs/1412.6980) optimizer

In [None]:
md = get_data(sz, BATCH_SIZE)
learn = ConvLearner.pretrained(arch, md) # Add dropout with ps=0.5 (dropout 50%) 
learn.opt_fn = optim.Adam

First, we find the optimal learning rate. The optimal lr is just before the base of the loss and before the start of divergence. It is important that the loss is still descending where we select the learning rate.

In [None]:
learn.lr_find()
learn.sched.plot()

We can select the learning rate around 1e-3 where it is close to the bottom but still descending.

Next, we train only the heads while keeping the rest of the model frozen. Otherwise the random initialization of the head weights could harm the relatively well performing pretrained weights of the model. After the heads have adjusted and the model somewhat works, we can continue to train all the weights.

In [None]:
lr = 1e-3
learn.fit(lr,2)

Next, we unfreeze the model and train it with differential learning rates. This is because the lower levels that activate on low-level shapes and patterns are probably well suited for this image detection task as is and don't need much adjusting. The higher levels that activate on more detailed features however, will need more adjusting to this training set. 

So we want to train the lower levels with very small adjustments (**lr/100**) and the middle (**lr/10**) and higher (**lr**) levels with larger adjustments.

In [None]:
learn.unfreeze()
# set different learning rate for low-mid-top layers
lrs=np.array([lr/100,lr/10,lr])

To avoid getting stuck to local minima, we train by lowering the learning rate as we train (cosine annealing) but periodically hop back up (restart). This is called a stochastic gradient descent with restarts (SGDR) and it has proven to be very effective. The idea behind it is when we train and get closer to points of minima, it makes sense to slow down the learning rate and take smaller adjustment steps to make sure we don't jump over the minima. The minima however, is never the global one where we want but a local minima. The learning rate restarts help us to gain momentarily the momentum to break out those local minimas and hopefully get closer to the global minima. When we do enough of these restarts, we can explore more of the loss landscape (pictured below) and typically get to a good stable minima.

![LR cycles](https://cdn-images-1.medium.com/max/880/1*9Fca3kpx3pVW8SaYz2pjpw.png)
<a href="https://openreview.net/pdf?id=BJYwwY9ll"><p style="text-align: center;"> Example of cyclic learning rates from the paper: Snapshot Ensembles </p></a>

The [authors of SGDR](https://arxiv.org/abs/1608.03983) also propose to lengthen the restart cycle as the training progresses.

In [None]:
# breakdown of this train phase: 1.cycle len=1 epoch, 2.cycle len=2 epochs, 3.cycle len=4 epochs
learn.fit(lrs/4, 3, cycle_len=1, cycle_mult=2)

In [None]:
# plot the learning rate to see the SGDR
learn.sched.plot_lr()

Finally, we finish by finetuning our model with small learning rates and one long cosine annealing cycle.

In [None]:
learn.fit(lrs/10,1,cycle_len=5,use_clr=(5,20))

In [None]:
#save model for later
learn.save(MODEL_PATH)

-------------------------
# Validation and analysis
Now the training is done.

### How good does the model perform technically?

We can only get metrics from our validation set, and the final test metrics will be most likely a bit different.
Loss? Accuracy?

In [None]:
# plot the loss
learn.sched.plot_loss()

In [None]:
# Get metrics: accuracies
learn.sched.rec_metrics

### How good is the model in terms of project requirements?
It is a good idea to look at examples of images from:

- The most correctly labeled (highest probability)
- The most incorrectly labeled (highest probability but wrong label)
- The most uncertain labels (probability closest to 0.5).

The visualization is a good way of understanding where our model performs well and what are the images it struggles with. It might also reveal something about the dataset such as bad quality data.


In [None]:
# Predict the validation set
log_preds = learn.predict()
log_preds.shape
# probs from log preds
probs = np.exp(log_preds[:,1])

In [None]:
# this is not the fastest way of iterating through our training set - TODO optimize code
val_labels = np.zeros((len(val_n)))
val_indexes = np.zeros((len(val_n)))
from tqdm import tqdm
for i in tqdm(range(len(val_n))):
    val_labels[i] = train_df.loc[val_n[i]]['label']
    val_indexes[i] = np.where(train_df.index == val_n[i])[0]

In [None]:
# we create the dataset again for validation purposes, the previous dataset is not compatible with the following visualization methods
train_names_tif = []
for name in train_names:
    train_names_tif.append(name + '.tif')

val_data = ImageClassifierData.from_names_and_array(
    path=train_path, 
    fnames=train_names_tif,
    val_idxs=val_indexes.astype(int),
    y=train_labels, 
    classes=['negative', 'metastases'],  
    tfms=tfms_from_model(arch, sz)
)

In [None]:
# a class that will help us plot our results
results = ImageModelResults(val_data.val_ds, log_preds)

### The most correct **Negatives**
These negative samples our model got right with very high prediction probability.

In [None]:
results.plot_most_correct(0)

### The most correct **Metastases**
These metastase samples our model got right with very high prediction probability.

In [None]:
results.plot_most_correct(1)

### The most incorrect **Negatives**
Our model predicted incorrectly that these were metastases with high probability.

In [None]:
results.plot_most_incorrect(0)

### The most incorrect **Metastases**
Our model predicted incorrectly that these were negative samples with high probability.

In [None]:
results.plot_most_incorrect(1)

### The most uncertain **Negatives**
These were the most confusing negative samples to our model. It could not decide to what class these belonged to.

In [None]:
results.plot_most_uncertain(0)

### The most uncertain **Metastases**
These were the most confusing metastase samples to our model. It could not decide to what class these belonged to.

In [None]:
results.plot_most_uncertain(1)

### Confusion matrix
Confusion matrix can help us understand the ratio of false negatives and positives. It is a simple table that shows the counts in a way of **true label vs. predicted label**.

In [None]:
from sklearn.metrics import *
from sklearn.metrics import confusion_matrix
from fastai.plots import *
preds_th = np.where(probs > 0.5, 1, 0)
cm = confusion_matrix(val_labels, preds_th)
plot_confusion_matrix(cm, ['negative','metastases'])

### ROC curve and AUC
Remember, AUC is the metric that is used for evaluating submissions. We can calculate it here for ou validation set but it will most likely differ from the final score.

In [None]:
# Compute ROC curve
fpr, tpr, thresholds = metrics.roc_curve(val_labels, probs, pos_label=1)

# Compute ROC area
roc_auc = auc(fpr, tpr)
print('ROC area is {0}'.format(roc_auc))

In [None]:
plt.figure()
plt.plot(fpr, tpr, color='darkorange', label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy', linestyle='--')
plt.xlim([-0.01, 1.0])
plt.ylim([0.0, 1.01])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic')
plt.legend(loc="lower right")

----------------

# Submit predictions
### TTA
To evaluate the model, we run inference on all test images. As we have test time augmentation, our results will probably improve if we do predictions multiple times per image and average out the results. 

In [None]:
# Use a fair number of iterations to cover different combinations of flips and rotations.
# The predictions are then averaged.
preds_t,y_t = learn.TTA(n_aug=12, is_test=True)
preds_t = np.stack(preds_t, axis=-1)
# Fast.ai returns the log of the prediction. To get probabilities from log, we do exp()
preds_t = np.exp(preds_t)
preds_t = preds_t.mean(axis=-1)[:,1]

### Submit the model for evaluation

In [None]:
SAMPLE_SUB = '/kaggle/input/sample_submission.csv'
sample_df = pd.read_csv(SAMPLE_SUB)
sample_list = list(sample_df.id)
pred_list = [p for p in preds_t]
pred_dic = dict((key, value) for (key, value) in zip(learn.data.test_ds.fnames,pred_list))
pred_list_cor = [pred_dic[id] for id in sample_list]
df = pd.DataFrame({'id':sample_list,'label':pred_list_cor})
df.to_csv('{0}_submission.csv'.format(MODEL_PATH), header=True, index=False)