# Introduction
**Who this notebook is for**  
This notebook is for anyone interested in creating a baseline model using Tensor Processing Units (TPUs) and begin making submissions to the **[Cassava Leaf Disease Classification competition](https://www.kaggle.com/c/cassava-leaf-disease-classification)**. If you've taken the **[Kaggle Intro to Deep Learning](https://www.kaggle.com/learn/intro-to-deep-learning)** and//or the **[Kaggle Computer Vision](https://www.kaggle.com/learn/computer-vision)** course you'll find this notebook to be a good starting place to bridge what you've learned in our micro-courses and applying that knowledge to get started in a competition.  

**How to use this notebook**  
Feel free to use this notebook as a walkthrough on how to build a preliminary image classification model using TensorFlow and Tensor Processing Units (TPUs). You can copy and edit the notebook by clicking on the corresponding button in the top right, which will make your own personal copy of the notebook in your Kaggle account. From there any edits you make will be unique to your own copy of the notebook!


**TPUs with TensorFlow**  
We'll be using TensorFlow and Keras to build our computer vision model, and using TPUs to both train our model and make predictions. If you'd like to learn about more about TPUs be sure to check out our **[Learn With Me: Getting Started with Tensor Processing Units (TPUs)](https://youtu.be/1pdwRQ1DQfY)** video.  

**References**  
This notebook was built using the following amazing resources created by Kagglers:
- **Martin Gorner:** [Getting Started With 100 Flowers on TPU](https://www.kaggle.com/mgornergoogle/getting-started-with-100-flowers-on-tpu)
- **Amy Jang:** [TensorFlow + Transfer Learning: Melanoma](https://www.kaggle.com/amyjang/tensorflow-transfer-learning-melanoma)
- **Phil Culliton:** [A Simple TF 2.1 Notebook](https://www.kaggle.com/philculliton/a-simple-tf-2-1-notebook)

# Set up environment

In [None]:
!pip install --upgrade tensorflow
!pip install -U efficientnet

In [None]:
import math, re, os, warnings
import tensorflow as tf
import numpy as np
import pandas as pd
import shutil
import matplotlib.pyplot as plt
from matplotlib import gridspec
from kaggle_datasets import KaggleDatasets
from tensorflow import keras
from functools import partial
from sklearn.model_selection import train_test_split
import tensorflow_hub as hub
from tensorflow import keras
from tensorflow.keras import preprocessing
from tensorflow.keras.layers.experimental import preprocessing
from tensorflow.keras import layers
from tensorflow.keras.callbacks import EarlyStopping
print("Tensorflow version " + tf.__version__)
tf.config.optimizer.set_jit(True)

from tensorflow.keras.preprocessing import image_dataset_from_directory
import efficientnet.keras as efn 
import gc

import cv2
from PIL import Image
import PIL

## Place images in the right folders

In [None]:
# Create training and validation folder
os.mkdir('/kaggle/working/train_data/')
os.mkdir('/kaggle/working/valid_data/')

# Open dataset file 
dataset = pd.read_csv("../input/cassava-leaf-disease-classification/train.csv")

# Split training images in training and validation images
training_data, validation_data = train_test_split(dataset, test_size=0.33)

training_file_names = list(training_data['image_id'].values) 
training_img_labels = list(training_data['label'].values) 
validation_file_names = list(validation_data['image_id'].values) 
validation_img_labels = list(validation_data['label'].values) 

# Create folders of labels
folders_to_be_created = np.unique(list(dataset['label'])) #.values 

# set source and destination
source = "../input/cassava-leaf-disease-classification/train_images"
training_destination = '/kaggle/working/train_data'
validation_destination = '/kaggle/working/valid_data'

# Create folders for training and validation images
for new_path in folders_to_be_created: 
    if not os.path.exists(".//" + str(new_path)):
        train_map = os.path.join('/kaggle/working/train_data/', str(new_path))
        valid_map = os.path.join('/kaggle/working/valid_data/', str(new_path))
        os.makedirs(train_map)
        os.makedirs(valid_map)
        
#os.path.exists('/kaggle/working/valid_data/2')
        
folders = folders_to_be_created.copy() 

# Places training images in the right folders   
for f in range(len(training_file_names)): 
    tr_current_img = training_file_names[f] 
    tr_current_label = training_img_labels[f] 
    src = os.path.join(source, tr_current_img)
    dst = os.path.join(training_destination, str(tr_current_label))
    shutil.copy(src, dst)
    
# Places validation images in the right folders    
for f in range(len(validation_file_names)): 
    va_current_img = validation_file_names[f] 
    va_current_label = validation_img_labels[f] 
    src = os.path.join(source, va_current_img)
    dst = os.path.join(validation_destination, str(va_current_label))
    shutil.copy(src, dst)

### Data exploration
met dank aan: https://www.kaggle.com/xhlulu/reducing-image-sizes-to-32x32

In [None]:
label_df = pd.read_csv('../input/cassava-leaf-disease-classification/train.csv')
label_df.head()

label_df['label'].value_counts().plot(kind='bar')

'''def display_samples(df, columns=4, rows=3):
    fig=plt.figure(figsize=(5*columns, 3*rows))

    for i in range(columns*rows):
        image_path = df.loc[i,'image_id']
        image_id = df.loc[i,'label']
        img = cv2.imread(f'kaggle/working/train_data/{image_path}')
        fig.add_subplot(rows, columns, i+1)
        plt.title(image_id)
        plt.imshow(img)

display_samples(label_df)'''

In [None]:
# Delete images out of 3
path, dirs, files = next(os.walk("/kaggle/working/train_data/4"))
file_count = len(files)
file_count

total_images = 8452+1580 + 1479 + 706 + 1718
weight_0 = 1 / (706/total_images)
weight_1 = 1 / (1479/total_images)
weight_2 = 1 / (1580/total_images)
weight_3 = 1 / (8452/total_images)
weight_4 = 1 / (1718/total_images)


# Eventueel foto's uit mapje 3 verwijderen? 
#filenames = os.listdir("/kaggle/working/train_data/3")

#for i in filenames[:1000]:
    #os.remove("/kaggle/working/train_data/3/" + i) 

In [None]:
# Set Matplotlib defaults
plt.rc('figure', autolayout=True)
plt.rc('axes', labelweight='bold', labelsize='large',
       titleweight='bold', titlesize=18, titlepad=10)
plt.rc('image', cmap='magma')
warnings.filterwarnings("ignore") # to clean up output cells


# Load training and validation sets
'''ds_train_ = image_dataset_from_directory(
    '/kaggle/working/train_data',
    labels='inferred',
    label_mode='int',
    image_size=[512, 512], #128, 299
    interpolation='nearest',
    batch_size=64,
    shuffle=True,
)'''

ds_train_ = image_dataset_from_directory(
    '/kaggle/working/train_data',
    labels='inferred',
    label_mode='int',
    image_size=[48, 48], #128, 299
    interpolation='nearest',
    batch_size=128,
    shuffle=True,
    validation_split=0.8,
    subset="training",
    seed=123
    )

ds_train_valid_ = image_dataset_from_directory(
    '/kaggle/working/train_data',
    labels='inferred',
    label_mode='int',
    image_size=[48, 48], #128, 299, 512
    interpolation='nearest',
    batch_size=128,
    shuffle=True,
    validation_split=0.8,
    subset="validation",
    seed=123
    )

ds_valid_ = image_dataset_from_directory(
    '/kaggle/working/valid_data',
    labels='inferred',
    label_mode='int',
    image_size=[48, 48], #128 512
    interpolation='nearest',
    batch_size=128,
    shuffle=False,
    #validation_split=0.1,
    #subset="training",
    #seed=123
)

# Data Pipeline
def convert_to_float(image, label):
    image = tf.image.convert_image_dtype(image, dtype=tf.float32)
    return image, label

AUTOTUNE = tf.data.experimental.AUTOTUNE
ds_train = (
    ds_train_
    .map(convert_to_float)
    .cache()
    .prefetch(buffer_size=AUTOTUNE)
)
ds_train_valid = (
    ds_train_valid_
    .map(convert_to_float)
    .cache()
    .prefetch(buffer_size=AUTOTUNE)
)
ds_valid = (
    ds_valid_
    .map(convert_to_float)
    .cache()
    .prefetch(buffer_size=AUTOTUNE)
)

# Set up variables
We'll set up some of our variables for our notebook here. 

If by chance you're using a private dataset, you'll also want to make sure that you have the **Google Cloud Software Development Kit (SDK)** attached to your notebook. You can find the Google Cloud SDK under the `Add-ons` dropdown menu at the top of your notebook. Documentation for the **Google Cloud Software Development Kit (SDK)** can be found **[here](https://www.kaggle.com/product-feedback/163416)**.

In [None]:
#gc.collect()

In [None]:
AUTOTUNE = tf.data.experimental.AUTOTUNE
#GCS_PATH = KaggleDatasets().get_gcs_path()
BATCH_SIZE = 128
IMAGE_SIZE = [48, 48]
CLASSES = ['0', '1', '2', '3', '4']
EPOCHS = 40

early_stopping = EarlyStopping(
    min_delta=0.01, # minimium amount of change to count as an improvement
    patience=10, # how many epochs to wait before stopping
    restore_best_weights=True,
    monitor = 'val_loss', 
    mode = 'auto',
)

class MyCustomCallback(tf.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs=None):
        gc.collect()
        tf.keras.backend.clear_session()

lr_scheduler = keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate=1e-5, 
    decay_steps=10000, 
    decay_rate=0.9)

### Model with pre-trained base 'EfficientnetB0'

In [None]:
'''!pip install segmentation-models
#from tensorflow.keras.mixed_precision import experimental as mixed_precision

#keras.utils.generic_utils = keras.utils
#policy = mixed_precision.Policy('mixed_float16')
#mixed_precision.set_policy(policy)

# parameters for data
height = 224
width = 224
channels = 3
input_shape = (height, width, channels)
n_classes = CLASSES

efnb0 = tf.keras.applications.efficientnet.EfficientNetB0(
    include_top=False, weights='imagenet',
    input_shape=input_shape, classes=n_classes,
    classifier_activation='softmax'
)


model = keras.Sequential([
    layers.Rescaling(1./255),
    efnb0, 
    layers.GlobalAveragePooling2D(),
    layers.Flatten(),
    #layers.LeakyReLU(alpha=0.01),
    layers.Dense(6, activation = 'relu'), #Leakyrelu? layers.LeakyReLU(alpha=0.3) of activation = LeakyRelu
    layers.Dropout(rate=0.3),
    layers.Dense(5, activation = 'softmax')
])

model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['sparse_categorical_accuracy'],
    run_eagerly=True,
)

#model.summary()

history = model.fit(
    ds_train_valid,
    validation_data=ds_valid,
    epochs=EPOCHS,
    callbacks=[MyCustomCallback()], #earlystopping
    batch_size = BATCH_SIZE #  put your callbacks in a list
    #verbose=0,  # turn off training log
)

history_frame = pd.DataFrame(history.history)
history_frame.loc[:, ['loss', 'val_loss']].plot()
history_frame.loc[:, ['sparse_categorical_accuracy', 'val_sparse_categorical_accuracy']].plot()'''

### Model with pre-trained base 'inception'

In [None]:
'''pretrained_base = tf.keras.models.load_model(
    '../input/cv-course-models/cv-course-models/inceptionv3'
)

pretrained_base.trainable = False

model = keras.Sequential([
    layers.Rescaling(1./255),
    #layers.BatchNormalization(renorm=True),
    pretrained_base,
    layers.Flatten(),
    layers.Dense(6, activation = 'relu'), #Leakyrelu?
    #layers.Dropout(rate=0.3),
    layers.Dense(5, activation = 'softmax')
])

model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['sparse_categorical_accuracy'],
    run_eagerly=True,
)

history = model.fit(
    ds_train_valid,
    validation_data=ds_valid,
    epochs=EPOCHS,
    callbacks=[MyCustomCallback()], #earlystopping
    batch_size = BATCH_SIZE,# put your callbacks in a list
    #verbose=0,  # turn off training log
)

history_frame = pd.DataFrame(history.history)
history_frame.loc[:, ['loss', 'val_loss']].plot()
history_frame.loc[:, ['binary_accuracy', 'val_binary_accuracy']].plot();'''

### Self defined base

In [None]:
model = keras.Sequential([
    layers.InputLayer(input_shape=[48, 48, 3]),
    layers.Rescaling(1./255),
    layers.Dropout(0.2),
    
    # Data Augmentation
    preprocessing.RandomFlip(mode='horizontal'), 
    preprocessing.RandomRotation(factor=0.1),
    preprocessing.RandomFlip(mode='vertical'), # meaning, top-to-bottom
    #preprocessing.RandomWidth(factor=0.15), # horizontal stretch
    #preprocessing.RandomTranslation(height_factor=0.1, width_factor=0.1),

    # Block One
    layers.BatchNormalization(renorm=True),
    layers.Conv2D(filters=32, 
                  kernel_size=3,
                  activation='relu',
                  padding='same',
                  strides = (2,2)),
    layers.Conv2D(filters=32, 
                  kernel_size=3,
                  activation='relu',
                  padding='same',
                  strides = (2,2)),
    layers.MaxPool2D(pool_size=2,
                     strides=2,
                     padding='same'),
    layers.Dropout(0.2), 
    
    # Block Two
    layers.BatchNormalization(renorm=True),
    layers.Conv2D(filters=64,
                  kernel_size=3,
                  activation='relu',
                  padding='same', 
                  strides = (2,2)),
    layers.Conv2D(filters=64,
                  kernel_size=3,
                  activation='relu',
                  padding='same', 
                  strides = (2,2)),
    layers.MaxPool2D(pool_size=2,
                     strides=2,
                     padding='same'),
    layers.Dropout(0.2),
    
    # Block Three
    layers.BatchNormalization(renorm=True),
    layers.Conv2D(filters=128,
                  kernel_size=3,
                  activation='relu', 
                  padding='same',
                  strides = (2,2)),
    layers.Conv2D(filters=128,
                  kernel_size=3,
                  activation='relu',
                  padding='same',
                  strides = (2,2)),
    layers.MaxPool2D(pool_size=2,
                     strides=2,
                     padding='same'),
    layers.Dropout(0.2),
    
     # Block four
    layers.BatchNormalization(renorm=True),
    layers.Conv2D(filters=256,
                  kernel_size=3,
                  activation='relu', 
                  padding='same',
                  strides = (2,2)),
    layers.Conv2D(filters=256,
                  kernel_size=3,
                  activation='relu',
                  padding='same',
                  strides = (2,2)),
    layers.MaxPool2D(pool_size=2,
                     strides=2,
                     padding='same'),
    layers.Dropout(0.2),

    # Head
    layers.BatchNormalization(renorm=True),
    layers.Flatten(),
    layers.Dense(512, activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(512, activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(512, activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(512, activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(len(CLASSES), activation='softmax'),
])

optimizer = tf.keras.optimizers.Adam(lr=0.0001)

model.compile(
    #optimizer=tf.keras.optimizers.Adam(learning_rate=lr_scheduler, epsilon=0.001),
    #optimizer = Adam(lr=0.001)
    #optimizer='adam',
    optimizer = optimizer,
    loss='sparse_categorical_crossentropy',
    metrics=['sparse_categorical_accuracy'], 
)

history = model.fit(
    ds_train_valid,
    validation_data=ds_valid,
    epochs=EPOCHS,
    batch_size = BATCH_SIZE,
    callbacks=[early_stopping],
    verbose=1,
    class_weight = {0: weight_0, 
                    1: weight_1, 
                    2: weight_2, 
                    3: weight_3, 
                    4: weight_4, 
                    }
)

#tf_reset_default_graph()

history_frame = pd.DataFrame(history.history)
history_frame.loc[:, ['loss', 'val_loss']].plot()
history_frame.loc[:, ['sparse_categorical_accuracy', 'val_sparse_categorical_accuracy']].plot();

#keras.utils.plot_model(model, show_shapes=True)

In [None]:
predictions = model.predict(ds_valid)
values, counts = np.unique([CLASSES[np.argmax(predictions[i,])] for i in range(len(predictions))], return_counts=True)
values, counts

classs = [1, 2, 3, 4, 5]

from sklearn import metrics

target = np.concatenate([label for example, label in ds_valid])

metrics.cohen_kappa_score([classs[np.argmax(predictions[i,])] for i in range(len(predictions))], target)

In [None]:
'''# Block Two
    layers.BatchNormalization(renorm=True),
    layers.Conv2D(filters=64,
                  kernel_size=3,
                  activation='relu',
                  padding='same', 
                  strides = (2,2)),
    layers.Conv2D(filters=64,
                  kernel_size=3,
                  activation='relu',
                  padding='same', 
                  strides = (2,2)),
    layers.MaxPool2D(pool_size=2,
                     strides=2,
                     padding='same'),
    layers.Dropout(0.25),
    
    # Block Three
    layers.BatchNormalization(renorm=True),
    layers.Conv2D(filters=128,
                  kernel_size=3,
                  activation='relu', 
                  padding='same',
                  strides = (2,2)),
    layers.Conv2D(filters=128,
                  kernel_size=3,
                  activation='relu',
                  padding='same',
                  strides = (2,2)),
    layers.MaxPool2D(pool_size=2,
                     strides=2,
                     padding='same'),'''

## Adding in augmentations 
You learned about augmentations in the **[Computer Vision: Data Augmentation](https://www.kaggle.com/ryanholbrook/data-augmentation)** lesson on Kaggle Learn, and here I've applied an augmentation available to us through TensorFlow. You can read more about these augmentations (as well as all of the other augmentations available to you!) in the **[TensorFlow tf.image documentation](https://www.tensorflow.org/api_docs/python/tf/image)**.  

If you're interested in learning how to create and use custom augmentations, check out these **[Rotation Augmentation GPU/TPU](https://www.kaggle.com/cdeotte/rotation-augmentation-gpu-tpu-0-96)** and **[CutMix and MixUp on GPU/TPU](https://www.kaggle.com/cdeotte/cutmix-and-mixup-on-gpu-tpu)** from Kaggle Grandmaster Chris Deotte.

## Define data loading methods
The following functions will be used to load our `training`, `validation`, and `test` datasets, as well as print out the number of images in each dataset.

# Brief exploratory data analysis (EDA)
First we'll print out the shapes and labels for a sample of each of our three datasets:

In [None]:
'''print("Training data shapes:")
for image, label in get_training_dataset().take(3):
    print(image.numpy().shape, label.numpy().shape)
print("Training data label examples:", label.numpy())
print("Validation data shapes:")
for image, label in get_validation_dataset().take(3):
    print(image.numpy().shape, label.numpy().shape)
print("Validation data label examples:", label.numpy())
print("Test data shapes:")
for image, idnum in get_test_dataset().take(3):
    print(image.numpy().shape, idnum.numpy().shape)
print("Test data IDs:", idnum.numpy().astype('U')) # U=unicode string'''

The following code chunk sets up a series of functions that will print out a grid of images. The grid of images will contain images and their corresponding labels.

You can also modify the above code to look at your `validation` and `test` data, like this:

# Building the model
## Learning rate schedule
We learned about learning rates in the **[Intro to Deep Learning: Stochastic Gradient Descent](https://www.kaggle.com/ryanholbrook/stochastic-gradient-descent)** lesson, and here I've created a learning rate schedule mostly using the defaults in the **[Keras Exponential Decay Learning Rate Scheduler](https://keras.io/api/optimizers/learning_rate_schedules/exponential_decay/)** documentation (I did change the `initial_learning_rate`. You can adjust the learning rate scheduler below, and read more about the other types of schedulers available to you in the **[Keras learning rate schedules API](https://keras.io/api/optimizers/learning_rate_schedules/)**.

## Building our model
In order to ensure that our model is trained on the TPU, we build it using `with strategy.scope()`.    

This model was built using transfer learning, meaning that we have a _pre-trained model_ (ResNet50) as our base model and then the customizable model built using `tf.keras.Sequential`. If you're new to transfer learning I recommend setting `base_model.trainable` to **False**, but _do_ encourage you to change which base model you're using (more options are available in the **[`tf.keras.applications` Module](https://www.tensorflow.org/api_docs/python/tf/keras/applications)** documentation) as well iterate on the custom model. 

Note that we're using `sparse_categorical_crossentropy` as our loss function, because we did _not_ one-hot encode our labels.

# Train the model
As our model is training you'll see a printout for each epoch, and can also monitor TPU usage by clicking on the TPU metrics in the toolbar at the top right of your notebook.

With model.summary() we'll see a printout of each of our layers, their corresponding shape, as well as the associated number of parameters. Notice that at the bottom of the printout we'll see information on the total parameters, trainable parameters, and non-trainable parameters. Because we're using a pre-trained model, we expect there to be a large number of non-trainable parameters (because the weights have already been assigned in the pre-trained model).

# Evaluating our model
The first chunk of code is provided to show you where the variables in the second chunk of code came from. As you can see, there's a lot of room for improvement in this model, but because we're using TPUs and have a relatively short training time, we're able to iterate on our model fairly rapidly.

In [None]:
# print out variables available to us
'''print(history.history.keys())'''

In [None]:
# create learning curves to evaluate model performance
'''history_frame = pd.DataFrame(history.history)
history_frame.loc[:, ['loss', 'val_loss']].plot()
history_frame.loc[:, ['sparse_categorical_accuracy', 'val_sparse_categorical_accuracy']].plot();'''

# Making predictions
Now that we've trained our model we can use it to make predictions! 

In [None]:
# Load testdata 
'''ds_test_ = image_dataset_from_directory(
    '..input/cassava-leaf-disease-classification/test_images',
    labels='inferred',
    label_mode='int',
    image_size=[48, 48], #128, 299
    interpolation='nearest',
    batch_size=64,
    shuffle=True,
)

# Data Pipeline
def convert_to_float(image, label):
    image = tf.image.convert_image_dtype(image, dtype=tf.float32)
    return image, label

AUTOTUNE = tf.data.experimental.AUTOTUNE
ds_test = (
    ds_test_
    .map(convert_to_float)
    .cache()
    .prefetch(buffer_size=AUTOTUNE)
)

probabilities = model.predict(ds_test)
predictions = np.argmax(probabilties, axis=-1)
print(predictions)

'''

#OLD CODE
'''test_ds = get_test_dataset(ordered=True) 
test_ds = test_ds.map(to_float32)

print('Computing predictions...')
test_images_ds = testing_dataset
test_images_ds = test_ds.map(lambda image, idnum: image)
probabilities = model.predict(test_images_ds)
predictions = np.argmax(probabilities, axis=-1)
print(predictions)'''

# Creating a submission file
Now that we've trained a model and made predictions we're ready to submit to the competition! You can run the following code below to get your submission file.

In [None]:
'''print('Generating submission.csv file...')
test_ids_ds = test_ds.map(lambda image, idnum: idnum).unbatch()
test_ids = next(iter(test_ids_ds.batch(NUM_TEST_IMAGES))).numpy().astype('U') # all in one batch
np.savetxt('submission.csv', np.rec.fromarrays([test_ids, predictions]), fmt=['%s', '%d'], delimiter=',', header='id,label', comments='')
!head submission.csv'''

Be aware that because this is a code competition with a hidden test set, internet and TPUs cannot be enabled on your submission notebook. Therefore TPUs will only be available for training models. For a walk-through on how to train on TPUs and run inference/submit on GPUs, see our [TPU Docs](https://www.kaggle.com/docs/tpu#tpu6).