# <center>Assignment 6 [Final]

In [1]:
Your_name = "V N D S R Prasad Jettiboina"
Your_emailid = "prasadjv99@gmail.com"

### **NOTE: Open this notebook in kaggle and import Artificial Lunar Landscape Dataset.** 

## > If you run this notebook as it is, you will get the val_iou_score of around 0.20 (remember to use GPU for training the model)

## > Your goal is to increase the val_iou_score as much as you can for this project using any method. The evaluation of this assignment will be based on your acquired val_iou_score. One point for each increasing 0.01 val_iou_score. 

> For example - if val_iou_score = 0.41, your points will be 41/100. 

## > Please check your notebook before submission, try to avoid any error.

### Some tips to increase the performance
* Increase the number of epochs
* Increase the number of layers in your model
* Using SOTA high performance networks with transfer learning
* Using callbacks and carefully observing your model performance

You are free to use other techniques too. 

# GUIDELINES ABOUT MAKING CHANGES TO THIS NOTEBOOK

For every change you make to this notebook, only those supported by the following would be considered for evaluation:

1. A descriptive comment explaining the change, e.g., if you are adding an extra conv2d layer, write about all the aspects of the conv2d layer you are adding. The comment should be placed at the point where the layer will be added. 

2. Changes brought to the system because of changes you introduced, e.g., if you changed layers - added, deleted, etc., you MUST show model properties before and after the changes were made.

3. Data preprocessing changes - if you use new data preprocessing techniques that are not a part of this notebook, you MUST explain their inner workings using 2-3 MARKDOWN cells, and ONLY AFTER THAT,  proceed to use that technique. Without this explanation, your technique will not be considered for evaluation.

4. ALL improvements MUST BE REPORTED VIA PLOTS OR TABLES OR BOTH, e.g., if increasing epochs from 30 to 50 improved your results, but decreasing learning_rate from 0.0001 to 0.00005 also improved your results, then these gains FIRST HAVE TO BE REPORTED SEPARATELY VIA PLOTS, THEN AGAIN TOGETHER VIA TABLES. 

  One plot can show iou values increasing from epoch 30 to epoch 50. Another plot can show iou values improving at varying learning rates. Finally tables can be used to show iou values for different learning rates, table 1 for lr_1 shows iou for epochs 30 through 50, table 2 for lr_2 shows iou for epochs 30 through 50, and so on and so forth - ALL YOUR RESULTS MUST BE COMPULSORILY QUANTIFIABLE! 

It is therefore advised to work on one improvement, optimize it, plot it, document it, then proceed to the next improvement - till you get a satisfactory IOU score.

5. FINAL IMPROVEMENT SUMMARY TABLE: Prepare a table with columns (improv#, description, increase in iou from, increase in iou to), and list out all the improvements you made to improve your model performance.


In [3]:
!pip install segmentation_models

In [4]:
# import the necessary Library

import tensorflow as tf
import segmentation_models as sm
import glob
import cv2
import os
import numpy as np
from matplotlib import pyplot as plt
import keras 
from sklearn.model_selection import train_test_split

* Provide environment variable SM_FRAMEWORK=keras / SM_FRAMEWORK=tf.keras before import segmentation_models
* Change framework sm.set_framework('keras') / sm.set_framework('tf.keras')

In [5]:
# Setting framework environment
os.environ["SM_FRAMEWORK"] = "tf.keras"
sm.set_framework('tf.keras')
keras.backend.set_image_data_format('channels_last')

## Data Preprocessing Pipeline

In [6]:
H = 256 # height of image
W = 256 # width of image

'''This function is used to return the list of path for images and masks in
sorted order from the given directory respectively.'''
# function to return list of image paths and mask paths 
def process_data(IMG_DIR, MASK_DIR):
    images = [os.path.join(IMG_DIR, x) for x in sorted(os.listdir(IMG_DIR))]
    masks = [os.path.join(MASK_DIR, x) for x in sorted(os.listdir(MASK_DIR))]

    return images, masks

'''This function is used to return splitted list of images and corresponding 
mask paths in train and test by providing test size.'''
# function to load data and train test split
def load_data(IMG_DIR, MASK_DIR):
    X, y = process_data(IMG_DIR, MASK_DIR)
    
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42)
    
    return X_train, X_test, y_train, y_test

'''This function is used to read images. It takes image path as input. 
After reading image it is resized by width and height provide above(256 x 256). 
Next normalization is done by dividing each values with 255. And the result is returned.'''
# function to read image
def read_image(x):
    x = cv2.imread(x, cv2.IMREAD_COLOR)
    x = cv2.resize(x, (W, H))
    x = x / 255.0
    x = x.astype(np.float32)
    return x

'''This function is used to read masks.'''
# function to read mask
def read_mask(x):
    x = cv2.imread(x, cv2.IMREAD_GRAYSCALE)
    x = cv2.resize(x, (W, H))
    x = x.astype(np.int32)
    return x

'''This function is used to generate tensorflow data pipeline. 
The tensorflow data pipeline is mapped to function ‘preprocess’ .'''
# function for tensorflow dataset pipeline
def tf_dataset(x, y, batch=8):
    dataset = tf.data.Dataset.from_tensor_slices((x, y))
    dataset = dataset.shuffle(buffer_size=5000)
    dataset = dataset.map(preprocess)
    dataset = dataset.batch(batch)
    dataset = dataset.repeat()
    dataset = dataset.prefetch(2)
    return dataset

'''This function takes image and mask path. 
It reads the image and mask as provided by paths. 
Mask is one hot encoded for multi class segmentation (here 4 class).'''
# function to read image and mask amd create one hot encoding for mask
def preprocess(x, y):
    def f(x, y):
        x = x.decode()
        y = y.decode()

        image = read_image(x)
        mask = read_mask(y)

        return image, mask

    image, mask = tf.numpy_function(f, [x, y], [tf.float32, tf.int32])
    mask = tf.one_hot(mask, 4, dtype=tf.int32)
    image.set_shape([H, W, 3])
    mask.set_shape([H, W, 4])

    return image, mask

## Load the dataset

In [7]:
'''RENDER_IMAGE_DIR_PATH: ‘Path of image directory’
GROUND_MASK_DIR_PATH: ‘Path of mask directory’

Here load_data function is called. This will load the dataset paths and 
split it into X_train, X_test, y_train, y_test '''

RENDER_IMAGE_DIR_PATH = '../input/artificial-lunar-rocky-landscape-dataset/images/render'
GROUND_MASK_DIR_PATH = '../input/artificial-lunar-rocky-landscape-dataset/images/clean'

X_train, X_test, y_train, y_test = load_data(RENDER_IMAGE_DIR_PATH, GROUND_MASK_DIR_PATH)
print(f"Dataset:\n Train: {len(X_train)} \n Test: {len(X_test)}")

## Generate tensorflow data pipeline

In [7]:
batch_size = 8

'''Here the tf_dataset function is called will generate the tensorflow data pipeline.'''
# calling tf_dataset
train_dataset = tf_dataset(X_train, y_train, batch=batch_size)
valid_dataset = tf_dataset(X_test, y_test, batch=batch_size)

## Creating U-net Architecture

In [8]:
from tensorflow.keras.layers import Input, Conv2D, BatchNormalization, Activation, MaxPool2D, UpSampling2D, Concatenate
from tensorflow.keras.models import Model

'''conv_block it is used to create one block with two convolution layer 
followed by BatchNormalization and activation function relu. 
If the pooling is required then Maxpool2D is applied and return it else not.'''
# function to create convolution block
def conv_block(inputs, filters, pool=True):
    x = Conv2D(filters, 3, padding="same")(inputs)
    x = BatchNormalization()(x)
    x = Activation("relu")(x)

    x = Conv2D(filters, 3, padding="same")(x)
    x = BatchNormalization()(x)
    x = Activation("relu")(x)

    if pool == True:
        p = MaxPool2D((2, 2))(x)
        return x, p
    else:
        return x

'''build_unet it is used to create the U-net architecture.'''
# function to build U-net
def build_unet(shape, num_classes):
    inputs = Input(shape)

    """ Encoder """
    x1, p1 = conv_block(inputs, 16, pool=True)
    x2, p2 = conv_block(p1, 32, pool=True)
    x3, p3 = conv_block(p2, 48, pool=True)
    x4, p4 = conv_block(p3, 64, pool=True)

    """ Bridge """
    b1 = conv_block(p4, 128, pool=False)

    """ Decoder """
    u1 = UpSampling2D((2, 2), interpolation="bilinear")(b1)
    c1 = Concatenate()([u1, x4])
    x5 = conv_block(c1, 64, pool=False)

    u2 = UpSampling2D((2, 2), interpolation="bilinear")(x5)
    c2 = Concatenate()([u2, x3])
    x6 = conv_block(c2, 48, pool=False)

    u3 = UpSampling2D((2, 2), interpolation="bilinear")(x6)
    c3 = Concatenate()([u3, x2])
    x7 = conv_block(c3, 32, pool=False)

    u4 = UpSampling2D((2, 2), interpolation="bilinear")(x7)
    c4 = Concatenate()([u4, x1])
    x8 = conv_block(c4, 16, pool=False)

    """ Output layer """
    output = Conv2D(num_classes, 1, padding="same", activation="softmax")(x8)

    return Model(inputs, output)

In [9]:
# calling build_unet function
model = build_unet((256, 256, 3), 4)

#printing model summary
model.summary()

## Load model and compile

In [10]:
# importing libraries
from tensorflow.keras.callbacks import ModelCheckpoint, ReduceLROnPlateau, EarlyStopping, TensorBoard
from segmentation_models.metrics import iou_score
import datetime, os

""" Defining Hyperparameters """
img_shape = (256, 256, 3)
num_classes = 4
lr = 1e-4
batch_size = 16
epochs = 5

""" Model building and compiling """
model = build_unet(img_shape, num_classes)
model.compile(loss="categorical_crossentropy", 
              optimizer=tf.keras.optimizers.Adam(lr), 
              metrics=[iou_score])


train_steps = len(X_train)//batch_size
valid_steps = len(X_test)//batch_size

## Train model

In [11]:
'''model.fit is used to train the model'''
model_history = model.fit(train_dataset,
        steps_per_epoch=train_steps,
        validation_data=valid_dataset,
        validation_steps=valid_steps,
        epochs=epochs,
    )

In [12]:
acc = model_history.history['iou_score']
val_acc = model_history.history['val_iou_score']

loss = model_history.history['loss']
val_loss = model_history.history['val_loss']

plt.figure(figsize=(8, 8))
plt.subplot(2, 1, 1)
plt.plot(acc, label='Training IOU score')
plt.plot(val_acc, label='Validation IOU score')
plt.legend(loc='lower right')
plt.ylabel('IOU Score')
plt.ylim([min(plt.ylim()),1])
plt.title('Training and Validation IOU score with learning rate 1e-4')

plt.subplot(2, 1, 2)
plt.plot(loss, label='Training Loss')
plt.plot(val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.ylabel('Cross Entropy')
plt.ylim([0,1.0])
plt.title('Training and Validation Loss with learning rate 1e-4')
plt.xlabel('epoch')
plt.show()

# <center>Advanced ML pipeline with segmentation_models and Callbacks (W7S1) (Week 7 lecture reference)

In [8]:
import os
import numpy as np
from matplotlib import pyplot as plt

import cv2
import keras
from tqdm import tqdm

import tensorflow as tf
import glob
from PIL import Image
from sklearn.model_selection import train_test_split

from skimage.io import imread
from skimage.transform import resize
import numpy as np
import math
from tensorflow.keras.utils import to_categorical, Sequence


import datetime

In [9]:
'''
Here load_data function is called. This will load the dataset paths and 
split it into X_train, X_test, y_train, y_test '''

img_dir = '../input/artificial-lunar-rocky-landscape-dataset/images/render'
mask_dir = '../input/artificial-lunar-rocky-landscape-dataset/images/clean'


# let's get the list of image paths and mask paths in sorted order from the given directory respectively
images = [os.path.join(img_dir, x) for x in sorted(os.listdir(img_dir))]
masks = [os.path.join(mask_dir, x) for x in sorted(os.listdir(mask_dir))]


# in this session, we will use our complete dataset
X_train = images[:8000]
y_train = masks[:8000]

X_valid = images[8000:]
y_valid = masks[8000:]

In [10]:
# Here, `x_set` is list of path to the images
# and `y_set` are the associated classes.
# https://www.tensorflow.org/api_docs/python/tf/keras/utils/Sequence
class LunarDataset(Sequence):

    def __init__(self, x_set, y_set, batch_size, dims, classes):
        self.x, self.y = x_set, y_set
        self.batch_size = batch_size
        self.img_height, self.img_width = dims
        self.classes = classes


    def __len__(self):
        return math.ceil(len(self.x) / self.batch_size)

    def __getitem__(self, idx):
        batch_x = self.x[idx * self.batch_size:(idx + 1) * self.batch_size]
        batch_y = self.y[idx * self.batch_size:(idx + 1) * self.batch_size]
        
        count = 0
        # https://numpy.org/doc/stable/reference/generated/numpy.zeros.html
        xtr = np.zeros((self.batch_size, self.img_height, self.img_width, 3))
        for filename in batch_x:
            img = imread(filename)[:self.img_height, :self.img_width, :] / 255.0
            img = img.astype(np.float32)
            xtr[count] = img
            count += 1
            
        count = 0
        ytr = np.zeros((self.batch_size, self.img_height, self.img_width, num_classes))
        for filename in batch_y:
            mask = imread(filename, as_gray = True)[:self.img_height, :self.img_width] // 0.07
            mask[mask == 3] = 2
            mask[mask == 10] = 3
            
            # one hot encoding our masks using to_categorical
            # https://www.tensorflow.org/api_docs/python/tf/keras/utils/to_categorical
            mask = to_categorical(mask, num_classes = 4)
            ytr[count] = mask
            count += 1

        return xtr, ytr.astype(np.int32)


batch_size = 16
dims = (480, 480)
num_classes = 4

train_dataset = LunarDataset(X_train, y_train, batch_size, dims, num_classes)
valid_dataset = LunarDataset(X_valid, y_valid, batch_size, dims, num_classes)

In [11]:
sam = next(iter(train_dataset))

sample = sam[1][1]

i, v = np.unique(sample, return_counts = True)
for a,b in zip(i,v):
    print(a," ", b)
    

fig, (a1, a2, a3, a4) = plt.subplots(1, 4, figsize = (20, 5))

a1.imshow(sample[:, :, 0])
a2.imshow(sample[:, :, 1])
a3.imshow(sample[:, :, 2])
a4.imshow(sample[:, :, 3])

plt.show()

# four channels showing different classes
# each channel have only 0 and 1 values

In [17]:
#### Step 1: Creating a base model 

IMG_SHAPE = (480, 480, 3)

# include_top specify that we don't want to use the top layer (classifier)
base_model = tf.keras.applications.VGG16(input_shape=IMG_SHAPE,
                                               include_top=False,
                                               weights='imagenet')




#### Step 2: Freezing the base

# It is important to freeze the convolutional base before you compile and train the model.
# Freezing prevents the weights in a given layer from being updated during training
# VGG16 has many layers, so setting the entire model's trainable flag to False will freeze all of them.

base_model.trainable = False

# Let's take a look at the base model architecture
base_model.summary()



#### Step 3: Adding the head

# inputs
inputs = tf.keras.Input(shape=(480, 480, 3))

# base with pretrained model
x = base_model(inputs, training=False)

# head layers
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dropout(0.2)(x)
outputs = tf.keras.layers.Dense(2)(x)

# model
model = tf.keras.Model(inputs, outputs)

# Let's take a look at the final model architecture
model.summary()


# reference: https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

In [18]:
# run this command to directly install the library in our notebook

!pip install segmentation_models

In [19]:
# By default it tries to import keras, if it is not installed, it will try to start with tensorflow.keras framework

import segmentation_models as sm
os.environ["SM_FRAMEWORK"] = "tf.keras"
sm.set_framework('tf.keras')
tf.keras.backend.set_image_data_format('channels_last')

In [20]:
BACKBONE = 'vgg16'
input_shape = (480, 480, 3)
n_classes = 4
activation = 'softmax'

# using segmentation_models to create U-net with vgg16 as a backbone
# and pretrained imagenet weights

# segmentation_model basically will create a mirror image of our backbone as expansion path and add to the contraction path
model = sm.Unet(backbone_name = BACKBONE, 
                input_shape = input_shape, 
                classes = n_classes, 
                activation = activation,
                encoder_weights = 'imagenet')
model.summary()

In [21]:
""" Hyperparameters """
lr = 1e-4
batch_size = 32
epochs = 10

# metrics for result validation
metrics = [sm.metrics.IOUScore(threshold=0.5), sm.metrics.FScore(threshold=0.5)]

# compiling the model
model.compile(loss = 'categorical_crossentropy', 
              optimizer = tf.keras.optimizers.Adam(lr), 
              metrics = metrics)

train_steps = len(X_train)//batch_size
valid_steps = len(X_valid)//batch_size


""" Callbacks """
current_datetime = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")

# https://www.tensorflow.org/api_docs/python/tf/keras/callbacks
callbacks = [
        tf.keras.callbacks.ModelCheckpoint(filepath=f'models/lunarModel_{current_datetime}.h5',
                        monitor='val_iou_score', verbose=0, 
                        mode='max', save_best_model=False),
             
        tf.keras.callbacks.ReduceLROnPlateau(monitor="val_iou_score", mode='max', patience=4,
                          factor=0.1, verbose=0, min_lr=1e-6),
             
        tf.keras.callbacks.EarlyStopping(monitor="val_iou_score", patience=5, verbose=0, mode='max'),

        tf.keras.callbacks.TensorBoard(f'models/logs_{current_datetime}')
    ]

In [2]:
# Fitting the model
model_history = model.fit(train_dataset,
        steps_per_epoch=train_steps,
        validation_data=valid_dataset,
        validation_steps=valid_steps,
        epochs=epochs,
        callbacks=callbacks
    )
# val_iou_score is expected to reach almost 0.8 after 5 epochs even without using callbacks

In [1]:
acc = model_history.history['iou_score']
val_acc = model_history.history['val_iou_score']

loss = model_history.history['loss']
val_loss = model_history.history['val_loss']

plt.figure(figsize=(8, 8))
plt.subplot(2, 1, 1)
plt.plot(acc, label='Training IOU score')
plt.plot(val_acc, label='Validation IOU score')
plt.legend(loc='lower right')
plt.ylabel('IOU Score')
plt.ylim([min(plt.ylim()),1])
plt.title('Training and Validation IOU score with learning rate 1e-4')

plt.subplot(2, 1, 2)
plt.plot(loss, label='Training Loss')
plt.plot(val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.ylabel('Cross Entropy')
plt.ylim([0,1.0])
plt.title('Training and Validation Loss with learning rate 1e-4')
plt.xlabel('epoch')
plt.show()

In [None]:
""" Hyperparameters """
lr = 1e-3
batch_size = 32
epochs = 2

# metrics for result validation
metrics = [sm.metrics.IOUScore(threshold=0.5), sm.metrics.FScore(threshold=0.5)]

# compiling the model
model.compile(loss = 'categorical_crossentropy', 
              optimizer = tf.keras.optimizers.Adam(lr), 
              metrics = metrics)

train_steps = len(X_train)//batch_size
valid_steps = len(X_valid)//batch_size


""" Callbacks """
current_datetime = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")

# https://www.tensorflow.org/api_docs/python/tf/keras/callbacks
callbacks = [
        tf.keras.callbacks.ModelCheckpoint(filepath=f'models/lunarModel_{current_datetime}.h5',
                        monitor='val_iou_score', verbose=0, 
                        mode='max', save_best_model=False),
             
        tf.keras.callbacks.ReduceLROnPlateau(monitor="val_iou_score", mode='max', patience=4,
                          factor=0.1, verbose=0, min_lr=1e-6),
             
        tf.keras.callbacks.EarlyStopping(monitor="val_iou_score", patience=5, verbose=0, mode='max'),

        tf.keras.callbacks.TensorBoard(f'models/logs_{current_datetime}')
    ]

In [None]:
# Fitting the model
model_history = model.fit(train_dataset,
        steps_per_epoch=train_steps,
        validation_data=valid_dataset,
        validation_steps=valid_steps,
        epochs=epochs,
        callbacks=callbacks
    )
# val_iou_score is expected to reach almost 0.8 after 5 epochs even without using callbacks

In [None]:
""" Hyperparameters """
lr = 1e-5
batch_size = 32
epochs = 2

# metrics for result validation
metrics = [sm.metrics.IOUScore(threshold=0.5,smooth=1e-5), sm.metrics.FScore(threshold=0.5,smooth=1e-5)]

# compiling the model
model.compile(loss = 'categorical_crossentropy', 
              optimizer = tf.keras.optimizers.Adam(lr), 
              metrics = metrics)

train_steps = len(X_train)//batch_size
valid_steps = len(X_valid)//batch_size


""" Callbacks """
current_datetime = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")

# https://www.tensorflow.org/api_docs/python/tf/keras/callbacks
callbacks = [
        tf.keras.callbacks.ModelCheckpoint(filepath=f'models/lunarModel_{current_datetime}.h5',
                        monitor='val_iou_score', verbose=0, 
                        mode='max', save_best_model=False),
             
        tf.keras.callbacks.ReduceLROnPlateau(monitor="val_iou_score", mode='max', patience=4,
                          factor=0.1, verbose=0, min_lr=1e-6),
             
        tf.keras.callbacks.EarlyStopping(monitor="val_iou_score", patience=5, verbose=0, mode='max'),

        tf.keras.callbacks.TensorBoard(f'models/logs_{current_datetime}')
    ]

In [None]:
# Fitting the model
model_history = model.fit(train_dataset,
        steps_per_epoch=train_steps,
        validation_data=valid_dataset,
        validation_steps=valid_steps,
        epochs=epochs,
        callbacks=callbacks
    )
# val_iou_score is expected to reach almost 0.8 after 5 epochs even without using callbacks

As tried on different learning rates and different parameters picked some optimized params and run on different models

# Using Resnet as backbone

In [22]:
#### Step 1: Creating a base model 

IMG_SHAPE = (480, 480, 3)

# include_top specify that we don't want to use the top layer (classifier)
base_model = tf.keras.applications.ResNet50(input_shape=IMG_SHAPE,
                                               include_top=False,
                                               weights='imagenet')




#### Step 2: Freezing the base

# It is important to freeze the convolutional base before you compile and train the model.
# Freezing prevents the weights in a given layer from being updated during training
# VGG16 has many layers, so setting the entire model's trainable flag to False will freeze all of them.

base_model.trainable = False

# Let's take a look at the base model architecture
base_model.summary()



#### Step 3: Adding the head

# inputs
inputs = tf.keras.Input(shape=(480, 480, 3))

# base with pretrained model
x = base_model(inputs, training=False)

# head layers
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dropout(0.2)(x)
outputs = tf.keras.layers.Dense(2)(x)

# model
model = tf.keras.Model(inputs, outputs)

# Let's take a look at the final model architecture
model.summary()


# reference: https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

In [23]:
BACKBONE = 'resnet34'
input_shape = (480, 480, 3)
n_classes = 4
activation = 'softmax'

# using segmentation_models to create U-net with vgg16 as a backbone
# and pretrained imagenet weights

# segmentation_model basically will create a mirror image of our backbone as expansion path and add to the contraction path
model = sm.Unet(backbone_name = BACKBONE, 
                input_shape = input_shape, 
                classes = n_classes, 
                activation = activation,
                encoder_weights = 'imagenet')
model.summary()

In [24]:
""" Hyperparameters """
lr = 1e-4
batch_size = 32
epochs = 10

# metrics for result validation
metrics = [sm.metrics.IOUScore(threshold=0.5), sm.metrics.FScore(threshold=0.5)]

# compiling the model
model.compile(loss = 'categorical_crossentropy', 
              optimizer = tf.keras.optimizers.Adam(lr), 
              metrics = metrics)

train_steps = len(X_train)//batch_size
valid_steps = len(X_valid)//batch_size


""" Callbacks """
current_datetime = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")

# https://www.tensorflow.org/api_docs/python/tf/keras/callbacks
callbacks = [
        tf.keras.callbacks.ModelCheckpoint(filepath=f'models/lunarModel_{current_datetime}.h5',
                        monitor='val_iou_score', verbose=0, 
                        mode='max', save_best_model=False),
             
        tf.keras.callbacks.ReduceLROnPlateau(monitor="val_iou_score", mode='max', patience=4,
                          factor=0.1, verbose=0, min_lr=1e-6),
             
        tf.keras.callbacks.EarlyStopping(monitor="val_iou_score", patience=5, verbose=0, mode='max'),

        tf.keras.callbacks.TensorBoard(f'models/logs_{current_datetime}')
    ]

In [25]:
# Fitting the model
model_history = model.fit(train_dataset,
        steps_per_epoch=train_steps,
        validation_data=valid_dataset,
        validation_steps=valid_steps,
        epochs=epochs,
        callbacks=callbacks
    )
# val_iou_score is expected to reach almost 0.8 after 5 epochs even without using callbacks

In [26]:
model_history1=model_history.history
print(model_history1)

In [27]:
acc = model_history.history['iou_score']
val_acc = model_history.history['val_iou_score']

loss = model_history.history['loss']
val_loss = model_history.history['val_loss']

plt.figure(figsize=(8, 8))
plt.subplot(2, 1, 1)
plt.plot(acc, label='Training IOU score')
plt.plot(val_acc, label='Validation IOU score')
plt.legend(loc='lower right')
plt.ylabel('IOU Score')
plt.ylim([min(plt.ylim()),1])
plt.title('Training and Validation IOU score with learning rate 1e-4')

plt.subplot(2, 1, 2)
plt.plot(loss, label='Training Loss')
plt.plot(val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.ylabel('Cross Entropy')
plt.ylim([0,1.0])
plt.title('Training and Validation Loss with learning rate 1e-4')
plt.xlabel('epoch')
plt.show()

In [None]:
BACKBONE = 'resnet50'
input_shape = (480, 480, 3)
n_classes = 4
activation = 'softmax'

# using segmentation_models to create U-net with vgg16 as a backbone
# and pretrained imagenet weights

# segmentation_model basically will create a mirror image of our backbone as expansion path and add to the contraction path
model = sm.Unet(backbone_name = BACKBONE, 
                input_shape = input_shape, 
                classes = n_classes, 
                activation = activation,
                encoder_weights = 'imagenet')
model.summary()

In [None]:
""" Hyperparameters """
lr = 1e-4
batch_size = 32
epochs = 10

# metrics for result validation
metrics = [sm.metrics.IOUScore(threshold=0.5), sm.metrics.FScore(threshold=0.5)]

# compiling the model
model.compile(loss = 'categorical_crossentropy', 
              optimizer = tf.keras.optimizers.Adam(lr), 
              metrics = metrics)

train_steps = len(X_train)//batch_size
valid_steps = len(X_valid)//batch_size


""" Callbacks """
current_datetime = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")

# https://www.tensorflow.org/api_docs/python/tf/keras/callbacks
callbacks = [
        tf.keras.callbacks.ModelCheckpoint(filepath=f'models/lunarModel_{current_datetime}.h5',
                        monitor='val_iou_score', verbose=0, 
                        mode='max', save_best_model=False),
             
        tf.keras.callbacks.ReduceLROnPlateau(monitor="val_iou_score", mode='max', patience=4,
                          factor=0.1, verbose=0, min_lr=1e-6),
             
        tf.keras.callbacks.EarlyStopping(monitor="val_iou_score", patience=5, verbose=0, mode='max'),

        tf.keras.callbacks.TensorBoard(f'models/logs_{current_datetime}')
    ]

In [None]:
# Fitting the model
model_history = model.fit(train_dataset,
        steps_per_epoch=train_steps,
        validation_data=valid_dataset,
        validation_steps=valid_steps,
        epochs=epochs,
        callbacks=callbacks
    )
# val_iou_score is expected to reach almost 0.8 after 5 epochs even without using callbacks

In [None]:
model_history2=model_history.history
print(model_history2)

In [None]:
acc = model_history.history['iou_score']
val_acc = model_history.history['val_iou_score']

loss = model_history.history['loss']
val_loss = model_history.history['val_loss']

plt.figure(figsize=(8, 8))
plt.subplot(2, 1, 1)
plt.plot(acc, label='Training IOU score')
plt.plot(val_acc, label='Validation IOU score')
plt.legend(loc='lower right')
plt.ylabel('IOU Score')
plt.ylim([min(plt.ylim()),1])
plt.title('Training and Validation IOU score with learning rate 1e-4')

plt.subplot(2, 1, 2)
plt.plot(loss, label='Training Loss')
plt.plot(val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.ylabel('Cross Entropy')
plt.ylim([0,1.0])
plt.title('Training and Validation Loss with learning rate 1e-4')
plt.xlabel('epoch')
plt.show()

# Using Inception as backbone

In [None]:
BACKBONE = 'inceptionv3'
input_shape = (480, 480, 3)
n_classes = 4
activation = 'softmax'

# using segmentation_models to create U-net with vgg16 as a backbone
# and pretrained imagenet weights

# segmentation_model basically will create a mirror image of our backbone as expansion path and add to the contraction path
model = sm.Unet(backbone_name = BACKBONE, 
                input_shape = input_shape, 
                classes = n_classes, 
                activation = activation,
                encoder_weights = 'imagenet')
model.summary()

In [None]:
""" Hyperparameters """
lr = 1e-4
batch_size = 32
epochs = 10

# metrics for result validation
metrics = [sm.metrics.IOUScore(threshold=0.5), sm.metrics.FScore(threshold=0.5)]

# compiling the model
model.compile(loss = 'categorical_crossentropy', 
              optimizer = tf.keras.optimizers.Adam(lr), 
              metrics = metrics)

train_steps = len(X_train)//batch_size
valid_steps = len(X_valid)//batch_size


""" Callbacks """
current_datetime = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")

# https://www.tensorflow.org/api_docs/python/tf/keras/callbacks
callbacks = [
        tf.keras.callbacks.ModelCheckpoint(filepath=f'models/lunarModel_{current_datetime}.h5',
                        monitor='val_iou_score', verbose=0, 
                        mode='max', save_best_model=False),
             
        tf.keras.callbacks.ReduceLROnPlateau(monitor="val_iou_score", mode='max', patience=4,
                          factor=0.1, verbose=0, min_lr=1e-6),
             
        tf.keras.callbacks.EarlyStopping(monitor="val_iou_score", patience=5, verbose=0, mode='max'),

        tf.keras.callbacks.TensorBoard(f'models/logs_{current_datetime}')
    ]

In [None]:
# Fitting the model
model_history = model.fit(train_dataset,
        steps_per_epoch=train_steps,
        validation_data=valid_dataset,
        validation_steps=valid_steps,
        epochs=epochs,
        callbacks=callbacks
    )
# val_iou_score is expected to reach almost 0.8 after 5 epochs even without using callbacks

In [None]:
model_history3=model_history.history
print(model_history3)

In [None]:
acc = model_history.history['iou_score']
val_acc = model_history.history['val_iou_score']

loss = model_history.history['loss']
val_loss = model_history.history['val_loss']

plt.figure(figsize=(8, 8))
plt.subplot(2, 1, 1)
plt.plot(acc, label='Training IOU score')
plt.plot(val_acc, label='Validation IOU score')
plt.legend(loc='lower right')
plt.ylabel('IOU Score')
plt.ylim([min(plt.ylim()),1])
plt.title('Training and Validation IOU score with learning rate 1e-4')

plt.subplot(2, 1, 2)
plt.plot(loss, label='Training Loss')
plt.plot(val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.ylabel('Cross Entropy')
plt.ylim([0,1.0])
plt.title('Training and Validation Loss with learning rate 1e-4')
plt.xlabel('epoch')
plt.show()

In [None]:
BACKBONE = 'inceptionresnetv2'
input_shape = (480, 480, 3)
n_classes = 4
activation = 'softmax'

# using segmentation_models to create U-net with vgg16 as a backbone
# and pretrained imagenet weights

# segmentation_model basically will create a mirror image of our backbone as expansion path and add to the contraction path
model = sm.Unet(backbone_name = BACKBONE, 
                input_shape = input_shape, 
                classes = n_classes, 
                activation = activation,
                encoder_weights = 'imagenet')
model.summary()

In [None]:
""" Hyperparameters """
lr = 1e-4
batch_size = 32
epochs = 10

# metrics for result validation
metrics = [sm.metrics.IOUScore(threshold=0.5), sm.metrics.FScore(threshold=0.5)]

# compiling the model
model.compile(loss = 'categorical_crossentropy', 
              optimizer = tf.keras.optimizers.Adam(lr), 
              metrics = metrics)

train_steps = len(X_train)//batch_size
valid_steps = len(X_valid)//batch_size


""" Callbacks """
current_datetime = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")

# https://www.tensorflow.org/api_docs/python/tf/keras/callbacks
callbacks = [
        tf.keras.callbacks.ModelCheckpoint(filepath=f'models/lunarModel_{current_datetime}.h5',
                        monitor='val_iou_score', verbose=0, 
                        mode='max', save_best_model=False),
             
        tf.keras.callbacks.ReduceLROnPlateau(monitor="val_iou_score", mode='max', patience=4,
                          factor=0.1, verbose=0, min_lr=1e-6),
             
#         tf.keras.callbacks.EarlyStopping(monitor="val_iou_score", patience=5, verbose=0, mode='max'),
        tf.keras.callbacks.EarlyStopping(monitor="val_iou_score", patience=1, verbose=0, mode='max',min_delta=0.01),

        tf.keras.callbacks.TensorBoard(f'models/logs_{current_datetime}')
    ]

In [None]:
# Fitting the model
model_history = model.fit(train_dataset,
        steps_per_epoch=train_steps,
        validation_data=valid_dataset,
        validation_steps=valid_steps,
        epochs=epochs,
        callbacks=callbacks
    )
# val_iou_score is expected to reach almost 0.8 after 5 epochs even without using callbacks

In [None]:
model_history4=model_history.history
print(model_history4)

In [None]:
acc = model_history.history['iou_score']
val_acc = model_history.history['val_iou_score']

loss = model_history.history['loss']
val_loss = model_history.history['val_loss']

plt.figure(figsize=(8, 8))
plt.subplot(2, 1, 1)
plt.plot(acc, label='Training IOU score')
plt.plot(val_acc, label='Validation IOU score')
plt.legend(loc='lower right')
plt.ylabel('IOU Score')
plt.ylim([min(plt.ylim()),1])
plt.title('Training and Validation IOU score with learning rate 1e-4')

plt.subplot(2, 1, 2)
plt.plot(loss, label='Training Loss')
plt.plot(val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.ylabel('Cross Entropy')
plt.ylim([0,1.0])
plt.title('Training and Validation Loss with learning rate 1e-4')
plt.xlabel('epoch')
plt.show()

## [IMPORTANT] Paste you final model training history here in the markdown.(just double click this line, and you'll be able to edit it. 

NOTE: If we find that your actual model score and what you paste here is differing, your assignment will get rejected.  

here ----

# After all these models achieved val_iou_score is in between .80 to .82
The stable incemental epochs observed in InceptionResnetv2

Epoch 1/10
  1/250 [..............................] - ETA: 1:28:37 - loss: 2.1726 - iou_score: 0.0203 - f1-score: 0.0393
2022-06-12 15:07:01.130978: I tensorflow/core/profiler/lib/profiler_session.cc:131] Profiler session initializing.
2022-06-12 15:07:01.131042: I tensorflow/core/profiler/lib/profiler_session.cc:146] Profiler session started.
  2/250 [..............................] - ETA: 7:44 - loss: 2.1149 - iou_score: 0.0192 - f1-score: 0.0371  
2022-06-12 15:07:03.723045: I tensorflow/core/profiler/lib/profiler_session.cc:66] Profiler session collecting data.
2022-06-12 15:07:03.732656: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1748] CUPTI activity buffer flushed
2022-06-12 15:07:03.923661: I tensorflow/core/profiler/internal/gpu/cupti_collector.cc:673]  GpuTracer has collected 5964 callback api events and 5960 activity events.
2022-06-12 15:07:04.080956: I tensorflow/core/profiler/lib/profiler_session.cc:164] Profiler session tear down.
2022-06-12 15:07:04.259236: I tensorflow/core/profiler/rpc/client/save_profile.cc:136] Creating directory: models/logs_20220612-150635/train/plugins/profile/2022_06_12_15_07_04

2022-06-12 15:07:04.358257: I tensorflow/core/profiler/rpc/client/save_profile.cc:142] Dumped gzipped tool data for trace.json.gz to models/logs_20220612-150635/train/plugins/profile/2022_06_12_15_07_04/14dbc38ab93a.trace.json.gz
2022-06-12 15:07:04.568185: I tensorflow/core/profiler/rpc/client/save_profile.cc:136] Creating directory: models/logs_20220612-150635/train/plugins/profile/2022_06_12_15_07_04

2022-06-12 15:07:04.577136: I tensorflow/core/profiler/rpc/client/save_profile.cc:142] Dumped gzipped tool data for memory_profile.json.gz to models/logs_20220612-150635/train/plugins/profile/2022_06_12_15_07_04/14dbc38ab93a.memory_profile.json.gz
2022-06-12 15:07:04.586890: I tensorflow/core/profiler/rpc/client/capture_profile.cc:251] Creating directory: models/logs_20220612-150635/train/plugins/profile/2022_06_12_15_07_04
Dumped tool data for xplane.pb to models/logs_20220612-150635/train/plugins/profile/2022_06_12_15_07_04/14dbc38ab93a.xplane.pb
Dumped tool data for overview_page.pb to models/logs_20220612-150635/train/plugins/profile/2022_06_12_15_07_04/14dbc38ab93a.overview_page.pb
Dumped tool data for input_pipeline.pb to models/logs_20220612-150635/train/plugins/profile/2022_06_12_15_07_04/14dbc38ab93a.input_pipeline.pb
Dumped tool data for tensorflow_stats.pb to models/logs_20220612-150635/train/plugins/profile/2022_06_12_15_07_04/14dbc38ab93a.tensorflow_stats.pb
Dumped tool data for kernel_stats.pb to models/logs_20220612-150635/train/plugins/profile/2022_06_12_15_07_04/14dbc38ab93a.kernel_stats.pb

250/250 [==============================] - 393s 1s/step - loss: 0.6413 - iou_score: 0.4547 - f1-score: 0.5372 - val_loss: 0.3477 - val_iou_score: 0.6490 - val_f1-score: 0.7417
Epoch 2/10
250/250 [==============================] - 370s 1s/step - loss: 0.2104 - iou_score: 0.7187 - f1-score: 0.8082 - val_loss: 0.1596 - val_iou_score: 0.7190 - val_f1-score: 0.8063
Epoch 3/10
250/250 [==============================] - 367s 1s/step - loss: 0.1368 - iou_score: 0.7788 - f1-score: 0.8590 - val_loss: 0.1148 - val_iou_score: 0.7886 - val_f1-score: 0.8659
Epoch 4/10
250/250 [==============================] - 327s 1s/step - loss: 0.1142 - iou_score: 0.7974 - f1-score: 0.8736 - val_loss: 0.1020 - val_iou_score: 0.8077 - val_f1-score: 0.8808
Epoch 5/10
250/250 [==============================] - 327s 1s/step - loss: 0.1024 - iou_score: 0.8052 - f1-score: 0.8790 - val_loss: 0.0998 - val_iou_score: 0.8025 - val_f1-score: 0.8767
Epoch 6/10
250/250 [==============================] - 328s 1s/step - loss: 0.0985 - iou_score: 0.8144 - f1-score: 0.8861 - val_loss: 0.0993 - val_iou_score: 0.7735 - val_f1-score: 0.8536
Epoch 7/10
250/250 [==============================] - 329s 1s/step - loss: 0.0875 - iou_score: 0.8268 - f1-score: 0.8947 - val_loss: 0.0893 - val_iou_score: 0.8186 - val_f1-score: 0.8882
Epoch 8/10
250/250 [==============================] - 367s 1s/step - loss: 0.0820 - iou_score: 0.8319 - f1-score: 0.8987 - val_loss: 0.0904 - val_iou_score: 0.8121 - val_f1-score: 0.8832
Epoch 9/10
250/250 [==============================] - 368s 1s/step - loss: 0.0748 - iou_score: 0.8427 - f1-score: 0.9067 - val_loss: 0.0844 - val_iou_score: 0.8228 - val_f1-score: 0.8918
Epoch 10/10
250/250 [==============================] - 328s 1s/step - loss: 0.0735 - iou_score: 0.8430 - f1-score: 0.9064 - val_loss: 0.0938 - val_iou_score: 0.7771 - val_f1-score: 0.8568


I have tried out some experiments on parameters to tweak some optimal parameters and checking the results with very less epochs and found that changing the optimal parameters did not result in quite a change in results towards desired results.  

Based on the experimented results I went on testing multiple standard models (ResNet and Inception) as backbone and performed transfer learning and collected inferences.

ResNet is a stable model which is good for Feaure extraction.
Inception is a very deep network and also it uttilises computing resources efficiently and is well known for it's accuracy.

The results from the mix of models can be witnessed in the above tests(code blocks).

Observed observations:
For 10 epochs, almost all models gives val_iou_score nearly 0.8.
InceptionResNetv2 scored better accuracy in the begning compared to other models.
InceptionResNetv2was run for another 10 epochs with "min_delta" parameter in "Early Stopping"

# 10 epochs ofResNet

/opt/conda/lib/python3.7/site-packages/keras/utils/generic_utils.py:497: CustomMaskWarning: Custom mask layers require a config and must override get_config. When loading, the custom mask layer must be passed to the custom_objects argument.
  category=CustomMaskWarning)
2022-06-12 16:50:47.510453: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
Epoch 1/10
2022-06-12 16:50:55.469730: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8005
  1/250 [..............................] - ETA: 1:11:14 - loss: 2.5629 - iou_score: 0.0090 - f1-score: 0.0177
2022-06-12 16:51:05.170845: I tensorflow/core/profiler/lib/profiler_session.cc:131] Profiler session initializing.
2022-06-12 16:51:05.173545: I tensorflow/core/profiler/lib/profiler_session.cc:146] Profiler session started.
  2/250 [..............................] - ETA: 6:08 - loss: 2.4813 - iou_score: 0.0114 - f1-score: 0.0223   
2022-06-12 16:51:06.217819: I tensorflow/core/profiler/lib/profiler_session.cc:66] Profiler session collecting data.
2022-06-12 16:51:06.219584: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1748] CUPTI activity buffer flushed
2022-06-12 16:51:06.374846: I tensorflow/core/profiler/internal/gpu/cupti_collector.cc:673]  GpuTracer has collected 1360 callback api events and 1356 activity events. 
2022-06-12 16:51:06.432439: I tensorflow/core/profiler/lib/profiler_session.cc:164] Profiler session tear down.
2022-06-12 16:51:06.495599: I tensorflow/core/profiler/rpc/client/save_profile.cc:136] Creating directory: models/logs_20220612-165042/train/plugins/profile/2022_06_12_16_51_06

2022-06-12 16:51:06.526933: I tensorflow/core/profiler/rpc/client/save_profile.cc:142] Dumped gzipped tool data for trace.json.gz to models/logs_20220612-165042/train/plugins/profile/2022_06_12_16_51_06/a9df19ac2169.trace.json.gz
2022-06-12 16:51:06.632621: I tensorflow/core/profiler/rpc/client/save_profile.cc:136] Creating directory: models/logs_20220612-165042/train/plugins/profile/2022_06_12_16_51_06

2022-06-12 16:51:06.642475: I tensorflow/core/profiler/rpc/client/save_profile.cc:142] Dumped gzipped tool data for memory_profile.json.gz to models/logs_20220612-165042/train/plugins/profile/2022_06_12_16_51_06/a9df19ac2169.memory_profile.json.gz
2022-06-12 16:51:06.647043: I tensorflow/core/profiler/rpc/client/capture_profile.cc:251] Creating directory: models/logs_20220612-165042/train/plugins/profile/2022_06_12_16_51_06
Dumped tool data for xplane.pb to models/logs_20220612-165042/train/plugins/profile/2022_06_12_16_51_06/a9df19ac2169.xplane.pb
Dumped tool data for overview_page.pb to models/logs_20220612-165042/train/plugins/profile/2022_06_12_16_51_06/a9df19ac2169.overview_page.pb
Dumped tool data for input_pipeline.pb to models/logs_20220612-165042/train/plugins/profile/2022_06_12_16_51_06/a9df19ac2169.input_pipeline.pb
Dumped tool data for tensorflow_stats.pb to models/logs_20220612-165042/train/plugins/profile/2022_06_12_16_51_06/a9df19ac2169.tensorflow_stats.pb
Dumped tool data for kernel_stats.pb to models/logs_20220612-165042/train/plugins/profile/2022_06_12_16_51_06/a9df19ac2169.kernel_stats.pb

250/250 [==============================] - 376s 1s/step - loss: 0.8468 - iou_score: 0.3734 - f1-score: 0.4427 - val_loss: 1.3447 - val_iou_score: 0.0019 - val_f1-score: 0.0038
Epoch 2/10
250/250 [==============================] - 331s 1s/step - loss: 0.3897 - iou_score: 0.6389 - f1-score: 0.7390 - val_loss: 0.6531 - val_iou_score: 0.2620 - val_f1-score: 0.3393
Epoch 3/10
250/250 [==============================] - 308s 1s/step - loss: 0.2275 - iou_score: 0.7300 - f1-score: 0.8190 - val_loss: 0.2898 - val_iou_score: 0.4909 - val_f1-score: 0.5403
Epoch 4/10
250/250 [==============================] - 269s 1s/step - loss: 0.1560 - iou_score: 0.7684 - f1-score: 0.8505 - val_loss: 0.1811 - val_iou_score: 0.6150 - val_f1-score: 0.6971
Epoch 5/10
250/250 [==============================] - 261s 1s/step - loss: 0.1242 - iou_score: 0.7843 - f1-score: 0.8635 - val_loss: 0.1156 - val_iou_score: 0.7714 - val_f1-score: 0.8525
Epoch 6/10
250/250 [==============================] - 246s 981ms/step - loss: 0.1090 - iou_score: 0.7948 - f1-score: 0.8714 - val_loss: 0.1164 - val_iou_score: 0.7715 - val_f1-score: 0.8518
Epoch 7/10
250/250 [==============================] - 256s 1s/step - loss: 0.0994 - iou_score: 0.8038 - f1-score: 0.8783 - val_loss: 0.0997 - val_iou_score: 0.7979 - val_f1-score: 0.8732
Epoch 8/10
250/250 [==============================] - 260s 1s/step - loss: 0.0902 - iou_score: 0.8137 - f1-score: 0.8859 - val_loss: 0.1022 - val_iou_score: 0.7556 - val_f1-score: 0.8386
Epoch 9/10
250/250 [==============================] - 297s 1s/step - loss: 0.0842 - iou_score: 0.8188 - f1-score: 0.8893 - val_loss: 0.0917 - val_iou_score: 0.8090 - val_f1-score: 0.8810
Epoch 10/10
250/250 [==============================] - 297s 1s/step - loss: 0.0779 - iou_score: 0.8267 - f1-score: 0.8953 - val_loss: 0.1012 - val_iou_score: 0.7576 - val_f1-score: 0.8385

# First 10 epochs of InceptionResNetv2

Epoch 1/10
  1/250 [..............................] - ETA: 1:28:37 - loss: 2.1726 - iou_score: 0.0203 - f1-score: 0.0393
2022-06-12 15:07:01.130978: I tensorflow/core/profiler/lib/profiler_session.cc:131] Profiler session initializing.
2022-06-12 15:07:01.131042: I tensorflow/core/profiler/lib/profiler_session.cc:146] Profiler session started.
  2/250 [..............................] - ETA: 7:44 - loss: 2.1149 - iou_score: 0.0192 - f1-score: 0.0371  
2022-06-12 15:07:03.723045: I tensorflow/core/profiler/lib/profiler_session.cc:66] Profiler session collecting data.
2022-06-12 15:07:03.732656: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1748] CUPTI activity buffer flushed
2022-06-12 15:07:03.923661: I tensorflow/core/profiler/internal/gpu/cupti_collector.cc:673]  GpuTracer has collected 5964 callback api events and 5960 activity events.
2022-06-12 15:07:04.080956: I tensorflow/core/profiler/lib/profiler_session.cc:164] Profiler session tear down.
2022-06-12 15:07:04.259236: I tensorflow/core/profiler/rpc/client/save_profile.cc:136] Creating directory: models/logs_20220612-150635/train/plugins/profile/2022_06_12_15_07_04

2022-06-12 15:07:04.358257: I tensorflow/core/profiler/rpc/client/save_profile.cc:142] Dumped gzipped tool data for trace.json.gz to models/logs_20220612-150635/train/plugins/profile/2022_06_12_15_07_04/14dbc38ab93a.trace.json.gz
2022-06-12 15:07:04.568185: I tensorflow/core/profiler/rpc/client/save_profile.cc:136] Creating directory: models/logs_20220612-150635/train/plugins/profile/2022_06_12_15_07_04

2022-06-12 15:07:04.577136: I tensorflow/core/profiler/rpc/client/save_profile.cc:142] Dumped gzipped tool data for memory_profile.json.gz to models/logs_20220612-150635/train/plugins/profile/2022_06_12_15_07_04/14dbc38ab93a.memory_profile.json.gz
2022-06-12 15:07:04.586890: I tensorflow/core/profiler/rpc/client/capture_profile.cc:251] Creating directory: models/logs_20220612-150635/train/plugins/profile/2022_06_12_15_07_04
Dumped tool data for xplane.pb to models/logs_20220612-150635/train/plugins/profile/2022_06_12_15_07_04/14dbc38ab93a.xplane.pb
Dumped tool data for overview_page.pb to models/logs_20220612-150635/train/plugins/profile/2022_06_12_15_07_04/14dbc38ab93a.overview_page.pb
Dumped tool data for input_pipeline.pb to models/logs_20220612-150635/train/plugins/profile/2022_06_12_15_07_04/14dbc38ab93a.input_pipeline.pb
Dumped tool data for tensorflow_stats.pb to models/logs_20220612-150635/train/plugins/profile/2022_06_12_15_07_04/14dbc38ab93a.tensorflow_stats.pb
Dumped tool data for kernel_stats.pb to models/logs_20220612-150635/train/plugins/profile/2022_06_12_15_07_04/14dbc38ab93a.kernel_stats.pb

250/250 [==============================] - 393s 1s/step - loss: 0.6413 - iou_score: 0.4547 - f1-score: 0.5372 - val_loss: 0.3477 - val_iou_score: 0.6490 - val_f1-score: 0.7417
Epoch 2/10
250/250 [==============================] - 370s 1s/step - loss: 0.2104 - iou_score: 0.7187 - f1-score: 0.8082 - val_loss: 0.1596 - val_iou_score: 0.7190 - val_f1-score: 0.8063
Epoch 3/10
250/250 [==============================] - 367s 1s/step - loss: 0.1368 - iou_score: 0.7788 - f1-score: 0.8590 - val_loss: 0.1148 - val_iou_score: 0.7886 - val_f1-score: 0.8659
Epoch 4/10
250/250 [==============================] - 327s 1s/step - loss: 0.1142 - iou_score: 0.7974 - f1-score: 0.8736 - val_loss: 0.1020 - val_iou_score: 0.8077 - val_f1-score: 0.8808
Epoch 5/10
250/250 [==============================] - 327s 1s/step - loss: 0.1024 - iou_score: 0.8052 - f1-score: 0.8790 - val_loss: 0.0998 - val_iou_score: 0.8025 - val_f1-score: 0.8767
Epoch 6/10
250/250 [==============================] - 328s 1s/step - loss: 0.0985 - iou_score: 0.8144 - f1-score: 0.8861 - val_loss: 0.0993 - val_iou_score: 0.7735 - val_f1-score: 0.8536
Epoch 7/10
250/250 [==============================] - 329s 1s/step - loss: 0.0875 - iou_score: 0.8268 - f1-score: 0.8947 - val_loss: 0.0893 - val_iou_score: 0.8186 - val_f1-score: 0.8882
Epoch 8/10
250/250 [==============================] - 367s 1s/step - loss: 0.0820 - iou_score: 0.8319 - f1-score: 0.8987 - val_loss: 0.0904 - val_iou_score: 0.8121 - val_f1-score: 0.8832
Epoch 9/10
250/250 [==============================] - 368s 1s/step - loss: 0.0748 - iou_score: 0.8427 - f1-score: 0.9067 - val_loss: 0.0844 - val_iou_score: 0.8228 - val_f1-score: 0.8918
Epoch 10/10
250/250 [==============================] - 328s 1s/step - loss: 0.0735 - iou_score: 0.8430 - f1-score: 0.9064 - val_loss: 0.0938 - val_iou_score: 0.7771 - val_f1-score: 0.8568

# Next 10 epochs with early stopping (8 epochs reached)

Epoch 1/10
  1/250 [..............................] - ETA: 1:18:04 - loss: 0.0353 - iou_score: 0.8824 - f1-score: 0.9339
2022-06-12 17:21:26.540918: I tensorflow/core/profiler/lib/profiler_session.cc:131] Profiler session initializing.
2022-06-12 17:21:26.540973: I tensorflow/core/profiler/lib/profiler_session.cc:146] Profiler session started.
  2/250 [..............................] - ETA: 6:58 - loss: 0.0438 - iou_score: 0.8481 - f1-score: 0.9115   
2022-06-12 17:21:29.221979: I tensorflow/core/profiler/lib/profiler_session.cc:66] Profiler session collecting data.
2022-06-12 17:21:29.233435: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1748] CUPTI activity buffer flushed
2022-06-12 17:21:29.419171: I tensorflow/core/profiler/internal/gpu/cupti_collector.cc:673]  GpuTracer has collected 5964 callback api events and 5960 activity events. 
2022-06-12 17:21:29.572608: I tensorflow/core/profiler/lib/profiler_session.cc:164] Profiler session tear down.
2022-06-12 17:21:29.742585: I tensorflow/core/profiler/rpc/client/save_profile.cc:136] Creating directory: models/logs_20220612-172102/train/plugins/profile/2022_06_12_17_21_29

2022-06-12 17:21:29.844091: I tensorflow/core/profiler/rpc/client/save_profile.cc:142] Dumped gzipped tool data for trace.json.gz to models/logs_20220612-172102/train/plugins/profile/2022_06_12_17_21_29/14dbc38ab93a.trace.json.gz
2022-06-12 17:21:30.049926: I tensorflow/core/profiler/rpc/client/save_profile.cc:136] Creating directory: models/logs_20220612-172102/train/plugins/profile/2022_06_12_17_21_29

2022-06-12 17:21:30.058949: I tensorflow/core/profiler/rpc/client/save_profile.cc:142] Dumped gzipped tool data for memory_profile.json.gz to models/logs_20220612-172102/train/plugins/profile/2022_06_12_17_21_29/14dbc38ab93a.memory_profile.json.gz
2022-06-12 17:21:30.067939: I tensorflow/core/profiler/rpc/client/capture_profile.cc:251] Creating directory: models/logs_20220612-172102/train/plugins/profile/2022_06_12_17_21_29
Dumped tool data for xplane.pb to models/logs_20220612-172102/train/plugins/profile/2022_06_12_17_21_29/14dbc38ab93a.xplane.pb
Dumped tool data for overview_page.pb to models/logs_20220612-172102/train/plugins/profile/2022_06_12_17_21_29/14dbc38ab93a.overview_page.pb
Dumped tool data for input_pipeline.pb to models/logs_20220612-172102/train/plugins/profile/2022_06_12_17_21_29/14dbc38ab93a.input_pipeline.pb
Dumped tool data for tensorflow_stats.pb to models/logs_20220612-172102/train/plugins/profile/2022_06_12_17_21_29/14dbc38ab93a.tensorflow_stats.pb
Dumped tool data for kernel_stats.pb to models/logs_20220612-172102/train/plugins/profile/2022_06_12_17_21_29/14dbc38ab93a.kernel_stats.pb

250/250 [==============================] - 391s 1s/step - loss: 0.0697 - iou_score: 0.8466 - f1-score: 0.9088 - val_loss: 0.0931 - val_iou_score: 0.8057 - val_f1-score: 0.8794
Epoch 2/10
250/250 [==============================] - 369s 1s/step - loss: 0.0750 - iou_score: 0.8393 - f1-score: 0.9044 - val_loss: 0.0869 - val_iou_score: 0.8110 - val_f1-score: 0.8829
Epoch 3/10
250/250 [==============================] - 367s 1s/step - loss: 0.0625 - iou_score: 0.8544 - f1-score: 0.9146 - val_loss: 0.0872 - val_iou_score: 0.8249 - val_f1-score: 0.8932
Epoch 4/10
250/250 [==============================] - 329s 1s/step - loss: 0.0605 - iou_score: 0.8556 - f1-score: 0.9155 - val_loss: 0.1005 - val_iou_score: 0.8225 - val_f1-score: 0.8913
Epoch 5/10
250/250 [==============================] - 368s 1s/step - loss: 0.0615 - iou_score: 0.8529 - f1-score: 0.9135 - val_loss: 0.0933 - val_iou_score: 0.8029 - val_f1-score: 0.8765
Epoch 6/10
250/250 [==============================] - 327s 1s/step - loss: 0.0570 - iou_score: 0.8580 - f1-score: 0.9167 - val_loss: 0.0933 - val_iou_score: 0.8096 - val_f1-score: 0.8815
Epoch 7/10
250/250 [==============================] - 368s 1s/step - loss: 0.0488 - iou_score: 0.8690 - f1-score: 0.9241 - val_loss: 0.1256 - val_iou_score: 0.7895 - val_f1-score: 0.8666
Epoch 8/10
250/250 [==============================] - 368s 1s/step - loss: 0.0471 - iou_score: 0.8678 - f1-score: 0.9227 - val_loss: 0.0908 - val_iou_score: 0.8294 - val_f1-score: 0.8962