# Knowledge Distillation on Cifar10 Dataset Using VGG16 pretrained on image net as the Teacher

## Summary:

### Key Points:
In this notebook we implement some of the extensions that we discussed in one of our previous notebooks on knowledge distillation.  The first extension that we implement is updating the teacher network to being VGG19 trained on the imagenet dataset.  We first test on the cifar10 dataset to see if what the "teacher VGG19 model" has learned from the imagenet dataset is transferrable to classifying Cifar10 classes!  In order to make a comparison against our previous teacher (which was just an enlarged neural network) we also retrain this model to show the results.  

### Terminology that we use throughout this section:
The "Teacher" network is a large and complex neural network that takes a long time to train

The "Student" network is a small neural network that learns information from the output of the "Teacher" network.  This is the network we are attempting to optimize throughout the distillation process because it has far less parameters leading to a reasonable execution time for production products (such as real time object recognition camera systems)

The "Control" network is a small neural network that has the exact same architecture as the student network.  However we train this network independently so it isn't learning anything from the teacher model and we can use it as a baseline to see if this knowledge distillation process is really making a difference!

### The general algorithm that we follow for Knowledge Distillation:
1. Define the Teacher and Student Neural Network models (the teacher model should be a lot larger and have more parameters to train
2. Train the teacher model using ordinary methods (this will be time consuming most likely depending on how large the teacher model is
3. Train the student network model using the "blurred" probability classification outputs of the teacher model.  We achieve this blurring by dividing the logits by some temperature value.  Note that THE HIGHER THE TEMPERATURE VALUE THE SOFTER THE PROBABILITY DISTRIBUTION IS OVER THE CLASSES!
4. We train the student network using this high temperature value 
5. We use a temperature value of 1 again when evaluating the student model
6. We train a control network (same architecture as the student) using ordinary training techniques to get a comparison

### Why do we use Knowledge Distillation?:
-We use knowledge distillation in order to get better results using smaller networks that take far less time and computing power to run! These small and high performing networks are ideal for making marketable products that work well in real time!

In [2]:
#General Imports
import numpy as np

#All Imports for Tensorflow/Keras
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import matplotlib.pyplot as plt

In [3]:
# Importing the cifar10 dataset now
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()

# Normalize data
x_train = x_train.astype("float32") / 255.0
x_train = np.reshape(x_train, (-1, 32, 32, 3))

x_test = x_test.astype("float32") / 255.0
x_test = np.reshape(x_test, (-1, 32, 32, 3))

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz


In [4]:
# Create the teacher
teacher = keras.Sequential(
    [
        keras.Input(shape=(32, 32, 3)),
        layers.Conv2D(256, (3, 3), strides=(2, 2), padding="same"),
        layers.LeakyReLU(alpha=0.2),
        layers.MaxPooling2D(pool_size=(2, 2), strides=(1, 1), padding="same"),
        layers.Conv2D(512, (3, 3), strides=(2, 2), padding="same"),
        layers.Flatten(),
        layers.Dense(10),
    ],
    name="teacher",
)

# Create the student
student = keras.Sequential(
    [
        keras.Input(shape=(32, 32, 3)),
        layers.Conv2D(16, (3, 3), strides=(2, 2), padding="same"),
        layers.LeakyReLU(alpha=0.2),
        layers.MaxPooling2D(pool_size=(2, 2), strides=(1, 1), padding="same"),
        layers.Conv2D(32, (3, 3), strides=(2, 2), padding="same"),
        layers.Flatten(),
        layers.Dense(10),
    ],
    name="student",
)

# Clone the student for the control model
controlModel = keras.models.clone_model(student)

In [5]:
class distillationModel(keras.Model):
    def __init__(self, student, teacher):
        super(distillationModel, self).__init__()
        self.teacher = teacher
        self.student = student

    def compile(
        self,
        optimizer,
        metrics,
        student_loss_fn,
        distillation_loss_fn,
        alpha=0.5,
        temperature=12,
    ):
        super(distillationModel, self).compile(optimizer=optimizer, metrics=metrics)
        self.student_loss_fn = student_loss_fn
        self.distillation_loss_fn = distillation_loss_fn
        self.alpha = alpha
        self.temperature = temperature

    #This method does a forward pass of the "student" and "teacher".  Only student weights are updated though
    #thus we only calculate gradients for the "student"
    def train_step(self, data):
        x, y = data
        teacher_predictions = self.teacher(x, training=False)  #"teachers forward pass"

        with tf.GradientTape() as tape:
            student_predictions = self.student(x, training=True)  #"students forward pass"
            student_loss = self.student_loss_fn(y, student_predictions)  #student and distillation losses
            distillation_loss = self.distillation_loss_fn(
                tf.nn.softmax(teacher_predictions / self.temperature, axis=1),
                tf.nn.softmax(student_predictions / self.temperature, axis=1),
            )
            loss = self.alpha * student_loss + (1 - self.alpha) * distillation_loss
        trainable_vars = self.student.trainable_variables     #student gradient
        gradients = tape.gradient(loss, trainable_vars)

        self.optimizer.apply_gradients(zip(gradients, trainable_vars)) #updating the weights with the gradient applied
        
        self.compiled_metrics.update_state(y, student_predictions)  #updating the metrics

        results = {m.name: m.result() for m in self.metrics}       #returning the performance currently in a dictionary
        results.update(
            {"student_loss": student_loss, "distillation_loss": distillation_loss}
        )
        return results

    #In this function the student model is evaluated on the current dataset
    def test_step(self, data):
        x, y = data
        y_prediction = self.student(x, training=False) 
        student_loss = self.student_loss_fn(y, y_prediction)   
        self.compiled_metrics.update_state(y, y_prediction)
        results = {m.name: m.result() for m in self.metrics}
        results.update({"student_loss": student_loss})
        return results

In [6]:
# Train teacher as usual
teacher.compile(
    optimizer=keras.optimizers.Adam(),
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=[keras.metrics.SparseCategoricalAccuracy()],
)

# Train and evaluate teacher on data.
teacher.fit(x_train, y_train, epochs=5)
teacher.evaluate(x_test, y_test)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


[1.3401508331298828, 0.5630000233650208]

In [8]:
# Initialize and compile distiller
distiller = distillationModel(student=student, teacher=teacher)
distiller.compile(
    optimizer=keras.optimizers.Adam(),
    metrics=[keras.metrics.SparseCategoricalAccuracy()],
    student_loss_fn=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    distillation_loss_fn=keras.losses.KLDivergence(),
    alpha=0.5,
    temperature=12,
)

# Distill teacher to student
distiller.fit(x_train, y_train, epochs=5)

# Evaluate student on test dataset
distiller.evaluate(x_test, y_test)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


[0.5938000082969666, 1.2796223163604736]

In [9]:
# Train student as done usually
controlModel.compile(
    optimizer=keras.optimizers.Adam(),
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=[keras.metrics.SparseCategoricalAccuracy()],
)

# Train and evaluate student trained from scratch.
controlModel.fit(x_train, y_train, epochs=5)
#controlModel.evaluate(x_test, y_test)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7fe83c7ba810>

## Part 2 Updating the teacher to be VGG16 pretrained on imagenet!

In [12]:
vgg16 = keras.applications.vgg16
new_Teacher_Model = vgg16.VGG16(weights='imagenet', include_top=False, input_shape=(32,32,3)) 
                                                        #It is important to use the include_top=False
                                                         #because imagenet has 1000 categories but now we only
                                                         #have 10 categories!
x = keras.layers.Flatten()(new_Teacher_Model.output)
x = keras.layers.Dense(20, activation='relu')(x)
predictions = keras.layers.Dense(10, activation='softmax')(x)

full_Teacher_Model = keras.models.Model(inputs=new_Teacher_Model.input, outputs=predictions)
full_Teacher_Model.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_4 (InputLayer)         [(None, 32, 32, 3)]       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 32, 32, 64)        1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 32, 32, 64)        36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 16, 16, 64)        0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 16, 16, 128)       73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 16, 16, 128)       147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 8, 8, 128)         0     

In [13]:
for layer in new_Teacher_Model.layers:
  layer.trainable = False

In [14]:
full_Teacher_Model.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_4 (InputLayer)         [(None, 32, 32, 3)]       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 32, 32, 64)        1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 32, 32, 64)        36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 16, 16, 64)        0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 16, 16, 128)       73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 16, 16, 128)       147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 8, 8, 128)         0     

In [15]:
# Train teacher as usual
full_Teacher_Model.compile(
    optimizer=keras.optimizers.Adam(),
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=[keras.metrics.SparseCategoricalAccuracy()],
)

# Train and evaluate teacher on data.
full_Teacher_Model.fit(x_train, y_train, epochs=5)
full_Teacher_Model.evaluate(x_test, y_test)

Epoch 1/5


  '"`sparse_categorical_crossentropy` received `from_logits=True`, but '


Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


[1.1950926780700684, 0.5824999809265137]

## Investigating Xception Since VGG16 pretrained gave similar accuracy to the existing teacher

In [22]:
Xception_teacher = tf.keras.applications.Xception(weights='imagenet', include_top=False, input_shape=(32,32,3))
x = keras.layers.Flatten()(Xception_teacher.output)
x = keras.layers.Dense(20, activation='relu')(x)
predictions = keras.layers.Dense(10, activation='softmax')(x)

full_Xception_teacher = keras.models.Model(inputs=Xception_teacher.input, outputs=predictions)
full_Xception_teacher.summary()


ValueError: ignored

In [None]:
tf.keras.applications.Xception(
    include_top=True,
    weights="imagenet",
    input_tensor=None,
    input_shape=None,
    pooling=None,
    classes=1000,
    classifier_activation="softmax",
)

## Perhaps the VGG16 model knows numbers better and may outperform the original teacher model on MNIST dataset

In [18]:
#Below We load in the mnist dataset that we have used in previous weeks so we don't 
#do much preprocessing or EDA this week!
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Normalizing our data
x_train = x_train.astype("float32") / 255.0
x_train = np.reshape(x_train, (-1, 28, 28, 1))

x_test = x_test.astype("float32") / 255.0
x_test = np.reshape(x_test, (-1, 28, 28, 1))

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [19]:
# Create the teacher network model
teacher = keras.Sequential(
    [
        keras.Input(shape=(28, 28, 1)),
        layers.Conv2D(256, (3, 3), strides=(2, 2), padding="same"),            #Note the large size of this layer
        layers.LeakyReLU(alpha=0.2),
        layers.MaxPooling2D(pool_size=(2, 2), strides=(1, 1), padding="same"),    #And of this layer!
        layers.Conv2D(512, (3, 3), strides=(2, 2), padding="same"),
        layers.Flatten(),
        layers.Dense(10),
    ],
    name="teacher",
)

# Create the student
student = keras.Sequential(
    [
        layers.Input(shape=(28, 28, 1)),
        layers.Conv2D(16, (3, 3), strides=(2, 2), padding="same"),      #Note that this student network has smaller  
        layers.LeakyReLU(alpha=0.2),
        layers.MaxPooling2D(pool_size=(2, 2), strides=(1, 1), padding="same"),   
        layers.Conv2D(32, (3, 3), strides=(2, 2), padding="same"),      # layers in these parts!!
        layers.Flatten(),
        layers.Dense(10),
    ],
    name="student",
)

#Creating a copy of the student model to be the control model!
controlModel = keras.models.clone_model(student)

#Here we are creating copies of the student model for later when we are optimizing temperature and and alpha
studentDict = {}
numStudents = 10
for i in range(numStudents):
    currString = 'student'+str(i)
    studentDict[currString] = keras.models.clone_model(student)

print(studentDict)

{'student0': <keras.engine.sequential.Sequential object at 0x7fe7c4e81dd0>, 'student1': <keras.engine.sequential.Sequential object at 0x7fe7c4e86590>, 'student2': <keras.engine.sequential.Sequential object at 0x7fe7c4e4c790>, 'student3': <keras.engine.sequential.Sequential object at 0x7fe7c4e65710>, 'student4': <keras.engine.sequential.Sequential object at 0x7fe7c4e61750>, 'student5': <keras.engine.sequential.Sequential object at 0x7fe7c4f10610>, 'student6': <keras.engine.sequential.Sequential object at 0x7fe7c4ec9250>, 'student7': <keras.engine.sequential.Sequential object at 0x7fe7c4e21ed0>, 'student8': <keras.engine.sequential.Sequential object at 0x7fe7c4dbffd0>, 'student9': <keras.engine.sequential.Sequential object at 0x7fe7c4dc9e90>}


In [20]:
# Train teacher model in the basic way 
teacher.compile(
    optimizer=keras.optimizers.Adam(),
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=[keras.metrics.SparseCategoricalAccuracy()],
)

# Train and evaluate teacher on data.
teacher.fit(x_train, y_train, epochs=5)
teacher.evaluate(x_test, y_test)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


[0.10577042400836945, 0.972100019454956]

In [21]:
vgg16 = keras.applications.vgg16
mnist_VGG_teacher = vgg16.VGG16(weights='imagenet', include_top=False, input_shape=(28,28,1)) 
                                                        #It is important to use the include_top=False
                                                         #because imagenet has 1000 categories but now we only
                                                         #have 10 categories!
x = keras.layers.Flatten()(mnist_VGG_teacher.output)
x = keras.layers.Dense(20, activation='relu')(x)
predictions = keras.layers.Dense(10, activation='softmax')(x)

full_Teacher_Model = keras.models.Model(inputs=mnist_VGG_teacher.input, outputs=predictions)
full_Teacher_Model.summary()

ValueError: ignored

In [None]:
vgg16 = keras.applications.vgg16
new_Teacher_Model = vgg16.VGG16(weights='imagenet', include_top=False, input_shape=(32,32,3)) 
                                                        #It is important to use the include_top=False
                                                         #because imagenet has 1000 categories but now we only
                                                         #have 10 categories!
x = keras.layers.Flatten()(new_Teacher_Model.output)
x = keras.layers.Dense(20, activation='relu')(x)
predictions = keras.layers.Dense(10, activation='softmax')(x)

full_Teacher_Model = keras.models.Model(inputs=new_Teacher_Model.input, outputs=predictions)
full_Teacher_Model.summary()

In [17]:
# Initialize and compile distiller
distiller = distillationModel(student=student, teacher=new_Teacher_Model)
distiller.compile(
    optimizer=keras.optimizers.Adam(),
    metrics=[keras.metrics.SparseCategoricalAccuracy()],
    student_loss_fn=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    distillation_loss_fn=keras.losses.KLDivergence(),
    alpha=0.5,
    temperature=12,
)

# Distill teacher to student
distiller.fit(x_train, y_train, epochs=5)

# Evaluate student on test dataset
distiller.evaluate(x_test, y_test)

Epoch 1/5


ValueError: ignored

In [None]:

#Below We load in the mnist dataset that we have used in previous weeks so we don't 
#do much preprocessing or EDA this week!
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Normalizing our data
x_train = x_train.astype("float32") / 255.0
x_train = np.reshape(x_train, (-1, 28, 28, 1))

x_test = x_test.astype("float32") / 255.0
x_test = np.reshape(x_test, (-1, 28, 28, 1))

In [None]:
# Create the teacher network model
teacher = keras.Sequential(
    [
        keras.Input(shape=(28, 28, 1)),
        layers.Conv2D(256, (3, 3), strides=(2, 2), padding="same"),            #Note the large size of this layer
        layers.LeakyReLU(alpha=0.2),
        layers.MaxPooling2D(pool_size=(2, 2), strides=(1, 1), padding="same"),    #And of this layer!
        layers.Conv2D(512, (3, 3), strides=(2, 2), padding="same"),
        layers.Flatten(),
        layers.Dense(10),
    ],
    name="teacher",
)

# Create the student
student = keras.Sequential(
    [
        layers.Input(shape=(28, 28, 1)),
        layers.Conv2D(16, (3, 3), strides=(2, 2), padding="same"),      #Note that this student network has smaller  
        layers.LeakyReLU(alpha=0.2),
        layers.MaxPooling2D(pool_size=(2, 2), strides=(1, 1), padding="same"),   
        layers.Conv2D(32, (3, 3), strides=(2, 2), padding="same"),      # layers in these parts!!
        layers.Flatten(),
        layers.Dense(10),
    ],
    name="student",
)

#Creating a copy of the student model to be the control model!
controlModel = keras.models.clone_model(student)

#Here we are creating copies of the student model for later when we are optimizing temperature and and alpha
studentDict = {}
numStudents = 10
for i in range(numStudents):
    currString = 'student'+str(i)
    studentDict[currString] = keras.models.clone_model(student)

print(studentDict)

In [None]:
# Train teacher model in the basic way 
teacher.compile(
    optimizer=keras.optimizers.Adam(),
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=[keras.metrics.SparseCategoricalAccuracy()],
)

# Train and evaluate teacher on data.
teacher.fit(x_train, y_train, epochs=5)
teacher.evaluate(x_test, y_test)

In [None]:
class distillationModel(keras.Model):
    def __init__(self, student, teacher):
        super(distillationModel, self).__init__()
        self.teacher = teacher
        self.student = student

    def compile(
        self,
        optimizer,
        metrics,
        student_loss_fn,
        distillation_loss_fn,
        alpha=0.1,
        temperature=10,
    ):
        super(distillationModel, self).compile(optimizer=optimizer, metrics=metrics)
        self.student_loss_fn = student_loss_fn
        self.distillation_loss_fn = distillation_loss_fn
        self.alpha = alpha
        self.temperature = temperature

    #This method does a forward pass of the "student" and "teacher".  Only student weights are updated though
    #thus we only calculate gradients for the "student"
    def train_step(self, data):
        x, y = data
        teacher_predictions = self.teacher(x, training=False)  #"teachers forward pass"

        with tf.GradientTape() as tape:
            student_predictions = self.student(x, training=True)  #"students forward pass"
            student_loss = self.student_loss_fn(y, student_predictions)  #student and distillation losses
            distillation_loss = self.distillation_loss_fn(
                tf.nn.softmax(teacher_predictions / self.temperature, axis=1),
                tf.nn.softmax(student_predictions / self.temperature, axis=1),
            )
            loss = self.alpha * student_loss + (1 - self.alpha) * distillation_loss
        trainable_vars = self.student.trainable_variables     #student gradient
        gradients = tape.gradient(loss, trainable_vars)

        self.optimizer.apply_gradients(zip(gradients, trainable_vars)) #updating the weights with the gradient applied
        
        self.compiled_metrics.update_state(y, student_predictions)  #updating the metrics
        
        results = {m.name: m.result() for m in self.metrics}       #returning the performance currently in a dictionary
        results.update(
            {"student_loss": student_loss, "distillation_loss": distillation_loss}
        )
        return results

    #In this function the student model is evaluated on the current dataset
    def test_step(self, data):
        x, y = data
        y_prediction = self.student(x, training=False) 
        student_loss = self.student_loss_fn(y, y_prediction)   
        self.compiled_metrics.update_state(y, y_prediction)
        results = {m.name: m.result() for m in self.metrics}
        results.update({"student_loss": student_loss})
        return results

In [None]:
# Initialize and compile distiller
distiller = distillationModel(student=student, teacher=teacher)
distiller.compile(
    optimizer=keras.optimizers.Adam(),
    metrics=[keras.metrics.SparseCategoricalAccuracy()],
    student_loss_fn=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    distillation_loss_fn=keras.losses.KLDivergence(),
    alpha=0.1,
    temperature=10,
)

# Distill teacher to student
distiller.fit(x_train, y_train, epochs=3)

# Evaluate student on test dataset
distiller.evaluate(x_test, y_test)

In [None]:

# Train Control Model Using the Student Clone from Earlier
controlModel.compile(
    optimizer=keras.optimizers.Adam(),
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=[keras.metrics.SparseCategoricalAccuracy()],
)

# Train and evaluate student trained from scratch.
controlModel.fit(x_train, y_train, epochs=3)
controlModel.evaluate(x_test, y_test)

In [None]:
alphaArr = [0.05,0.5,0.95]
numAlphas = len(alphaArr)
alphaTempArr = np.zeros((numAlphas,numStudents))
tempArr = []

accuracyArr = []
for i in range(numAlphas):
    for j in range(numStudents):
        currString = 'student'+str(i)
        currDistiller = distillationModel(student=studentDict[currString], teacher=teacher)
        currDistiller.compile(
            optimizer=keras.optimizers.Adam(),
            metrics=[keras.metrics.SparseCategoricalAccuracy()],
            student_loss_fn=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
            distillation_loss_fn=keras.losses.KLDivergence(),
            alpha=0.1,
            temperature=((j+1)*2),
        )
            # Distill teacher to student
        currDistiller.fit(x_train, y_train, epochs=3)

        # Evaluate student on test dataset
        currEval = currDistiller.evaluate(x_test, y_test)
        print(f"Current Distillation evaluation is: {currEval}")

        #Appending current temperature to temperature array
        tempArr.append((j+1)*2)
        
        alphaTempArr[i,j] = currEval[0]

In [None]:
print("Below is the results of each of the distillation 'students' after 3 epochs of training in an array")
print(alphaTempArr)

In [None]:
import seaborn as sns

ax = sns.heatmap(alphaTempArr, linewidth=0.5)
ax.set_title('Final Accuracy of Distillation "Student Models" Varying Temperature and Alpha Parameters')
ax.set(xlabel='Temperature', ylabel='Alpha')
ax.set_xticks(range(10))
#ax.set_xticklabels('2','4','6','8','10','12','14','16','18','20')
temps = ['2','4','6','8','10','12','14','16','18','20']
ax.set_xticks(np.arange(len(temps)))
ax.set_xticklabels(temps)
alphas = ['0.05','0.5','0.95']
ax.set_yticks(np.arange(len(alphas)))
ax.set_yticklabels(alphas)
plt.show()

#Image Segmentation Using Detectron2 and the ADE20K dataset

Semantic vs. Instance Segmentation
Image segmentation can be formulated as a classification problem of pixels with semantic labels (semantic segmentation) or partitioning of individual objects (instance segmentation). Semantic segmentation performs pixel-level labeling with a set of object categories (for example, people, trees, sky, cars) for all image pixels.


It is generally a more difficult undertaking than image classification, which predicts a single label for the entire image or frame. Instance segmentation extends the scope of semantic segmentation further by detecting and delineating all the objects of interest in an image

detectron 2:
https://youtu.be/9a_Z14M-msc


https://yann-leguilly.gitlab.io/post/2019-12-14-tensorflow-tfdata-segmentation/



Here is a tutorial on the process of creating these image segmentation labels for help with creating a dataset:
https://scikit-image.org/docs/dev/user_guide/tutorial_segmentation.html