<a href="https://colab.research.google.com/github/miam-bonbon/assignment-computer-vision/blob/main/assignment_computer_vision.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Multi-label Image Classification of Fruits on a Platter

My goal is to develop a deep learning model that can accurately identify and classify multiple fruits present in images of fruit platters.

The [dataset](https://universe.roboflow.com/suratthani-rajabhat-university/rcnn-oumhw) consists of images of fruit platters with various fruits arranged on them. Each image can contain one or more fruits from a set of 6 classes, including an "other" class for objects that are not among the defined fruit categories.

My approach applies theory I learned in the module "Computer Vision" by Susanne Suter and Marco Willi and implements best practices according to [Karpathy:19](http://karpathy.github.io/2019/04/25/recipe/).

Google Colab GPUs: We start by using a A100 GPU

## Dataset analysis

### Loading the dataset

In [1]:
# mount google drive
from google.colab import drive
drive.mount('/content/drive/')

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).


In [2]:
!pwd

/content


In [3]:
%cd drive/MyDrive/assignment-computer-vision
!ls -l

/content/drive/MyDrive/assignment-computer-vision
total 16
drwx------ 5 root root 4096 Nov 23 13:30 data
-rw------- 1 root root 7733 Nov 23 13:33 plot_functions.py
drwx------ 2 root root 4096 Nov 23 13:43 __pycache__


In [4]:
%%capture

# %load_ext autoreload
# %autoreload 2
import os
import glob
import random

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import tqdm.auto as tqdm

import tensorflow as tf

# from sklearn.model_selection import train_test_split

class print_style:
    BOLD = '\033[1m'
    UNDERLINE = '\033[4m'
    END = '\033[0m'

In [5]:
import plot_functions

In [6]:
!nvidia-smi
tf.config.list_physical_devices('GPU')

Sat Nov 23 20:34:30 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA A100-SXM4-40GB          Off | 00000000:00:04.0 Off |                    0 |
| N/A   29C    P0              42W / 400W |      2MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                    

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

In [7]:
# Dataset configuration
base_path = 'data'
num_classes = 6

In [8]:
# Prepare for further processing
# Train
train_image_dir = os.path.join(base_path, 'train')
train_annotation_file = os.path.join(base_path, 'train/_annotations.csv')

if not os.path.exists(train_annotation_file):
    print(f"Error: Annotation file not found at {train_annotation_file}")

train_annotations = pd.read_csv(train_annotation_file)
train_image_files = glob.glob(os.path.join(train_image_dir, '*.jpg'))

# TEst
test_image_dir = os.path.join(base_path, 'test')
test_annotation_file = os.path.join(base_path, 'test/_annotations.csv')

if not os.path.exists(test_annotation_file):
    print(f"Error: Annotation file not found at {test_annotation_file}")

test_annotations = pd.read_csv(test_annotation_file)
test_image_files = glob.glob(os.path.join(test_image_dir, '*.jpg'))

# Validation
validation_image_dir = os.path.join(base_path, 'valid')
validation_annotation_file = os.path.join(base_path, 'valid/_annotations.csv')

if not os.path.exists(validation_annotation_file):
    print(f"Error: Annotation file not found at {validation_annotation_file}")

validation_annotations = pd.read_csv(validation_annotation_file)
validation_image_files = glob.glob(os.path.join(validation_image_dir, '*.jpg'))

In [37]:
# prompt: crop train_annotations to 500 please

train_annotations = train_annotations.head(500)
validation_annotations = validation_annotations.head(500)

len(validation_annotations), len(train_annotations)

(500, 500)

### Analyze the Data Qualitatively

First we "dive" into the data to get an understanding of it. Let's get a sense of the quantities.

*TODO: label distribution, count

In [9]:
import matplotlib.patches as patches
import matplotlib.image as mpimg

In [10]:
# show images with bounding boxes
def show_images_with_bboxes(base_path, num_images=5, cols=2, shuffle=False, fontsize=37):
    """
    Iterates through images, displays them in a grid, and overlays bounding boxes.
    """
    if not train_image_files:
        print(f"Error: No images found in {train_image_dir}")
        return

    if (shuffle):
      random.shuffle(train_image_files) # Shuffle for random display
    num_images = min(num_images, len(train_image_files))
    rows = (num_images + cols -1) // cols

    image = mpimg.imread(train_image_files[0])
    max_width = image.shape[1]
    max_height = image.shape[0]

    # Dynamic divisor based on image dimensions
    divisor = max(max_width, max_height) / 10  # Adjust 10 as needed

    fig, axes = plt.subplots(rows, cols, figsize=((max_width * cols) / divisor, (max_height * rows) / divisor))

    # fig, axes = plt.subplots(rows, cols, figsize=(10, 5 * rows))

    if rows == 1 and cols == 1:
        axes = np.array([axes])

    for i in range(num_images):
      image_path = train_image_files[i]
      image = mpimg.imread(image_path)
      row = i // cols
      col = i % cols
      ax = axes[row, col] if rows > 1 else axes[col]
      ax.imshow(image)

      filename = os.path.basename(image_path)
      bboxes = train_annotations[train_annotations['filename'] == filename]


      # Add bounding boxes
      for index, row in bboxes.iterrows():
          xmin, ymin, xmax, ymax, label = row['xmin'], row['ymin'], row['xmax'], row['ymax'], row['class']
          rect = patches.Rectangle((xmin, ymin), xmax - xmin, ymax - ymin, linewidth=2, edgecolor='r', facecolor='none')
          ax.add_patch(rect)
          ax.text(xmin, ymin, label, color='r', fontsize=fontsize)  # Add label text

      ax.axis('off') # Hide axis

    # remove empty subplots
    for j in range(i+1, rows*cols):
      row = j // cols
      col = j % cols
      fig.delaxes(axes[row,col] if rows > 1 else axes[col])

    plt.tight_layout()
    plt.show()

In [11]:
# show_images_with_bboxes(base_path, num_images=20, cols=4, shuffle=True, fontsize=37)

In [12]:
def create_dataset(train_annotation_file, train_image_dir):
  """
  Creates training dataset from annotation file and image directory.

  Args:
      train_annotation_file: A pandas DataFrame containing annotation data.
      train_image_dir: The directory containing the images.

  Returns:
      dataset_images: A list of image arrays.
      dataset_targets: A list of bounding box coordinates.
      dataset_labels: A list of labels (1 for banana, 0 for others).
  """
  dataset_images = []
  dataset_targets = []
  dataset_labels = []

  for index, row in train_annotation_file.iterrows():
      (filename, width, height, class_name, xmin, ymin, xmax, ymax) = row

      dataset_image_fullpath = os.path.join(train_image_dir, filename)
      dataset_img = tf.keras.preprocessing.image.load_img(dataset_image_fullpath, target_size=(height, width))
      dataset_img_arr = tf.keras.preprocessing.image.img_to_array(dataset_img)

      xmin = round(xmin/ width, 2)
      ymin = round(ymin/ height, 2)
      xmax = round(xmax/ width, 2)
      ymax = round(ymax/ height, 2)

      dataset_images.append(dataset_img_arr)
      dataset_targets.append((xmin, ymin, xmax, ymax))
      dataset_labels.append(1 if class_name == 'banana' else 0)

  return dataset_images, dataset_targets, dataset_labels

In [38]:
# Get training sets
train_images, train_targets, train_labels = create_dataset(train_annotations, train_image_dir)

# # Get test sets
# test_images, test_targets, test_labels = create_dataset(test_annotations, test_image_dir)

# Get validation sets
validation_images, validation_targets, validation_labels = create_dataset(validation_annotations, validation_image_dir)

In [39]:
width = 640
height = 640
num_classes = 2
# classes = ["Circle", "No-Circle"]

In [40]:
#create the common input layer
input_shape = (height, width, 3)
input_layer = tf.keras.layers.Input(input_shape)

In [41]:
# Build the model
from tensorflow.keras import layers

base_layers = layers.Rescaling(1./255, name='bl_1')(input_layer)
base_layers = layers.Conv2D(16, 3, padding='same', activation='relu', name='bl_2')(base_layers)
base_layers = layers.MaxPooling2D(name='bl_3')(base_layers)
base_layers = layers.Conv2D(32, 3, padding='same', activation='relu', name='bl_4')(base_layers)
base_layers = layers.MaxPooling2D(name='bl_5')(base_layers)
base_layers = layers.Conv2D(64, 3, padding='same', activation='relu', name='bl_6')(base_layers)
base_layers = layers.MaxPooling2D(name='bl_7')(base_layers)
base_layers = layers.Flatten(name='bl_8')(base_layers)

classifier_branch = layers.Dense(128, activation='relu', name='cl_1')(base_layers)
classifier_branch = layers.Dense(num_classes, name='cl_head')(classifier_branch)

locator_branch = layers.Dense(128, activation='relu', name='bb_1')(base_layers)
locator_branch = layers.Dense(64, activation='relu', name='bb_2')(locator_branch)
locator_branch = layers.Dense(32, activation='relu', name='bb_3')(locator_branch)
locator_branch = layers.Dense(4, activation='sigmoid', name='bb_head')(locator_branch)

In [42]:
model = tf.keras.Model(input_layer, outputs=[classifier_branch, locator_branch])

In [43]:
model.summary()

In [44]:
losses = {"cl_head":tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), "bb_head":tf.keras.losses.MSE}


In [45]:
model.compile(loss=losses, optimizer='Adam', metrics=['accuracy'])

In [46]:
training_epochs = 20

print(train_images[0].shape)
print(train_targets[0])
print(train_labels[0])
train_images = np.array(train_images)
train_targets = np.array(train_targets)
train_labels = np.array(train_labels)
print(train_images[0].shape)
print(train_targets[0])
print(train_labels[0])

validation_images = np.array(validation_images)
validation_targets = np.array(validation_targets)
validation_labels = np.array(validation_labels)

(640, 640, 3)
(0.25, 0.18, 0.59, 0.47)
0
(640, 640, 3)
[0.25 0.18 0.59 0.47]
0


In [47]:
trainTargets = {
    "cl_head": train_labels,
    "bb_head": train_targets
}

validationTargets = {
    "cl_head": validation_labels,
    "bb_head": validation_targets
}

In [48]:
print(type(trainTargets))

<class 'dict'>


In [32]:
import tensorflow as tf
import numpy as np
import random

def data_generator(images, targets, labels, batch_size):
    """Generates batches of data for training.

    Args:
        images: List of image arrays.
        targets: List of bounding box coordinates.
        labels: List of labels.
        batch_size: The size of each batch.

    Yields:
        A tuple of (image_batch, target_batch) for each batch.
    """
    num_samples = len(images)
    indices = list(range(num_samples))
    while True:  # Loop indefinitely for multiple epochs
        random.shuffle(indices)  # Shuffle data for each epoch
        for i in range(0, num_samples, batch_size):
            batch_indices = indices[i:i + batch_size]
            image_batch = [images[j] for j in batch_indices]
            # Ensure target batch elements are at least rank 1 and have defined types
            # Adjust the shape of 'cl_head' to match the model's output shape (e.g., [?, 4])
            target_batch = {
                # Reshape labels to (None, 4) before stacking
                "cl_head": tf.stack([tf.cast(tf.reshape(labels[j], (1, 4)), dtype=tf.int32) for j in batch_indices]),
                "bb_head": tf.stack([tf.cast(targets[j], dtype=tf.float32) for j in batch_indices])
            }
            yield np.array(image_batch), target_batch

# ... (rest of your code)

In [33]:
import tensorflow as tf
import numpy as np
import random

def data_generator(images, targets, labels, batch_size):
  """Generates batches of data for training.

  Args:
    images: List of image arrays.
    targets: List of bounding box coordinates.
    labels: List of labels.
    batch_size: The size of each batch.

  Yields:
    A tuple of (image_batch, target_batch) for each batch.
  """
  num_samples = len(images)
  indices = list(range(num_samples))
  while True:  # Loop indefinitely for multiple epochs
      random.shuffle(indices)  # Shuffle data for each epoch
      for i in range(0, num_samples, batch_size):
          batch_indices = indices[i:i + batch_size]
          image_batch = [images[j] for j in batch_indices]
          # Ensure target batch elements are at least rank 1 and have defined types
          # Adjust the shape of 'cl_head' to match the model's output shape (e.g., [?, 4])
          target_batch = {
              # Reshape labels to (None, 4) before stacking
              "cl_head": tf.stack([tf.cast(tf.reshape(labels[j], (1, 4)), dtype=tf.int32) for j in batch_indices]),
              "bb_head": tf.stack([tf.cast(targets[j], dtype=tf.float32) for j in batch_indices])
          }
          yield np.array(image_batch), target_batch

# Specify output signature for the data generator
output_signature = (
    tf.TensorSpec(shape=(None, 640, 640, 3), dtype=tf.float32), # Assuming your images are 640x640x3 and float32
    {
        "cl_head": tf.TensorSpec(shape=(None, 4), dtype=tf.int32),  # Adjust shape if necessary
        "bb_head": tf.TensorSpec(shape=(None, 4), dtype=tf.float32) # Adjust shape if necessary (assuming 4 coordinates for bounding box)
    }
)

train_generator = tf.data.Dataset.from_generator(
    lambda: data_generator(train_images, train_targets, train_labels, batch_size=32),
    output_signature=output_signature
)

validation_generator = tf.data.Dataset.from_generator(
    lambda: data_generator(validation_images, validation_targets, validation_labels, batch_size=32),
    output_signature=output_signature
)

history = model.fit(
    train_generator,
    steps_per_epoch=len(train_images) // 32,  # Number of batches per epoch
    epochs=training_epochs,
    validation_data=validation_generator,
    validation_steps=len(validation_images) // 32
)


Epoch 1/20


UnknownError: Graph execution error:

Detected at node PyFunc defined at (most recent call last):
<stack traces unavailable>
InvalidArgumentError: {{function_node __wrapped__Reshape_device_/job:localhost/replica:0/task:0/device:GPU:0}} Input to reshape is a tensor with 1 values, but the requested shape has 4 [Op:Reshape]
Traceback (most recent call last):

  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
    ret = func(*args)

  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)

  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/data/ops/from_generator_op.py", line 198, in generator_py_func
    values = next(generator_state.get_iterator(iterator_id))

  File "<ipython-input-33-ae33c62b490d>", line 28, in data_generator
    "cl_head": tf.stack([tf.cast(tf.reshape(labels[j], (1, 4)), dtype=tf.int32) for j in batch_indices]),

  File "<ipython-input-33-ae33c62b490d>", line 28, in <listcomp>
    "cl_head": tf.stack([tf.cast(tf.reshape(labels[j], (1, 4)), dtype=tf.int32) for j in batch_indices]),

  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/ops/weak_tensor_ops.py", line 88, in wrapper
    return op(*args, **kwargs)

  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None

  File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/eager/execute.py", line 53, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,

tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __wrapped__Reshape_device_/job:localhost/replica:0/task:0/device:GPU:0}} Input to reshape is a tensor with 1 values, but the requested shape has 4 [Op:Reshape]


	 [[{{node PyFunc}}]]
	 [[IteratorGetNext]] [Op:__inference_one_step_on_iterator_503]

In [50]:
# prompt: please generate the correct model.fit for my data, thank you

history = model.fit(
    x=train_images,
    y=trainTargets,
    epochs=training_epochs,
    validation_data=(validation_images, validationTargets),
    batch_size=10
)

Epoch 1/20


ValueError: Dimensions must be equal, but are 4 and 2 for '{{node compile_loss/mean_squared_error/sub}} = Sub[T=DT_FLOAT](data_2, functional_1_1/cl_head_1/Add)' with input shapes: [10,4], [10,2].

In [24]:
def data_generator(images, targets, labels, batch_size):
  """Generates batches of data for training.

  Args:
    images: List of image arrays.
    targets: List of bounding box coordinates.
    labels: List of labels.
    batch_size: The size of each batch.

  Yields:
    A tuple of (image_batch, target_batch) for each batch.
  """
  num_samples = len(images)
  indices = list(range(num_samples))
  while True:  # Loop indefinitely for multiple epochs
    random.shuffle(indices)  # Shuffle data for each epoch
    for i in range(0, num_samples, batch_size):
      batch_indices = indices[i:i + batch_size]
      image_batch = [images[j] for j in batch_indices]
      target_batch = {
          "cl_head": [labels[j] for j in batch_indices],
          "bb_head": [targets[j] for j in batch_indices]
      }
      yield np.array(image_batch), target_batch

In [27]:
import numpy as np
import random

def data_generator(images, targets, labels, batch_size):
  """Generates batches of data for training.

  Args:
    images: List of image arrays.
    targets: List of bounding box coordinates.
    labels: List of labels.
    batch_size: The size of each batch.

  Yields:
    A tuple of (image_batch, target_batch) for each batch.
  """
  num_samples = len(images)
  indices = list(range(num_samples))
  while True:  # Loop indefinitely for multiple epochs
    random.shuffle(indices)  # Shuffle data for each epoch
    for i in range(0, num_samples, batch_size):
      batch_indices = indices[i:i + batch_size]
      image_batch = [images[j] for j in batch_indices]
      # Ensure target batch elements are at least rank 1
      target_batch = {
          "cl_head": [np.expand_dims(labels[j], axis=0) for j in batch_indices], #  Ensure 'labels' is at least rank 1
          "bb_head": [np.expand_dims(targets[j], axis=0) for j in batch_indices] #  Ensure 'targets' is at least rank 1
      }
      yield np.array(image_batch), target_batch

In [30]:
import tensorflow as tf
import numpy as np
import random

def data_generator(images, targets, labels, batch_size):
  """Generates batches of data for training.

  Args:
    images: List of image arrays.
    targets: List of bounding box coordinates.
    labels: List of labels.
    batch_size: The size of each batch.

  Yields:
    A tuple of (image_batch, target_batch) for each batch.
  """
  num_samples = len(images)
  indices = list(range(num_samples))
  while True:  # Loop indefinitely for multiple epochs
    random.shuffle(indices)  # Shuffle data for each epoch
    for i in range(0, num_samples, batch_size):
      batch_indices = indices[i:i + batch_size]
      image_batch = [images[j] for j in batch_indices]
      # Ensure target batch elements are at least rank 1 and have defined types
      target_batch = {
          "cl_head": tf.stack([tf.cast(np.expand_dims(labels[j], axis=0), dtype=tf.int32) for j in batch_indices]), # Stack and define type as tf.int32
          "bb_head": tf.stack([tf.cast(np.expand_dims(targets[j], axis=0), dtype=tf.float32) for j in batch_indices]) # Stack and define type as tf.float32
      }
      yield np.array(image_batch), target_batch

# Specify output signature for the data generator
output_signature = (
    tf.TensorSpec(shape=(None, 640, 640, 3), dtype=tf.float32), # Assuming your images are 640x640x3 and float32
    {
        "cl_head": tf.TensorSpec(shape=(None, 1), dtype=tf.int32),  # Adjust shape if necessary
        "bb_head": tf.TensorSpec(shape=(None, 4), dtype=tf.float32) # Adjust shape if necessary (assuming 4 coordinates for bounding box)
    }
)

train_generator = tf.data.Dataset.from_generator(
    lambda: data_generator(train_images, train_targets, train_labels, batch_size=32),
    output_signature=output_signature
)

validation_generator = tf.data.Dataset.from_generator(
    lambda: data_generator(validation_images, validation_targets, validation_labels, batch_size=32),
    output_signature=output_signature
)

history = model.fit(
    train_generator,
    steps_per_epoch=len(train_images) // 32,  # Number of batches per epoch
    epochs=training_epochs,
    validation_data=validation_generator,
    validation_steps=len(validation_images) // 32
)

Epoch 1/20


ValueError: Dimensions must be equal, but are 4 and 2 for '{{node compile_loss/mean_squared_error/sub}} = Sub[T=DT_FLOAT](data_2, functional_1/cl_head_1/Add)' with input shapes: [?,4], [?,2].

In [28]:
train_generator = data_generator(train_images, train_targets, train_labels, batch_size=32)  # Adjust batch_size as needed
validation_generator = data_generator(validation_images, validation_targets, validation_labels, batch_size=32)

In [29]:
history = model.fit(
    train_generator,
    steps_per_epoch=len(train_images) // 32,  # Number of batches per epoch
    epochs=training_epochs,
    validation_data=validation_generator,
    validation_steps=len(validation_images) // 32
)

TypeError: `output_signature` must contain objects that are subclass of `tf.TypeSpec` but found <class 'list'> which is not.

In [None]:
history = model.fit(train_images, trainTargets,
                    validation_data=(validation_images, validationTargets),
                    batch_size=4,
                    epochs=training_epochs,
                    shuffle=True,
                    verbose=1)