# Trabajo Práctico Visión por computadora

### Dataset description
In this dataset, we are provided a large number of images and corresponding fashion/apparel segmentations. Images are named with a unique ImageId. You must segment and classify the images in the test set. This dataset contains images of people wearing a variety of clothing types in a variety of poses. 

#### Files
* train/ - The training images
* test/ - The test images (you are segmenting and classifying these images)
* train.csv - Training annotations, contains images with both segmented apparel categories and fine-grained attributes; and images with segmented apparel categories only.
* label_descriptions.json - A file giving the apparel categories and fine-grained attributes descriptions.
* sample_submission.csv - A sample submission file in the correct format.

#### Columns
* ImageId - the unique Id of an image
* EncodedPixels - masks in run-length encoded format (please refer to evaluation page for details).
* ClassId - the class id for this mask. It represents the apparel category.
* AttributesIds - the attributes ids for this mask. We concatenate all the attributes (if any) together.

#### Import libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import json
import os
import random
import cv2

print(f"pandas version: {pd.__version__}")
print(f"numpy version: {np.__version__}")

## Explorary data analysis

### Upload datasets

In [None]:
# Training dataset
train_df = pd.read_csv('/kaggle/input/imaterialist-fashion-2020-fgvc7/train.csv')
train_df.head()

### Dataset details

#### Datasets size

In [None]:
# Get datasets shapes
print(f'Training dataset shape: {train_df.shape}')
print(f'Unique images training: {train_df["ImageId"].nunique()}')

#### Size distribution

In [None]:
# Get image size distribution
shape_df = train_df.groupby("ImageId")[["Height", "Width"]].first()
for dim in ["Height", "Width"]:
    plt.figure()
    plt.hist(shape_df[dim], bins=50)
    plt.grid()
    plt.title(f"{dim} distribution")

#### Classes per image distribution

In [None]:
# Get classes per image distribution
plt.hist(train_df.groupby(['ImageId'], as_index=False).count()[['ClassId']], bins=30)
plt.grid()
plt.title("Clases per image distribution distribution")

#### Plotting random images

In [None]:
# Plot randomly selected image
plt.figure(figsize=(70,7))
random_image = train_df.sample()["ImageId"].item()
plt.imshow(mpimg.imread(f'/kaggle/input/imaterialist-fashion-2020-fgvc7/train/{random_image}.jpg'))
plt.grid(False)
plt.show()

#### Label description analysis

In [None]:
# Get label file
with open('/kaggle/input/imaterialist-fashion-2020-fgvc7/label_descriptions.json', 'r') as file:
    label_d = json.load(file)

print("Label description columns {}".format(list(label_d.keys())))

In [None]:
# Separate label description into categories and attributes
categories_df = pd.DataFrame(label_d['categories'])
attributes_df = pd.DataFrame(label_d['attributes'])

In [None]:
# Categories
categories_df

In [None]:
categ_names = categories_df["name"].unique()
print(categ_names)
print(f"Number of attributes {len(categ_names)}")

In [None]:
# Attributes
pd.set_option('display.max_rows', 500)
attributes_df

In [None]:
attr_names = attributes_df["name"].unique()
print(attr_names)
print(f"Number of attributes {len(attr_names)}")

In [None]:
# Create dictionaries to map the IDs with the category and attributes strings
cat_map = {category["id"]: category["name"] for category in label_d['categories']}
cat_map_inv = {category["name"]: category["id"] for category in label_d['categories']}

attr_map = {category["id"]: category["name"] for category in label_d['attributes']}
attr_map_inv = {category["name"]: category["id"] for category in label_d['attributes']}

#### Plot segmented images

In [None]:
def plot_raw_segmented_image(df, figsize=(15,15)):
    # Read random image
    random_id = df.sample()["ImageId"].item()
    image = mpimg.imread(f'/kaggle/input/imaterialist-fashion-2020-fgvc7/train/{random_id}.jpg')
    shape = image.shape,
    encoded_pixels = df[train_df['ImageId'] == random_id]['EncodedPixels']
    class_ids = df[train_df['ImageId'] == random_id]['ClassId']
    
    # Create mask
    height, width = shape[0][:2]
    mask = np.zeros((height, width)).reshape(-1)
    for pixels, class_id in zip(encoded_pixels, class_ids):
        pixels_split = list(map(int, pixels.split()))
        pixel_starts = pixels_split[::2]
        run_lengths = pixels_split[1::2]
        for pixel_start, run_length in zip(pixel_starts, run_lengths):
            mask[pixel_start:pixel_start + run_length] = 255 - class_id * 4
    mask = mask.reshape(height, width, order='F')    
    
    # Plot images
    fig, axs = plt.subplots(1, 2,figsize=(15,15))
    axs[0].imshow(image)    
    axs[1].imshow(image)    
    axs[1].imshow(mask, alpha=0.8)
    plt.show()

In [None]:
# Plot raw and segmented images
size = 3
for _ in range(size):
    plot_raw_segmented_image(train_df, "ImageId")

In [None]:
def plot_classes_image(df, figsize=(15,15)):
    # Select random image
    random_id = df.sample()["ImageId"].item()
    image = mpimg.imread(f'/kaggle/input/imaterialist-fashion-2020-fgvc7/train/{random_id}.jpg')
    shape = image.shape,
    encoded_pixels = df[train_df['ImageId'] == random_id]['EncodedPixels']
    class_ids = df[train_df['ImageId'] == random_id]['ClassId']
    
    # Create mask and plot every specific class in the image
    height, width = shape[0][:2]
    for pixels, class_id in zip(encoded_pixels, class_ids):
        mask = np.zeros((height, width)).reshape(-1)
        pixels_split = list(map(int, pixels.split()))
        pixel_starts = pixels_split[::2]
        run_lengths = pixels_split[1::2]
        for pixel_start, run_length in zip(pixel_starts, run_lengths):
            mask[pixel_start:pixel_start + run_length] = 255 - class_id * 4
        mask = mask.reshape(height, width, order='F')
        
        # Plot masked image
        plt.figure(figsize=(15, 15))
        plt.title(cat_map[class_id])
        plt.imshow(image)    
        plt.imshow(mask, alpha=0.8)
        plt.show()

In [None]:
plot_classes_image(train_df)

#### Replace ClassId value

In [None]:
# Replace ClassId for class string 
train_df['ClassId'] = train_df['ClassId'].map(cat_map)
train_df.head()

In [None]:
# Plot class value count
train_df['ClassId'].value_counts()[:20].plot(kind='barh')
plt.grid()

In [None]:
# Transform ClassId back to int to perform the training
train_df['ClassId'] = train_df['ClassId'].map(cat_map_inv)
train_df.head()

## Detectron

#### Install dependencies

In [None]:
!pip install -q cython pyyaml

In [None]:
!pip install pycocotools==2.0.2

In [None]:
pip install 'git+https://github.com/facebookresearch/detectron2.git'

#### Import libraries

In [None]:
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog, DatasetCatalog
from detectron2.structures import BoxMode

#### Use detectron in current dataset

In [None]:
# Use detectron to predict images in the current dataset,
# use the default weights and labels from the network
config_file = "COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file(config_file))
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url(config_file)

predictor = DefaultPredictor(cfg)

In [None]:
# Plot images classified by the pretrained detectron
rows, cols = 2, 2
plt.figure(figsize=(20, 20))

for i in range(int(rows * cols)):
    plt.subplot(rows, cols, i + 1)
    
    # Get random image
    random_id = train_df.sample()["ImageId"].item()
    im = mpimg.imread(f'/kaggle/input/imaterialist-fashion-2020-fgvc7/train/{random_id}.jpg')
    height, width = im.shape[:2]
    
    # Get detectron prediction from the selected image
    outputs = predictor(im)
    
    # Create visualizer
    visualizer = Visualizer(im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=0.4)
    
    # Change font size for better reading
    visualizer._default_font_size = np.sqrt(height * width) // 20
    visualizer = visualizer.draw_instance_predictions(outputs["instances"].to("cpu"))
    
    # Plot images
    plt.axis('off')
    plt.imshow(visualizer.get_image()[:, :, ::-1])

plt.show()

#### Modify dataset to use detectron

In [None]:

def rle_decode_string(string, h, w):
    """
    Transforms rle string into a pixel mask
    
    :param string: rle string to transform into mask
    :type string: str
    :param string: image height
    :type string: int
    :param string: image width
    :type string: int
    :return: image mask
    :rtype: numpy array

    """
    mask = np.full(h * w, 0, dtype=np.uint8)
    annotation = [int(x) for x in string.split(' ')]
    for i, start_pixel in enumerate(annotation[::2]):
        mask[start_pixel: start_pixel + annotation[2 * i + 1]] = 1
    mask = mask.reshape((h, w), order='F')
    return mask

def rle2bbox(rle, shape):
    '''
    Get a bbox from a mask which is required for Detectron 2 dataset
    :param rle: run-length encoded image mask, as string
    :type rle: str
    :param shape: (height, width) of image on which RLE was produced
    :type rle: tuple
    :return: (x0, y0, x1, y1) tuple describing the bounding box of the rle mask
    :rtype: tuple
    '''
    
    a = np.fromiter(rle.split(), dtype=np.uint)
    a = a.reshape((-1, 2))  # an array of (start, length) pairs
    a[:,0] -= 1  # `start` is 1-indexed
    
    y0 = a[:,0] % shape[0]
    y1 = y0 + a[:,1]
    if np.any(y1 > shape[0]):
        # got `y` overrun, meaning that there are a pixels in mask on 0 and shape[0] position
        y0 = 0
        y1 = shape[0]
    else:
        y0 = np.min(y0)
        y1 = np.max(y1)
    
    x0 = a[:,0] // shape[0]
    x1 = (a[:,0] + a[:,1]) // shape[0]
    x0 = np.min(x0)
    x1 = np.max(x1)
    
    if x1 > shape[1]:
        # just went out of the image dimensions
        raise ValueError("invalid RLE or image dimensions: x1=%d > shape[1]=%d" % (
            x1, shape[1]
        ))

    return x0, y0, x1, y1

In [None]:
# Transform ImageId into image path
image_dir = '/kaggle/input/imaterialist-fashion-2020-fgvc7/train/'
train_df['ImageId'] = image_dir + train_df['ImageId'] + '.jpg'
train_df.head()

In [None]:
# Create boxes list
bboxes = [rle2bbox(c.EncodedPixels, (c.Height, c.Width)) for n, c in train_df.iterrows()]
bboxes_array = np.array(bboxes)

In [None]:
# Fill NaNs
train_df = train_df.fillna(999)

In [None]:
# Add bounding boxes coordinates to train using detectron
train_df['x0'], train_df['y0'], train_df['x1'], train_df['y1'] = bboxes_array[:,0], bboxes_array[:,1], bboxes_array[:,2], bboxes_array[:,3]
train_df.head()

In [None]:
def transform_to_array(value):
    if isinstance(value, (np.ndarray, np.generic)):
        return value
    elif isinstance(value, str):
        array = [int(val) for val in value.split(",")]
    elif isinstance(value, int):
        array = [999] 
    array = np.array(array)
    return np.pad(array, (0, 14 - len(array)))

In [None]:
# Transform attribute string into tensor
train_df["AttributesIds"] = train_df["AttributesIds"].map(transform_to_array)
train_df.head()

In [None]:
# Store modified train_df
train_df.to_pickle("train_df.pickle")

In [None]:
print(len(train_df))

In [None]:
import pycocotools
def get_materialist_dicts(df):
    """
    Transforms dataframe into dictionary used to train using detectron
    """
    dataset_dicts = []
    for idx, filename in enumerate(df["ImageId"].unique()):
        record = {}
        # Get useful image information
        height, width = df[df["ImageId"] == filename][["Height", "Width"]].values[0]
        record["file_name"] = filename
        record["image_id"] = idx
        record["height"] = int(height)
        record["width"] = int(width)
        
        if idx % 1000 == 0:
            print(idx)
        
        objs = []
        for i, row in df[(df['ImageId'] == filename)].iterrows():
            
            # Get segmentation polygons
            mask = rle_decode_string(row['EncodedPixels'], row['Height'], row['Width'])
            # segmentation = pycocotools.mask.encode(np.asarray(mask, order="F"))
            contours, hierarchy = cv2.findContours((mask).astype(np.uint8), cv2.RETR_TREE,
                                                    cv2.CHAIN_APPROX_SIMPLE)
            segmentation = []

            for contour in contours:
                contour = contour.flatten().tolist()
                if len(contour) > 4:
                    segmentation.append(contour)

            obj = {
                "bbox": [row['x0'], row['y0'], row['x1'], row['y1']],
                "bbox_mode": BoxMode.XYXY_ABS,
                "segmentation": segmentation,
                "category_id": row['ClassId'],
                "attributes": row['AttributesIds'],
                "iscrowd": 0,
            }
            objs.append(obj)
        
        record['annotations'] = objs
        dataset_dicts.append(record)
    return dataset_dicts

# Use reduced dictionary to reduce the time to transform into dictionaries
#df_copy = train_df[:8000].copy()

df_copy = train_df.copy()

df_copy = train_df[:100000].copy()
df_copy_val = train_df[100000:110000].copy()

# Full dictionary
# df_copy = train_df.copy()

materialist_dict = get_materialist_dicts(df_copy)

In [None]:
materialist_dict[0].keys()

In [None]:
print(len(df_copy))
print(len(materialist_dict))
print(len(train_df))
print(len(train_df["ImageId"].unique()))
print(len(df_copy["ImageId"].unique()))

In [None]:
# Register the custom dataset to detectron2,
for d in ["train", "val"]:
    if d == "train":
        used_df = df_copy
    else:
        used_df = df_copy_val
    DatasetCatalog.register("mat_" + d, lambda df=used_df: get_materialist_dicts(df))
    # DatasetCatalog.register("mat_" + d, lambda df=df_copy: get_materialist_dicts(df))
    MetadataCatalog.get("mat_" + d).set(thing_classes=list(categories_df.name))
materialist_metadata = MetadataCatalog.get("mat_train")

In [None]:
# To verify the data loading is correct we visualize the annotations of randomly selected samples in the training set
for d in random.sample(materialist_dict, 5):
    img = cv2.imread(d["file_name"])
    img = mpimg.imread(d["file_name"])
    height, width = img.shape[:2]
    plt.figure(figsize=(20, 20))
    visualizer = Visualizer(img[:, :, ::-1], metadata=materialist_metadata, scale=0.5)
    visualizer._default_font_size = np.sqrt(height * width) // 20
    out = visualizer.draw_dataset_dict(d)
    plt.imshow(out.get_image()[:, :, ::-1])
    plt.axis('off')
    plt.show()

## FPN

In [None]:
from detectron2.engine import DefaultTrainer
os.environ["CUDA_LAUNCH_BLOCKING"] = "1"

# Fine-tune a COCO-pretrained R50-FPN Mask R-CNN model on the dataset
cfg_FPN = get_cfg()
cfg_FPN.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
cfg_FPN.DATASETS.TRAIN = ("mat_train",)
cfg_FPN.DATASETS.TEST = ("mat_val",)
cfg_FPN.DATALOADER.NUM_WORKERS = 1
cfg_FPN.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")
cfg_FPN.SOLVER.IMS_PER_BATCH = 2
cfg_FPN.SOLVER.BASE_LR = 0.00025 
cfg_FPN.SOLVER.MAX_ITER = 1000  
cfg_FPN.SOLVER.STEPS = []       
cfg_FPN.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128  
cfg_FPN.MODEL.ROI_HEADS.NUM_CLASSES = 46 

# Train
cfg_FPN.OUTPUT_DIR = "./output_FPN"
os.makedirs(cfg_FPN.OUTPUT_DIR, exist_ok=True)
trainer_FPN = DefaultTrainer(cfg_FPN) 
trainer_FPN.resume_or_load(resume=False)
trainer_FPN.train()

In [None]:
# Create predictor from the weigths obtained during the training
cfg_FPN.MODEL.WEIGHTS = os.path.join(cfg_FPN.OUTPUT_DIR, "model_final.pth")
cfg_FPN.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5   # set the testing threshold for this model
cfg_FPN.DATASETS.TEST = ('mat_val',)
predictor_FPN = DefaultPredictor(cfg_FPN)

In [None]:
from detectron2.utils.visualizer import ColorMode
plt.figure(figsize=(20,20))
for d in random.sample(materialist_dict, 3):    
    im = cv2.imread(d["file_name"])
    outputs = predictor_FPN(im)
    visualizer = Visualizer(im[:, :, ::-1],
                   metadata=materialist_metadata, 
                   scale=0.8, 
                   instance_mode=ColorMode.IMAGE_BW   # remove the colors of unsegmented pixels
    )
    v = visualizer.draw_instance_predictions(outputs["instances"].to("cpu"))
    plt.imshow(v.get_image()[:, :, ::-1])

In [None]:
from detectron2.utils.visualizer import ColorMode

# Show different images at random
rows, cols = 3, 3
plt.figure(figsize=(20,20))

for i, d in enumerate(random.sample(materialist_dict, 9)):
    # Process image
    plt.subplot(rows, cols, i+1)

    im = cv2.imread(d["file_name"])
    im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
    
    # Run through predictor
    outputs = predictor_FPN(im)
    
    # Visualize
    v = Visualizer(im[:, :, ::-1],
                   metadata=materialist_metadata, 
                   scale=0.8, 
                   instance_mode=ColorMode.IMAGE_BW   # remove the colors of unsegmented pixels
    )
    v = v.draw_instance_predictions(outputs["instances"].to("cpu"))
    plt.imshow(v.get_image()[:, :, ::-1])

plt.show()

In [None]:
from detectron2.evaluation import COCOEvaluator, inference_on_dataset
from detectron2.data import build_detection_test_loader

# Evaluate model
evaluator_FPN = COCOEvaluator("mat_val", output_dir="./output")
val_loader_FPN = build_detection_test_loader(cfg_FPN, "mat_val")

In [None]:
# Get results
result_FPN = inference_on_dataset(predictor_FPN.model, val_loader_FPN, evaluator_FPN)

In [None]:
result_FPN

## DC5

In [None]:
from detectron2.engine import DefaultTrainer
os.environ["CUDA_LAUNCH_BLOCKING"] = "1"

# Fine-tune a COCO-pretrained R50-DC5 Mask R-CNN model on the dataset
cfg_DC5 = get_cfg()
cfg_DC5.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_DC5_3x.yaml"))
cfg_DC5.DATASETS.TRAIN = ("mat_train",)
cfg_DC5.DATASETS.TEST = ()
cfg_DC5.DATALOADER.NUM_WORKERS = 1
cfg_DC5.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_DC5_3x.yaml")
cfg_DC5.SOLVER.IMS_PER_BATCH = 2
cfg_DC5.SOLVER.BASE_LR = 0.00025  
cfg_DC5.SOLVER.MAX_ITER = 1000 
cfg_DC5.SOLVER.STEPS = []    
cfg_DC5.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128  
cfg_DC5.MODEL.ROI_HEADS.NUM_CLASSES = 46  

# Train
cfg_DC5.OUTPUT_DIR = "./output_DC5"
os.makedirs(cfg_DC5.OUTPUT_DIR, exist_ok=True)
trainer_DC5 = DefaultTrainer(cfg_DC5) 
trainer_DC5.resume_or_load(resume=False)
trainer_DC5.train()

In [None]:
cfg_DC5.MODEL.WEIGHTS = os.path.join(cfg_DC5.OUTPUT_DIR, "model_final.pth")
cfg_DC5.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5   # set the testing threshold for this model
cfg_DC5.DATASETS.TEST = ('mat_val')
predictor_DC5 = DefaultPredictor(cfg_DC5)

In [None]:
from detectron2.utils.visualizer import ColorMode
plt.figure(figsize=(20,20))
for d in random.sample(materialist_dict, 3):    
    im = cv2.imread(d["file_name"])
    outputs = predictor_DC5(im)
    visualizer = Visualizer(im[:, :, ::-1],
                   metadata=materialist_metadata, 
                   scale=0.8, 
                   instance_mode=ColorMode.IMAGE_BW   # remove the colors of unsegmented pixels
    )
    v = visualizer.draw_instance_predictions(outputs["instances"].to("cpu"))
    plt.imshow(v.get_image()[:, :, ::-1])

In [None]:
# Show different images at random
rows, cols = 3, 3
plt.figure(figsize=(20,20))
for i, d in enumerate(random.sample(materialist_dict, 9)):
    # Process image
    plt.subplot(rows, cols, i+1)
    im = cv2.imread(d["file_name"])
    im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
    
    # Run through predictor
    outputs = predictor_DC5(im)
    
    # Visualize
    v = Visualizer(im[:, :, ::-1],
                   metadata=materialist_metadata, 
                   scale=0.8, 
                   instance_mode=ColorMode.IMAGE_BW   # remove the colors of unsegmented pixels
    )
    v = v.draw_instance_predictions(outputs["instances"].to("cpu"))
    plt.imshow(v.get_image()[:, :, ::-1])
plt.show()

In [None]:
# Evaluate model
evaluator_DC5 = COCOEvaluator("mat_val", output_dir=cfg_DC5.OUTPUT_DIR)
val_loader_DC5 = build_detection_test_loader(cfg_DC5, "mat_val")
result_DC5 = inference_on_dataset(predictor_DC5.model, val_loader_DC5, evaluator_DC5)

In [None]:
result_DC5

## C4

In [None]:
from detectron2.engine import DefaultTrainer

# Fine-tune a COCO-pretrained R50-C4 Mask R-CNN model on the dataset
cfg_C4 = get_cfg()
cfg_C4.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_C4_3x.yaml"))
cfg_C4.DATASETS.TRAIN = ("mat_train",)
cfg_C4.DATASETS.TEST = ()
cfg_C4.DATALOADER.NUM_WORKERS = 2
cfg_C4.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_C4_3x.yaml")
cfg_C4.SOLVER.IMS_PER_BATCH = 2
cfg_C4.SOLVER.BASE_LR = 0.00025
cfg_C4.SOLVER.MAX_ITER = 1000
cfg_C4.SOLVER.STEPS = []
cfg_C4.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128
cfg_C4.MODEL.ROI_HEADS.NUM_CLASSES = 46

# Train
cfg_C4.OUTPUT_DIR = "./output_C4"
os.makedirs(cfg_C4.OUTPUT_DIR, exist_ok=True)
trainer_C4 = DefaultTrainer(cfg_C4) 
trainer_C4.resume_or_load(resume=False)
trainer_C4.train()

In [None]:
# Create predictor from the weigths obtained during the training
cfg_C4.MODEL.WEIGHTS = os.path.join(cfg_C4.OUTPUT_DIR, "model_final.pth")
cfg_C4.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5
cfg_C4.DATASETS.TEST = ('mat_val')
predictor_C4 = DefaultPredictor(cfg_C4)

In [None]:
from detectron2.utils.visualizer import ColorMode
plt.figure(figsize=(20,20))
for d in random.sample(materialist_dict, 3):    
    im = cv2.imread(d["file_name"])
    outputs = predictor_C4(im)
    visualizer = Visualizer(im[:, :, ::-1],
                   metadata=materialist_metadata, 
                   scale=0.8, 
                   instance_mode=ColorMode.IMAGE_BW   # remove the colors of unsegmented pixels
    )
    v = visualizer.draw_instance_predictions(outputs["instances"].to("cpu"))
    plt.imshow(v.get_image()[:, :, ::-1])

In [None]:
# Show different images at random
rows, cols = 3, 3
plt.figure(figsize=(20,20))
for i, d in enumerate(random.sample(materialist_dict, 9)):
    # Process image
    plt.subplot(rows, cols, i+1)
    im = cv2.imread(d["file_name"])
    im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
    
    # Run through predictor
    outputs = predictor_C4(im)
    
    # Visualize
    v = Visualizer(im[:, :, ::-1],
                   metadata=materialist_metadata, 
                   scale=0.8, 
                   instance_mode=ColorMode.IMAGE_BW   # remove the colors of unsegmented pixels
    )
    v = v.draw_instance_predictions(outputs["instances"].to("cpu"))
    plt.imshow(v.get_image()[:, :, ::-1])
plt.show()

In [None]:
# Evaluate model
evaluator_C4 = COCOEvaluator("mat_val", output_dir="./output")
val_loader_C4 = build_detection_test_loader(cfg_C4, "mat_val")
result_C4 = inference_on_dataset(predictor_C4.model, val_loader_C4, evaluator_C4)

In [None]:
result_C4

# Evaluation

In [None]:
results_dict = {}

# Create table to compare results
results_dict['bbox_DC5'] = result_DC5['bbox']
results_dict['bbox_C4'] = result_C4['bbox']
results_dict['bbox_FPN'] = result_FPN['bbox']
results_dict['segm_DC5'] = result_DC5['segm']
results_dict['segm_C4'] = result_C4['segm']
results_dict['segm_FPN'] = result_FPN['segm']

df_results = pd.DataFrame.from_dict(results_dict)
df_results.loc[['AP', 'AP50', 'AP75', 'APs', 'APm', 'APl']]

As shown in the table, most of the metrics obtained during the training are similar. All the networks used to train have the same number of layers. If we wanted to improve those metrics, we could use a bigger training set (since we are using only a small subset to accelerate the training process), use a higher number of epochs or use a network architecture with a higher number of layers, such as res101.   

Due to hardawre limitations the metrics are not as good as expected. 

As shown in the different results for each model, the AP metric is higher for the classes that are more frequent in the dataset (for instance shoes, sleeves, etc.), as expected.

In [None]:
import ast

metrics = {}

# Create losses plots
for folder in ["FPN", "C4", "DC5"]:
    with open(f'output_{folder}/metrics.json') as file:
        lines = file.readlines()
    
    metrics_model = []
    for line in lines:
        metrics_model.append(ast.literal_eval(line))
    metrics[folder] = metrics_model

In [None]:
import matplotlib.pyplot as plt

fig, axs = plt.subplots(3, 4, figsize=(20,10))

for idx, model in enumerate(["FPN", "C4", "DC5"]):
    met = metrics[model]
    for loss_id, loss in enumerate(["loss_box_reg", "loss_mask", "loss_cls", "total_loss"]):
        ax = axs[idx, loss_id]
        total_loss = [loss_dict[loss] for loss_dict in met]
        iterations = [loss_dict['iteration'] for loss_dict in met]

        ax.set_title(f"{model} - {loss}")
        ax.set_xlabel("iterations")
        ax.plot(iterations, total_loss)
        ax.grid()
        
fig.tight_layout()

The increase in the box loss is normal, and it is because the box regression loss is only applied to the positive boxes.
Given that the amount of positive boxes is very small in the beginning of training (due to the objectness classifier not being very good), the box regression loss is smaller. But as training goes on, we have better objectness, and thus more boxes to be regressed, and the regression loss increases a bit at the beginning.