<a href="https://colab.research.google.com/github/majsylw/detect-waste-workshop/blob/main/Detecting_trash_in_a_wild_Part_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<p><img alt="Colaboratory logo" height="45px" src="https://colab.research.google.com/img/colab_favicon.ico" align="left" hspace="10px" vspace="0px"></p>
Author: Sylwia Majchrowska


<h1>Welcome to the workshop notebook "Exploring Trash Annotations in Context"!</h1>

<img src="https://raw.githubusercontent.com/majsylw/detect-waste-workshop/main/imgs/graphic.jpg" alt="logo" width="700"/>

This workshop provides an overview of waste types using public dataset, covering data cleaning, preparation, and labeling standards. Participants will also conduct exploratory data analysis to understand data and label quality and their impact on machine learning training.

## Notebook Handling
The file you are reading is a [Jupyter Notebook](https://jupyter.org/). It is not a static page but an interactive environment that allows you to create and execute code written in the Python language.

The notebook consists of two types of cells: for entering text and for source code. In one cell, you can enter multiple lines of code, but it is advisable to do so thoughtfully and in moderation, as all commands placed in one cell will execute sequentially when it is run (Shift+Enter). Below is an example of a cell with code that saves a certain value to a variable and prints its contents on the screen.

In [None]:
number_of_seconds_in_a_day = 24 * 60 * 60
number_of_seconds_in_a_day

To execute the code in the above cell, select it by clicking on it and then simultaneously press the "Shift+Enter" key combination (or click the play button, which appears to the left of the code after clicking the cell).

Within one notebook, all code cells share a common memory. Furthermore, the order in which commands in the cells are run depends on you. This means that if you create an object in memory and give it a name in one cell, every subsequently executed cell will be aware of this object. This has its advantages and disadvantages. An important side effect of this solution is that the events in the notebook depend strictly on the order in which its cells are run. Below is an example prepared to illustrate the described property of notebooks.

In [None]:
number_of_seconds_in_a_week = 7 * number_of_seconds_in_a_day
number_of_seconds_in_a_week

#### Data Access

To avoid installing all libraries and dependencies, you can use [Google Colaboratory](https://colab.research.google.com/notebooks/welcome.ipynb) - a Python workspace in the cloud. To do this, you need to move all data (as well as this notebook) to your Google Drive.

You can access files on Google Drive by connecting (mapping) Google Drive in the virtual machine of the execution environment (notebook). To do this, execute the two code cells below.

**NOTE:** Before executing the script below, make sure that you have uploaded the necessary data to your Google Drive and edited the access paths.

**NOTE:** If you prefer to work on your own device, you need to install (or make sure you have installed) a [Python interpreter](https://docs.anaconda.com/anaconda/install/windows/) and the modules used in this notebook - you can find them by looking at all the instructions with the keyword *import*.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### Python libraries

**Usefull imports** (select the below cell and press shift-enter to execute it)

In [None]:
%matplotlib inline
import json
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
import pylab

from collections import Counter

from graphviz import Digraph  # Note: graphviz may require more than pip installation due to path issue: e.g. brew install graphviz for mac

from PIL import Image, ExifTags
from pycocotools.coco import COCO
from matplotlib.patches import Polygon, Rectangle
from matplotlib.collections import PatchCollection
import colorsys
import random
import pylab

import cv2
import glob
import os

from tqdm import tqdm

# How Do Machines Learn?

Some artificial intelligence technologies have been around for a long time, but advances in computational power, the availability of vast amounts of data, and new algorithms have led to major breakthroughs in this field. Artificial intelligence is widely used to provide personalized recommendations during shopping or ordinary web searches. More advanced inventions include autonomous - self-driving - cars, which, in a simplified manner, make decisions about their next moves based on data collected from various sensors installed in them.

## Types of Machine Learning

We can perceive machine learning as the art of extracting knowledge from data. The basic division of the field into sub-areas results from the type of task the machine is to solve:
- Supervised learning: occurs when all data presented to the machine is labeled, i.e., marked, in exactly the same way as the expected response.
- Unsupervised learning: occurs when we have a large amount of unlabeled data, and the machine's task is to determine the data structure, such as grouping them accordingly.
- Reinforcement learning: through trial and error, the machine seeks a solution to a formulated task, being rewarded (when it acts correctly) or punished (when it makes mistakes), but otherwise, it is not given any hints or suggestions.

## Common Challenges

As you've probably noticed, machine learning heavily relies on data. Therefore, both the quantity and quality of data are particularly important here. Just imagine that you are building an autonomous car - in simple terms, you need to teach the machine to recognize the road, the roadside, other vehicles, and pedestrians. If you only have data collected during the day, you won't be able to teach your device to react at night when the lighting of encountered objects is completely different. Furthermore, even with too few instances of daytime road images, you cannot be sure that you will train a network that will be able to react correctly in every situation - how would it know how to behave if it hadn't 'seen' a similar case before? We can use an extreme case here: a pedestrian wearing a shirt with black and white stripes. It would be unacceptable to release an algorithm that could mistake them for a pedestrian crossing, right? Moving on - let's assume you want to deploy your invention on the streets of New York, but during model training, you only had photos from Skopje. Both cities have different infrastructures, which also affects the device's operational efficiency.

## Images - A Data Treasury for Computer Vision
The example of machine learning discussed above relies largely on images. Computer vision, a field of artificial intelligence, aims to mimic machines' ability to understand what they see. In computer vision, a model is created based on images and the expected results (predictions) when dealing with supervised learning.

**NOTE:** The accuracy of predictions depends on the quality of input data and the constructed model (a kind of *computer's eyes*).

<img src="https://raw.githubusercontent.com/majsylw/detect-waste-workshop/main/imgs/CV.jpg" alt="cv" width="700"/>

# The Trash Annotations in Context (TACO) Dataset 🌮
For the tutorial we will use the open-source [TACO dataset](http://tacodataset.org/). The dataset repository contains a official dataset with 1500 images and 4784 annotations and an unofficial dataset with 3736 images and 8419 annotations. Both datasets contain various large and small litter objects on different backgrounds such as streets, parks, and beaches. The dataset contains 60 categories. of litter.

## About the dataset

- Research Paper: [TACO: Trash Annotations in Context for Litter Detection](https://arxiv.org/abs/2003.06975)
- Author: Pedro F Proença, Pedro Simões
- Dataset Size: Official: 1500 images, 4784 annotations & Unofficial: 3736 images, 8419 annotations.
- Categories: 60 litter categories
- License: CC BY 4.0
- Release: 17 March 2020
- Read more: [Github](https://github.com/pedropro/TACO) & [Webpage](http://tacodataset.org/)

## Downloading the Dataset

So, let’s download the dataset from the official website by following the instructions. Currently, there are two datasets, namely, TACO-Official and TACO-Unofficial.

- The annotations of the official dataset are collected, annotated, and reviewed by the creators of the TACO project.
- The annotations of the unofficial dataset are provided by the community and they are not reviewed.

In [None]:
# Clone the TACO repo
!git clone https://github.com/pedropro/TACO

In [None]:
# Move the folder that contains the JSON annotations to pwd
#!cp -r TACO/data .
# Download the images
#!python TACO/download.py

In [None]:
# For simplicity we will use provided data at google drive

HOME_FOLDER = '/content/drive/MyDrive/detect-waste-workshop/'
TACO_DATA_FOLDER = HOME_FOLDER + 'TACO/'
MODELS_FOLDER = HOME_FOLDER + 'models/'

## Understanding Data and Label Quality

The TACO dataset is a dataset specifically designed for the detection of litter in complex environments. It is built upon and extends the COCO (Common Objects in Context) format, which is a widely used standard for object detection, segmentation, and captioning tasks. The COCO format provides a structured approach to organizing image data and annotations, making it easier for machine learning models to learn from complex datasets.

### COCO Annotations in the TACO Dataset
In the context of the TACO dataset, [COCO annotations](https://cocodataset.org/#home) are used to describe various aspects of the images, particularly focusing on the litter items present. Here's a breakdown of how COCO annotations are typically structured and applied within the TACO dataset:

- `Images`: Each entry contains information about the image, including a unique ID, file name, and dimensions (height and width).
- `Categories`: This section defines the different types of litter or waste categories present in the dataset. Each category has a unique ID, a name (e.g., "plastic bottle", "can"), and sometimes a supercategory (e.g., "plastic" for "plastic bottle").
- `Annotations`: Annotations link specific instances of objects (in this case, litter items) in the images to their categories. Each annotation includes:
  - An ID for the annotation itself.
  - The ID of the image the annotation belongs to.
  - The category ID indicating the type of litter.
  - A segmentation field that outlines the exact shape of the object within the image. This can be in the form of a polygon or a mask.
  - A bounding box field that provides the coordinates of a rectangle enclosing the object.
  - Additional attributes like area (of the segmentation or bounding box) and iscrowd (indicating if the object is part of a group or a single instance).



In [None]:
# Set up path to annotations
anns_file_path = TACO_DATA_FOLDER + 'annotations.json'

# Read annotations
with open(anns_file_path, 'r') as f:
    dataset = json.loads(f.read())

In [None]:
print(dataset.keys())

In [None]:
print(dataset['annotations'][0].keys())

### Purpose and Application
The use of COCO annotations in the TACO dataset facilitates the application of advanced computer vision techniques for litter detection. By providing detailed annotations, the dataset allows for:

- `Object Detection`: Identifying and locating litter items within images.
  - `Annotation Format`: The annotations typically include bounding boxes that specify the coordinates of the rectangle enclosing each object. Along with the bounding box, each object is labeled with a category from a predefined set.
  - `Example`: In a waste detection dataset, a plastic bottle on the beach might be annotated with a bounding box around the bottle and labeled as "plastic."
- `Segmentation`: Precisely outlining the shape of each litter item, useful for understanding the spatial distribution and context of litter in various environments.
  - `Annotation Format`: Segmentation annotations are more detailed than detection annotations. They can be in the form of masks that outline the exact shape of each object, with each pixel in the mask being assigned a class label. For instance segmentation, each object instance is uniquely identified.
  - `Example`: In the TACO dataset, a segmentation mask might precisely outline the shape of each piece of litter, differentiating between multiple pieces of the same type of waste.
- `Classification`: Classifying litter items into predefined categories, aiding in waste management and recycling efforts.
  - `Annotation Format`: These annotations are simpler, consisting of one or more labels that apply to the whole image or to identified objects without spatial information. In object-level classification, the objects are usually first detected or segmented.
  - `Example`: An image of a landfill might be labeled with categories like "organic waste," "plastic," and "metal," indicating the presence of these types of waste without specifying their locations or shapes.

The structured format of COCO annotations, combined with the specific focus on litter in the TACO dataset, makes it a valuable resource for developing and testing machine learning models aimed at environmental conservation and waste management solutions.

#### Visualize Annotated Images

For simplicity, to select and show the dataset images with the respective masks, we make use of the COCO API. The script below shows how to load and visualize an image with all its annotations.

Unfortunately, several python libraries do not take into account the EXIF orientation tag, thus we have to explicitly rotate the images. Alternatively you can use instead OpenCV.

In [None]:
# User settings
image_filepath = 'batch_11/000028.jpg'
pylab.rcParams['figure.figsize'] = (28,28)
####################

# Obtain Exif orientation tag code
for orientation in ExifTags.TAGS.keys():
    if ExifTags.TAGS[orientation] == 'Orientation':
        break

# Loads dataset as a coco object
coco = COCO(anns_file_path)

# Find image id
img_id = -1
for img in dataset['images']:
    if img['file_name'] == image_filepath:
        img_id = img['id']
        break

# Show image and corresponding annotations
if img_id == -1:
    print('Incorrect file name')
else:
    # Load image
    print(image_filepath)
    I = Image.open(TACO_DATA_FOLDER + image_filepath)

    # Load and process image metadata
    if I._getexif():
        exif = dict(I._getexif().items())
        # Rotate portrait and upside down images if necessary
        if orientation in exif:
            if exif[orientation] == 3:
                I = I.rotate(180,expand=True)
            if exif[orientation] == 6:
                I = I.rotate(270,expand=True)
            if exif[orientation] == 8:
                I = I.rotate(90,expand=True)

    # Show image
    fig,ax = plt.subplots(1)
    plt.axis('off')
    plt.imshow(I)

    # Load mask ids
    annIds = coco.getAnnIds(imgIds=img_id, catIds=[], iscrowd=None)
    anns_sel = coco.loadAnns(annIds)

    # Show annotations
    for ann in anns_sel:
        color = colorsys.hsv_to_rgb(np.random.random(),1,1)
        for seg in ann['segmentation']:
            poly = Polygon(np.array(seg).reshape((int(len(seg)/2), 2)))
            p = PatchCollection([poly], facecolor=color, edgecolors=color,linewidths=0, alpha=0.4)
            ax.add_collection(p)
            p = PatchCollection([poly], facecolor='none', edgecolors=color, linewidths=2)
            ax.add_collection(p)
        [x, y, w, h] = ann['bbox']
        rect = Rectangle((x,y),w,h,linewidth=2,edgecolor=color,
                         facecolor='none', alpha=0.7, linestyle = '--')
        ax.add_patch(rect)

    plt.show()

In [None]:
for ann in anns_sel:
  for cat in dataset['categories']:
    if cat['id'] == ann['category_id']:
      print(cat['supercategory'])

#### Key Differences
- Level of Detail: Segmentation annotations provide the highest level of detail, followed by detection annotations. Category classification annotations provide the least detail, focusing only on the presence of categories.
- Use Cases: Detection is suitable for applications where the location of objects is important. Segmentation is used when precise outlines are needed, for example, in medical imaging or detailed environmental monitoring. Category classification is used for simpler tasks where only the presence of certain types of objects is relevant.
- Complexity and Effort: Creating segmentation annotations requires the most effort due to the need for pixel-level precision. Detection annotations are less time-consuming but still require careful placement of bounding boxes. Category classification annotations are the simplest and fastest to create.

Understanding these differences is crucial for selecting the appropriate annotation type for a given task, balancing the level of detail needed against the effort required to create the annotations.


**NOTE:** Try yourself by annotating TACO images at http://tacodataset.org/annotate

### TACO waste type distribution



In [None]:
# This shows the number of annotations per category

categories = dataset['categories']
anns = dataset['annotations']
imgs = dataset['images']
scenes = dataset['scene_categories']

nr_cats = len(categories)
nr_annotations = len(anns)
nr_images = len(imgs)
nr_scenes = len(scenes)

# Load categories and super categories
cat_names = []
super_cat_names = []
super_cat_ids = {}
super_cat_last_name = ''
nr_super_cats = 0
for cat_it in categories:
    cat_names.append(cat_it['name'])
    super_cat_name = cat_it['supercategory']
    # Adding new supercat
    if super_cat_name != super_cat_last_name:
        super_cat_names.append(super_cat_name)
        super_cat_ids[super_cat_name] = nr_super_cats
        super_cat_last_name = super_cat_name
        nr_super_cats += 1

print('Number of super categories:', nr_super_cats)
print('Number of categories:', nr_cats)
print('Number of annotations:', nr_annotations)
print('Number of images:', nr_images)
print('Number of scenes:', nr_scenes)

In total, there are 4784 annotations for 60 classes. Waste were observed in 7 diverse enviroments.

In [None]:
# This shows the scenes distribiution

# Get scene cat names
scene_cats = dataset['scene_categories']
scene_name = []
for scene_cat in scene_cats:
    scene_name.append(scene_cat['name'])

nr_scenes = len(scene_cats)
scene_cat_histogram = np.zeros(nr_scenes,dtype=int)

for scene_ann in dataset['scene_annotations']:
    scene_ann_ids = scene_ann['background_ids']
    for scene_ann_id in scene_ann_ids:
        if scene_ann_id<len(scene_cats):
            scene_cat_histogram[scene_ann_id]+=1

# Convert to DataFrame
df = pd.DataFrame({'scene_cats': scene_cats, 'nr_annotations': scene_cat_histogram})

# Plot
colors = ['white','black','gray', 'gold', 'red','green','lightskyblue']
plt.pie(scene_cat_histogram, labels=scene_name, colors = colors,
      shadow=False, startangle=-120)

plt.axis('equal')
plt.show()

In [None]:
# This shows the number of annotations per category

# Count annotations
cat_histogram = np.zeros(nr_cats,dtype=int)
for ann in anns:
    cat_histogram[ann['category_id']] += 1

# Initialize the matplotlib figure
f, ax = plt.subplots(figsize=(5,15))

# Convert to DataFrame
df = pd.DataFrame({'Categories': cat_names, 'Number of annotations': cat_histogram})
df = df.sort_values(by='Number of annotations', ascending=False)

# Plot the histogram
sns.set_color_codes("pastel")
sns.set(style="whitegrid")
plot_1 = sns.barplot(x="Number of annotations", y="Categories", data=df,
            label="Total", color="b")


In [None]:
cat_ids_2_supercat_ids = {}
for cat in categories:
    cat_ids_2_supercat_ids[cat['id']] = super_cat_ids[cat['supercategory']]

# Count annotations
super_cat_histogram = np.zeros(nr_super_cats,dtype=int)
for ann in anns:
    cat_id = ann['category_id']
    super_cat_histogram[cat_ids_2_supercat_ids[cat_id]] +=1

# Initialize the matplotlib figure
f, ax = plt.subplots(figsize=(5,10))

# Convert to DataFrame
d ={'Super categories': super_cat_names, 'Number of annotations': super_cat_histogram}
df = pd.DataFrame(d)
df = df.sort_values(by='Number of annotations', ascending=False)

sns.set_color_codes("pastel")
sns.set(style="whitegrid")
plot_1 = sns.barplot(x="Number of annotations", y="Super categories", data=df,
                     label="Total", color="b")

It is clear that there is a high class imbalance with some classes having fewer than 10 annotations, and some more than 500.

In [None]:
# Visualize dataset graph

#g = Digraph('G', filename='hello.gv')
dot = Digraph('Dataset graph', filename='asd.gv')
dot.attr(rankdir='LR', size='8,10')

for cat_it in categories:
    dot.node(cat_it['name'])
    if cat_it['name']==cat_it['supercategory']:
        dot.node(cat_it['supercategory'])
    else:
        dot.edge(cat_it['supercategory'], cat_it['name'])
dot
# Uncomment next line to print pdf
#dot.view()

# Detect-waste categories

The [Detect-waste](https://detectwaste.netlify.app/) dataset is inspired by the waste segregation principles in Gdańsk, Poland, and it categorizes waste according to these principles, aiming to align with Polish recycling standards. The dataset proposes seven well-defined categories for sorting litter, which are reflective of the broader recycling categories recognized in Poland. These categories are:

- **Bio**: Organic waste that can be composted.
- **Glass**: All types of glass products.
- **Metal and Plastic**: This category combines both metal and plastic waste, acknowledging the common collection of these materials in single recycling bins in some recycling schemes.
- **Non-recyclable**: Waste that cannot be recycled and is typically destined for landfill or incineration.
- **Other**: A category for waste that does not fit into the other categories or is of an ambiguous nature.
- **Paper**: All types of paper and cardboard products.
- **Unknown**: Items that cannot be easily classified into the above categories due to lack of visibility or information.

These categories are based on the recycling rules in Gdańsk, Poland, and aim to provide a comprehensive framework for waste classification that can be used for automatic waste detection and sorting.

<img src="https://raw.githubusercontent.com/majsylw/detect-waste-workshop/main/imgs/dw-cat.jpg" alt="DW-cat"/>

In [None]:
# Convert taco label to detect-waste labels based on polish recykling standards

# Step 1. Manually assign categories from TACO to detectwaste

def taco_to_detectwaste(label):
    glass = ["Glass bottle", "Broken glass", "Glass jar"]
    metals_and_plastics = ["Aluminium foil", "Clear plastic bottle","Other plastic bottle",
                           "Plastic bottle cap","Metal bottle cap","Aerosol","Drink can",
                           "Food can","Drink carton","Disposable plastic cup","Other plastic cup",
                           "Plastic lid","Metal lid","Single-use carrier bag","Polypropylene bag",
                           "Plastic Film","Six pack rings","Spread tub","Tupperware",
                           "Disposable food container","Other plastic container",
                           "Plastic glooves","Plastic utensils","Pop tab","Scrap metal",
                           "Plastic straw","Other plastic", "Plastic film", "Food Can"]

    non_recyclable = ["Aluminium blister pack","Carded blister pack",
                      "Meal carton","Pizza box","Cigarette","Paper cup",
                      "Meal carton","Foam cup","Glass cup","Wrapping paper",
                      "Magazine paper","Garbage bag","Plastified paper bag",
                      "Crisp packet","Other plastic wrapper","Foam food container",
                      "Rope","Shoe","Squeezable tube","Paper straw","Styrofoam piece",
                      "Rope & strings", "Tissues"]

    other = ["Battery"]
    paper = ["Corrugated carton","Egg carton","Toilet tube","Other carton", "Normal paper", "Paper bag"]
    bio = ["Food waste"]
    unknown = ["Unlabeled litter"]

    if (label in glass):
            label="glass"
    elif (label in metals_and_plastics):
            label="metals_and_plastics"
    elif(label in non_recyclable):
            label="non-recyclable"
    elif(label in other):
            label="other"
    elif (label in paper):
            label="paper"
    elif(label in bio):
            label="bio"
    elif(label in unknown):
            label="unknown"
    else:
        print(label, "is non-taco label")
        label = "unknown"
    return label

In [None]:
# Step 2. Use function to convert annotations to desired form

# convert all taco anns to detect-waste anns
# let's change supercategory to detectwaste
detectwaste_categories = dataset['categories']
for ann in anns:
    cat_id = ann['category_id']
    cat_taco = categories[cat_id]['name']
    detectwaste_categories[cat_id]['supercategory'] = taco_to_detectwaste(cat_taco)

In [None]:
# As there is no representation of "Plastified paper bag" in annotated data, change of this supercategory was done manually.

print(detectwaste_categories[35])
detectwaste_categories[35]['supercategory'] = taco_to_detectwaste("Plastified paper bag")
print(detectwaste_categories[35])
#detectwaste_categories

## Detect-waste evaluation
It is extremely important to get to know what your dataset looks like. It might be helpfull during the evaluation of the system. Sometimes it happens that misclassification or other errors are caused by the imbalance if the dataset or some erroneous annotations. The first thing that we did was extracting from the dataset as much information as we could. Here you can see a few diagrams representing vital statistics.

To prevent the negative effects of the data imbalance in our dataset, first we have to know how many images we have in each category. As you can see most of the trash found in our dataset is metals and plastics. Unfortunately, the second numerously represented category is unknown - the litter that has probably decomposed so much that it is hard to classify it. This makes our dataset highly imbalanced and will require special attention in future.

In [None]:
# Generate new ids for ploting histograms

detectwaste_ids = {}
detectwaste_cat_names = []
cat_id = 0
for cat in detectwaste_categories:
    if cat['supercategory'] not in detectwaste_ids:
        detectwaste_cat_names.append(cat['supercategory'])
        detectwaste_ids[cat['supercategory']] = cat_id
        cat_id += 1

print(detectwaste_ids)
print(detectwaste_cat_names)

taco_to_detectwaste_ids = {}
for i, cat in enumerate(detectwaste_categories):
    taco_to_detectwaste_ids[cat['id']] = detectwaste_ids[cat['supercategory']]

# print(taco_to_detectwaste_ids)

colors_recykling = ['yellow', 'gray', 'gray', 'green', 'blue', 'brown', 'pink']

anns_detectwaste = anns.copy()
for i, ann in enumerate(anns):
    #print(ann['category_id'])
    anns_detectwaste[i]['category_id'] = taco_to_detectwaste_ids[ann['category_id']]
    anns_detectwaste[i].pop('segmentation', None)

In [None]:
# Count annotations
detectwaste_cat_histogram = np.zeros(len(detectwaste_cat_names),dtype=int)

for ann in anns_detectwaste:
    cat_id = ann['category_id']
    detectwaste_cat_histogram[cat_id] +=1

# Initialize the matplotlib figure
f, ax = plt.subplots(figsize=(5,10))

# Convert to DataFrame
d ={'Super categories': detectwaste_cat_names, 'Number of annotations': detectwaste_cat_histogram}
df = pd.DataFrame(d)
df = df.sort_values(by='Number of annotations', ascending=False)

sns.set_palette(sns.color_palette(colors_recykling))
plot_1 = sns.barplot(x="Number of annotations", y="Super categories", data=df, label="Total")
plot_1.set_title('Annotations per detectwaste category',fontsize=20)

## Visualize Annotated Images

The script below shows how to filter images by either category or supercategory.

Go ahead and try different (super)categories searches by changing the `category_name`. Note that small objects may be hard to see.

In [None]:
def extract_detectwaste_color(ann, taco_to_detectwaste_ids, colors_recykling):
    color_id = taco_to_detectwaste_ids[ann['category_id']]
    color = colors_recykling[color_id]
    return color

In [None]:
categories_to_show = ['Bottle', 'Shoe', 'Food waste']
nr_img_2_display = 1
pylab.rcParams['figure.figsize'] = (14,14)

for category_name in categories_to_show: #  --- Insert the name of one of the categories or super-categories above

    # Obtain Exif orientation tag code
    for orientation in ExifTags.TAGS.keys():
        if ExifTags.TAGS[orientation] == 'Orientation':
            break

    # Loads dataset as a coco object
    coco = COCO(anns_file_path)

    # Get image ids
    imgIds = []
    catIds = coco.getCatIds(catNms=[category_name])
    if catIds:
        # Get all images containing an instance of the chosen category
        imgIds = coco.getImgIds(catIds=catIds)
    else:
        # Get all images containing an instance of the chosen super category
        catIds = coco.getCatIds(supNms=[category_name])
        for catId in catIds:
            imgIds += (coco.getImgIds(catIds=catId))
        imgIds = list(set(imgIds))

    nr_images_found = len(imgIds)
    print('Number of images found: ',nr_images_found)

    # Select N random images
    random.shuffle(imgIds)
    imgs = coco.loadImgs(imgIds[0:min(nr_img_2_display,nr_images_found)])

    for img in imgs:
        image_path = TACO_DATA_FOLDER + img['file_name']
        # Load image
        I = Image.open(image_path)

        # Load and process image metadata
        if I._getexif():
            exif = dict(I._getexif().items())
            # Rotate portrait and upside down images if necessary
            if orientation in exif:
                if exif[orientation] == 3:
                    I = I.rotate(180,expand=True)
                if exif[orientation] == 6:
                    I = I.rotate(270,expand=True)
                if exif[orientation] == 8:
                    I = I.rotate(90,expand=True)

        # Show image
        fig,ax = plt.subplots(1)
        plt.axis('off')
        plt.imshow(I)

        # Load mask ids
        annIds = coco.getAnnIds(imgIds=img['id'], catIds=catIds, iscrowd=None)
        anns_sel = coco.loadAnns(annIds)

        # Show annotations
        for ann in anns_sel:
            color = extract_detectwaste_color(ann, taco_to_detectwaste_ids, colors_recykling)
            for seg in ann['segmentation']:
                poly = Polygon(np.array(seg).reshape((int(len(seg)/2), 2)))
                p = PatchCollection([poly], facecolor=color, edgecolors=color,linewidths=0, alpha=0.4)
                ax.add_collection(p)
                p = PatchCollection([poly], facecolor='none', edgecolors=color, linewidths=2)
                ax.add_collection(p)
            [x, y, w, h] = ann['bbox']
            rect = Rectangle((x,y),w,h,linewidth=2,edgecolor=color,
                             facecolor='none', alpha=0.7, linestyle = '--')
            ax.add_patch(rect)

        plt.show()

## Detect-waste statistics

Information about the number of objects per image might be helpful during the detection process. We need to know if the majority of images contain only a single object or a few.

In [None]:
nr_annotation_per_image = []

for img in dataset['images']:
    annotations_per_image = []
    for i in range(0, len(anns_detectwaste)):
        if img['id'] == anns_detectwaste[i]['image_id']:
            annotations_per_image.append(anns_detectwaste[i]['id'])
    nr_annotation_per_image.append(len(annotations_per_image))

plt.figure(figsize=(20,7))
ax = sns.distplot(nr_annotation_per_image,kde=False,bins=100, color='g')
ax.set_yscale('log')
ax.set(xlabel='Number of annotations per image', ylabel='Image Count')

In [None]:
def no_bbox_per_image(anns):
    temo_im_ids = ([ann['image_id'] for ann in anns])
    temp_im_ids = Counter(temo_im_ids)
    list_of_duplicates = [temp_im_ids[i] for i,im_id in enumerate(temp_im_ids)]
    list_of_duplicates = list(filter(lambda duplicate: duplicate != 0, list_of_duplicates))
    return np.mean(list_of_duplicates)
def no_bbox_summary(anns):
    temo_im_ids = ([ann['image_id'] for ann in anns])
    temp_im_ids = Counter(temo_im_ids)
    list_of_duplicates = [temp_im_ids[i] for i,im_id in enumerate(temp_im_ids)]
    list_of_duplicates = list(filter(lambda duplicate: duplicate != 0, list_of_duplicates))
    print('Maximum bboxes: ',np.max(list_of_duplicates))
    print('Mean number of bboxes: ',np.mean(list_of_duplicates))
    print('Median number of bboxes: ',np.median(list_of_duplicates))


print('all:',no_bbox_per_image(anns_detectwaste))
for cat_nr, cat in enumerate(detectwaste_cat_names):
    try:
        temp_anns = [ann for ann in anns_detectwaste if(ann['category_id'] == cat_nr)]
        print('\n', cat)
        no_bbox_summary(temp_anns)
    except:
        continue

Some ML architectures require exact image size as an input, so it is worth to know what is the size of our data so that we could properly resize them. It also a good indicator of our general data quality.

In [None]:
# Parsing image shapes (resolutions)
widths = []
heights = []
shape_freqs = []
img_shapes_keys = {}
for img in dataset['images']:
    key = str(img['width'])+'-'+str(img['height'])
    if key in img_shapes_keys:
        shape_id = img_shapes_keys[key]
        shape_freqs[shape_id] += 1
    else:
        img_shapes_keys[key] = len(widths)
        widths.append(img['width'])
        heights.append(img['height'])
        shape_freqs.append(1)

d ={'Image width (px)': widths, 'Image height (px)': heights, '# images': shape_freqs}
df = pd.DataFrame(d)
cmap = sns.cubehelix_palette(dark=.1, light=.6, as_cmap=True)
plot = sns.scatterplot(x="Image width (px)", y="Image height (px)", size='# images', hue="# images", palette = cmap,data=df)
plt.xlabel('Image width (px)', fontsize=15)
plt.ylabel('Image height (px)', fontsize=15)
plot = plot.set_title('Number of images per image shape',fontsize=15)

Knowing where most of the objects occur in the image might be a useful information that can help us prevent some later errors. For instance, if our objects appear only in the center of an image, we should consider applying data augmentation methods to make the detector recognize objects in any other place in the image.

Interestingly, in our case most of the objects are in the center. It might be due to the fact that at the same time most of the images contain only single object, typically in the center. In case of numerous objects, they usually are scattered through the whole image.

In [None]:
center_x = []
center_y = []
for i in range(0, len(anns_detectwaste)):
    for j in range (0, len(dataset['images'])):
        if dataset['images'][j]['id'] == anns_detectwaste[i]['image_id']:

            center_x.append((anns_detectwaste[i]['bbox'][0]+anns_detectwaste[i]['bbox'][2]/2)/dataset['images'][j]['width'])
            center_y.append((anns_detectwaste[i]['bbox'][1]+anns_detectwaste[i]['bbox'][3]/2)/dataset['images'][j]['height'])

plt.figure(figsize=(30,15))
plt.plot(center_x, center_y, 'bo')
plt.title('Placement of central point of the bbox in the image', fontsize=30)
plt.xlabel('Bbox x coordinate', fontsize=30)
plt.ylabel('Bbox y coordinate', fontsize=30)
plt.show()

Also, the size of the bounding boxes is essential – we need to know if we will deal mostly with small or rather bigger objects. It is a well-known fact that even the state-of-the-art detectors do not work well with small objects.

As our images vary in size, the bounding box size is also relative. It is worth to know how many annotations per different bounding box sizes we have in our dataset.

In [None]:
bbox_widths = []
bbox_heights = []
obj_areas_sqrt = []
obj_areas_sqrt_fraction = []
bbox_aspect_ratio = []
max_image_dim = 1024

for ann in anns_detectwaste:

    imgs = dataset['images']

    resize_scale = max_image_dim/max(imgs[0]['width'], imgs[0]['height'])
    # Uncomment this to work on original image size
    # resize_scale = 1

    bbox_widths.append(ann['bbox'][2]*resize_scale)
    bbox_heights.append(ann['bbox'][3]*resize_scale)
    obj_area = ann['bbox'][2]*ann['bbox'][3]*resize_scale**2 # ann['area']
    obj_areas_sqrt.append(np.sqrt(obj_area))

    img_area = imgs[0]['width']*imgs[0]['height']*resize_scale**2
    obj_areas_sqrt_fraction.append(np.sqrt(obj_area/img_area))

print('According to MS COCO Evaluation. This dataset has:')
print(np.sum(np.array(obj_areas_sqrt)<32), 'small objects (area<32*32 px)')
print(np.sum(np.array(obj_areas_sqrt)<64), 'medium objects (area<96*96 px)')
print(np.sum(np.array(obj_areas_sqrt)<96), 'large objects (area>96*96 px)')

# d ={'Bbox width (px)': bbox_widths, 'Bbox height (px)': bbox_heights, 'area': seg_areas}
# df = pd.DataFrame(d)

plt.figure(figsize=(30,15))
ax = sns.distplot(obj_areas_sqrt_fraction,kde=False, bins=200, color='g')
ax.set_yscale('log')
plt.title('Number of annotations per relative bbox size', fontsize=30)
plt.xlabel(r'Annotation relative size as $\sqrt{ Bbox\_area \ /  \ Image\_area}$', fontsize=30)
plt.ylabel('Number of annotations', fontsize=30)

plt.figure(figsize=(20,7))
ax = sns.distplot(np.maximum(np.array(bbox_widths),np.array(bbox_heights)),kde=False, bins=200, color='g')
ax = ax.set(xlabel='Maximum bbox dimension', ylabel='Number of annotations')

import colorsys
fig, ax = plt.subplots(1, 1, figsize=(5,5))

# Plotting bbox dims
d ={'BBox width (px)': bbox_widths, 'BBox height (px)': bbox_heights}
df = pd.DataFrame(d)
cmap = sns.cubehelix_palette(dark=.1, light=.6, as_cmap=True)
ax = sns.scatterplot(x="BBox width (px)", y="BBox height (px)", palette = cmap,data=df)

print('Number of bboxes smaller than 1024:',np.sum(np.array(bbox_widths)<1024))
print('Number of bboxes larger than 1024:',np.sum(np.array(bbox_widths)>1024))

# anchors = [(32,32),(64,64),(128,128),(256,256),(512,512)]
scales, ratios = np.meshgrid(np.array([16,32,64,128,256,512]), np.array([0.5,1,2]))
scales = scales.flatten()
ratios = ratios.flatten()
# Enumerate heights and widths from scales and ratios
anchor_heights = scales / np.sqrt(ratios)
anchor_widths = scales * np.sqrt(ratios)

IoUs = []
for i in range(len(bbox_widths)):
    bbox_area = bbox_widths[i]*bbox_heights[i]
    IoU_max = 0.0
    for j in range(len(anchor_heights)):
        anchor_area = anchor_heights[j]*anchor_widths[j]
        intersection_area = min(anchor_widths[j],bbox_widths[i])*min(anchor_heights[j], bbox_heights[i])
        IoU = intersection_area / (bbox_area + anchor_area - intersection_area)
        if IoU>0.5:
            IoU_max = IoU
    IoUs.append(IoU_max)

print('Number of missing annotations', np.sum(np.array(IoUs)==0.0))

# Plotting bbox dims
d ={'BBox width (px)': bbox_widths, 'BBox height (px)': bbox_heights, 'IoU': IoUs}
df = pd.DataFrame(d)
cmap = sns.cubehelix_palette(dark=.1, light=.6, as_cmap=True)
ax = sns.scatterplot(x="BBox width (px)", y="BBox height (px)", hue = 'IoU',data=df)
plt.title('Bounding-boxes size', fontsize=15)

In [None]:
def bbox_stats(anns, calc = 'mean', verbose = 1):
    picsw = [pic['bbox'][2] for pic in anns]
    picsh = [pic['bbox'][3] for pic in anns]
    bbox_size = [w * h for w, h, in zip(picsw,picsh)]
    if calc == 'mean':
        return np.mean(bbox_size)
    if calc == 'median':
        return np.median(bbox_size)

def area_stats(anns, calc = 'mean', verbose = 1):
    picsw = [pic['area'] for pic in anns]
    picsh = [pic['area'] for pic in anns]
    area_size = [w * h for w, h, in zip(picsw,picsh)]
    if calc == 'mean':
        return np.mean(area_size)
    if calc == 'median':
        return np.median(area_size)

In [None]:
pylab.rcParams['figure.figsize'] = (12,6)

mean_bbox = []
median_bbox = []
for cat_nr, cat in enumerate(detectwaste_cat_names):
    temp_anns = [ann for ann in anns_detectwaste if(ann['category_id'] == cat_nr)]
    mean_bbox.append(bbox_stats(temp_anns,))
    median_bbox.append(bbox_stats(temp_anns,'median'))
mean_bbox[-1] =0
median_bbox[-1] =0

# append stats for all for comparison
temp_detectwaste_cat_names = detectwaste_cat_names.copy()
temp_detectwaste_cat_names.append('all')
mean_bbox.append(bbox_stats(anns_detectwaste))
median_bbox.append(bbox_stats(anns_detectwaste,'median'))

colors = []
colors = colors_recykling.copy()
colors.append('red')
plt.bar(temp_detectwaste_cat_names,mean_bbox, color=colors)
plt.title('Mean size of bbox')
plt.show()

plt.bar(temp_detectwaste_cat_names,median_bbox, color=colors)
plt.title('Median size of bbox')
plt.show()

In [None]:
mean_area = []
median_area = []
for cat_nr, cat in enumerate(detectwaste_cat_names):
    temp_anns = [ann for ann in anns_detectwaste if(ann['category_id'] == cat_nr)]
    mean_area.append(area_stats(temp_anns,))
    median_area.append(area_stats(temp_anns,'median'))

mean_area[-1] = 0
median_area[-1] = 0
mean_area.append(area_stats(anns_detectwaste))
median_area.append(area_stats(anns_detectwaste,'median'))
print(temp_detectwaste_cat_names)

plt.bar(temp_detectwaste_cat_names,mean_area, color=colors)
plt.title('Mean size of area')
plt.show()

plt.bar(temp_detectwaste_cat_names,median_area, color=colors)
plt.title('Median size of area')
plt.show()

# Takeaways and Insights

- The images are very large
  - we should downsize the images prior to model training
- The aspect ratio of most images show that they are probably taken on phone cameras in a vertical perspective. This could potential bias our inference if the edge devices are different.
- The dataset has a few overrepresented classes (cigarettes, plastic film, unlabeled litter) and many categories with below 20 annotations.
  - we will have to try out some data augmentation methods or weighted loss functions to make up for the imbalance,
  - we have to take the imbalance into account when preparing dataset splits,
- The data contains a lot of small objects:
  - we will have to try architectures that are better with dealing with small objects,
- Many images fall into the “unknown” category:
  - we may try to cut images within bounding boxes and train a classifier. Perhaps then we will be able to classify those unknown objects to get their approximate categories.

After above analysis we are ready start to prepare the data for training :)

Can you think about more examinations which may be done here?

# Additional excercise - spliting and cropping images

In [None]:
# Function for cropping images

def crop(annotation_obj, fname, category_name, square, zoom, src_img, dst_img,  i):
    # read information from 'annotations'
    annotation_id = str(i) + str(annotation_obj['id'])
    file_name = os.path.join(src_img, fname)

    # prepare for cropping - USING THE BBOX's
    # WIDTH AND HEIGHT HERE
    x, y, width, height = annotation_obj['bbox']
    img = cv2.imread(file_name)
    if square:
        if width > height:
            x = x - (width-height)/2
            height = width
        else:
            y = y - (-width+height)/2
            width = height
    width *= zoom
    height *= zoom
    crop_img = img[int(abs(y)): int(abs(y) + abs(height)),
                   int(abs(x)): int(abs(x) + abs(width))]
    try:
        os.makedirs(os.path.dirname(
            os.path.join(dst_img, category_name,
                         annotation_id + '.jpg')), exist_ok=True)
        cv2.imwrite(os.path.join(dst_img, category_name,
                                 annotation_id + '.jpg'),
                    crop_img)
    except BaseException:
        print(f"ERROR: {file_name}")

In [None]:
# path to source directory with images
src_img = TACO_DATA_FOLDER
# path to destination directory for images
dst_img = HOME_FOLDER + 'images_square/'
# cut images into square shape
square = True
# zoom out or in bounding box: useful for classification when used witg
# detection algorithm that select not bbox coordinates
# however can lower the scores if images are
# crowded with many objects
zoom = 1

if not os.path.exists(dst_img):
  os.mkdir(dst_img)


waste_list = dataset['categories']

print(waste_list)
mapping_category = {}
for item in waste_list:
    mapping_category[item['id']] = item['supercategory']
    if not os.path.exists(os.path.join(dst_img, item['supercategory'])):
        os.mkdir(os.path.join(dst_img, item['supercategory']))

# build a dictionary mapping the image id to the file name
images = {}
for img_obj in dataset['images']:
    file_name = img_obj['file_name']
    id = img_obj['id']
    images[id] = file_name


i = 0
for annotation_obj in tqdm(dataset['annotations']):
    category_name = mapping_category[annotation_obj['category_id']]
    image_id = int(annotation_obj['image_id'])
    crop(annotation_obj, images[image_id], category_name, square, zoom,
         src_img, dst_img, i)
    i += 1

#dataset['annotations']