# Deep Learning Homework: Waste classification
Authors: Gergály Anna, Mészáros Péter

## Downloading the datasets

Insert your Kaggle API keys, to download the datasets with the Kaggle API.
The first block sets the environment variables for the Kaggle API to work. More info about creating a Kaggle API Token can be found here: https://www.kaggle.com/docs/api.
The second block downloads the datasets from Kaggle.
The third block downloads a third dataset from github as a zip file, and then extracts it.

In [None]:
import os
os.environ['KAGGLE_USERNAME'] = '' #insert your api token data here
os.environ['KAGGLE_KEY'] = ''

In [None]:
from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api.authenticate()
api.dataset_download_files('asdasdasasdas/garbage-classification', path="./garbage1", quiet=False, unzip=True)
api.dataset_download_files('mostafaabla/garbage-classification', path="./garbage2", quiet=False, unzip=True)

In [None]:
import wget
wget.download("https://github.com/nikhilvenkatkumsetty/TrashBox/archive/refs/heads/main.zip", out="./garbage3.zip")

In [None]:
import zipfile
with zipfile.ZipFile("garbage3.zip", mode='r') as z:
    z.extractall("./garbage3")

In [None]:
dataset1_directory='garbage1/Garbage classification/Garbage classification/'
dataset2_directory='garbage2/garbage_classification/'
dataset3_directory='garbage3/TrashBox-main/TrashBox_train_dataset_subfolders'
dataset_directories = [dataset1_directory, dataset2_directory, dataset3_directory]

Removing the last unnecessary classes from the second dataset, which can't be found in the first dataset.
Merging white-glass, brown-glass, green-glass classes into one class, named glass.

In [None]:
import shutil
import os
removable_classes=['battery', 'clothes', 'biological', 'shoes']
for label in removable_classes:
    shutil.rmtree(os.path.join(dataset2_directory,label))
os.rename('garbage2/garbage_classification/brown-glass', 'garbage2/garbage_classification/glass')
glasses = [os.path.join(dataset2_directory,'white-glass'), os.path.join(dataset2_directory,'green-glass')]
for glass_directory_name in glasses:
    for filename in os.listdir(glass_directory_name):
        shutil.move(os.path.join(glass_directory_name, filename), os.path.join('garbage2/garbage_classification/glass', filename))
    os.rmdir(glass_directory_name)

Adding the extra classes to the first and second datasets (e-waste and medical) and adding trash to the third dataset.

In [None]:
os.mkdir(os.path.join(dataset1_directory, 'e-waste'))
os.mkdir(os.path.join(dataset1_directory, 'medical'))
os.mkdir(os.path.join(dataset2_directory, 'e-waste'))
os.mkdir(os.path.join(dataset2_directory, 'medical'))
os.mkdir(os.path.join(dataset3_directory, 'trash'))

## Reading the datasets.
Importing libraries and setting hyperparameter variables.
The datasets are split for training and validation in a 4:1 ratio.

In [None]:
import tensorflow.keras as keras
from tensorflow.keras.preprocessing import image as image_utils
import tensorflow as tf
import numpy as np

class_names=['glass', 'paper', 'cardboard', 'trash', 'metal', 'plastic', 'e-waste', 'medical']
image_size=(256, 256)
validation_split=0.2
seed=111
batch_size=32

In [None]:
train_ = []
val_ = []
for i in range(3):
    train_.append(keras.utils.image_dataset_from_directory(
    dataset_directories[i],
    labels='inferred',
    label_mode='categorical',
    batch_size=batch_size,
    image_size=image_size,
    validation_split=validation_split,
    seed=seed,
    subset='training'
    ))
    val_.append(keras.utils.image_dataset_from_directory(
    dataset_directories[i],
    labels='inferred',
    label_mode='categorical',
    batch_size=batch_size,
    image_size=image_size,
    validation_split=validation_split,
    seed=seed,
    subset='validation'
    ))

Normalizing the images, and concatenating the datasets.

In [None]:
normalization_layer = tf.keras.layers.Rescaling(1./255)
for i in range(3):
    train_[i] = train_[i].map(lambda x, y: (normalization_layer(x), y))
    val_[i] = val_[i].map(lambda x, y: (normalization_layer(x), y))

In [None]:
train = train_[0].concatenate(train_[1]).concatenate(train_[2]) #training X and Y
val = val_[0].concatenate(val_[1]).concatenate(val_[2]) #validation X and Y