# **Waste Material Segregation for Improving Waste Management**

## **Objective**

The objective of this project is to implement an effective waste material segregation system using convolutional neural networks (CNNs) that categorises waste into distinct groups. This process enhances recycling efficiency, minimises environmental pollution, and promotes sustainable waste management practices.

The key goals are:

* Accurately classify waste materials into categories like cardboard, glass, paper, and plastic.
* Improve waste segregation efficiency to support recycling and reduce landfill waste.
* Understand the properties of different waste materials to optimise sorting methods for sustainability.

## **Data Understanding**

The Dataset consists of images of some common waste materials.

1. Food Waste
2. Metal
3. Paper
4. Plastic
5. Other
6. Cardboard
7. Glass


**Data Description**

* The dataset consists of multiple folders, each representing a specific class, such as `Cardboard`, `Food_Waste`, and `Metal`.
* Within each folder, there are images of objects that belong to that category.
* However, these items are not further subcategorised. <br> For instance, the `Food_Waste` folder may contain images of items like coffee grounds, teabags, and fruit peels, without explicitly stating that they are actually coffee grounds or teabags.

## **1. Load the data**

Load and unzip the dataset zip file.

**Import Necessary Libraries**

In [None]:
# Recommended versions:

# numpy version: 1.26.4
# pandas version: 2.2.2
# seaborn version: 0.13.2
# matplotlib version: 3.10.0
# PIL version: 11.1.0
# tensorflow version: 2.18.0
# keras version: 3.8.0
# sklearn version: 1.6.1

In [1]:
# Import essential libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from PIL import Image
import tensorflow as tf
from tensorflow import keras
from sklearn.model_selection import train_test_split
import os
from tensorflow.keras import datasets, layers, models

Load the dataset.

In [6]:
# Load and unzip the dataset
local_zip_path = '/content/data.zip'
!unzip {local_zip_path} -d /content/


Archive:  /content/data.zip
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of /content/data.zip or
        /content/data.zip.zip, and cannot find /content/data.zip.ZIP, period.


## **2. Data Preparation** <font color=red> [25 marks] </font><br>


### **2.1 Load and Preprocess Images** <font color=red> [8 marks] </font><br>

Let us create a function to load the images first. We can then directly use this function while loading images of the different categories to load and crop them in a single step.

#### **2.1.1** <font color=red> [3 marks] </font><br>
Create a function to load the images.

In [4]:
# Create a function to load the raw images
def load_images(folder_path):
    images = []
    labels = []
    # Iterate through each category (subdirectory) in the given folder path
    for category in os.listdir(folder_path):
        category_path = os.path.join(folder_path, category)
        # Check if the path is a directory
        if os.path.isdir(category_path):
            # Iterate through each image file in the category directory
            for image_name in os.listdir(category_path):
                image_path = os.path.join(category_path, image_name)
                # Check if the path is a file
                if os.path.isfile(image_path):
                    try:
                        # Open and append the image to the images list
                        img = Image.open(image_path)
                        images.append(np.array(img)) # Convert PIL image to numpy array
                        # Append the category name to the labels list
                        labels.append(category)
                    except Exception as e:
                        print(f"Error loading image {image_path}: {e}") # Print error if image fails to load

    # Return the lists of images and labels
    return images, labels

#### **2.1.2** <font color=red> [5 marks] </font><br>
Load images and labels.

Load the images from the dataset directory. Labels of images are present in the subdirectories.

Verify if the images and labels are loaded correctly.

In [5]:
# Get the images and their labels
images, labels = load_images('/content/data')
print("Number of images loaded:", len(images))
print("Number of labels loaded:", len(labels))
print("Sample labels:", labels[:5])
print("Sample image shape:", images[0].shape)

FileNotFoundError: [Errno 2] No such file or directory: '/content/data'

Perform any operations, if needed, on the images and labels to get them into the desired format.

### **2.2 Data Visualisation** <font color=red> [9 marks] </font><br>

#### **2.2.1** <font color=red> [3 marks] </font><br>
Create a bar plot to display the class distribution

In [None]:
# Visualise Data Distribution
plt.figure(figsize=(10, 6))
sns.countplot(x=labels)
plt.title('Class Distribution')
plt.xlabel('Class')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.show()

#### **2.2.2** <font color=red> [3 marks] </font><br>
Visualise some sample images

In [None]:
# Visualise Sample Images (across different labels)
plt.figure(figsize=(15, 10))
for i in range(9):
    plt.subplot(3, 3, i + 1)
    plt.imshow(images[i])
    plt.title(labels[i])
    plt.axis('off')
plt.show()


#### **2.2.3** <font color=red> [3 marks] </font><br>
Based on the smallest and largest image dimensions, resize the images.

In [None]:
# Find the smallest and largest image dimensions from the data set
min_width = min(img.shape[1] for img in images)
min_height = min(img.shape[0] for img in images)
max_width = max(img.shape[1] for img in images)
max_height = max(img.shape[0] for img in images)

print(f"Minimum width: {min_width}, Minimum height: {min_height}")
print(f"Maximum width: {max_width}, Maximum height: {max_height}")



In [None]:
# Resize the image dimensions
target_width = 224
target_height = 224

resized_images = []
for img in images:
    resized_img = Image.fromarray(img).resize((target_width, target_height))
    resized_images.append(np.array(resized_img))

images = resized_images
print("Sample resized image shape:", images[0].shape)

### **2.3 Encoding the classes** <font color=red> [3 marks] </font><br>

There are seven classes present in the data.

We have extracted the images and their labels, and visualised their distribution. Now, we need to perform encoding on the labels. Encode the labels suitably.

####**2.3.1** <font color=red> [3 marks] </font><br>
Encode the target class labels.

In [None]:
# Encode the labels suitably
label_to_index = {label: idx for idx, label in enumerate(set(labels))}
encoded_labels = [label_to_index[label] for label in labels]
print("Encoded labels:", encoded_labels[:5])


### **2.4 Data Splitting** <font color=red> [5 marks] </font><br>

#### **2.4.1** <font color=red> [5 marks] </font><br>
Split the dataset into training and validation sets

In [None]:
# Assign specified parts of the dataset to train and validation sets
X_train, X_val, y_train, y_val = train_test_split(images, encoded_labels, test_size=0.2, random_state=42)
print("Training set size:", len(X_train))
print("Validation set size:", len(X_val))


## **3. Model Building and Evaluation** <font color=red> [20 marks] </font><br>

### **3.1 Model building and training** <font color=red> [15 marks] </font><br>

#### **3.1.1** <font color=red> [10 marks] </font><br>
Build and compile the model. Use 3 convolutional layers. Add suitable normalisation, dropout, and fully connected layers to the model.

Test out different configurations and report the results in conclusions.

In [None]:
# Build and compile the model
model = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(target_height, target_width, 3)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(6, 4, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(128, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(len(label_to_index), activation='softmax')
])

#### **3.1.2** <font color=red> [5 marks] </font><br>
Train the model.

Use appropriate metrics and callbacks as needed.

In [None]:
# Training
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history = model.fit(np.array(X_train), np.array(y_train), epochs=10, validation_data=(np.array(X_val), np.array(y_val)))


### **3.2 Model Testing and Evaluation** <font color=red> [5 marks] </font><br>

#### **3.2.1** <font color=red> [5 marks] </font><br>
Evaluate the model on test dataset. Derive appropriate metrics.

In [None]:
# Evaluate on the test set; display suitable metrics
test_loss, test_accuracy = model.evaluate(np.array(X_val), np.array(y_val))
print(f"Test Loss: {test_loss}, Test Accuracy: {test_accuracy}")


## **4. Data Augmentation** <font color=red> [optional] </font><br>

#### **4.1 Create a Data Augmentation Pipeline**

##### **4.1.1**
Define augmentation steps for the datasets.

In [None]:
# Define augmentation steps to augment images
data_augmentation = keras.Sequential([
    layers.RandomFlip("horizontal_and_vertical"),
    layers.RandomRotation(0.2),
    layers.RandomZoom(0.2),
])

Augment and resample the images.
In case of class imbalance, you can also perform adequate undersampling on the majority class and augment those images to ensure consistency in the input datasets for both classes.

Augment the images.

In [None]:
# Create a function to augment the images
def augment_images(images):
    augmented_images = []
    for img in images:
        augmented_images.append(data_augmentation(img))
    return augmented_images

In [None]:
# Create the augmented training dataset
augmented_X_train = augment_images(X_train)


##### **4.1.2**

Train the model on the new augmented dataset.

In [None]:
# Train the model using augmented images
history_augmented = model.fit(np.array(augmented_X_train), np.array(y_train), epochs=10, validation_data=(np.array(X_val), np.array(y_val)))


## **5. Conclusions** <font color = red> [5 marks]</font>

#### **5.1 Conclude with outcomes and insights gained** <font color =red> [5 marks] </font>

**## Report your findings about the data**


*   A Convolutional Neural Network (CNN) was implemented successfully to classify waste images into different categories.

*   The dataset of waste Mangement segrigation was split into traning, validation and test sets and applied the preprocessing techniques such as image resizing and normalization.

*   Data augmentation helps to improve the model accuracy and reduce the loss and also generalize the model techniques on the training images.

*  The CNN architecture effectively learned the distinguishing
features of recyclable and non-recyclable waste images.

*  The training process highlighted the importance of tuning parameters like the number of epochs, batch size,Accuracy rate and learning rate for optimal performance.




## Report model training results

*   The dataset comtains the images are categorized into different classes od waste.
*   The model used multiple convolutional layers with ReLU activations, max-pooling layers, and dense layers followed by a softmax output
*   Training loss decreased steadily, indicating good learning. A slight gap between training and validation accuracy.
*   The final model achieved satisfactory performance on unseen data, showing its ability to generalize and Implement techniques like early stopping, dropout tuning, and hyperparameter optimization.
