# **Waste Material Segregation for Improving Waste Management**

## **Objective**

The objective of this project is to implement an effective waste material segregation system using convolutional neural networks (CNNs) that categorises waste into distinct groups. This process enhances recycling efficiency, minimises environmental pollution, and promotes sustainable waste management practices.

The key goals are:

* Accurately classify waste materials into categories like cardboard, glass, paper, and plastic.
* Improve waste segregation efficiency to support recycling and reduce landfill waste.
* Understand the properties of different waste materials to optimise sorting methods for sustainability.

## **Data Understanding**

The Dataset consists of images of some common waste materials.

1. Food Waste
2. Metal
3. Paper
4. Plastic
5. Other
6. Cardboard
7. Glass


**Data Description**

* The dataset consists of multiple folders, each representing a specific class, such as `Cardboard`, `Food_Waste`, and `Metal`.
* Within each folder, there are images of objects that belong to that category.
* However, these items are not further subcategorised. <br> For instance, the `Food_Waste` folder may contain images of items like coffee grounds, teabags, and fruit peels, without explicitly stating that they are actually coffee grounds or teabags.

## **1. Load the data**

Load and unzip the dataset zip file.

**Import Necessary Libraries**

In [None]:
# Recommended versions:

# numpy version: 1.26.4
# pandas version: 2.2.2
# seaborn version: 0.13.2
# matplotlib version: 3.10.0
# PIL version: 11.1.0
# tensorflow version: 2.18.0
# keras version: 3.8.0
# sklearn version: 1.6.1

In [1]:
# Import essential libraries
import numpy as np # Numerical Analysis
import tensorflow as tf # for NN Architecture
from tensorflow.keras.datasets import mnist # To load data
from tensorflow.keras.models import Sequential # Model - NN Architecture
from tensorflow.keras.layers import Dense, Flatten # Layers - NN Architecture
from tensorflow.keras.utils import to_categorical # Required only for multi-class classification problem
import matplotlib.pyplot as plt # For Visualisation


In [19]:
import os
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras.optimizers import Adam

Load the dataset.

In [None]:
import pandas as pd

In [15]:
import os

In [9]:
data_dir

'waste_data'

In [29]:
os.getcwd()

'C:\\Users\\hp\\course4\\Assignment'

In [31]:
os.listdir()

['.ipynb_checkpoints', 'CNN_Assg_Waste_Segregation_Starter.ipynb', 'data.zip']

In [35]:
!unzip course4/Assignment/data.zip -d waste_data

'unzip' is not recognized as an internal or external command,
operable program or batch file.


In [43]:
print(os.listdir("waste_data"))

['data']


In [39]:
import zipfile

with zipfile.ZipFile("data.zip", 'r') as zip_ref:
    zip_ref.extractall("waste_data")

In [40]:
import os
os.listdir('waste_data')

['data']

In [45]:
csv_files = [f for f in os.listdir('waste_data') if f.endswith('.csv')]
print(csv_files)

[]


In [49]:
if csv_files:
    df = pd.read_csv(os.path.join('waste_data', csv_files[0]))
    print(df.head())

In [None]:
plt.imshow(df[0], cmap='gray')
plt.title(f"Label: {df[0]}")
plt.show()


## **2. Data Preparation** <font color=red> [25 marks] </font><br>


### **2.1 Load and Preprocess Images** <font color=red> [8 marks] </font><br>

Let us create a function to load the images first. We can then directly use this function while loading images of the different categories to load and crop them in a single step.

#### **2.1.1** <font color=red> [3 marks] </font><br>
Create a function to load the images.

In [51]:
# Create a function to load the raw images

# Parameters
img_height = 150
img_width = 150
batch_size = 32
epochs = 10

In [53]:
datagen = ImageDataGenerator(
    rescale=1./255,
    validation_split=0.2
)

In [55]:
val_gen = datagen.flow_from_directory(
    data_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='binary',
    subset='validation'
)

Found 1525 images belonging to 1 classes.


In [65]:
train_gen = datagen.flow_from_directory(
    data_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='binary',
    subset='training'
)


Found 6100 images belonging to 1 classes.


In [57]:
val_gen

<keras.src.legacy.preprocessing.image.DirectoryIterator at 0x1f1ba373fe0>

#### **2.1.2** <font color=red> [5 marks] </font><br>
Load images and labels.

Load the images from the dataset directory. Labels of images are present in the subdirectories.

Verify if the images and labels are loaded correctly.

In [None]:
# Get the images and their labels

# Get the images and their labels
data_dir = os.path.join(extracted_path, 'Waste_Segregation_Dataset') # Adjust if structure is different

images = []
labels = []
image_dimensions = []

print("Loading images and labels...")
for category in os.listdir(data_dir):
    category_path = os.path.join(data_dir, category)
    if os.path.isdir(category_path):
        for image_name in os.listdir(category_path):
            image_path = os.path.join(category_path, image_name)
            img = load_and_crop_image(image_path, target_size=None) # Load without initial resize to get original dims
            if img is not None:
                images.append(img)
                labels.append(category)
                image_dimensions.append(img.shape[:2]) # Store (height, width)

print(f"Loaded {len(images)} images with {len(labels)} labels.")

# Verify if the images and labels are loaded correctly.
if len(images) > 0:
    print(f"First image shape: {images[0].shape}, First label: {labels[0]}")
else:
    print("No images loaded. Check data directory and paths.")

# Perform any operations, if needed, on the images and labels to get them into the desired format.
# At this stage, images are loaded as numpy arrays. Labels are strings.

Perform any operations, if needed, on the images and labels to get them into the desired format.

### **2.2 Data Visualisation** <font color=red> [9 marks] </font><br>

#### **2.2.1** <font color=red> [3 marks] </font><br>
Create a bar plot to display the class distribution

In [None]:
# Visualise Data Distribution
plt.figure(figsize=(10, 6))
sns.countplot(y=labels, order=pd.Series(labels).value_counts().index)
plt.title('Distribution of Waste Material Categories')
plt.xlabel('Number of Images')
plt.ylabel('Waste Category')
plt.show()

In [None]:
# Visualise Data Distribution

# Visualise Sample Images (across different labels)
unique_labels = list(set(labels))
plt.figure(figsize=(15, 10))

for i, label in enumerate(unique_labels):
    # Get indices of images for the current label
    indices = [j for j, l in enumerate(labels) if l == label]
    if indices:
        # Pick a random image from this category
        sample_image_index = random.choice(indices)
        plt.subplot(3, 3, i + 1) # Adjust subplot grid based on number of unique_labels
        plt.imshow(images[sample_image_index])
        plt.title(f"Category: {label}")
        plt.axis('off')

plt.tight_layout()
plt.show()

#### **2.2.2** <font color=red> [3 marks] </font><br>
Visualise some sample images

In [None]:
# Visualise Sample Images (across different labels)



#### **2.2.3** <font color=red> [3 marks] </font><br>
Based on the smallest and largest image dimensions, resize the images.

In [None]:
# Find the smallest and largest image dimensions from the data set



In [None]:
# Resize the image dimensions



### **2.3 Encoding the classes** <font color=red> [3 marks] </font><br>

There are seven classes present in the data.

We have extracted the images and their labels, and visualised their distribution. Now, we need to perform encoding on the labels. Encode the labels suitably.

####**2.3.1** <font color=red> [3 marks] </font><br>
Encode the target class labels.

In [None]:
# Encode the labels suitably



### **2.4 Data Splitting** <font color=red> [5 marks] </font><br>

In [None]:
# Assign specified parts of the dataset to train and validation sets
# Normalize image data
images = images.astype('float32') / 255.0

X_train, X_val, y_train, y_val = train_test_split(images, one_hot_labels, test_size=0.2, random_state=42, stratify=one_hot_labels)

print(f"X_train shape: {X_train.shape}")
print(f"X_val shape: {X_val.shape}")
print(f"y_train shape: {y_train.shape}")
print(f"y_val shape: {y_val.shape}")

#### **2.4.1** <font color=red> [5 marks] </font><br>
Split the dataset into training and validation sets

In [None]:
# Assign specified parts of the dataset to train and validation sets



## **3. Model Building and Evaluation** <font color=red> [20 marks] </font><br>

### **3.1 Model building and training** <font color=red> [15 marks] </font><br>

#### **3.1.1** <font color=red> [10 marks] </font><br>
Build and compile the model. Use 3 convolutional layers. Add suitable normalisation, dropout, and fully connected layers to the model.

Test out different configurations and report the results in conclusions.

In [59]:
# Build and compile the model
model = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(img_height, img_width, 3)),
    MaxPooling2D(pool_size=(2, 2)),
    Conv2D(64, (3, 3), activation='relu'),
    MaxPooling2D(pool_size=(2, 2)),
    Flatten(),
    Dense(128, activation='relu'),
    Dropout(0.5),
    Dense(1, activation='sigmoid')
])



  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


#### **3.1.2** <font color=red> [5 marks] </font><br>
Train the model.

Use appropriate metrics and callbacks as needed.

In [61]:
# Training

model.compile(optimizer=Adam(), loss='binary_crossentropy', metrics=['accuracy'])

In [67]:
history = model.fit(
    train_gen,
    validation_data=val_gen,
    epochs=epochs
)

  self._warn_if_super_not_called()


Epoch 1/10
[1m191/191[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m279s[0m 1s/step - accuracy: 0.9743 - loss: 0.0235 - val_accuracy: 1.0000 - val_loss: 5.8582e-38
Epoch 2/10
[1m191/191[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m178s[0m 928ms/step - accuracy: 1.0000 - loss: 7.9601e-20 - val_accuracy: 1.0000 - val_loss: 5.8582e-38
Epoch 3/10
[1m191/191[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m162s[0m 850ms/step - accuracy: 1.0000 - loss: 2.4806e-21 - val_accuracy: 1.0000 - val_loss: 5.8582e-38
Epoch 4/10
[1m191/191[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m193s[0m 1s/step - accuracy: 1.0000 - loss: 2.2877e-16 - val_accuracy: 1.0000 - val_loss: 5.8582e-38
Epoch 5/10
[1m191/191[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m219s[0m 1s/step - accuracy: 1.0000 - loss: 7.1406e-17 - val_accuracy: 1.0000 - val_loss: 5.8582e-38
Epoch 6/10
[1m191/191[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m203s[0m 838ms/step - accuracy: 1.0000 - loss: 8.8975e-20 - val_accuracy:

In [None]:
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Val Accuracy')
plt.legend()
plt.title('Accuracy')
plt.show()

In [None]:
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Val Loss')
plt.legend()
plt.title('Loss')

plt.show()


### **3.2 Model Testing and Evaluation** <font color=red> [5 marks] </font><br>

#### **3.2.1** <font color=red> [5 marks] </font><br>
Evaluate the model on test dataset. Derive appropriate metrics.

In [None]:
# Evaluate on the test set; display suitable metrics
# Evaluate on the test set; display suitable metrics
# We will use the validation set as our "test set" for evaluation since the notebook only specifies train/validation split.
# In a real-world scenario, you would have a separate, unseen test set.

# Load the best model saved by ModelCheckpoint
# Note: For Keras 3.x, use .keras extension for saving models.
best_model = tf.keras.models.load_model('best_waste_classifier.keras')

loss, accuracy = best_model.evaluate(X_val, y_val, verbose=1)
print(f"\nValidation Loss: {loss:.4f}")
print(f"Validation Accuracy: {accuracy:.4f}")

# Predict on the validation set
y_pred_probs = best_model.predict(X_val)
y_pred = np.argmax(y_pred_probs, axis=1)
y_true = np.argmax(y_val, axis=1)

# Classification Report
print("\nClassification Report:")
print(classification_report(y_true, y_pred, target_names=label_encoder.classes_))

# Confusion Matrix
plt.figure(figsize=(10, 8))
cm = confusion_matrix(y_true, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=label_encoder.classes_, yticklabels=label_encoder.classes_)
plt.title('Confusion Matrix')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()


## **4. Data Augmentation** <font color=red> [optional] </font><br>

#### **4.1 Create a Data Augmentation Pipeline**

##### **4.1.1**
Define augmentation steps for the datasets.

In [None]:
# Define augmentation steps to augment images



Augment and resample the images.
In case of class imbalance, you can also perform adequate undersampling on the majority class and augment those images to ensure consistency in the input datasets for both classes.

Augment the images.

In [None]:
# Create a function to augment the images




In [None]:
# Create the augmented training dataset



##### **4.1.2**

Train the model on the new augmented dataset.

In [None]:
# Train the model using augmented images



## **5. Conclusions** <font color = red> [5 marks]</font>

#### **5.1 Conclude with outcomes and insights gained** <font color =red> [5 marks] </font>

* Report your findings about the data
* Report model training results

In [None]:
 Conclude with outcomes and insights gained
Report your findings about the data

Initial observations on class distribution (e.g., if any classes are heavily imbalanced).

Insights from sample image visualization (e.g., variations within categories, image quality).

Observations on image dimensions and the choice of TARGET_SIZE.

Report model training results

Without Augmentation:

Initial validation accuracy and loss.

How the accuracy and loss changed over epochs.

Whether early stopping was triggered and at what epoch.

Key metrics from the classification report (precision, recall, f1-score for each class, and overall accuracy).

Insights from the confusion matrix (e.g., which classes are frequently confused with each other).

With Augmentation (if implemented):

Comparison of training and validation accuracy/loss with and without augmentation.

Did augmentation help in improving generalization or reducing overfitting?

Any changes in the classification report or confusion matrix compared to the non-augmented model.

Overall best performing model and its metrics.

This structure provides a comprehensive set of code snippets for your waste segregation project. Remember to ensure that your Waste_Segregation_Dataset.zip file is in the correct location or adjust the zip_file_path variable accordingly.