# Waste_Image_Classification

Objective : The objective of this project is to develop a deep learning model, specifically a Convolutional Neural Network (CNN), to classify waste images into different categories of materials, using accuracy as the evaluation metric. This aims to assist in automating waste sorting processes, enhancing recycling efficiency, and promoting environmental sustainability. Manual waste sorting is inefficient, leading to low recycling rates and increased environmental harm. Our goal is to develop a deep learning-based waste classification system using a Convolutional Neural Network (CNN) that can accurately classify at least 70% (accuracy) of waste images across 9 material categories within a 12-week timeframe. The project aims to improve waste management efficiency and environmental sustainability

## Problem Identification
In this project, we will define the problem statement and its goals using the SMART framework, then we will analyze it further by breaking down the problem into questions.

Topic : Real Waste Classification

### Background
In modern society, waste management is a significant environmental and operational challenge. Proper waste classification can improve recycling rates and reduce environmental impact, but manual sorting is time-consuming and prone to errors. Leveraging deep learning models to classify waste materials can greatly assist in automating this process, reducing both costs and the need for human labor while improving the accuracy of waste sorting.

### Problem statement
"Manual waste sorting is inefficient, leading to low recycling rates and increased environmental harm. Our goal is to develop a deep learning-based waste classification system using a Convolutional Neural Network (CNN) that can accurately classify at least 70% of waste images across 9 material categories within a 12-week timeframe. The project aims to improve waste management efficiency and environmental sustainability."

### Breaking Down The Problem
Main problem: Developing a CNN-based deep learning model capable of accurately classifying waste images into distinct material types (e.g., plastic, metal, glass, etc.).

How can the dataset be pre-processed to ensure high model accuracy?
What is the most effective CNN architecture/model for this classification task?
How can the model be evaluated and optimized for real-world applications in waste management?
Dataset Description
The dataset used in this analysis is the RealWaste dataset, obtained from the UC Irvine Machine Learning Repository. This dataset contains an image classification dataset of waste items across 9 major material types, collected within an authentic (real data) landfill environment. RealWaste was created as apart of an honors thesis researching how convolution neural networks could perform on authentic waste material when trained on objects in pure and unadulterated forms, when compared to training via real waste items. Color images of waste items captured at the point of reception in a landfill environment. Images are released in 524x524 resolution in line with accompanying research paper.

# Import libraries

#### Import libraries for data loading
- import os
- import random

#### Import libraries for data operations
- import numpy as np
- import pandas as pd
- import math

#### Import libraries for data visualization
- import matplotlib.pyplot as plt
- import seaborn as sns
- import cv2
- from PIL import Image
- import textwrap


#### Import libraries for feature engineering
- from sklearn.model_selection import train_test_split
- import tensorflow as tf
- from tensorflow.keras.preprocessing.image import ImageDataGenerator, load_img, img_to_array

#### Import libraries for model creation
- from keras.models import Sequential
- from keras.layers import Conv2D, BatchNormalization, ReLU, MaxPooling2D, GlobalAveragePooling2D, Dense, Dropout

#### Import libraries for model training
- from sklearn.utils import class_weight
- from tensorflow.keras.optimizers import Adam
- from tensorflow.keras.callbacks import EarlyStopping

#### Import libraries for model evaluation
- from sklearn.metrics import confusion_matrix, classification_report

#### Import libraries for pre-trained model
- from tensorflow.keras.applications import InceptionV3

#### Import libraries for warnings 
- import warnings
- warnings.filterwarnings('ignore')

In [11]:
# Import libraries for data loading
import os
import random

In [12]:
# Import libraries for data operations
import numpy as np
import pandas as pd
import math

In [13]:
# Import libraries for data visualization
import matplotlib.pyplot as plt
import seaborn as sns
import cv2
from PIL import Image
import textwrap

In [14]:
# Import libraries for feature engineering
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator, load_img, img_to_array

In [15]:
# Import libraries for model creation
from keras.models import Sequential
from keras.layers import Conv2D, BatchNormalization, ReLU, MaxPooling2D, GlobalAveragePooling2D, Dense, Dropout

In [16]:
# Import libraries for model training
from sklearn.utils import class_weight
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping

In [17]:
# Import libraries for model evaluation
from sklearn.metrics import confusion_matrix, classification_report

In [18]:
# Import libraries for pre-trained model
from tensorflow.keras.applications import InceptionV3

In [19]:
# Import libraries for warnings 
import warnings
warnings.filterwarnings('ignore')

To ensure that TensorFlow is properly utilizing the GPU for our deep learning tasks, we will perform some additional steps. These steps include checking if TensorFlow is built with CUDA support, listing available GPUs, configuring TensorFlow to use GPU memory growth, and verifying that TensorFlow is indeed using the GPU. By implementing these checks, we can optimize our model's performance and take full advantage of the GPU's computational power. This will be particularly beneficial for training our complex neural network models, potentially reducing training time and improving overall efficiency. This needs to be done because in this project, we will run it locally

In [21]:
# Check if TensorFlow is using GPU
print("Is TensorFlow using GPU:", tf.test.is_built_with_cuda())

# List available GPUs
print("Available GPUs:", tf.config.list_physical_devices('GPU'))

# Set TensorFlow to use GPU memory growth
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
    except RuntimeError as e:
        print(e)

Is TensorFlow using GPU: False
Available GPUs: []


# Data Loading

To begin our analysis, we'll start by loading our dataset and creating a structured dataframe. This approach will facilitate our subsequent analytical processes. Our image data, originally sourced from UCI Machine Learning Repository, has been downloaded and then uploaded to a local directory. As we're running this notebook locally, we'll need to set up the appropriate file paths to access our data.

In [31]:
# Define the path to the dataset
dataset_path = r"C:\Users\LENOVO\OneDrive\Desktop\wet_data\TrashType_Image_Dataset"

# Define classes
classes = ["Cardboard", "Glass", "Metal", "Paper", "Plastic", "trash"]

In [37]:
# Define the path to the dataset
#dataset_path = r"C:\Users\LENOVO\OneDrive\Desktop\wet_data\TrashType_Image_Dataset"

# List the subdirectories in the dataset directory
classes = os.listdir(dataset_path)

# Print the number of classes and their names
print(f"Number of classes: {len(classes)}")
print("Class names:", classes)

Number of classes: 6
Class names: ['cardboard', 'glass', 'metal', 'paper', 'plastic', 'trash']


The next step is to define the paths to each folder and specify the classes for the images. This setup is essential for organizing the data and ensuring efficient access during model training and evaluation. By creating a structured representation of our dataset, we can easily iterate through the images, associate them with their respective classes, and prepare them for input into our machine learning model. This organization will facilitate smooth data loading, preprocessing, and ultimately contribute to the effectiveness of our image classification task.

In [38]:
# Define the paths for each class
class_paths = {cls: os.path.join(dataset_path, cls) for cls in classes}

# Function to list files in a directory
def list_files(directory_path):
    return os.listdir(directory_path)

# List files in each directory
print("Paths to dataset folder:")
print("Dataset Path:", dataset_path)
print("\nFiles in each folder:")
for cls, path in class_paths.items():
    files = list_files(path)
    print(f"\n{cls}:")
    for file in files[:5]:  # Display only the first 5 files
        print(f"  - {file}")
    if len(files) > 5:
        print(f"  ... and {len(files) - 5} more files")

Paths to dataset folder:
Dataset Path: C:\Users\LENOVO\OneDrive\Desktop\wet_data\TrashType_Image_Dataset

Files in each folder:

cardboard:
  - cardboard_001.jpg
  - cardboard_002.jpg
  - cardboard_003.jpg
  - cardboard_004.jpg
  - cardboard_005.jpg
  ... and 398 more files

glass:
  - glass_001.jpg
  - glass_002.jpg
  - glass_003.jpg
  - glass_004.jpg
  - glass_005.jpg
  ... and 496 more files

metal:
  - metal_001.jpg
  - metal_002.jpg
  - metal_003.jpg
  - metal_004.jpg
  - metal_005.jpg
  ... and 405 more files

paper:
  - paper_001.jpg
  - paper_002.jpg
  - paper_003.jpg
  - paper_004.jpg
  - paper_005.jpg
  ... and 589 more files

plastic:
  - plastic_001.jpg
  - plastic_002.jpg
  - plastic_003.jpg
  - plastic_004.jpg
  - plastic_005.jpg
  ... and 477 more files

trash:
  - trash_001.jpg
  - trash_002.jpg
  - trash_003.jpg
  - trash_004.jpg
  - trash_005.jpg
  ... and 132 more files


The next step is to count the total number of images in each folder to determine their number.

In [41]:
# Function to count files in a folder
def count_files(folder_path):
    file_list = os.listdir(folder_path)
    return len(file_list)

# Create a dictionary with class names and counts
classes_dict = {cls: count_files(class_paths[cls]) for cls in classes}

# Create a DataFrame from the dictionary
classes_df = pd.DataFrame.from_dict(classes_dict, orient='index', columns=['Number of Images'])

# Sort the DataFrame by 'Number of Images' in descending order
classes_df_sorted = classes_df.sort_values('Number of Images', ascending=False)

# Display the sorted DataFrame
classes_df_sorted

Unnamed: 0,Number of Images
paper,594
glass,501
plastic,482
metal,410
cardboard,403
trash,137
