Deep Learning Real Life Item Classification

This project is to solve an real life item classification problem using deep learning technology.

About The Project

Deep learning is widely used in real-life classification tasks across various industries. Its usage included performing product categorization, recommendation systems, visual search in online retail platforms, object detection, tracking and classification in security and surveillance system.

In this project we will construct a deep learning application to solve a non-recyclable waste gets mixed with recyclable waste problem. This real life problem is that some people find it difficult to distinguish recyclable and non-recyclable waste that result in some non-recyclable waste in the recycle bin. For example, soft plastic like plastic bags and polystyrene takeaway boxes are not recyclable but often appear in the recyclable bin of hard plastic.

As a result, we will implement an image classification application using deep learning technology. The application will utilize a neural network model to analyze images of waste items and classify them into the recyclable and non-recyclable categories.

Built With

Python 3.6
Tensorflow 2.5
Keras 2.12
Pandas
IPython
sklearn

(back to top)

Project Plan

1. Data Collection

Gather a dataset of images representing different types of recyclable items, such as plastic bottles, aluminum cans, paper products, glass containers

2. Data Preprocessing

Resize images to a uniform size, converting them to a standard format. Augment the dataset with techniques including rotation and flippingto increase the diversity of training samples.

3. Labelling

Annotate the images with corresponding labels indicating the type of recyclable item depicted in each image.

4. Model Development

Convolutional neural network (CNN) is chose for the deep learning architecture and will be developed with the use of pre-trained model VGG19.

5. Model Training

The annotated dataset will be splited into training, validation, and test sets. The deep learning model will be trained on the training set using a optimization algorithm.

6. Evaluation

The trained model's performance will be evaluated on the test set with its accuracy, callback and eye ball check result.

(back to top)

Getting Started

Setup your GPU running machine locally or on the cloud platform
Install IPython
Clone the git repository
Execute the ipython script file in the "code" folder

Prerequisites

Basic understanding of python
Basic knowledge of running python script in IPython environment
Basic knowledge of neural network structure

Data Source

2000 Internet downloaded image files where 1000 images are for training and 1000 images are for testing. Among training images, 500 images are recyclable waste and 500 images are non-recyclable waste. In the testing image set, recyclable and non-recyclable waste has about 500 images each.

The downloaded images are labelled into recyclable, non-recyclable waste and put into 2 separate folders.

(back to top)

IPython Script Work Flow Description

The script will execute below steps sequencially.

Import libraries
Unzip the dataset file
Apply a rescale parameter in ImageDataGenerator for a rescaling factor to the pixel values of the images

It is to rescale the pixel values to a specific range in order to normalize the data
rescale = 1.0/255, the pixel values will be divided by 255, resulting in normalized values between 0 and 1
validation_split = 0.2 meaning 80% of the images are for training and 20% are for validation

Define callback - monitor the accuracy of validation and save the model's weights during training
Model creation using VGG19 as base model
Model fitting
Check model accuracy
Check model loss
Launch TensorBoard for model detail review
Evaluate overall loss and accuracy for test data
Calculate confusion matrix and plot heatmap of the confusion matrix
Apply data augmentation to the train dataset
Create new model
Train new model with data augmentation
Compare new model and original model performance

16. Re-plot heatmap of the confusion matrix

(back to top)

Key Concepts

Convolutional neural network (CNN)

A convolutional neural network (CNN) is a type of deep learning algorithm that is specifically designed to analyze and process structured grid data, such as images and video. CNNs have proven to be highly effective for tasks such as image classification, object detection, facial recognition, and more, and have become the cornerstone of modern computer vision systems.

Key components of a typical CNN architecture included:

Convolutional Layers - apply a set of learnable filters to the input image to extract features such as edges, textures, and patterns.
Activation Function - the most commonly used activation function in CNNs is the Rectified Linear Unit (ReLU).
Pooling Layers - max pooling is a commonly used pooling technique, which selects the maximum value from each patch of the feature map.
Flattening - converts the spatial information in the feature maps into a format that can be processed by the dense layers.
Fully Connected Layers - dense layers.
Softmax Layer - enables the network to output probabilities indicating the likelihood of each class.

Transfer learning

It is a machine learning technique where a model trained on one task is reused or adapted as the starting point for a model on a second related task. Instead of starting the learning process from scratch, transfer learning leverages knowledge gained from solving one problem and applies it to a different, but related, problem domain.

VGG model

The VGG model is a convolutional neural network (CNN) architecture proposed by researchers at the Visual Geometry Group (VGG) at the University of Oxford. The key characteristic of the VGG model is its simplicity and uniformity in architecture. It consists of multiple convolutional layers followed by max-pooling layers, with smaller 3x3 convolutional filters used throughout the network. The authors experimented with different depths of the network, ranging from VGG-11 (11 layers) to VGG-19 (19 layers), demonstrating that increasing the depth of the network leads to better performance on image classification tasks.

Data augmentation

It is a technique used to artificially increase the diversity and size of a dataset by applying various transformations to the existing data samples. These transformations modify the original data samples in ways that preserve their semantic content but introduce variations in factors such as appearance, orientation, scale, brightness, and contrast. Data augmentation is commonly used in machine learning and deep learning to improve model generalization, reduce overfitting, and enhance the robustness of models to variations in the input data.

(back to top)

Design Explanation

Determine the quality of the model

The quality of the model is determined by below measurement:

The accuracy metric in the model fitting result of the validation dataset achieves 80% or higher
The correction rate of using the model to classify images in the test dataset achieves 80% or higher.

Model decision

Among the popular CNN models: AlexNet, Inception, ResNet and VGG, VGG is chose due to its simplicity and high performance. After testing I found VGG19 is performing better than VGG16 in this application. Therefore VGG19 is chosen as the final base model.

Data augmentation

After testing below augmentation are applied to 20% of the image files randomly:

Zoom 40%
Image rotation
Flip horizontal
Flip Vertical

After performing data augmentation, the evaluation accuracy on the test data increased 0.43% and the evaluate loss decreased 32.16%. Although the model prediction accuracy has increased, the percentage is not that significant. This may be due to the model before performing data augmentation already achieving a good evaluate accuracy (0.803), so the improvement is not large.

However for the evaluation accuracy the improvement is large. When we look at the line chart of model loss against epoch, the model training without data augmentation appears larger fluctuation in loss and much more serious overfitting compared to the model training with data augmentation.

The execution time increased from 1010 seconds to 1925 seconds, which in the other word, increased 90.59%. This means the data augmentation consumes quite a large amount of compute resources to perform.

Batch normalization

2 batch normalization layers are added into the model.

After applied batch normalization, the new model achieved higher validation accuracy in model training compared with the original model. In the confusion matrix, the new model resulted in 58.95% predicted non-recyclable correctly compared to 51.04% at the original model. Meanwhile, the percentage of predicted recyclable correctly in the new model is less than the original model.

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Oscar Lee
Email: mail.oscar.lee@gmail.com
LinkedIn: https://www.linkedin.com/in/oscarlee1

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
code		code
dataset		dataset
images		images
LICENSE		LICENSE
README.md		README.md

License

oscarlee711/deep-learning-recycle-item-classification

Folders and files

Latest commit

History

Repository files navigation

Deep Learning Real Life Item Classification

About The Project

Built With

Project Plan

1. Data Collection

2. Data Preprocessing

3. Labelling

4. Model Development

5. Model Training

6. Evaluation

Getting Started

Prerequisites

Data Source

IPython Script Work Flow Description

Key Concepts

Convolutional neural network (CNN)

Transfer learning

VGG model

Data augmentation

Design Explanation

Determine the quality of the model

Model decision

Data augmentation

Batch normalization

License

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Languages