Skip to content

mehulraj19/Image-Processing-Project

Repository files navigation

Image-Processing-Project

This is part of Image Processing Project. This is actually a part of the project of my team: Face-Mask Detection.
The project overall development is divided into 3 parts:-

  • Image Processing
  • Feature Selection
  • Model Training (Classification model used: RandomForest Classification)

Motivation

The sole purpose is to help our frontline warriors in anyway possible This model can be used to detect whether a particular person is actually wearing mask or not. Before actual starting working in the project, we have divided our part into three groups: Data Augmentation, Face Detection, Face-Mask Detection. The whole working can be checked in here.

Dataset Used

For model development, I have used the dataset having 2 main domains: with-mask and without mask.
About Dataset:
The dataset contains around 7600 images out of which around 3700 are with masks and 3800 are without masks.
The Dataset is not balanced, this actually works for better results since this will help to extract only certain differnces and train itself accordingly.
You can find dataset here.

Model Used

I am part of the Group: Face-Mask Detection, Here all three of us have implemeted the same working model via three different models. I have implemented via RandomForest.
Why Random Forest?
Random Forest is an ensemble algorthim. Here for the development of model, forest of Descision Trees is used. Therfore, this model works effectively for the classification.

Different Feature Selection techniques with their Working

  • sklearn.feature_selection.SelectFromModel

    Here, the feature selection is done via comparison of weights with the threshold value we provide and accordingly prune those having weights less than threshold and keeping all others.
    First we need to provide the estimator or the classification algorithm on which we are actually developing the model. In our case, we are using RandomForest, therefore, this will be the estimator. Now when we are talking about weights of features, these are usually less for images-type dataset. So, we have given the threshold as: 0.0001
    How this weighting features actually happens
    The RandomForest that we have taken for training uses the concepts of 'Embedded Methods' that include 2 things:
    • Filter: This uses the correlation to find importance of features
    • Wrapper: This uses the concept of usefulness after actual training happens. After training, the model assigns weights and importance of the features.
    So, the selectionFromModel uses the weights we get from RandomForest and compare it with threshold value we set and prune all those having less than the threshold.
  • sklearn.feature_selection.RFE

    This feature elimination used greedy search to find the best performing feature dataset. The goal is to select features by recursively considering smaller and smaller datasets. Here, we first need to give estimator, RandomForest, in our case, then, the selection is done either by taking coef_ into consideration or feature_importances_. Here, we have the flexibility to determine the number of features to prune after each step as well as number of features at the end of the process (If not given, selection will reduce to half the size).
  • sklearn.feature_selection.RFECV

    Here, we rank features based on recursive feature elimination and cross-validated selection of the best number of features. Here, we have cv as an extra feature in RFE, where it takes 5 as default and we can give n-Strategicfolds.
    Cross-validation estimator: This estimator built-on cross-validation capabilties to automatically select the best hyper-parameters.The advantage of using this estimator that we can use pre-computed results in the previous steps of the cross-validation process. This generally leads to speed improvements.

Implementation

I have divided woking model into three parts as I have mentioned before.
  1. Image Pre-Processing

    • For Image Pre-Processing, I have first scaled my Image from RGB into Gray-scale. Reason behind this is that for the model that I am trying to build, colors do not play much significiant role.
    • Image size has been changed to reduce the overall computation.
  2. Feature Selection

    Feature Selection in literal terms means to select only those features that are actually required for overall model development and extractiong all other that are not. Now this actually helps us to reduce the computational cost of model training, also sometimes, by doing this, the features that are actually important, leads to even better accuracy.
    How this is achieved?
    For implementing this, I have used python inbuilt library for feature Selection. For threshold, I have set it to: 0.0001. Now doing this, all featured values lesser than this threshold will be dicarded and others having greater than threshold are stored.
  3. Model Training

    Now after Image Pre-Processing and Feature Selection, all we are left with actual model training. For this I have used 300 Descsion trees and criterion: entropy (since this crtiterion works well for information gain.).

Further Improvement

For this, I have developed Web-App which one can use to classify same. Visit here to get the code.

Results

I have compared the the training accuracy before and after feature selection.
  • Without Feature Selection: 85.84%
  • With Feature Selection (using SelectionFromModel): 86.24%

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published