This is part of Image Processing Project. This is actually a part of the project of my team: Face-Mask Detection.
The project overall development is divided into 3 parts:-
- Image Processing
- Feature Selection
- Model Training (Classification model used: RandomForest Classification)
The sole purpose is to help our frontline warriors in anyway possible This model can be used to detect whether a particular person is actually wearing mask or not. Before actual starting working in the project, we have divided our part into three groups: Data Augmentation, Face Detection, Face-Mask Detection. The whole working can be checked in here. For model development, I have used the dataset having 2 main domains: with-mask and without mask.
About Dataset:
The dataset contains around 7600 images out of which around 3700 are with masks and 3800 are without masks.
The Dataset is not balanced, this actually works for better results since this will help to extract only certain differnces and train itself accordingly.
You can find dataset here. I am part of the Group: Face-Mask Detection, Here all three of us have implemeted the same working model via three different models. I have implemented via RandomForest.
Why Random Forest?
Random Forest is an ensemble algorthim. Here for the development of model, forest of Descision Trees is used. Therfore, this model works effectively for the classification.
-
Here, the feature selection is done via comparison of weights with the threshold value we provide and accordingly prune those having weights less than threshold and
keeping all others.
First we need to provide the estimator or the classification algorithm on which we are actually developing the model. In our case, we are using RandomForest, therefore, this will be the estimator. Now when we are talking about weights of features, these are usually less for images-type dataset. So, we have given the threshold as: 0.0001
How this weighting features actually happens
The RandomForest that we have taken for training uses the concepts of 'Embedded Methods' that include 2 things:- Filter: This uses the correlation to find importance of features
- Wrapper: This uses the concept of usefulness after actual training happens. After training, the model assigns weights and importance of the features.
- This feature elimination used greedy search to find the best performing feature dataset. The goal is to select features by recursively considering smaller and smaller datasets. Here, we first need to give estimator, RandomForest, in our case, then, the selection is done either by taking coef_ into consideration or feature_importances_. Here, we have the flexibility to determine the number of features to prune after each step as well as number of features at the end of the process (If not given, selection will reduce to half the size).
-
Here, we rank features based on recursive feature elimination and cross-validated selection of the best number of features. Here, we have cv as an extra feature in RFE, where it takes 5 as default and we can give n-Strategicfolds.
Cross-validation estimator: This estimator built-on cross-validation capabilties to automatically select the best hyper-parameters.The advantage of using this estimator that we can use pre-computed results in the previous steps of the cross-validation process. This generally leads to speed improvements.
-
- For Image Pre-Processing, I have first scaled my Image from RGB into Gray-scale. Reason behind this is that for the model that I am trying to build, colors do not play much significiant role.
- Image size has been changed to reduce the overall computation.
-
Feature Selection in literal terms means to select only those features that are actually required for overall model development and extractiong all other that are not. Now
this actually helps us to reduce the computational cost of model training, also sometimes, by doing this, the features that are actually important, leads to even better
accuracy.
How this is achieved?
For implementing this, I have used python inbuilt library for feature Selection. For threshold, I have set it to: 0.0001. Now doing this, all featured values lesser than this threshold will be dicarded and others having greater than threshold are stored. - Now after Image Pre-Processing and Feature Selection, all we are left with actual model training. For this I have used 300 Descsion trees and criterion: entropy (since this crtiterion works well for information gain.).
For this, I have developed Web-App which one can use to classify same. Visit here to get the code.
I have compared the the training accuracy before and after feature selection.- Without Feature Selection: 85.84%
- With Feature Selection (using SelectionFromModel): 86.24%