Gender Prediction using CNN

An image classifier machine learning project for gender prediction utilizing convolutional neural networks (CNN).

Project Dataset
Business Requirements
Hypothesis and Validation
The rationale to map the business requirements to the Data Visualizations and ML tasks
- Epics
- User Stories
ML Business Case
Dashboard Design (Streamlit App User Interface)
Deployment
Main Data Analysis & Machine Learning Libraries
Run locally
Credits

Project Dataset

Dataset used provided by Ashish Jangra under the CC BY-NC-SA 4.0 license.

Link to dataset: Gender Classification 203K Images | CelebA License Details: CC BY-NC-SA 4.0 License Deed Original source of dataset: Large-scale CelebFaces Attributes (CelebA) Dataset

The dataset used for this project is sourced from Kaggle and dataset consists of 200k+ images of peoples faces.

I have not included the actual data in this repo, if you would like to inspect the dataset, please follow the steps in Run locally to download the dataset to your local machine.

Business requirements

The (fictional) business requirement story is as follows:

"A social media platform allows users to register their accounts with the profile information for the 'gender' field set to either 'Male,' 'Female,' or 'Other'.

Recently, the company has observed an increase in users registering or changing their profile information with the 'gender' field set to 'Other,' accompanied by a directly correlated decrease in advertisers on the site.

The marketing team has proposed a possible explanation for this phenomenon: The platform's algorithm for personalizing advertisements becomes less efficient when users have their gender set to 'Other,' resulting in lower conversion rates for advertisements on the site. This has led advertisers to opt for other platforms to meet their marketing needs.

The IT team has suggested a machine learning solution to provide the algorithm with a 'suggested' gender, aiming to enhance its efficiency while still respecting users' privacy and preferences."

In short terms, the business requirements are the following:

The client is interested in a machine learning solution to predict the gender of a person based on a picture of their face
The client is interested in a study of the data presented visually in order to understand the data better
The client is interested in an API in order to integrate the solution into their own applications

Hypothesis and validation

Hypothesis: Men and women have gender-specific facial features that differentiate them.

How to validate: Conducting an average image study of the male and female face.

The rationale to map the business requirements to the Data Visualisations and ML tasks

Epics:

Data collection and preparation
Data visualization
Model training, optimization and evaluation
Dashboard planning, designing, and development and deployment
API Development and deployment

User stories:

Data Collection and Preparation

User Story: As a developer, I can source and acquire the data to create a reliable and well-prepared dataset for the project.
- Task: Download the dataset and extract the relevant data, save it in a new relevant folder structure.

Data Visualization

User Story: As a developer, I can generate informative visualizations to understand the data, providing valuable insights.
- Task: Choose appropriate visualization techniques, generate visualizations and save them.
User Story: As a developer, I can integrate data visualizations into the dashboard for user-friendly data exploration.
- Task: Design the layout and implement interactive features.

Model Training, Optimization, and Evaluation

User Story: As a developer, I can find the optimal hyperparamaters for my model in a set range of parameters.
- Task: Find the optimal parameters using techniques such as Grid Search or Randomized Search.
User Story: As a developer, I can train and fine-tune the machine learning model based on the optimal hyperparameters found.
- Task: Define model architecture and implement a function to build the model based on the found optimal hyperparameters.
User Story: As a developer, I can evaluate my models performance using a variety of metrics.
- Task: Perform model evaluation using a Machine Learning library, visually represent the results and save the visualizations.
User Story: As a user, I can access model evaluation results, helping me understand the model's performance.
- Task: Provide a user interface for accessing model evaluation reports.

Dashboard Planning, Designing, and Development and Deployment

User Story: As a developer, I can implement Streamlit features, making it interactive and user-friendly.
- Task: Develop and integrate interactive Streamlit features and functionalities into the dashboard.
User Story: As a developer, I can deploy the Streamlit dashboard, ensuring it is accessible to users.
- Task: Deploy the streamlit app to Heroku and ensure the dashboard is accessible online.
User Story: As a user, I can access and interact with the deployed Streamlit app, enabling me to navigate through the project, explore data visualizations, and make live predictions on the model.
- Task: Provide navigation options, interactive data exploration features and a page for making live predictions with a way to download sample images for making predictions.

API Development and Deployment

User Story: As a user, I can access the API in order to integrate the machine learning solution into my applications.
- Task: Develop an API and provide an endpoint for users to interact with the model.
User Story: As a user, I can access information for usage of the API in order to learn how to use it.
- Task: Provide usage instructions along with example code on how to use the API inside the dashboard.

ML Business Case

We want an ML model to predict which gender the image of a face belongs to based on the image dataset provided. The target variable is categorical and contains 2-classes. We consider a classification model. It is a supervised model, a 2-class, single-label, classification model which produces output: 0 (female) or 1 (male)
Our aim is to have an accuracy of at least 75%
Our ideal outcome is to provide the company with a dependable solution to provide their algorithm with reliable data.
An API will be required for the company to integrate the solution into their platform in an automated way. The images will be gathered from profile pictures and posts made by users.

Dashboard design (Streamlit App User Interface)

Page 1: Home

Title
Link to GitHub profile
Link to GitHub repo

Page 2: Project summary

Project summary
- General info
- Project dataset
- Business requirements

Page 3: Data visualization

Should answer business requirement 2: "The client is interested in a study of the data presented visually in order to understand the data better"

Show sample from available dataset
Show label distribution
Show average images and difference between average images

Page 4: Predict gender

Should answer business requirement 1: "The client is interested in a machine learning solution to predict the gender of a person based on a picture of their face"

Provide a download link for images of faces for live prediction
Provide a file uploader and predict button to interact with the model
Provide a download button for a report of the predictions

Page 5: Hypothesis and validation

State hypothesis
State how to validate
State how the hypothesis was validated

Page 6: ML Metrics

Provide metrics from model training presented visually
Provide metrics from model evaluation presented visually
Provide True Positives(TP), True Negatives(TN), False Positives(FN) and False Negatives(FN) from model evaluation presented visually
Provide scikit-learns classification report from model evalutation

Page 7: API

Should answer business requirement 3: "The client is interested in an API in order to integrate the solution into their own applications"

Provide the API endpoint
Provide instructions on how to use the API along with example code

API

As part of the business requirement "The client is interested in an API in order to integrate the solution into their own applications", but also in order to tackle the slug size limitation set by Heroku, I developed a simple flask application to be able to interact with the model via POST requests. This app is hosted in a different Heroku app instance.

More information about the API can be found in this GitHub repository

Deployment

This project was deployed to Heroku using the following steps:

Log in to Heroku and create an App
At the Deploy tab, select GitHub as the deployment method.
Select your repository name and click Search. Once it is found, click Connect.
Select the branch you want to deploy, then click Deploy Branch.
The deployment process should happen smoothly if all deployment files are fully functional. Click now the button Open App on the top of the page to access your App.
If the slug size is too large then specify dependencies unnecessary for deployment by specifying your 'requirements.txt' and specify unnecessary files in a '.slugignore' in the root directory.

Main Data Analysis & Machine Learning libraries

Main Data analysis libraries used:

Numpy
- For performing calculations on large amounts of data efficiently, mainly pixel data in this case
- Normalizing pixel data
- Calculating means and standard deviations
- Base for other data analysis and ML libraries
Pandas
- Mainly for pandas DataFrame's for easy management of large data (sampling, shuffling, concatenation etc.)
Matplotlib & Seaborn
- For plotting and visualization of data
- Showing images from pixel data
- Metric plots & Histograms

Main Machine Learning libraries used:

TensorFlow & Keras
- Image augmentation
- Model loading
- Defining model architecture
- Training model
- One-hot encoding
- tensorflow.data.Datset API
Scikit Learn
- Hyperparameter optimization using GridSearchCV
- Generating confusion matrixes & classification reports
OpenCV
- Reading images pixel data as NumPy arrays
- Resizing images
Scikeras
- For performing GridSearchCV on Keras models which is not compatible by default with scikit-learn. (KerasClassifier)

Run locally

This repo covers the entire process of creating a ML model. From collecting and processing the data, to conducting hyperparameter optimization, data augmentation, defining and training the model on the data.

To use this repo, follow these steps:

Fork or clone this repository
Install dependencies by running:
```
pip install -r "requirements-dev.txt"
```
Register an account with Kaggle and create a new API token, download the kaggle.json and place it in the projects root directory.
Run the notebooks in the jupyter_notebooks folder in the specified order.
- DataCollection.ipynb: Downloads the dataset and extracts specified number of images.
- DataVisualization.ipynb: Conducts studies on the data and saves insightful plots.
- Model.ipynb: Prepares data, performs data augmentation and hyperparameter optimization, defines model architecture, trains, evaluates and saves a ML model.
Start the web app by running:
```
streamlit run Home.py
```

If you encounter an error while importing opencv-python(cv2), run the following commands:

sudo apt-get update
sudo apt-get install -y libgl1-mesa-dev

Credits

Churnometer repo by Code Institute: For the Readme template/structure.

Streamlit documentation: For getting the web app up and running.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
assets		assets
jupyter_notebooks		jupyter_notebooks
pages		pages
.gitignore		.gitignore
.gitpod.Dockerfile		.gitpod.Dockerfile
.gitpod.yml		.gitpod.yml
.slugignore		.slugignore
Home.py		Home.py
Procfile		Procfile
README.md		README.md
dashboard_plan.md		dashboard_plan.md
model.h5		model.h5
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
runtime.txt		runtime.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gender Prediction using CNN

Table of Contents

Project Dataset

Business requirements

Hypothesis and validation

The rationale to map the business requirements to the Data Visualisations and ML tasks

Epics:

User stories:

Data Collection and Preparation

Data Visualization

Model Training, Optimization, and Evaluation

Dashboard Planning, Designing, and Development and Deployment

API Development and Deployment

ML Business Case

Dashboard design (Streamlit App User Interface)

Page 1: Home

Page 2: Project summary

Page 3: Data visualization

Page 4: Predict gender

Page 5: Hypothesis and validation

Page 6: ML Metrics

Page 7: API

API

Deployment

Main Data Analysis & Machine Learning libraries

Run locally

Credits

About

Releases

Packages

Languages

linx02/genderpredictor

Folders and files

Latest commit

History

Repository files navigation

Gender Prediction using CNN

Table of Contents

Project Dataset

Business requirements

Hypothesis and validation

The rationale to map the business requirements to the Data Visualisations and ML tasks

Epics:

User stories:

Data Collection and Preparation

Data Visualization

Model Training, Optimization, and Evaluation

Dashboard Planning, Designing, and Development and Deployment

API Development and Deployment

ML Business Case

Dashboard design (Streamlit App User Interface)

Page 1: Home

Page 2: Project summary

Page 3: Data visualization

Page 4: Predict gender

Page 5: Hypothesis and validation

Page 6: ML Metrics

Page 7: API

API

Deployment

Main Data Analysis & Machine Learning libraries

Run locally

Credits

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages