Repository created for CS-57300 Data Mining Course at Purdue University.
In this project we plan to compare how deep learning models perform in dearth of labelled data.
We use dataset Pins Face Recognition. This dataset has images of 105 celebrity faces. To simulate problem of low data we subsample the dataset keeping only 32 images per class for training.
We plan to compare performance of following experiments with different models on the subsampled dataset:
- Training from Scratch
- Shallow CNN - ConvNet
- ResNet-50
- ResNet-18
- Transfer Learning
- ResNet-50
- Knowledge Distillation - Feature
- Train ResNet-18 using pre-trained ResNet-50 as teacher using MSE
- Train ConvNet using pre-trained ResNet-50 as teacher using MSE
- Train ResNet-18 using pre-trained ResNet-50 as teacher using BCE
- Train ConvNet using pre-trained ResNet-50 as teacher using BCE
- One-shot-Learning
The purpose of knowledge distillation is to train shallow CNN to identify facial features learned by deep learning models trained on large datasets.
-
Creating Virtual Environment:
-
Using Conda:
conda create --name dm_project python=3.9 conda activate dm_project
-
Using Virtualenv:
virtualenv venv_dm_project source venv_dm_project/bin/activate
-
-
Installing Python Dependancies:
pip install -r requirements.txt
- The code required to get and setup the dataset is provided in the
dataset/
directory. - Run
dataset.sh
- The script downloads the dataset from kaggle, unzip it and subsample it as required by the experiments.
- Users can change the subsampling number in the shell script.
- The results in the report were based on experiments with 32 training images per class.
Model | Download link |
---|---|
resnet50_ft |
link |
resnet50_scratch |
link |
Download the models in src/saved_models
as
resnet50_ft_weight.pkl
resnet50_scratch_weights.pkl
- The code for generating faces is present in
src/utils/generate_dataset/
. - Run the
generate.sh
using NVIDIA GPU with CUDA to generate 102400 images of faces. - The shell script will clone the StyleGAN2 repository provided by NVIDIA and run the generate python file to generate faces.
- Users can generate more or less number of faces as per requirement.
- The results in the report were based on experiments with 102400 faces.
Run Black
code formatter before committing code to git repository.
Run following command in root folder of repository:
black .
Using same code formatter will help prevent error such as tab-space conversions and will code look uniform and readable throughout.
- Raj Jagtap
- Pranav Patil
- Mansi Shinde
- Rucha Deshpande
- This project was completed under the guidance of Dr. Rajiv Khanna (Purdue University).
- The ResNet-50 model code in file
src/models/resnet.py
is used from VGGFace2-PyTorch - The pre-trained model ResNet-50 is used from the repository VGGFace2-PyTorch