CS231N Final Project

We are using the dataset from the following Kaggle competition: iMaterialist Challenge (Fashion) at FGVC5.

Here are major documents of the project:

Final Report: cs231n_final_report.pdf.
Poster cs231n_poster_24x36.pdf.
Project Proposal: cs231n_project_proposal.pdf

Setup

Prerequisites

Python3: we use Python version 3+ for this project.
Pipenv: Python package manager and virtual environment. Can be installed with command pip install pipenv.

Initial Setup

At the first time, run the following commands:

git clone git@github.com:minfawang/cs231n-fashion.git  # Clones repo.
cd cs231n-fashion  # Changes your directory to the root of the repo.
# If you use a conda custom Python binary, then you may use the
# command in the comment below:
# pipenv --python /usr/local/bin/python3 install
pipenv --three install  # Create a virtual env using Python3.

# Enter virtual env.
pipenv shell

# Set up custom python kernel with correct binary and dependency.
# https://stackoverflow.com/a/47296960
python -m ipykernel install --user --name=cs231n-fashion

For running the cs231n pre-defined image on VM instance on Google Cloud, you need to also run this comamnd per instructions from the course page:

/home/shared/setup.sh && source ~/.bashrc

Download data

First, download the json files from the Kaggle data page.

Please download the files into data/ directory and then unzip all of them. Then download the images using the script below:

# Change max_download parameters in the file.
python utils/downloader.py

Each run

Everytime you need to update the project or run the scripts:

pipenv shell  # Enter the virtual env.
# Make updates.
exit

Useful commands

Training

# run training.
python code/keras_model_runner.py --mode=train --fine_tune --reg=0.00001 --steps_per_epoch=2000 --batch_size=64 --initial_epoch=0 --model_dir=model_dir/keras_xception/

Additional flags:

--generator_use_weight=1: Assign per-calss weights in training time.
--generator_use_wad=1: Generate wide-and-deep features.

Testing

# run test, generate submission file. If set pred_threshold to a filename, then use per class threshold.
python code/model_runner.py --mode=test --model_dir=/home/shared/cs231n-fashion/model_dir/baseline2/ --pred_threshold=0.8

Eval

# run eval.
python code/model_runner.py --mode=eval --model_dir=/home/shared/cs231n-fashion/model_dir/baseline2/ --eval_thresholds=0.3;0.5;0.7;0.75;0.8;0.85;0.9

Print debug dump

# Print debug dump. Check the Threshold Selection part in binbin_playground for reference.
# By default this prints the output of validation set. You can change this behavior in model_runner.py
python code/model_runner.py --mode=debug --model_dir=/home/shared/cs231n-fashion/model_dir/baseline2/ --debug_dump_file=model_dir/baseline2/debug_dump.csv

Print debug test dump

Similar as above, just replace debug with debug_test. It could be used to create model ensemble.

python code/model_runner.py --mode=debug_test --model_dir=/home/shared/cs231n-fashion/model_dir/baseline2/ --debug_dump_file=model_dir/baseline2/debug_test_dump.csv

Threshold selection

Check binbin_playground for reference. This could give extra 3% boost for single model.

Other useful commands

# after logging in, run the following command to monitor memory usage
sh /home/binbinx/memusg.sh

# this will download the test_prediction to local
gcloud compute scp binbinx@cs231n-fashion-ssd:/home/shared/cs231n-fashion/submission/test_prediction.csv .

Model Ensemble

For each model, run the debug_test command and generate a csv file.
Put all csv files into a single folder.

python code/ensemble.py --pred_threshold=0.2 --ensemble_dir=/home/shared/ensemble_dir --ensemble_output=/home/shared/ensemble_output.csv --output_type='prob' --mode='validate'

Scratch Pad

# Model from scratch:
python code/keras_model_runner.py --mode=train --model_dir=model_dir/keras_xception/retrain/ --drop_out_rate=0.5 --reg=0.00001 --gpu_id=0 --batch_size=32 --steps_per_epoch=2500 --epochs=1000 --fine_tune --initial_epoch=65

# Model from scratch with sample weighting:
python code/keras_model_runner.py --mode=train --model_dir=model_dir/keras_xception/retrain_weight/ --drop_out_rate=0.2 --reg=0.00001 --gpu_id=0 --batch_size=32 --steps_per_epoch=2500 --epochs=1000 --fine_tune --generator_use_weight --initial_eopch=60

# WAD model from scratch with smaple weighting:
python code/keras_model_runner.py --mode=train --model_dir=model_dir/keras_xception/retrain_weight_wad/ --drop_out_rate=0.2 --reg=0.00001 --gpu_id=0 --batch_size=32 --steps_per_epoch=2500 --epochs=1000  --generator_use_weight --deep_model_dir=model_dir/keras_xception/retrain/ --generator_use_wad

Name		Name	Last commit message	Last commit date
Latest commit History 129 Commits
code		code
data		data
model_dir		model_dir
submission		submission
.gitignore		.gitignore
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
RNN-draw.xml		RNN-draw.xml
RNN.xml		RNN.xml
cs231n_final_report.pdf		cs231n_final_report.pdf
cs231n_poster_24x36.pdf		cs231n_poster_24x36.pdf
cs231n_project_proposal.pdf		cs231n_project_proposal.pdf
thresholds2.csv		thresholds2.csv

minfawang/cs231n-fashion

Folders and files

Latest commit

History

Repository files navigation