Kaggle-NDSB

Code for National Data Science Bowl at Kaggle. Ranked 10th/1049.

Summary

Ensemble Deep CNNs trained with real-time data augmentation.

Preprocessing

centering, convert to a square image with padding, convert to a negative.

Source	Destination

Data augmentation

real-time data agumentation (apply the random transformation each minibatchs). transformation method includes translation, scaling, rotation, perspective cropping and contrast scaling.

Neural Network Architecture

Three CNN architectures for different rescaling inputs. cnn_96x96, cnn_72x72, cnn_48x48

Normalization

Global Contrast Normalization (GCN)

Optimization method

minibatch-SGD with Nesterov momentum.

Results

Model	Public LB score
cnn_48x48 single model	0.6718
cnn_72x72 single model	0.6487
cnn_96x96 single model	0.6561
cnn_48x48 average of 8 models	0.6507
cnn_72x72 average of 8 models	0.6279
cnn_96x96 average of 8 models	0.6311
ensemble (cnn_48x48(x8) * 0.2292 + cnn_72x72(x8) * 0.3494 + cnn_96x96(x8) * 0.4212 + 9.828e-6)	0.6160

Developer Environment

Ubuntu 14.04
16GB RAM
GPU & CUDA (I used EC2 g2.2xlarge instance)
Torch7
NVIDIA CuDNN
cudnn.torch

Installation

Install CUDA, Torch7, NVIDIA CuDNN, cudnn.torch.

Checking CUDA environment

th cuda_test.lua

Please check your Torch7/CUDA environment when this code fails.

Convert dataset

Place the data files into a subfolder ./data.

ls ./data
test  train  train.txt test.txt classess.txt

th convert_data.lua

Training, Validation, Make submission

training & validate single cnn_48x48 model.

th train.lua -model 48 -seed 101
ls -la models/cnn*.t7

make submission file.

th predict.lua -model 48 -seed 101
ls -la models/submission*.txt

when use cnn_72x72 model.

th train.lua -model 72 -seed 101
th predict.lua -model 72 -seed 101

when use cnn_96x96 model.

th train.lua -model 96 -seed 101
th predict.lua -model 96 -seed 101

Ensemble

This task is very heavy. I used x20 g2.xlarge instances for this task and it's takes 4 days.

(helper tool can be found at ./appendix folder.)

th train.lua -model 48 -seed 101
th train.lua -model 48 -seed 102
th train.lua -model 48 -seed 103
th train.lua -model 48 -seed 104
th train.lua -model 48 -seed 105
th train.lua -model 48 -seed 106
th train.lua -model 48 -seed 107
th train.lua -model 48 -seed 108
th train.lua -model 72 -seed 101
th train.lua -model 72 -seed 102
th train.lua -model 72 -seed 103
th train.lua -model 72 -seed 104
th train.lua -model 72 -seed 105
th train.lua -model 72 -seed 106
th train.lua -model 72 -seed 107
th train.lua -model 72 -seed 108
th train.lua -model 96 -seed 101
th train.lua -model 96 -seed 102
th train.lua -model 96 -seed 103
th train.lua -model 96 -seed 104
th train.lua -model 96 -seed 105
th train.lua -model 96 -seed 106
th train.lua -model 96 -seed 107
th train.lua -model 96 -seed 108

th predict.lua -model 48 -seed 101
th predict.lua -model 48 -seed 102
th predict.lua -model 48 -seed 103
th predict.lua -model 48 -seed 104
th predict.lua -model 48 -seed 105
th predict.lua -model 48 -seed 106
th predict.lua -model 48 -seed 107
th predict.lua -model 48 -seed 108
th predict.lua -model 72 -seed 101
th predict.lua -model 72 -seed 102
th predict.lua -model 72 -seed 103
th predict.lua -model 72 -seed 104
th predict.lua -model 72 -seed 105
th predict.lua -model 72 -seed 106
th predict.lua -model 72 -seed 107
th predict.lua -model 72 -seed 108
th predict.lua -model 96 -seed 101
th predict.lua -model 96 -seed 102
th predict.lua -model 96 -seed 103
th predict.lua -model 96 -seed 104
th predict.lua -model 96 -seed 105
th predict.lua -model 96 -seed 106
th predict.lua -model 96 -seed 107
th predict.lua -model 96 -seed 108

th ensemble.lua > submission.txt

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
appendix		appendix
data		data
figure		figure
models		models
.gitignore		.gitignore
LICENSE		LICENSE
LeakyReLU.lua		LeakyReLU.lua
NOTICE		NOTICE
README.md		README.md
TrueNLLCriterion.lua		TrueNLLCriterion.lua
cnn_48x48.lua		cnn_48x48.lua
cnn_72x72.lua		cnn_72x72.lua
cnn_96x96.lua		cnn_96x96.lua
convert_data.lua		convert_data.lua
cuda_test.lua		cuda_test.lua
ensemble.lua		ensemble.lua
iproc.lua		iproc.lua
minibatch_sgd.lua		minibatch_sgd.lua
predict.lua		predict.lua
preprocess.lua		preprocess.lua
run.sh		run.sh
settings.lua		settings.lua
train.lua		train.lua
transform.lua		transform.lua
util.lua		util.lua

License

nagadomi/kaggle-ndsb

Folders and files

Latest commit

History

Repository files navigation

Kaggle-NDSB

Summary

Developer Environment

Installation

Checking CUDA environment

Convert dataset

Training, Validation, Make submission

Ensemble

About

Topics

Resources

License

Stars

Watchers

Forks

Languages