Skip to content

nagadomi/kaggle-ndsb

Repository files navigation

Kaggle-NDSB

Code for National Data Science Bowl at Kaggle. Ranked 10th/1049.

Summary

Ensemble Deep CNNs trained with real-time data augmentation.

Preprocessing centering, convert to a square image with padding, convert to a negative.
Source Destination
Data augmentation real-time data agumentation (apply the random transformation each minibatchs). transformation method includes translation, scaling, rotation, perspective cropping and contrast scaling.
Neural Network Architecture Three CNN architectures for different rescaling inputs. cnn_96x96, cnn_72x72, cnn_48x48
Normalization Global Contrast Normalization (GCN)
Optimization method minibatch-SGD with Nesterov momentum.
Results
Model Public LB score
cnn_48x48 single model 0.6718
cnn_72x72 single model 0.6487
cnn_96x96 single model 0.6561
cnn_48x48 average of 8 models 0.6507
cnn_72x72 average of 8 models 0.6279
cnn_96x96 average of 8 models 0.6311
ensemble (cnn_48x48(x8) * 0.2292 + cnn_72x72(x8) * 0.3494 + cnn_96x96(x8) * 0.4212 + 9.828e-6) 0.6160

Developer Environment

Installation

Install CUDA, Torch7, NVIDIA CuDNN, cudnn.torch.

Checking CUDA environment

th cuda_test.lua

Please check your Torch7/CUDA environment when this code fails.

Convert dataset

Place the data files into a subfolder ./data.

ls ./data
test  train  train.txt test.txt classess.txt
  • th convert_data.lua

Training, Validation, Make submission

training & validate single cnn_48x48 model.

th train.lua -model 48 -seed 101
ls -la models/cnn*.t7

make submission file.

th predict.lua -model 48 -seed 101
ls -la models/submission*.txt

when use cnn_72x72 model.

th train.lua -model 72 -seed 101
th predict.lua -model 72 -seed 101

when use cnn_96x96 model.

th train.lua -model 96 -seed 101
th predict.lua -model 96 -seed 101

Ensemble

This task is very heavy. I used x20 g2.xlarge instances for this task and it's takes 4 days.

(helper tool can be found at ./appendix folder.)

th train.lua -model 48 -seed 101
th train.lua -model 48 -seed 102
th train.lua -model 48 -seed 103
th train.lua -model 48 -seed 104
th train.lua -model 48 -seed 105
th train.lua -model 48 -seed 106
th train.lua -model 48 -seed 107
th train.lua -model 48 -seed 108
th train.lua -model 72 -seed 101
th train.lua -model 72 -seed 102
th train.lua -model 72 -seed 103
th train.lua -model 72 -seed 104
th train.lua -model 72 -seed 105
th train.lua -model 72 -seed 106
th train.lua -model 72 -seed 107
th train.lua -model 72 -seed 108
th train.lua -model 96 -seed 101
th train.lua -model 96 -seed 102
th train.lua -model 96 -seed 103
th train.lua -model 96 -seed 104
th train.lua -model 96 -seed 105
th train.lua -model 96 -seed 106
th train.lua -model 96 -seed 107
th train.lua -model 96 -seed 108

th predict.lua -model 48 -seed 101
th predict.lua -model 48 -seed 102
th predict.lua -model 48 -seed 103
th predict.lua -model 48 -seed 104
th predict.lua -model 48 -seed 105
th predict.lua -model 48 -seed 106
th predict.lua -model 48 -seed 107
th predict.lua -model 48 -seed 108
th predict.lua -model 72 -seed 101
th predict.lua -model 72 -seed 102
th predict.lua -model 72 -seed 103
th predict.lua -model 72 -seed 104
th predict.lua -model 72 -seed 105
th predict.lua -model 72 -seed 106
th predict.lua -model 72 -seed 107
th predict.lua -model 72 -seed 108
th predict.lua -model 96 -seed 101
th predict.lua -model 96 -seed 102
th predict.lua -model 96 -seed 103
th predict.lua -model 96 -seed 104
th predict.lua -model 96 -seed 105
th predict.lua -model 96 -seed 106
th predict.lua -model 96 -seed 107
th predict.lua -model 96 -seed 108

th ensemble.lua > submission.txt

About

Code for National Data Science Bowl. 10th place.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published