kaggle-humpback-submission

Code for 3rd place solution in Kaggle Humpback Whale Identification Challange.

To read the detailed solution, please, refer to the Kaggle post

Hardware

The following specs were used to create the original solution.

Ubuntu 16.04 LTS
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
2x NVIDIA 1080 Ti

Reproducing Submission

To reproduce my submission without retraining, do the following steps:

Installation
Download Dataset
Download Pretrained models
Inference
Make Submission

Installation

All requirements should be detailed in requirements.txt. Using Anaconda is strongly recommended.

conda create -n humpback python=3.6
source activate humpback
pip install -r requirements.txt

Download dataset

Download and extract train.zip and test.zip to data directory. If the Kaggle API is installed, run following command.

$ kaggle competitions download -c humpback-whale-identification -f train.zip
$ kaggle competitions download -c humpback-whale-identification -f test.zip
$ unzip train.zip -d data/train
$ unzip test.zip -d data/test

Generate CSV files

You can skip this step. All CSV files are prepared in the data directory.

List of CSV files

filename	description
landmark.{split}.{fold}.csv	predicted landmarks for the train and test set
duplcate_ids.csv	list of duplicate identities
leaks.csv	leaks from post
split.keypoint.{fold}.csv	labels for training bounding box and landmark detector
train.v2.csv	label file that duplicate ids are grouped to single identity and several new whales are also grouped.

Landmark

To inference landmarks, run following commands

$ sh inference_landmarks.sh

Training

In the configs directory, you can find configurations I used to train my final models.

Train models

To train models, run following commands.

$ python train.py --config={config_path}

The expected training times are:

Model	GPUs	Image size	Training Epochs	Training Time
densenet121	1x 1080 Ti	320	300	60 hours

Average weights

To average weights, run following commands.

$ python swa.py --config={config_path}

The averages weights will be located in train_logs/{train_dir}/checkpoint.

Pretrained models

You can download pretrained model that used for my submission from link. Or run following command.

$ wget https://www.dropbox.com/s/fdnh29pjk8rpxgs/train_logs.zip
$ tar xzvf train_logs.tar.gz

Unzip them into train_logs then you can see the following structure:

results
  +- densenet121.1st
  |  +- checkpoint
  +- densenet121.2nd
  |  +- checkpoint
  +- densenet121.3rd
  |  +- checkpoint
  +- landmark.0
  |  +- checkpoint
  +- landmark.1
  |  +- checkpoint
  +- landmark.2
  |  +- checkpoint
  +- landmark.3
  |  +- checkpoint
  +- landmark.4
  |  +- checkpoint

Inference

If trained weights are prepared, you can create files that contain cosine similarities of images with target whales.

$ python inference.py \
  --config={config_filepath} \
  --tta_landmark={0 or 1} \
  --tta_flip={0 or 1} \
  --output={output_filepath}

To make submission, you must inference test and test_val splits. For example:

$ python make_submission.py \
  --input_path={comma seperated list of similarity file paths} \
  --output_path={submission_file_path}

To inference all models and make submission using pretrained models, simply run sh inference.sh

Post Processing

As you know, there are some duplicate whale ids. For the duplicate ids, the following process are applied.

Assume that the identity A and the identity B are duplicate.

If top1 prediction is the identity A, then I set the identity B to top2 prediction.
If the size of test image is equal to one of images in identity A and is not equal to any of images in identity B, then I set top1 prediction to identity A.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
data		data
datasets		datasets
losses		losses
models		models
optimizers		optimizers
schedulers		schedulers
tasks		tasks
transforms		transforms
utils		utils
LICENSE		LICENSE
README.md		README.md
ensemble_landmarks.py		ensemble_landmarks.py
inference.sh		inference.sh
inference_landmark.py		inference_landmark.py
inference_landmark.sh		inference_landmark.sh
inference_similarity.py		inference_similarity.py
make_submission.py		make_submission.py
post_processing.py		post_processing.py
requirements.txt		requirements.txt
swa.py		swa.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

kaggle-humpback-submission

Hardware

Reproducing Submission

Installation

Download dataset

Generate CSV files

List of CSV files

Landmark

Training

Train models

Average weights

Pretrained models

Inference

Post Processing

About

Releases

Packages

Languages

License

pudae/kaggle-humpback

Folders and files

Latest commit

History

Repository files navigation

kaggle-humpback-submission

Hardware

Reproducing Submission

Installation

Download dataset

Generate CSV files

List of CSV files

Landmark

Training

Train models

Average weights

Pretrained models

Inference

Post Processing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages