Whale Detector for Kaggle's Right Whale Recognition Challenge
Kaggle's NOAA Right Whale Recognition Challenge aims to develop an algorithm to identify individuals of Right Whales, which are critically endangered. It is a great chance to study machine learning and digital image processing although looks to me as a really hard challenge. Anyway I've developed this method to detect the whale in the photograph and I'm releasing it in a hope that it may help others.
It takes advantage of the fact that most pictures are pretty plain, with almost all of the area covered by water, and have a smaller region of interest which corresponds to the whale, so the histogram for most of the image will be similar except on the region of interest. The algorithm looks recursively to subimages that have an HSV histogram not similar to the original image's histogram, marking those regions in white and else on black. Then searches for the biggest continuous region using contours and places a bounding box around it, assuming it's the whale. The image is called "extract" and is saved along the black & white mask. Uses Python 2.7 and OpenCV 3.0.
Running with docker, Python
The jupyter version (hist_zones.ipynb) works well with a docker image that contains OpenCV 3 and Python 3 as described here. Just modify the last line of the Dockerfile to "CMD /usr/local/bin/jupyter-notebook --ip=0.0.0.0 --allow-root" to for root access.