<h1><center>HumpBack Whale Identification</center></h1>

<img src="https://i.ibb.co/NpWmp2n/Whale-identity.jpg" width="750px"/>


### Table Of Content:
* [Introduction](#1)
* [Submission Format](#2)
* [Evaluation metric explained](#3)
* [Whale Classes Distribution](#4)
* [Plotting image For one class at time](#5)
* [Plotting Odd Looking Images](#6)
* [Resolution distribution of whales Images](#7)
* [Color distribution of whales Images](#8)
* [Analysis on Bounding Boxes Images](#9)
* [Visualize upper and lower bound of the Bounding Box Ratio's](#10)
* [Modelling](#11)
* [References](#12)

## Introduction<a id="1"></a>
In this competition, we’re challenged to build an algorithm to identify individual whales in images. Happywhale’s database of over 25,000 images, gathered from research institutions and public contributors are available to build our model.Happywhale is a platform that uses image process algorithms to let anyone to submit their whale photo and have it automatically identified.

The goal of this competition is to identify individual whales in images. Despite several whales are well represented in images, most of whales are unique or shown only in a few pictures.

In [None]:
%matplotlib inline
import numpy as np # linear algebra
import pandas as pd
import cv2
import os
import seaborn as sns
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [14, 12]
import collections
from PIL import Image
import matplotlib.image as mpimg
from matplotlib.pyplot import imshow
import matplotlib.patches as patches
import random
DIR = "../input/humpback-whale-identification"

In [None]:
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
from keras import layers
from keras.preprocessing import image
from keras.applications.imagenet_utils import preprocess_input
from keras.layers import Input, Dense, Activation, BatchNormalization, Flatten, Conv2D
from keras.layers import AveragePooling2D, MaxPooling2D, Dropout
from keras.models import Model
import keras.backend as K
from keras.models import Sequential

In [None]:
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)
warnings.simplefilter(action='ignore', category=FutureWarning)
warnings.simplefilter(action='ignore', category=Warning)

### Lets see what's the data provided by the kaggle

In [None]:
train = pd.read_csv(os.path.join(DIR, "train.csv"))
test = pd.read_csv(os.path.join(DIR, "sample_submission.csv"))
train.shape, test.shape

In [None]:
os.listdir(DIR)

#### Having two csv files
* train.csv - contain image name,label for the images in train folder

> format(Image- image name in the train folder,Id- whale identification name)

*  sample_submission.csv - contain image name,dummy label for the images in the test folder

> format(Image- image name in the test folder,Id- dummy whale identification name)

#### And two folders contain the images
* train - having 25361 images of different type of whales.There Labels have provided in the train.csv file
* test - having 7960 images of different type of whales.There Labels we need to predict

In [None]:
train.head()

In [None]:
test.head()

## Submission Format for the Competition<a id="2"></a>

### We need to predict 5 labels for each of the image.
> Take example of first row 00028a005.jpg	new_whale w_23a388d w_9b5109b w_9c506f6 w_0369a5c.

>In the Above example we have predicted labels new_whale,w_23a388d,w_9b5109b,w_9c506f6,w_0369a5c for the image named 00028a005.jpg.

If we didn't submit in this format for all the 7960 rows then we will got error while submiting our predictions

## Evaluation Metric Explained<a id="3"></a>

The evaluation metric in the competition's description is Mean Average Precision @ 5 (MAP@5):
$$MAP@5 = {1 \over U} \sum_{u=1}^{U} \sum_{k=1}^{min(n,5)}P(k)  × rel(k)$$

where `U` is the number of images, `P(k)` is the precision at cutoff `k`, rel(k)  is an indicator function equaling 1 if the item at rank k is a relevant (correct) label, zero otherwise and `n` is the number of predictions per image.

> the calculation would stop after the first occurrence of the correct whale, so `P(1) = 1`. So, a prediction that is `correct` `incorrect` `incorrect` `incorrect` `incorrect` also scores `1`.

So we don't have to sum up to 5, only up to the first correct answer. In this competition there is only one correct (`TP`) answer per image, so the possible precision scores per image are either `0` or `P(k)=1/k`.

| true  | predicted   | k  | Image score |
|:-:|:-:|:-:|:-:|:-:|
| [x]  | [x, ?, ?, ?, ?]   | 1  | 1.0  |
| [x]  | [?, x, ?, ?, ?]   | 2  | 0 + 1/2 = 0.5 |
| [x]  | [?, ?, x, ?, ?]   | 3  | 0/1 + 0/2 + 1/3  = 0.33 |
| [x]  | [?, ?, ?, x, ?]   | 4  | 0/1 + 0/2 + 0/3 + 1/4  = 0.25 |
| [x]  | [?, ?, ?, ?, x]   | 5  | 0/1 + 0/2 + 0/3 + 0/4 + 1/5  = 0.2 |
| [x]  | [?, ?, ?, ?, ?]   | 5  | 0/1 + 0/2 + 0/3 + 0/4 + 0/5  = 0.0 |

where `x` is the correct and `?` is incorrect prediction. 

### The final score is simply the average over the scores of the images.

### Let's look at some random whale images from Both train and test folders.

In [None]:
random_train_whales = np.random.choice([os.path.join(DIR+'/train',whale) for whale in train['Image']],3)
random_test_whales = np.random.choice([os.path.join(DIR+'/test',whale) for whale in test['Image']],3)
both_whales = np.concatenate([random_train_whales,random_test_whales])
print('Training Images:')
for i,whale in enumerate(both_whales):
    if i==3:
        print('Test Images:')
    img = Image.open(whale)
    plt.imshow(img)
    plt.show()

## Distribution of images per Whale Class<a id="4"></a>

In [None]:
train['Id'].value_counts()[:5]

In [None]:
print(f"There are {len(os.listdir(DIR+'/train'))} images in train dataset with {train.Id.nunique()} unique classes.")
print(f"There are {len(os.listdir(DIR+'/test'))} images in test dataset.")

In [None]:
for i in range(1, 4):
    print(f'There are {train.Id.value_counts()[train.Id.value_counts().values==i].shape[0]} classes with {i} samples in train data.')

In [None]:
plt.title('Distribution of Classes excluding new_whale');
train.Id.value_counts()[1:].plot(kind='hist', bins=8,figsize=(20,14));

In [None]:
counted = train.groupby("Id").count().rename(columns={"Image":"image_count"})
counted.loc[counted["image_count"] > 80,'image_count'] = 80
plt.figure(figsize=(20,14))
sns.countplot(data=counted, x="image_count")
plt.show()

In [None]:
image_count_for_whale = train.groupby("Id", as_index=False).count().rename(columns={"Image":"image_count"})
whale_count_for_image_count = image_count_for_whale.groupby("image_count", as_index=False).count().rename(columns={"Id":"whale_count"})
whale_count_for_image_count['image_total_count'] = whale_count_for_image_count['image_count'] * whale_count_for_image_count['whale_count']

In [None]:
whale_count_for_image_count[:5]

In [None]:
whale_count_for_image_count[-3:]

## Observation Regarding Class Distribution
There is a huge disbalance in the data. There are many classes with only one or several samples:

1. Total Number of classes are 5005
2. 2000+ whales have just one image
3. Single whale with most images have 73 of them
4. Images dsitribution:
  1. almost 30% comes from whales with 4 or less images.
  1. almost 40% comes from 'new_whale' or 'Default' group around 10k samples.
  1. the rest 30% comes from whales with 5-73 images.

## Explore images Based upon Class<a id="5"></a>

### Some image samples of 'new_whale'

In [None]:
fig = plt.figure(figsize = (20, 15))
for idx, img_name in enumerate(train[train['Id'] == 'new_whale']['Image'][:12]):
    y = fig.add_subplot(3, 4, idx+1)
    img = cv2.imread(os.path.join(DIR,"train",img_name))
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    y.imshow(img)
plt.show()

   ### Now some images of whales that have just 1 image

In [None]:
single_whales = train['Id'].value_counts().index[-12:]
fig = plt.figure(figsize = (20, 15))

for widx, whale in enumerate(single_whales):
    for idx, img_name in enumerate(train[train['Id'] == whale]['Image'][:1]):
        axes = widx + idx + 1
        y = fig.add_subplot(3, 4, axes)
        img = cv2.imread(os.path.join(DIR,"train",img_name))
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        y.imshow(img)

plt.show()

## Odd looking Images in the Training Set<a id="6"></a>

In [None]:
def Plot_image_tog(ls,row,col):
    fig = plt.figure(figsize = (20, 15))
    for idx, img_name in enumerate(ls):
        y = fig.add_subplot(row, col, idx+1)
        img = cv2.imread(os.path.join(DIR,"train",img_name))
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        y.imshow(img)
    plt.show()

In [None]:
ls = ['0b75361cd.jpg','0c6772887.jpg','0ef9d37be.jpg','fabc19a85.jpg']
Plot_image_tog(ls,2,2)

###  Images with Text Part

In [None]:
text_ls=["2b96cac5a.jpg",'f110a9721.jpg','0b6e959b8.jpg','0b7aef92f.jpg','00b92e9bf.jpg','f045d7afc.jpg']
Plot_image_tog(text_ls,2,3)

###  Images with Single Fin

In [None]:
single_ls=['f3f2023c6.jpg','f0cfd99be.jpg','ed309eb49.jpg','155116572.jpg','0ac7c6cf0.jpg','fdb27aea3.jpg']
#
Plot_image_tog(single_ls,2,3)

In [None]:
# train[train["Image"] == "2b96cac5a.jpg"]
# train[train["Id"] == "w_c7bd8e7"]

## Distribution of Resolutions of Whale<a id="7"></a>

In [None]:
imageSizes_train = collections.Counter([Image.open(f'{DIR}/train/{filename}').size
                        for filename in os.listdir(f"{DIR}/train")])
imageSizes_test = collections.Counter([Image.open(f'{DIR}/test/{filename}').size
                        for filename in os.listdir(f"{DIR}/test")])

In [None]:
def isdf(imageSizes):
    imageSizeFrame = pd.DataFrame(list(imageSizes.most_common()),columns = ["imageDim","count"])
    imageSizeFrame['fraction'] = imageSizeFrame['count'] / sum(imageSizes.values())
    imageSizeFrame['count_cum'] = imageSizeFrame['count'].cumsum()
    imageSizeFrame['count_cum_fraction'] = imageSizeFrame['count_cum'] / sum(imageSizes.values())
    return imageSizeFrame

train_isdf = isdf(imageSizes_train)
train_isdf['set'] = 'train'
test_isdf = isdf(imageSizes_test)
test_isdf['set'] = 'test'

In [None]:
isizes = train_isdf.merge(test_isdf, how="outer", on="imageDim")
isizes['total_count'] = isizes['count_x'] + isizes['count_y']
dims_order = isizes.sort_values('total_count', ascending=False)[['imageDim']]
print('Number of Unique Resolutions Available are: ',len(dims_order))

In [None]:
isizes = pd.concat([train_isdf, test_isdf])
print('Number of Unique Resolutions Available in both train and test are',isizes.shape[0])

In [None]:
isizes.head()

In [None]:
popularSizes = isizes[isizes['fraction'] > 0.002]
popularSizes.shape

In [None]:
plt.figure(figsize=(20,14))
sns.barplot(x='imageDim',y='fraction',data = popularSizes, hue="set")
_ = plt.xticks(rotation=45)

### Observation Regarding Resolution Distribution
1. There are Over 7000 unique resolutions.
2. 39 most popular cover  Approx.45% images (both in train and test)

## Color Scale Distribution of Whale<a id="8"></a>

#### We saw from our above EDA that some of the images are either on a greyscale or redscale format, which is different from typical RGB pictures. Let's explore that

In [None]:
def is_grey_scale(givenImage):
    """Adopted from https://www.kaggle.com/lextoumbourou/humpback-whale-id-data-and-aug-exploration"""
    w,h = givenImage.size
    for i in range(w):
        for j in range(h):
            r,g,b = givenImage.getpixel((i,j))
            if r != g != b: return False
    return True

### Train Color Scale Distribution

In [None]:
sampleFrac = 0.1
#get our sampled images
imageList = [Image.open(f'{DIR}/train/{imageName}').convert('RGB')
            for imageName in train['Image'].sample(frac=sampleFrac)]

In [None]:
isGreyList = [is_grey_scale(givenImage) for givenImage in imageList]

In [None]:
#then get proportion greyscale
np.sum(isGreyList) / len(isGreyList)

### Test Colour Scale Distribution

In [None]:
sampleFrac = 0.1
imageListtest = [Image.open(f'{DIR}/test/{imageName}').convert('RGB')
            for imageName in test['Image'].sample(frac=sampleFrac)]
isGreyListtest = [is_grey_scale(givenImage) for givenImage in imageListtest]

In [None]:
#then get proportion greyscale
np.sum(isGreyListtest) / len(isGreyListtest)

### Get mean intensity for each channel RGB

In [None]:
def get_rgb_men(row):
    img = cv2.imread(DIR + '/train/' + row['Image'])
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    return np.sum(img[:,:,0]), np.sum(img[:,:,1]), np.sum(img[:,:,2])

train['R'], train['G'], train['B'] = zip(*train.apply(lambda row: get_rgb_men(row), axis=1) )

### Red images and there Colour Distribution

In [None]:
df = train[(train['B'] < train['R']) & (train['G'] < train['R'])]
num_photos = 6
fig, axr = plt.subplots(num_photos,2,figsize=(15,15))
for i,(_,row) in enumerate(df.iloc[:num_photos].iterrows()):
    img = cv2.imread(DIR + '/train/' + row['Image'])
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    axr[i,0].imshow(img)
    axr[i,0].axis('off')
    axr[i,1].set_title('R={:.0f}, G={:.0f}, B={:.0f} '.format(np.mean(img[:,:,0]), np.mean(img[:,:,1]), np.mean(img[:,:,2]))) 
    x, y = np.histogram(img[:,:,0], bins=255, normed=True)
    axr[i,1].bar(y[:-1], x, label='R', alpha=0.8, color='C0')
    x, y = np.histogram(img[:,:,1], bins=255, normed=True)
    axr[i,1].bar(y[:-1], x, label='G', alpha=0.8, color='C5')
    x, y = np.histogram(img[:,:,2], bins=255, normed=True)
    axr[i,1].bar(y[:-1], x, label='B', alpha=0.8, color='C1')
    axr[i,1].legend()
    axr[i,1].axis('off')

### Blue images and there Colour Distribution

In [None]:
df = train[(train['B'] > train['R']) & (train['B'] > train['G'])]
num_photos = 6
fig, axr = plt.subplots(num_photos,2,figsize=(15,15))
for i,(_,row) in enumerate(df.iloc[:num_photos].iterrows()):
    img = cv2.imread(DIR + '/train/' + row['Image'])
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    axr[i,0].imshow(img)
    axr[i,0].axis('off')
    axr[i,1].set_title('R={:.0f}, G={:.0f}, B={:.0f} '.format(np.mean(img[:,:,0]), np.mean(img[:,:,1]), np.mean(img[:,:,2]))) 
    x, y = np.histogram(img[:,:,0], bins=255, normed=True)
    axr[i,1].bar(y[:-1], x, label='R', alpha=0.8, color='C0')
    x, y = np.histogram(img[:,:,1], bins=255, normed=True)
    axr[i,1].bar(y[:-1], x, label='G', alpha=0.8, color='C5')
    x, y = np.histogram(img[:,:,2], bins=255, normed=True)
    axr[i,1].bar(y[:-1], x, label='B', alpha=0.8, color='C1')
    axr[i,1].legend()
    axr[i,1].axis('off')

### Observation Regarding Colour Distribution
1. We see that around 31% of the images in the training set are greyscale. While 29% in the Test set are greyscale.
2. Some whales have yellow spots and some images are reddish.This can happened due to sunset.
3. This suggests that we need to create image transformations that are very agnostic to the RGB spectrum (i.e. bump up the number of greyscaled images in the smaller classes).

## Analysis on Bounding Boxes Images<a id="9"></a>

In [None]:
##Bounding Boxes for the whale fins only.
bbox = pd.read_csv('../input/bounding-box/bounding_boxes.csv')

In [None]:
bbox.head()

In [None]:
# DIR = '/home/aiml/ml/share/data/all_kagg'
TRAIN = os.path.join(DIR, 'train')
TEST = os.path.join(DIR, 'test')

train_paths = [img for img in os.listdir(TRAIN)]
test_paths = [img for img in os.listdir(TEST)]

In [None]:
len(train_paths)
len(test_paths)

In [None]:
## Create full path for the images
def full_path(row):
    if row in train_paths:
        return TRAIN+'/'+row
    else:
        return TEST+'/'+row

In [None]:
bbox['Full_Path'] = bbox['Image'].apply(lambda row: full_path(row))

In [None]:
##check images are already present in the directory or not.
bbox[bbox['Image'] ==test_paths[0]]

### Visualize Original and Bounding boxed images

In [None]:
i=2
fig,ax = plt.subplots(6,2,figsize=(25,20))
for i in range(6):
    img_row = bbox[bbox['Image'] ==test_paths[i]]
    img = cv2.imread(TEST+'/'+img_row['Image'].values[0])
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    # fig,ax = plt.subplots(2)
    ax[i,0].imshow(img)
    xmin1 = img_row['x0'].values[0]
    ymin1 = img_row['y0'].values[0]
    xmax = img_row['x1'].values[0]
    ymax = img_row['y1'].values[0]
    rect = patches.Rectangle((xmin1,ymin1),xmax-xmin1,ymax-ymin1,linewidth=1,edgecolor='r',facecolor='none')
    ax[i,1].add_patch(rect)
    ax[i,1].imshow(img)
    # plt.imshow(img)
plt.show()

### Get the Original Images sizes for comparison

In [None]:
def x_orig_img(row):
    if row in train_paths:
        return Image.open(TRAIN+'/'+row).size[0]
    else:
        return Image.open(TEST+'/'+row).size[0]

def y_orig_img(row):
    if row in train_paths:
        return Image.open(TRAIN+'/'+row).size[1]
    else:
        return Image.open(TEST+'/'+row).size[1]

In [None]:
bbox['x_orig'] = bbox['Image'].apply(lambda row: x_orig_img (row))
bbox['y_orig'] = bbox['Image'].apply(lambda row: y_orig_img (row))

In [None]:
bbox['ratio'] = ((bbox['x1']-bbox['x0']) * (bbox['y1']-bbox['y0']))/(bbox['x_orig'] * bbox['y_orig']) * 100

In [None]:
plt.figure(figsize=(20,14))
plt.title("Comparison Of Full and Cropped Images", {'size':'14'})
f = sns.distplot(bbox['ratio'])
f.set_xlabel("In Percentage Cropped Size Over Original", {'size':'14'})
f.set_ylabel("Frequency", {'size':'14'}) 

## Visualize upper and lower bound of the Bounding Box Ratio's<a id="10"></a>

In [None]:
bbox[bbox['ratio']<5].sort_values(['ratio']).sort_values(['ratio']).head(5)

### Lower bound Bounding box ratio

In [None]:
i=2
fig,ax = plt.subplots(6,2,figsize=(25,20))
for i in range(6):
    img_row = bbox[bbox['ratio']<5].sort_values(['ratio'],ascending=[False])[i:i+1]
    img = cv2.imread(img_row['Full_Path'].values[0])
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    ax[i,0].imshow(img)
    xmin1 = img_row['x0'].values[0]
    ymin1 = img_row['y0'].values[0]
    xmax = img_row['x1'].values[0]
    ymax = img_row['y1'].values[0]
    rect = patches.Rectangle((xmin1,ymin1),xmax-xmin1,ymax-ymin1,linewidth=1,edgecolor='r',facecolor='none')
    ax[i,1].add_patch(rect)
    ax[i,1].imshow(img)
    # plt.imshow(img)
plt.show()

In [None]:
bbox[bbox['ratio']>95].sort_values(['ratio']).sort_values(['ratio'],ascending=[False]).head()

### Upper Bound bounding box ratio

In [None]:
i=2
fig,ax = plt.subplots(6,2,figsize=(25,20))
for i in range(6):
    img_row = bbox[bbox['ratio']>90].sort_values(['ratio'], ascending=[False])[i:i+1]
#     img_row = bbox[bbox['Image'] ==test_paths[i]]
    img = cv2.imread(img_row['Full_Path'].values[0])
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    ax[i,0].imshow(img)
    xmin1 = img_row['x0'].values[0]
    ymin1 = img_row['y0'].values[0]
    xmax = img_row['x1'].values[0]
    ymax = img_row['y1'].values[0]
    rect = patches.Rectangle((xmin1,ymin1),xmax-xmin1,ymax-ymin1,linewidth=1,edgecolor='r',facecolor='none')
    ax[i,1].add_patch(rect)
    ax[i,1].imshow(img)
plt.show()

### Conclusion
1. Most of the whale fins images are between 30 to 50% of the original and the peak near to 90% also show some images are already nicely cropped.
2. Some of the images are not bounded correctly.so we can't take the same crop as in Bounding Box.csv
3. Decision to crop the images or not is difficult some of the images are bounded correctly which help to get good score but some images are only identifiable by the sea where cropping make the score worse.

## Modelling<a id="11"></a>

In [None]:
def prepareImages(data, m, dataset):
    print("Preparing images")
    X_train = np.zeros((m, 100, 100, 3))
    count = 0
    
    for fig in data['Image']:
        #load images into images of size 100x100x3
        img = image.load_img("../input/"+dataset+"/"+fig, target_size=(100, 100, 3))
        x = image.img_to_array(img)
        x = preprocess_input(x)

        X_train[count] = x
        if (count%500 == 0):
            print("Processing image: ", count+1, ", ", fig)
        count += 1
    
    return X_train

In [None]:
def prepare_labels(y):
    values = np.array(y)
    label_encoder = LabelEncoder()
    integer_encoded = label_encoder.fit_transform(values)
    onehot_encoder = OneHotEncoder(sparse=False)
    integer_encoded = integer_encoded.reshape(len(integer_encoded), 1)
    onehot_encoded = onehot_encoder.fit_transform(integer_encoded)
    y = onehot_encoded
    return y, label_encoder

In [None]:
X = prepareImages(train, train.shape[0], "humpback-whale-identification/train")
X /= 255

In [None]:
y, label_encoder = prepare_labels(train['Id'])

In [None]:
model = Sequential()
model.add(Conv2D(32, (7, 7), strides = (1, 1), name = 'conv0', input_shape = (100, 100, 3)))
model.add(BatchNormalization(axis = 3, name = 'bn0'))
model.add(Activation('relu'))
model.add(MaxPooling2D((2, 2), name='max_pool'))
model.add(Conv2D(64, (3, 3), strides = (1,1), name="conv1"))
model.add(Activation('relu'))
model.add(AveragePooling2D((3, 3), name='avg_pool'))
model.add(Flatten())
model.add(Dense(500, activation="relu", name='rl'))
model.add(Dropout(0.8))
model.add(Dense(y.shape[1], activation='softmax', name='sm'))

model.compile(loss='categorical_crossentropy', optimizer="adam", metrics=['accuracy'])
model.summary()

In [None]:
history = model.fit(X, y, epochs=15, batch_size=100, verbose=1)
gc.collect()

In [None]:
plt.plot(history.history['acc'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.show()

In [None]:
test = os.listdir("../input/test/")
print(len(test))

In [None]:
col = ['Image']
test_df = pd.DataFrame(test, columns=col)
test_df['Id'] = ''

In [None]:
X = prepareImages(test_df, test_df.shape[0], "test")
X /= 255

In [None]:
predictions = model.predict(np.array(X), verbose=1)

In [None]:
for i, pred in enumerate(predictions):
    test_df.loc[i, 'Id'] = ' '.join(label_encoder.inverse_transform(pred.argsort()[-5:][::-1]))

In [None]:
test_df.head(10)
test_df.to_csv('submission.csv', index=False)

## References<a id="11"></a>

In [None]:
## I have used these awesome kernels for whole EDA
##https://www.kaggle.com/pestipeti/explanation-of-map5-scoring-metric
##https://www.kaggle.com/artgor/pytorch-whale-identifier
##https://www.kaggle.com/kretes/eda-distributions-images-and-no-duplicates
##https://www.kaggle.com/cristianpb/on-finding-rgb-or-bgr
##https://www.kaggle.com/suicaokhoailang/generating-whale-bounding-boxes
##https://www.kaggle.com/pestipeti/keras-cnn-starter