# Happy Whalphins

Currently, most research institutions rely on time-intensive—and sometimes inaccurate—manual matching by the human eye. Thousands of hours go into manual matching, which involves staring at photos to compare one individual to another, finding matches, and identifying new individuals. While researchers enjoy looking at a whale photo or two, manual matching limits the scope and reach.

In this competition, we are asked to develop a model to match individual whales and dolphins by unique—but often subtle—characteristics of their natural markings. You'll pay particular attention to dorsal fins and lateral body views in image sets from a multi-species dataset built by 28 research institutions. The best submissions will suggest photo-ID solutions that are fast and accurate.

# Load the Dataset

In [None]:
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
import seaborn as sns
import cv2
# import imagesize

In [None]:
train_df = pd.read_csv('/kaggle/input/happy-whale-and-dolphin/train.csv')
sample_submission_df = pd.read_csv('/kaggle/input/happy-whale-and-dolphin/sample_submission.csv')

In [None]:
train_df.head()

In [None]:
sample_submission_df.head()

## Whale and Dholpins
Explore the data of whales and dholpins.

In [None]:
train_df.loc[train_df.species.str.contains('beluga'), 'species'] = 'beluga_whale'
train_df.loc[train_df.species.str.contains('globis'), 'species'] = 'globis_whale'
train_df['class'] = train_df.species.map(lambda x: 'whale' if 'whale' in x else 'dolphin')
train_df['species'] = train_df['species'].str.replace('bottlenose_dolpin','bottlenose_dolphin')
train_df['species'] = train_df['species'].str.replace('kiler_whale','killer_whale')

In [None]:
temp = train_df.groupby(["class"])["species"].nunique()
df = pd.DataFrame({'Classes': temp.index,
                   'Species': temp.values
                  })
df = df.sort_values(['Species'], ascending=False)
plt.figure(figsize = (6,6))
plt.title('Species distribution - grouped on Dolphins and Whales - train dataset')
sns.set_color_codes("pastel")
s = sns.barplot(x = 'Classes', y="Species", data=df)
s.set_xticklabels(s.get_xticklabels(),rotation=90)
locs, labels = plt.xticks()
plt.show()

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(7, 7))
sns.kdeplot(np.log(train_df.loc[train_df["class"]=='whale'].individual_id.value_counts()))
sns.kdeplot(np.log(train_df.loc[train_df["class"]=='dolphin'].individual_id.value_counts()))
ax.legend(labels=['whale', 'dolphin'])
plt.title("Logaritmic distribution of individual_id frequency in images")
plt.show()

## Samples Images

In [None]:
def plot_image_samples(species):
    root_path = "/kaggle/input/happy-whale-and-dolphin/"
    fig.subplots_adjust(hspace = .1, wspace=.1)
    images_folder="train_images/"
    df = train_df[train_df['species']==species].copy()
    df.index = range(len(df.index))

    f, ax = plt.subplots(4, 4, figsize=(16,16))

    for i in range(16):
        file = df.loc[i, 'image']
        species = df.loc[i, 'species']
        identifier = df.loc[i, 'individual_id']
        img = cv2.imread(root_path+images_folder+file)
        ax[i//4, i%4].imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
        ax[i//4, i%4].set_title(identifier+" ("+species+")")
        ax[i//4, i%4].axis('off')

In [None]:
plot_image_samples("bottlenose_dolphin")

# Thank you for visiting this notebook!

If you like this notebook, you know it is <b>FREE</b> to click the upvote button.

Thanks for reading this notebook. If you have any feedback or comments please write it down the comment section below.

# References

[Happy Whales and Dolphins](https://www.kaggle.com/gpreda/happy-whales-and-dolphins/notebook)</br>
[Read, Display and Write an Image using OpenCV](https://learnopencv.com/read-display-and-write-an-image-using-opencv/)