# What to localize?
#### To exclude background from classification process, it is indispensible to localize whale/dolphin from image.

But that's not all what we want; we want to crop normalized fin image and use it like a fingerprint. So, do we have to **localize whale/dolphin** or **localize fin** or both of them? There comes the problem. Some species have only fin images and you can't localize whole whale/dolphin, but some species have no fin and there's no clue to localize fin.

I've been drawing bounding boxes to localize whale/dolphin and got confused a lot because of that. The problem is, there's some species which have individual trait not only on their fin, but on their body. To make bounding boxes properly, I categorized which species can be localized by their body and which are not.


In [None]:
import pandas as pd

train_df = pd.read_csv("../input/happy-whale-and-dolphin/train.csv")
train_df.species.replace({"globis": "short_finned_pilot_whale",
                          "pilot_whale": "short_finned_pilot_whale",
                          "kiler_whale": "killer_whale",
                          "bottlenose_dolpin": "bottlenose_dolphin"}, inplace=True)

In [None]:
img_dic = dict()
species_list = train_df['species'].unique()
for species in species_list:
    img_dic[f'{species}'] = train_df[train_df['species']==f'{species}']['image']

In [None]:
import random
import cv2
import matplotlib.pyplot as plt

# randomly display 10 images of specific species
def show_species(species, plot_col=5, plot_row=2):
    num_img = int(len(img_dic[f'{species}']))
    print(f"{species} images: {num_img}")
    a = random.sample(range(0,num_img),plot_col*plot_row)
    plt.figure(figsize=(24,5))
    for i, img_name in enumerate(img_dic[f'{species}'].iloc[a]):
        img_dir = f"../input/happy-whale-and-dolphin/train_images/{img_name}"
        img = cv2.imread(img_dir)[..., ::-1]
        plt.subplot(plot_row,plot_col,i+1)
        plt.title(f"{img_name}")
        plt.axis(False)
        plt.imshow(img)
    plt.show()
    return(num_img)

# No fin
#### Localizing body is necessary.
Images belong here have no specific fin.
* beluga: 80~90% of images are body-localized.
* southern_right_whale: check [here](https://www.andersoncabotcenterforoceanlife.org/rightwhales/right-whales/identifying-right-whales-2/) for more information about identifying right whales.

In [None]:
no_fin_list = ['beluga',
               'gray_whale',
               'southern_right_whale'
              ]

In [None]:
num_all = 0
for species in no_fin_list:
    num_all += show_species(species)
print(f"all images: {num_all}")

# Fin + Body
#### Localizing body would help.
Images belong here have fin and body.
* blue_whale: 80~90% of images are body-localized.

In [None]:
fin_body_list = ['blue_whale',
                 'cuviers_beaked_whale',
                 'fin_whale',
                 'humpback_whale'
                ]

In [None]:
num_all = 0
for species in fin_body_list:
    num_all += show_species(species)
print(f"all images: {num_all}")

# Fin but body..?
#### Localizing body might not help.
Images belong here have fin and, some have body but others are unfortunately not.

In [None]:
fin_or_body_list = ['brydes_whale',
                    'killer_whale',
                    'minke_whale',
                    'sei_whale'
                   ]

In [None]:
num_all = 0
for species in fin_or_body_list:
    num_all += show_species(species)
print(f"all images: {num_all}")

# Fin
#### Localizing body won't help.
Images belong here have only fin. (some might have body but still it won't help classification process)
* commersons_dolphin: all of images seems consistant and cropping might not necessary
* dusky_dolphin: all of images seems fin-localized

In [None]:
fin_list = ['bottlenose_dolphin',
            'commersons_dolphin',
            'common_dolphin',
            'dusky_dolphin',
            'false_killer_whale',
            'frasiers_dolphin',
            'long_finned_pilot_whale',
            'melon_headed_whale',
            'pantropic_spotted_dolphin',
            'pygmy_killer_whale',
            'rough_toothed_dolphin',
            'short_finned_pilot_whale',
            'spinner_dolphin',
            'spotted_dolphin',
            'white_sided_dolphin'
           ]

In [None]:
num_all = 0
for species in fin_list:
    num_all += show_species(species,plot_row=1)
print(f"all images: {num_all}")