# **Introduction**

After centuries of intense whaling, recovering whale populations still have a hard time adapting to warming oceans and struggle to compete every day with the industrial fishing industry for food.

To aid whale conservation efforts, scientists use photo surveillance systems to monitor ocean activity. They use the shape of whales’ tails and unique markings found in footage to identify what species of whale they’re analyzing and meticulously log whale pod dynamics and movements. For the past 40 years, most of this work has been done manually by individual scientists, leaving a huge trove of data untapped and underutilized.

In this competition, we’re challenged to build an algorithm to identify individual whales in images. we’ll analyze Happywhale’s database of over 25,000 images, gathered from research institutions and public contributors. By contributing, we’ll help to open rich fields of understanding for marine mammal population dynamics around the globe.

# **Available Data**

This training data contains thousands of images of humpback whale flukes. Individual whales have been identified by researchers and given an Id. The challenge is to predict the whale Id of images in the test set. What makes this such a challenge is that there are only a few examples for each of 3,000+ whale Ids.

# File descriptions

* **train.zip** - a folder containing the training images
* **train.csv** - maps the training Image to the appropriate whale Id. Whales that are not predicted to have a label identified in the training data should be labeled as new_whale.
* **test.zip** - a folder containing the test images to predict the whale Id


# Part 1  - Keras Pre Processing

In [None]:
import numpy as np
import pandas as pd
from PIL import Image
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow
from IPython.display import HTML
import os
print(os.listdir("../input"))

%matplotlib inline

df=pd.read_csv('../input/train.csv')
df.head()

In the csv file, the feature 'Image' represents the file name of each photos in the train.zip. The feature 'Id' represents the category of the whale in the correspond row feature 'Image. Those whale in the image that doesn't have a label isto be represented as a new_whale. 

In [None]:
df.count()

Let's add a new column into the data frame which indicates the path of each file.

In [None]:
df['Path']=df['Image'].map(lambda x:'../input/train/{}'.format(x))
df.head()

The feature 'Id' is categorical since it is the label for each Image. It represents the category/species in which each whale in the train data belongs. Since machine learning models need numerical data for processing, we have toencode the categorical content into numerical values. 

In [None]:
df['Id'].nunique()

In [None]:
df['Id'].value_counts().head(20)

Let's open 2 random whale Image fron the train data. 

In [None]:
random_whale=np.random.choice(df['Path'],2)
for whale in random_whale:
    image=Image.open(whale)
    plt.imshow(image)
    plt.show()

Now let's prepare the data for Keras CNN. Let's create x_train and y_train which will be fitted to the keras for training the model. x_train will contain all the images in train dataset and y_train will contain the corresponding Id/label of each whale image.  img_to_array converts a PIL image instance to numpy array. The images will all be reshaped. 

In [None]:
from keras.preprocessing import image
from keras.applications.imagenet_utils import preprocess_input

def add_img(dataset,shape,img_size):
    
    x_train = np.zeros((shape, img_size[0], img_size[1], img_size[2]))
    count = 0
    
    for fig in dataset.itertuples():
        
        #load train data images into images of specified size
        img = image.load_img(fig.Path, target_size=img_size)
        x = image.img_to_array(img)
        x = preprocess_input(x)
        x_train[count] = x
        count += 1
    
    return x_train

Now let's prepare the y_train. y_train contain the labels, whale name of each whale image in x_train. The labels are categorical values and hence they need to be numrically encoded. For this LabelEncoder,OneHotEncoder functions from the ScikitLearn library is used. 

In [None]:
from sklearn.preprocessing import LabelEncoder
from keras.utils.np_utils import to_categorical
def label(y):
    y_train=np.array(y)
    label_encoder = LabelEncoder()
    y_train = label_encoder.fit_transform(y_train)
    y_train = to_categorical(y_train, num_classes = 5005)
    return y_train,label_encoder

In [None]:
x_train=add_img(df,df.shape[0],(128,128,3))
y_train,encoder=label(df['Id'])
x_train/=255 #Normalizing the data

In [None]:
y_train.shape

In [None]:
# Importing the Keras packages
from keras.models import Sequential
from keras.layers import Convolution2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers.normalization import BatchNormalization
from keras.preprocessing.image import ImageDataGenerator

# Initialising the CNN

In [None]:
classifier = Sequential()

# Step 1 - Convolution

In [None]:
classifier.add(Convolution2D(64, 5, 5, input_shape = (128,128, 3), activation = 'relu'))

# Step 2 - Pooling

In [None]:
classifier.add(MaxPooling2D(pool_size = (2, 2)))

# Adding a second convolutional layer

Convolution is the first layer to extract features from an input image. Convolution preserves the relationship between pixels by learning image features using small squares of input data. It is a mathematical operation that takes two inputs such as image matrix and a filter or kernal

Max pooling is a type of operation that is typically added to CNNs following individual convolutional layers. When added to a model, max pooling reduces the dimensionality of images by reducing the number of pixels in the output from the previous convolutional layer.

Dropout is a technique where randomly selected neurons are ignored during training. They are “dropped-out” randomly. This means that their contribution to the activation of downstream neurons is temporally removed on the forward pass and any weight updates are not applied to the neuron on the backward pass. As a neural network learns, neuron weights settle into their context within the network. Weights of neurons are tuned for specific features providing some specialization. Neighboring neurons become to rely on this specialization, which if taken too far can result in a fragile model too specialized to the training data. This reliant on context for a neuron during training is referred to complex co-adaptations. You can imagine that if neurons are randomly dropped out of the network during training, that other neurons will have to step in and handle the representation required to make predictions for the missing neurons. This is believed to result in multiple independent internal representations being learned by the network. The effect is that the network becomes less sensitive to the specific weights of neurons. This in turn results in a network that is capable of better generalization and is less likely to overfit the training data.

Flatten() flattens the output and feed into a fully connected layer (FC Layer)

In [None]:
classifier.add(Convolution2D(64, 5, 5, activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))
classifier.add(Dropout(0.25))

In [None]:
classifier.add(Convolution2D(32, 3, 3, activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))
classifier.add(Dropout(0.25))

In [None]:
classifier.add(Convolution2D(16, 3, 3, activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))
classifier.add(Dropout(0.3))

# Step 3 - Flattening

In [None]:
classifier.add(Flatten())

# Step 4 - Full connection

In [None]:
classifier.add(Dense(output_dim = 240, activation = 'relu'))
classifier.add(BatchNormalization())
classifier.add(Dense(output_dim = y_train.shape[1], activation = 'sigmoid'))

# Optimizer And Annealer

The Adam optimization algorithm is an extension to stochastic gradient descent that has recently seen broader adoption for deep learning applications in computer vision and natural language processing.

Stochastic gradient descent maintains a single learning rate (termed alpha) for all weight updates and the learning rate does not change during training.

**Adaptive Gradient Algorithm** (AdaGrad) that maintains a per-parameter learning rate that improves performance on problems with sparse gradients (e.g. natural language and computer vision problems).

**Root Mean Square Propagation** (RMSProp) that also maintains per-parameter learning rates that are adapted based on the average of recent magnitudes of the gradients for the weight (e.g. how quickly it is changing). This means the algorithm does well on online and non-stationary problems

In [None]:
from keras.optimizers import Adam
from keras.callbacks import ReduceLROnPlateau

# Define the optimizer
adam_optimizer = Adam(lr = 0.001, beta_1 = 0.9, beta_2 = 0.999)

# Set a learning rate annealer
learning_rate = ReduceLROnPlateau(monitor='val_acc', 
                                            patience=3, 
                                            verbose=1, 
                                            factor=0.5, 
                                            min_lr=0.00001)

# Compiling the CNN

In [None]:
classifier.compile(optimizer = adam_optimizer, loss = 'categorical_crossentropy', metrics = ['accuracy'])

In [None]:
classifier.summary()

# Part 2 - Fitting the CNN to the images

In [None]:
whale_detector = classifier.fit(x_train, y_train, epochs=60, batch_size=1000, verbose=10, callbacks=[learning_rate])

# Let's Predict the model for test Images

The method listdir() returns a list containing the names of the entries in the directory given by path. The list is made a data frame and the rest is same as we did for the train data. 

In [None]:
test = os.listdir("../input/test/")
test_df = pd.DataFrame(test, columns=['Image'])
test_df['Path']=test_df['Image'].map(lambda x:'../input/test/{}'.format(x))
x_test=add_img(test_df,test_df.shape[0],(100,100,3))
x_test/255
pred=classifier.predict(np.array(x_test),verbose=1)#Since numpy array is faster than df

# Lets Plot the Loss and Accuracy changers per epoch

In [None]:
# Plot the loss curve for training
plt.plot(whale_detector.history['loss'], color='r', label="Train Loss")
plt.title("Train Loss")
plt.xlabel("Number of Epochs")
plt.ylabel("Loss")
plt.legend()
plt.show()

In [None]:
# Plot the accuracy curve for training
plt.plot(whale_detector.history['acc'], color='g', label="Train Accuracy")
plt.title("Train Accuracy")
plt.xlabel("Number of Epochs")
plt.ylabel("Accuracy")
plt.legend()
plt.show()

# Submission

In [None]:
test_df['Id']=''
for index,prediction in enumerate(pred):
    test_df.loc[index, 'Id'] = ' '.join(encoder.inverse_transform(prediction.argsort()[-5:][::-1]))
test_df.drop(['Path'],axis=1,inplace=True)
test_df.to_csv('submission.csv', index=False)
test_df.head()