# iWildCam-2019
### Categorize animals in wild

In this kernel we will create a CNN using keras to categorize the images of wild animals. These images are captured using WildCams. These WildCams collect images in large quantities which then are used by biologists to monitor the biodiversity and population density of animals.

We will follow the steps below in this kernel:
1. Import libraries
2. Import dataset
3. Create and train model
4. Analyse the results
5. Make predictions and submission

Let's get started

### 1. Import libraries

In [None]:
# For file manipulation
import os

# For data manipulation
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt

# For our CNN model
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D

### 2. Import dataset

Since, the data is too large and will take a lot of time I used data from another awesome [kernel](https://www.kaggle.com/xhlulu/reducing-image-sizes-to-32x32) by [xhlulu](https://www.kaggle.com/xhlulu). You can simply click on the "Add Dataset" on the top to add the data to your kernel.

In [None]:
# Loading the train and test data
x_train = np.load('../input/reducing-image-sizes-to-32x32/X_train.npy')
x_test = np.load('../input/reducing-image-sizes-to-32x32/X_test.npy')
y_train = np.load('../input/reducing-image-sizes-to-32x32/y_train.npy')

In [None]:
# Preprocessing the image data
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255.
x_test /= 255.

In [None]:
# Defining the required variables
batch_size = 64
num_classes = 14
epochs = 30
val_split = 0.1
input_shape=x_train.shape[1:]

### 3. Create and train model

We will use Keras to create a CNN and then we will train it on our training data. There are many good architectures out there that we can use. These architectures can give you much better accuracy on both train and validation set. Here, we will use a simple architecture.

In [None]:
def baseline_model():
    model = Sequential()
    model.add(Conv2D(32, (3, 3), padding='same', input_shape=input_shape))
    model.add(Activation('relu'))
    model.add(Conv2D(32, (3, 3)))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    #model.add(Dropout(0.25))

    model.add(Conv2D(64, (3, 3), padding='same'))
    model.add(Activation('relu'))
    model.add(Conv2D(64, (3, 3)))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    #model.add(Dropout(0.25))

    model.add(Flatten())
    
    model.add(Dense(1024))
    model.add(Activation('relu'))
    model.add(Dropout(0.5))
    
    model.add(Dense(1024))
    model.add(Activation('relu'))
    model.add(Dropout(0.5))
    
    model.add(Dense(num_classes))
    model.add(Activation('softmax'))
    
    return model

In [None]:
model = baseline_model()

In [None]:
# Compiling the model
model.compile(
    loss='categorical_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

In [None]:
# Training the model
hist = model.fit(
    x_train, 
    y_train,
    batch_size=batch_size,
    epochs=epochs,
    validation_split=val_split,
    shuffle=True
)

### 4. Analyse the results

In [None]:
history = hist.history

fig, ax = plt.subplots(2)

ax[0].plot(history['acc'])
ax[0].plot(history['val_acc'])
ax[0].legend(['training accuracy', 'validation accuracy'])

ax[1].plot(history['loss'])
ax[1].plot(history['val_loss'])
ax[1].legend(['training loss', 'validation loss'])

for axs in ax.flat:
    axs.label_outer()

### 5. Make predictions and submission

In [None]:
y_test = model.predict(x_test)

submission_df = pd.read_csv('../input/iwildcam-2019-fgvc6/sample_submission.csv')
submission_df['Predicted'] = y_test.argmax(axis=1)
print(submission_df.shape)
submission_df.head()

In [None]:
submission_df.to_csv('submission.csv',index=False)
# history_df.to_csv('history.csv', index=False)

# with open('history.json', 'w') as f:
#     json.dump(hist.history, f)