**Objective**


In this notebook, we are going to predict the presence of invasive species from the pictures taken in a Brazilian national forest.

Check the input files we have

In [1]:
import numpy as np 
import pandas as pd 

from subprocess import check_output
print(check_output(["ls", "../input"]).decode("utf8"))

Let's see the top 5 rows from train_label file

In [2]:
train_labels_df = pd.read_csv("../input/train_labels.csv")
train_labels_df.head()

In [3]:
train_labels_df.tail()

In [4]:
train_labels_df.describe() #describing train_labels_df

In [5]:
train_labels_df.invasive.value_counts() #finding how many invasive and not invasive samples in train data

Find how many images in both train and test folders

In [6]:
#Getting image names from both train and test folders
train_images_names = check_output(["ls", "../input/train/"]).decode("utf8")
train_images_names = train_images_names.split("\n")
test_images_names = check_output(["ls", "../input/test/"]).decode("utf8")
test_images_names = test_images_names.split("\n")
print("Total train images",len(train_images_names))
print("Total test images",len(test_images_names))

Creating test dataframe for submission using test images names

In [7]:
test_df = pd.DataFrame()
test_df["name"] = [test_image.split(".")[0] for test_image in test_images_names]
test_df.head()

Lets see some sample images for both invasive and non-invasive now.

 **3.jpg** is invasive and **4.jpg** is non-invasive species. 

In [8]:
% pylab inline
import os
import random

import pandas as pd
from scipy.misc import imread
print("See train images with invasive and without invasive species")
print("3.jpg - With Invasive species")
img = imread("../input/train/3.jpg")
imshow(img)

In [9]:
print("4.jpg - Non-Invasive species")
img1 = imread("../input/train/4.jpg")
imshow(img1)

In [10]:
#importing all the necessary modules
% pylab inline
import os
import random

import pandas as pd
from scipy.misc import imread

root_dir = os.path.abspath('.')
data_dir = '../input/'

Script to randomly select an image and printed it

In [11]:
i = random.choice(train_labels_df.index)

img_name = str(train_labels_df.name[i])+".jpg"
img = imread(os.path.join(data_dir, 'train', img_name))

imshow(img)
print("Image",img_name)
print("Invasive", train_labels_df.invasive[i])

All images are different in size. This may reduce the model accuracy. So we need to resize all the images to same size.

Load all the images and resize them into a single numpy array.

In [12]:
#Resizing train images
from scipy.misc import imresize

temp = []


for img_name in train_labels_df.name:
    img_path = os.path.join(data_dir, 'train', str(img_name)+".jpg")
    img = imread(img_path)
    img = imresize(img, (32, 32))

    img = img.astype('float32')
    temp.append(img)
train_x = np.stack(temp)

In [13]:
print(test_df.tail()) #Last row is null
test_df = test_df[:-1] #So removing last row from the test dataframe
print(test_df.tail())

In [14]:
#Resizing test images
temp = []
i=0
for img_name in test_df.name:
    img_path = os.path.join(data_dir, 'test', str(img_name)+".jpg")
    try:
        img = imread(img_path)
        img = imresize(img, (32, 32))

        img = img.astype('float32')
        temp.append(img)
        i=i+1
    except:
        continue
test_x = np.stack(temp)

We can do one more thing that could help us build a better model; i.e. we can normalize our images. Normalizing the images will make our train faster.

In [15]:
train_x = train_x / 255
test_x = test_x / 255

Let's see the distribution of invasive images in our data

In [16]:
train_labels_df.invasive.value_counts(normalize=True)

First we should specify all the parameters we will be using in our network

In [17]:
import keras
from sklearn.preprocessing import LabelEncoder

lb = LabelEncoder()
train_y = lb.fit_transform(train_labels_df.invasive)
train_y = keras.utils.np_utils.to_categorical(train_y)

input_num_units = (32, 32, 3)
hidden_num_units = 500
output_num_units = 2

epochs = 5
batch_size = 128

In [18]:
#Import the necessary keras modules
from keras.models import Sequential
from keras.layers import Dense, Flatten, InputLayer

#Define our network
model = Sequential([
  InputLayer(input_shape=input_num_units),
  Flatten(),
  Dense(units=hidden_num_units, activation='relu'),
  Dense(units=output_num_units, activation='softmax'),
])

To see how our model looks like; lets print it

In [19]:
model.summary()

In [20]:
#Compile and train our network
model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(train_x, train_y, batch_size=batch_size,epochs=epochs,verbose=1)

Let’s tweak the code a little bit to cross validate it.

In [21]:
model.fit(train_x, train_y, batch_size=batch_size,epochs=epochs,verbose=1, validation_split=0.2)

Let's submit the result

In [22]:
pred = model.predict_classes(test_x)
pred = lb.inverse_transform(pred)
test_df['invasive'] = pred

In [23]:
test_df['invasive'].value_counts()

In [24]:
test_df.to_csv('submission.csv', index=False)

Let's test our model using random image

In [26]:
i = random.choice(train_labels_df.index)
img_name = train_labels_df.name[i]

img = imread(os.path.join(data_dir, 'train', str(img_name)+".jpg")).astype('float32')
imshow(imresize(img, (128, 128)))
pred = model.predict_classes(train_x)
print('Original:', train_labels_df.invasive[i], 'Predicted:', lb.inverse_transform(pred[i]))

For creating this notebook, i referred an article Solution for Age Detection Practice Problem from Analytics Vidya.

If you really feel this is helpful for you. Please upvote it and encourage me to write more. Thanks.