
<p><img src="https://assets.datacamp.com/production/project_412/img/92_notebook.jpg" alt="honey bee">
<em>A honey bee (Apis).</em></p>

<p><img src="https://assets.datacamp.com/production/project_412/img/20_notebook.jpg" alt="bumble bee">
<em>A bumble bee (Bombus).</em></p>
<p>This notebook walks through building a model that can automatically detect honey bees and bumble bees.</p>

In [None]:
import pickle
from pathlib import Path
from skimage import io

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import keras
from keras.models import Sequential
from keras.layers import Dense, Conv2D, Flatten, Dropout, MaxPooling2D

Using TensorFlow backend.


## 2. Load image labels
<p>Now that we have all of our imports ready, it is time to look at the labels for our data. We will load our <code>labels.csv</code> file into a DataFrame called <code>labels</code>, where the index is the image name (e.g. an index of 1036 refers to an image named 1036.jpg) and the <code>genus</code> column tells us the bee type. <code>genus</code> takes the value of either <code>0.0</code> (Apis or honey bee) or <code>1.0</code> (Bombus or bumble bee).</p>

In [None]:
labels = pd.read_csv('datasets/labels.csv', index_col = 0)

print(labels.genus.value_counts())

y = labels.genus.values

0.0    827
1.0    827
Name: genus, dtype: int64


### Normalize image data
<p>Now we need to normalize our image data. Normalization is a general term that means changing the scale of our data so it is consistent.</p>


In [None]:
ss = StandardScaler()

image_list = []
for i in labels.index:
    img = io.imread('datasets/{}.jpg'.format(i)).astype(np.float64)
    
    for channel in range(img.shape[2]):
        img[:, :, channel] = ss.fit_transform(img[:, :, channel])
        
    image_list.append(img)
    
X = np.array(image_list)

print(X.shape)

(1654, 50, 50, 3)


##  Split into train, test, and evaluation sets


In [None]:
x_interim, x_eval, y_interim, y_eval = train_test_split(X,
                                           y,
                                           test_size=0.2,
                                           random_state=52)

x_train, x_test, y_train, y_test = train_test_split(x_interim, y_interim, test_size = 0.4, random_state = 52 )
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
print(x_eval.shape[0], 'eval samples')

x_train shape: (793, 50, 50, 3)
793 train samples
530 test samples
331 eval samples


In [None]:

num_classes = 1

model = Sequential()

model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(50, 50, 3)))


model.add(Conv2D(64, kernel_size=(3,3), activation = 'relu'))

In [None]:
model.add(MaxPooling2D (pool_size = (2,2)))

model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))

model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='sigmoid', name='preds'))

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 48, 48, 32)        896       
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 46, 46, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 23, 23, 64)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 21, 21, 64)        36928     
_________________________________________________________________
dropout_1 (Dropout)          (None, 21, 21, 64)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 28224)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               3612800   
__________

## 8. Compile and train model


In [None]:
model.compile(
   
    loss= keras.losses.binary_crossentropy,
  
    optimizer=keras.optimizers.SGD(lr=0.001),
    
    metrics=['accuracy']
)


model.fit(
    x_train,
    y_train,
    epochs= 200,
    verbose=0,
    validation_data=(x_test, y_test)
)



score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

print("")

eval_score = model.evaluate(x_eval,y_eval, verbose = 0)
print('Eval loss:', eval_score[0])
print('Eval accuracy:', eval_score[1])

Test loss: 0.6423929142502119
Test accuracy: 0.664150944070996

Eval loss: 0.654895001667864
Eval accuracy: 0.649546827434413


#Generate predictions

<p>We now have a deep learning model that can be used to identify honey bees and bumble bees in images! The next step is to explore transfer learning, which harnesses the prediction power of models that have been trained on far more images than the mere 1600 in our dataset.</p>

In [None]:

y_proba = model.predict(x_eval)
print("First five probabilities:")
print(y_proba[:5])
print("")


y_pred = model.predict_classes(x_eval)
print("First five class predictions:")
print(y_pred[:5])
print("")

First five probabilities:
[[0.6641951 ]
 [0.60525185]
 [0.75047696]
 [0.32835406]
 [0.88149595]]

First five class predictions:
[[1]
 [1]
 [1]
 [0]
 [1]]

