## Case Study

1st case study - Project 1:

 The case study is from a dataset from Kaggle. 

Link to the Kaggle project site:

https://www.kaggle.com/c/plant-seedlings-classification (Links to an external site.)Links to an external site.

 The dataset has to be downloaded from the above Kaggle web site.

 Can you differentiate a weed from a crop seedling?

The ability to do so effectively can mean better crop yields and better stewardship of the environment.

The Aarhus University Signal Processing group, in collaboration with University of Southern Denmark, has recently released a dataset containing images of approximately 960 unique plants belonging to 12 species at several growth stages.

### Connect with Google Drive

In [33]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### Functions for image augmentation...

In [0]:
import random
from scipy import ndarray
import skimage as sk
from skimage import transform
from skimage import util

def random_rotation(image_array: ndarray):
    # pick a random degree of rotation between 25% on the left and 25% on the right
    random_degree = random.uniform(-25, 25)
    return sk.transform.rotate(image_array, random_degree)

def random_noise(image_array: ndarray):
    # add random noise to the image
    return sk.util.random_noise(image_array)

def horizontal_flip(image_array: ndarray):
    # horizontal flip doesn't need skimage, it's easy as flipping the image array of pixels !
    return image_array[:, ::-1]
  
# dictionary of the transformations we defined earlier
available_transformations = {
    'rotate': random_rotation,
    'noise': random_noise,
    'horizontal_flip': horizontal_flip
}

### Read the images and generate the train and test dataset (5 points)

In [0]:
#Import necessary libraries to fetch the train and test data...
import os
from zipfile import ZipFile
import cv2
import pandas as pd
import numpy as np
import tensorflow as tf

tf.set_random_seed(42)

In [0]:
#Change the working directory to make our file path access simple.

os.chdir('/content/drive/My Drive/AIML/Projects/Residency 7/plant-seedlings-classification')

In [0]:
#Extract the train Zip file content

with ZipFile('train.zip', 'r') as z:
  z.extractall()

In [0]:
#Extract the test Zip file content

with ZipFile('test.zip', 'r') as z:
  z.extractall()

In [0]:
#Read the test images in to an array...

X_test=[]
X_test_name=[]
os.chdir('/content/drive/My Drive/AIML/Projects/Residency 7/plant-seedlings-classification/test')

for i in os.listdir():
  try:
    dummy = cv2.imread(i)
    dummy = cv2.resize(dummy,(128,128)) #resize to have all the images of same size
    X_test.append(dummy)
    X_test_name.append(i)
  except Exception as e:
    print(e)

In [0]:
#Now lets read train data...

os.chdir('/content/drive/My Drive/AIML/Projects/Residency 7/plant-seedlings-classification/train')

In [40]:
ls

 [0m[01;34mBlack-grass[0m/        [01;34m'Common wheat'[0m/      [01;34m'Scentless Mayweed'[0m/
 [01;34mCharlock[0m/           [01;34m'Fat Hen'[0m/           [01;34m'Shepherds Purse'[0m/
 [01;34mCleavers[0m/           [01;34m'Loose Silky-bent'[0m/  [01;34m'Small-flowered Cranesbill'[0m/
[01;34m'Common Chickweed'[0m/   [01;34mMaize[0m/              [01;34m'Sugar beet'[0m/


In [41]:
#Train images are classified and placed in folders. The folder name should be taken as target (y_train).
os.listdir()

['Fat Hen',
 'Small-flowered Cranesbill',
 'Cleavers',
 'Black-grass',
 'Sugar beet',
 'Shepherds Purse',
 'Charlock',
 'Loose Silky-bent',
 'Scentless Mayweed',
 'Maize',
 'Common Chickweed',
 'Common wheat']

In [42]:
for i in os.listdir():
  path, dirs, files = next(os.walk(i))
  print (i + " : " + str(len(files)))

Fat Hen : 475
Small-flowered Cranesbill : 496
Cleavers : 287
Black-grass : 263
Sugar beet : 385
Shepherds Purse : 231
Charlock : 390
Loose Silky-bent : 654
Scentless Mayweed : 516
Maize : 221
Common Chickweed : 611
Common wheat : 221


In [43]:
#The data is imbalanced among categories. Lets balance it by augmentation...

from skimage import io

#Lets generated new images such that each category have same number of images, to eradicate imbalance and bias.
num_files_desired = 754
num_files = 0

for i in os.listdir():
    if (os.path.isdir(i)):
            Images = []
            folder_path = "/content/drive/My Drive/AIML/Projects/Residency 7/plant-seedlings-classification/train/" + i
            for j in os.listdir(i):
                Images.append(folder_path + "/" + j)
            
            Img_cnt = len(Images)
            Cnt_diff = num_files_desired - Img_cnt
            num_generated_files = 0
            print ("In " + i + ", " + str(Cnt_diff) + " new images will be added by augmentation...")
            while num_generated_files < Cnt_diff:
                # random image from the folder
                image_path = random.choice(Images)
                # read image as an two dimensional array of pixels
                image_to_transform = sk.io.imread(image_path)

                # random num of transformation to apply
                num_transformations_to_apply = random.randint(1, len(available_transformations))

                num_transformations = 0
                transformed_image = None
                while num_transformations <= num_transformations_to_apply:
                    # random transformation to apply for a single image
                    key = random.choice(list(available_transformations))
                    transformed_image = available_transformations[key](image_to_transform)
                    num_transformations += 1

                    new_file_path = '%s/augmented_image_%s.png' % (folder_path, num_files)

                    # write image to the disk
                    io.imsave(new_file_path, transformed_image)
                    num_generated_files += 1
                    num_files += 1

In Fat Hen, 279 new images will be added by augmentation...


  .format(dtypeobj_in, dtypeobj_out))


In Small-flowered Cranesbill, 258 new images will be added by augmentation...
In Cleavers, 467 new images will be added by augmentation...
In Black-grass, 491 new images will be added by augmentation...
In Sugar beet, 369 new images will be added by augmentation...
In Shepherds Purse, 523 new images will be added by augmentation...
In Charlock, 364 new images will be added by augmentation...
In Loose Silky-bent, 100 new images will be added by augmentation...
In Scentless Mayweed, 238 new images will be added by augmentation...
In Maize, 533 new images will be added by augmentation...
In Common Chickweed, 143 new images will be added by augmentation...
In Common wheat, 533 new images will be added by augmentation...


In [44]:
#Count of Images in each category after augmentation...
for i in os.listdir():
  path, dirs, files = next(os.walk(i))
  print (i + " : " + str(len(files)))

Fat Hen : 754
Small-flowered Cranesbill : 757
Cleavers : 756
Black-grass : 756
Sugar beet : 757
Shepherds Purse : 754
Charlock : 755
Loose Silky-bent : 754
Scentless Mayweed : 755
Maize : 756
Common Chickweed : 754
Common wheat : 755


In [45]:
X_train = []
y_train = []

for i in os.listdir():
    print(i)
    if (os.path.isdir(i)):
            for j in os.listdir(i):
                try:
                    dummy = cv2.imread('/content/drive/My Drive/AIML/Projects/Residency 7/plant-seedlings-classification/train/' + i + "/" + j)
                    dummy = cv2.resize(dummy,(128,128)) #resize to have all the images of same size
                    X_train.append(dummy)
                    y_train.append(i)
                except Exception as e:
                    print(e)

Fat Hen
Small-flowered Cranesbill
Cleavers
Black-grass
Sugar beet
Shepherds Purse
Charlock
Loose Silky-bent
Scentless Mayweed
Maize
Common Chickweed
Common wheat


In [46]:
print ("No. of images in X_train: ", len(X_train))
print ("No. of images in X_test: ", len(X_test))
print ("No. of values in y_train: ", len(y_train))

No. of images in X_train:  9063
No. of images in X_test:  794
No. of values in y_train:  9063


In [47]:
print ("Shape of an image in X_train: ", X_train[0].shape)
print ("Shape of an image in X_test: ", X_test[0].shape)

Shape of an image in X_train:  (128, 128, 3)
Shape of an image in X_test:  (128, 128, 3)


In [0]:
#Get lable encoding for y_train

from sklearn import preprocessing

le = preprocessing.LabelEncoder()
le.fit(y_train)
y_train = le.transform(y_train)

In [49]:
print("Total Plant categories (Unique Target): ", len(np.unique(y_train)))

Total Plant categories (Unique Target):  12


In [0]:
y_train = np.array(y_train)
X_train = np.array(X_train)

In [0]:
y_train = tf.keras.utils.to_categorical(y_train, num_classes=12)

###2. Divide the data set into Train and validation data sets

In [52]:
from sklearn.model_selection import train_test_split

X_train2, X_val, y_train2, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=2)
print ("No. of images in train dataset: ", len(X_train2))
print ("No. of images in Validation dataset: ", len(X_val))

No. of images in train dataset:  7250
No. of images in Validation dataset:  1813


In [53]:
print ("X_train2 Shape: ", X_train2.shape)
print ("X_val Shape: ", X_val.shape)
print("y_train2 Shape: ", y_train2.shape)
print("y_val Shape: ", y_val.shape)

X_train2 Shape:  (7250, 128, 128, 3)
X_val Shape:  (1813, 128, 128, 3)
y_train2 Shape:  (7250, 12)
y_val Shape:  (1813, 12)


###3. Initialize & build the model (10 points)

In [0]:
#Import necessary libraries to build the model...
from keras.models import Sequential
from keras.layers import Convolution2D, Dropout, Dense
from keras.layers import BatchNormalization
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.optimizers import adam
from keras.optimizers import sgd
from keras.layers import LeakyReLU

In [55]:
model = Sequential()
model.add(BatchNormalization(input_shape = (128,128,3)))
model.add(Convolution2D(32, (3,3), input_shape = (128, 128, 3), kernel_initializer = 'he_normal')) 
model.add(LeakyReLU(alpha=0.1))
model.add(MaxPooling2D(pool_size=2))
#model.add(Dropout(0.2))

model.add(Convolution2D(filters=64, kernel_size=5, padding='same', kernel_initializer = 'he_normal'))
model.add(LeakyReLU(alpha=0.1))
model.add(MaxPooling2D(pool_size=2))
#model.add(Dropout(0.2))

model.add(Convolution2D(filters=128, kernel_size=4, padding='same', kernel_initializer = 'he_normal'))
model.add(LeakyReLU(alpha=0.1))
model.add(MaxPooling2D(pool_size=2))
#model.add(Dropout(0.2))

model.add(Convolution2D(filters=128, kernel_size=3, padding='same', kernel_initializer = 'he_normal'))
model.add(LeakyReLU(alpha=0.1))
model.add(MaxPooling2D(pool_size=2))
#model.add(Dropout(0.2))

model.add(Convolution2D(filters=128, kernel_size=2, padding='same', kernel_initializer = 'he_normal'))
model.add(LeakyReLU(alpha=0.1))
model.add(MaxPooling2D(pool_size=2))
#model.add(Dropout(0.2))

model.add(Flatten()) 

# fully connected layer
model.add(Dense(units=128, kernel_initializer = 'he_normal'))
model.add(LeakyReLU(alpha=0.1))
model.add(Dense(units = 64, kernel_initializer = 'he_normal'))
model.add(LeakyReLU(alpha=0.1))
#model.add(Dropout(0.3))
model.add(Dense(units = 32, kernel_initializer = 'he_normal'))
model.add(LeakyReLU(alpha=0.1))
#model.add(Dropout(0.3))
model.add(Dense(units = 12, activation = 'softmax')) 


Instructions for updating:
Colocations handled automatically by placer.


In [0]:
#optimizer = adam(lr=0.001)
model.compile(optimizer='adam', loss = 'categorical_crossentropy',metrics = ['accuracy'])

In [57]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
batch_normalization_1 (Batch (None, 128, 128, 3)       12        
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 126, 126, 32)      896       
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU)    (None, 126, 126, 32)      0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 63, 63, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 63, 63, 64)        51264     
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU)    (None, 63, 63, 64)        0         
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 31, 31, 64)        0         
__________

In [58]:
history = model.fit(X_train2,y_train2,
                    epochs=30, 
                    validation_data=(X_val,y_val),
                    verbose = 1,
                    initial_epoch=0)

Instructions for updating:
Use tf.cast instead.
Train on 7250 samples, validate on 1813 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


### 4. Optimize the model (5 points)

#### Already implemented the best activation function and optimizer. 
#### Lets optimize the batch size, epoc and weight initialization.

In [65]:
# Tune Network Weight Initialization
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV

def create_model(init_mode='uniform'):
  model = Sequential()
  model.add(BatchNormalization(input_shape = (128,128,3)))
  model.add(Convolution2D(32, (3,3), input_shape = (128, 128, 3), kernel_initializer = init_mode)) 
  model.add(LeakyReLU(alpha=0.1))
  model.add(MaxPooling2D(pool_size=2))
  #model.add(Dropout(0.2))

  model.add(Convolution2D(filters=64, kernel_size=5, padding='same', kernel_initializer = init_mode))
  model.add(LeakyReLU(alpha=0.1))
  model.add(MaxPooling2D(pool_size=2))
  #model.add(Dropout(0.2))

  model.add(Convolution2D(filters=128, kernel_size=4, padding='same', kernel_initializer = init_mode))
  model.add(LeakyReLU(alpha=0.1))
  model.add(MaxPooling2D(pool_size=2))
  #model.add(Dropout(0.2))

  model.add(Convolution2D(filters=128, kernel_size=3, padding='same', kernel_initializer = init_mode))
  model.add(LeakyReLU(alpha=0.1))
  model.add(MaxPooling2D(pool_size=2))
  #model.add(Dropout(0.2))

  model.add(Convolution2D(filters=128, kernel_size=2, padding='same', kernel_initializer = init_mode))
  model.add(LeakyReLU(alpha=0.1))
  model.add(MaxPooling2D(pool_size=2))
  #model.add(Dropout(0.2))

  model.add(Flatten()) 

  # fully connected layer
  model.add(Dense(units=128, kernel_initializer = init_mode))
  model.add(LeakyReLU(alpha=0.1))
  model.add(Dense(units = 64, kernel_initializer = init_mode))
  model.add(LeakyReLU(alpha=0.1))
  #model.add(Dropout(0.3))
  model.add(Dense(units = 32, kernel_initializer = init_mode))
  model.add(LeakyReLU(alpha=0.1))
  #model.add(Dropout(0.3))
  model.add(Dense(units = 12, activation = 'softmax', kernel_initializer = init_mode)) 
  model.compile(optimizer='adam', loss = 'categorical_crossentropy',metrics = ['accuracy'])
  return model

# create model
modelOp = KerasClassifier(build_fn=create_model, epochs=10, verbose=0)

# define the grid search parameters
init_mode = ['uniform', 'lecun_uniform', 'normal', 'zero', 'glorot_normal', 'glorot_uniform', 'he_normal', 'he_uniform']
param_grid = dict(init_mode=init_mode)

grid = GridSearchCV(estimator=modelOp, param_grid=param_grid, cv=2)
grid_result = grid.fit(X_train2, y_train2)

# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.857655 using {'init_mode': 'lecun_uniform'}
0.803448 (0.000414) with: {'init_mode': 'uniform'}
0.857655 (0.000000) with: {'init_mode': 'lecun_uniform'}
0.851448 (0.004276) with: {'init_mode': 'normal'}
0.078069 (0.002207) with: {'init_mode': 'zero'}
0.813655 (0.024966) with: {'init_mode': 'glorot_normal'}
0.834621 (0.033793) with: {'init_mode': 'glorot_uniform'}
0.844276 (0.005103) with: {'init_mode': 'he_normal'}
0.844552 (0.009793) with: {'init_mode': 'he_uniform'}


In [0]:
# Tune Network Weight Initialization
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV

def create_model(init_mode='uniform'):
  model = Sequential()
  model.add(BatchNormalization(input_shape = (128,128,3)))
  model.add(Convolution2D(32, (3,3), input_shape = (128, 128, 3), kernel_initializer = 'lecun_uniform')) 
  model.add(LeakyReLU(alpha=0.1))
  model.add(MaxPooling2D(pool_size=2))
  #model.add(Dropout(0.2))

  model.add(Convolution2D(filters=64, kernel_size=5, padding='same', kernel_initializer = 'lecun_uniform'))
  model.add(LeakyReLU(alpha=0.1))
  model.add(MaxPooling2D(pool_size=2))
  #model.add(Dropout(0.2))

  model.add(Convolution2D(filters=128, kernel_size=4, padding='same', kernel_initializer = 'lecun_uniform'))
  model.add(LeakyReLU(alpha=0.1))
  model.add(MaxPooling2D(pool_size=2))
  #model.add(Dropout(0.2))

  model.add(Convolution2D(filters=128, kernel_size=3, padding='same', kernel_initializer = 'lecun_uniform'))
  model.add(LeakyReLU(alpha=0.1))
  model.add(MaxPooling2D(pool_size=2))
  #model.add(Dropout(0.2))

  model.add(Convolution2D(filters=128, kernel_size=2, padding='same', kernel_initializer = 'lecun_uniform'))
  model.add(LeakyReLU(alpha=0.1))
  model.add(MaxPooling2D(pool_size=2))
  #model.add(Dropout(0.2))

  model.add(Flatten()) 

  # fully connected layer
  model.add(Dense(units=128, kernel_initializer = 'lecun_uniform'))
  model.add(LeakyReLU(alpha=0.1))
  model.add(Dense(units = 64, kernel_initializer = 'lecun_uniform'))
  model.add(LeakyReLU(alpha=0.1))
  #model.add(Dropout(0.3))
  model.add(Dense(units = 32, kernel_initializer = 'lecun_uniform'))
  model.add(LeakyReLU(alpha=0.1))
  #model.add(Dropout(0.3))
  model.add(Dense(units = 12, activation = 'softmax')) 
  model.compile(optimizer='adam', loss = 'categorical_crossentropy',metrics = ['accuracy'])
  return model

# create model
modelOp = KerasClassifier(build_fn=create_model, verbose=0)

# define the grid search parameters
batch_size = [10, 20, 40, 60, 80, 100]
epochs = [10, 30, 50, 100]
param_grid = dict(batch_size=batch_size, epochs=epochs)

grid = GridSearchCV(estimator=modelOp, param_grid=param_grid, n_jobs=1, scoring="accuracy", cv=2)
grid_result = grid.fit(X_train2, np.argmax(y_train2, axis=1))

# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

###Final Model

#### Implementing the optimum hyperparameter based on above results...

In [0]:
model = Sequential()
model.add(BatchNormalization(input_shape = (128,128,3)))
model.add(Convolution2D(32, (3,3), input_shape = (128, 128, 3), kernel_initializer = 'lecun_uniform')) 
model.add(LeakyReLU(alpha=0.1))
model.add(MaxPooling2D(pool_size=2))
#model.add(Dropout(0.2))

model.add(Convolution2D(filters=64, kernel_size=5, padding='same', kernel_initializer = 'lecun_uniform'))
model.add(LeakyReLU(alpha=0.1))
model.add(MaxPooling2D(pool_size=2))
#model.add(Dropout(0.2))

model.add(Convolution2D(filters=128, kernel_size=4, padding='same', kernel_initializer = 'lecun_uniform'))
model.add(LeakyReLU(alpha=0.1))
model.add(MaxPooling2D(pool_size=2))
#model.add(Dropout(0.2))

model.add(Convolution2D(filters=128, kernel_size=3, padding='same', kernel_initializer = 'lecun_uniform'))
model.add(LeakyReLU(alpha=0.1))
model.add(MaxPooling2D(pool_size=2))
#model.add(Dropout(0.2))

model.add(Convolution2D(filters=128, kernel_size=2, padding='same', kernel_initializer = 'lecun_uniform'))
model.add(LeakyReLU(alpha=0.1))
model.add(MaxPooling2D(pool_size=2))
#model.add(Dropout(0.2))

model.add(Flatten()) 

# fully connected layer
model.add(Dense(units=128, kernel_initializer = 'lecun_uniform'))
model.add(LeakyReLU(alpha=0.1))
model.add(Dense(units = 64, kernel_initializer = 'lecun_uniform'))
model.add(LeakyReLU(alpha=0.1))
#model.add(Dropout(0.3))
model.add(Dense(units = 32, kernel_initializer = 'lecun_uniform'))
model.add(LeakyReLU(alpha=0.1))
#model.add(Dropout(0.3))
model.add(Dense(units = 12, activation = 'softmax')) 

In [0]:
#optimizer = adam(lr=0.001)
model.compile(optimizer='adam', loss = 'categorical_crossentropy',metrics = ['accuracy'])

In [68]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
batch_normalization_19 (Batc (None, 128, 128, 3)       12        
_________________________________________________________________
conv2d_91 (Conv2D)           (None, 126, 126, 32)      896       
_________________________________________________________________
leaky_re_lu_145 (LeakyReLU)  (None, 126, 126, 32)      0         
_________________________________________________________________
max_pooling2d_91 (MaxPooling (None, 63, 63, 32)        0         
_________________________________________________________________
conv2d_92 (Conv2D)           (None, 63, 63, 64)        51264     
_________________________________________________________________
leaky_re_lu_146 (LeakyReLU)  (None, 63, 63, 64)        0         
_________________________________________________________________
max_pooling2d_92 (MaxPooling (None, 31, 31, 64)        0         
__________

In [69]:
history = model.fit(X_train2,y_train2,
                    epochs=30, 
                    validation_data=(X_val,y_val),
                    verbose = 1,
                    initial_epoch=0, batch_size=60)

Train on 7250 samples, validate on 1813 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


### 5. Predict the accuracy for both train and validation data (5 points)

In [0]:
predictions_train = model.predict(X_train2)
predictions_val = model.predict(X_val)

In [85]:
from sklearn import metrics
print ("Train Accuracy: ", metrics.accuracy_score(np.argmax(y_train2, axis=1), np.argmax(predictions_train, axis=1)))

Train Accuracy:  0.9911724137931035


In [82]:
from sklearn.metrics import classification_report

print ("Classification Report for train data")
print(classification_report(np.argmax(y_train2, axis=1), np.argmax(predictions_train, axis=1)))

Classification Report for train data
              precision    recall  f1-score   support

           0       0.93      1.00      0.96       591
           1       1.00      1.00      1.00       618
           2       1.00      0.99      1.00       595
           3       0.99      0.99      0.99       602
           4       0.99      0.99      0.99       601
           5       0.99      1.00      0.99       595
           6       1.00      0.94      0.97       622
           7       1.00      1.00      1.00       594
           8       1.00      0.99      0.99       618
           9       1.00      1.00      1.00       599
          10       1.00      1.00      1.00       603
          11       1.00      1.00      1.00       612

   micro avg       0.99      0.99      0.99      7250
   macro avg       0.99      0.99      0.99      7250
weighted avg       0.99      0.99      0.99      7250



In [86]:
print ("Val Accuracy: ", metrics.accuracy_score(np.argmax(y_val, axis=1), np.argmax(predictions_val, axis=1)))

Val Accuracy:  0.915057915057915


In [83]:
print ("Classification Report for Validation data")
print(classification_report(np.argmax(y_val, axis=1), np.argmax(predictions_val, axis=1)))

Classification Report for Validation data
              precision    recall  f1-score   support

           0       0.75      0.90      0.82       165
           1       0.94      0.95      0.95       137
           2       0.94      0.96      0.95       161
           3       0.92      0.92      0.92       152
           4       0.91      0.97      0.94       154
           5       0.91      0.91      0.91       159
           6       0.84      0.67      0.75       132
           7       0.96      0.98      0.97       162
           8       0.96      0.88      0.92       137
           9       0.95      0.93      0.94       155
          10       0.99      0.97      0.98       154
          11       0.96      0.91      0.93       145

   micro avg       0.92      0.92      0.92      1813
   macro avg       0.92      0.91      0.91      1813
weighted avg       0.92      0.92      0.91      1813



### Observations:
#### The category 0 and 6 have very less f1 score. We should make the model learn more on these categories. We can do more augmentation in these categories and make model learn better.

### Predict Test and upload the result file to Kaggle

In [0]:
X_test = np.array(X_test)
predictions = model.predict(X_test)

In [0]:
FinalPred = np.argmax(predictions, axis=1)

In [0]:
y_test = le.inverse_transform(FinalPred)

In [0]:
X_test_name = np.array(X_test_name)
y_test = np.array(y_test)
dataset = pd.DataFrame({'file': X_test_name, 'species': y_test}, columns=['file', 'species'])

In [74]:
dataset.head()

Unnamed: 0,file,species
0,c0461776c.png,Common Chickweed
1,4bbfd1e05.png,Cleavers
2,1d0cbd819.png,Loose Silky-bent
3,93079d970.png,Sugar beet
4,856f2910a.png,Small-flowered Cranesbill


In [0]:
dataset.to_csv("/content/drive/My Drive/AIML/Projects/Residency 7/plant-seedlings-classification/ResultsGanga.csv", index=False) 

### Upload File to Kaggle...

In [87]:
from googleapiclient.discovery import build
import io, os
from googleapiclient.http import MediaIoBaseDownload
from google.colab import auth
auth.authenticate_user()
drive_service = build('drive', 'v3')
results = drive_service.files().list(
        q="name = 'kaggle.json'", fields="files(id)").execute()
kaggle_api_key = results.get('files', [])
filename = "/root/.kaggle/kaggle.json"  # NOTE: This is different from the Medium post!
os.makedirs(os.path.dirname(filename), exist_ok=True)
request = drive_service.files().get_media(fileId=kaggle_api_key[0]['id'])
fh = io.FileIO(filename, 'wb')
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
    status, done = downloader.next_chunk()
    print("Download %d%%." % int(status.progress() * 100))
os.chmod(filename, 600)

Download 100%.


In [88]:
!kaggle competitions submit -c plant-seedlings-classification -f "/content/drive/My Drive/AIML/Projects/Residency 7/plant-seedlings-classification/ResultsGanga.csv" -m "GUpload"

100% 21.9k/21.9k [00:00<00:00, 61.7kB/s]
Successfully submitted to Plant Seedlings Classification

###Observations:
### Got 88.99% accurecy when uploaded the test predictions in Kaggle.