<a href="https://colab.research.google.com/github/pejmanrasti/FormationUnivAngers/blob/main/Jour1/CNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Weeds vs plants Image Classification


## Specific concepts that will be covered:
In the process, we will build practical experience and develop intuition around the following concepts

* Building _data input pipelines_ using the `tf.keras.preprocessing.image.ImageDataGenerator` class — How can we efficiently work with data on disk to interface with our model?


## We will follow the general machine learning workflow:

1. Examine and understand data
2. Build an input pipeline
3. Build our model
4. Train our model
5. Test our model
6. Improve our model/Repeat the process

<hr>

**Before you begin**

Before running the code in this notebook, reset the runtime by going to **Kernel -> Restart & clear output** in the menu above. If you have been working through several notebooks, this will help you avoid reaching memory limits.


# Importing packages

Let's start by importing required packages:

*   os — to read files and directory structure
*   numpy — for some matrix math outside of TensorFlow
*   matplotlib.pyplot — to plot the graph and display images in our training and validation data

In [None]:
%tensorflow_version 1.x
from __future__ import absolute_import, division, print_function, unicode_literals

In [None]:
import tensorflow as tf

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

In [None]:
import os
import numpy as np
import matplotlib.pyplot as plt
import datetime

import numpy as np   # Package for scientific computing
import matplotlib.pyplot as plt # 2D plotting library
import os     # Using operating system
import cv2    # Computer vision and machine learning software library
from tqdm import tqdm   # Progress bar library
import random  # Generating Random Numbers
import pickle # Serializing and de-serializing a Python object structure
from os.path import join as pj # for path ops

# Data Loading

To build our image classifier, we begin by loading the dataset. The dataset we are using is a consists of weeds and plants).

In this notebook, we will make use of the class `tf.keras.preprocessing.image.ImageDataGenerator` which will read data from disk. We therefore need to directly of data.

We'll now assign variables with the proper file path for the training and validation sets.

In [None]:
from google.colab import drive
root = '/content/gdrive/'
drive.mount( root )

In [None]:
# create permanent directory in gdrive
data_dir_path = r'/My Drive/FormationUA/'
os.makedirs(root+data_dir_path, exist_ok=True)
os.listdir(root+data_dir_path)

In [None]:
!unzip -q "/content/gdrive/My Drive/FormationUA/data.zip"

In [None]:
ls data/

In [None]:
base_dir = 'data'
train_dir = os.path.join(base_dir, 'train')
validation_dir = os.path.join(base_dir, 'Validation')

In [None]:
train_weeds_dir = os.path.join(train_dir, 'weeds')  # directory with our training weed pictures
train_plants_dir = os.path.join(train_dir, 'plants')  # directory with our training plant pictures
validation_weeds_dir = os.path.join(validation_dir, 'weeds')  # directory with our validation weed pictures
validation_plants_dir = os.path.join(validation_dir, 'plants')  # directory with our validation plant pictures

### Understanding our data

Let's look at how many plants and weeds images we have in our training and validation directory

In [None]:
num_weeds_tr = len(os.listdir(train_weeds_dir))
num_plants_tr = len(os.listdir(train_plants_dir))

num_weeds_val = len(os.listdir(validation_weeds_dir))
num_plants_val = len(os.listdir(validation_plants_dir))

total_train = num_weeds_tr + num_plants_tr
total_val = num_weeds_val + num_plants_val

In [None]:
print('total training weeds images:', num_weeds_tr)
print('total training plants images:', num_plants_tr)

print('total validation weeds images:', num_weeds_val)
print('total validation plants images:', num_plants_val)
print("--")
print("Total training images:", total_train)
print("Total validation images:", total_val)

# Reading Data

Here, we read and have applied normalization to our training images.

In [None]:
IMG_SIZE_H=256 # you need to set up a numerical value here. Useful to resize to normalize data size
IMG_SIZE_W=256 # you need to set up a numerical value here. Useful to resize to normalize data size
def read_data(DATADIR):
  input_data = []
  CATEGORIES = os.listdir(DATADIR) 
  for category in CATEGORIES:  # do plants and weeds

      path = os.path.join(DATADIR,category)  # create path to the labels
      class_num = CATEGORIES.index(category)  # get the classification  (0 or a 1). 0=plants 1=weeds

      for img in tqdm(os.listdir(path)):  # iterate over each image per plants and weeds
      
          img_array = cv2.imread(os.path.join(path,img))  # convert to array 
          new_array = cv2.resize(img_array, (IMG_SIZE_H, IMG_SIZE_W))  # resize to normalize data size
          input_data.append([new_array, class_num])  # add this to our training_data
  
  
  return input_data

In [None]:
training_data = read_data(train_dir)  # Calling the function for reading Training images and labels
Validation_data = read_data(validation_dir)  # Calling the function for reading Training images and labels

**Preparation of data for feeding into a CNN model**


In [None]:
random.shuffle(training_data)   # Shuffling data 
random.shuffle(Validation_data)   # Shuffling data 
X = []  # An Array for Training images
y = []  # An Array for Training labels
X_val = []  # An Array for Validation images
y_val = []  # An Array for Validation labels

for features,label in training_data:   # Seperation of iamegs and labels
    X.append(features)
    y.append(label)

print("Total training images:",np.array(X).shape) # Print the size of the database

for features,label in Validation_data:   # Seperation of iamegs and labels
    X_val.append(features)
    y_val.append(label)
print("Total validation images:", np.array(X_val).shape) # Print the size of the database

In [None]:
X = np.array(X).reshape(-1, IMG_SIZE_H, IMG_SIZE_W, 3)  # Reshape data in a form that is suitable for keras
X_val = np.array(X_val).reshape(-1, IMG_SIZE_H, IMG_SIZE_W, 3)  # Reshape data in a form that is suitable for keras
print(X.shape) # Print the size of the database
print(X_val.shape) # Print the size of the database

Let's visualize how a single image would look like five different times, when we pass these augmentations randomly to our dataset. 

In [None]:
# plot 3 images as gray scale
plt.subplot(131)
plt.imshow(cv2.cvtColor(X[0,:,:,:], cv2.COLOR_BGR2RGB))
plt.subplot(132)
plt.imshow(cv2.cvtColor(X[20,:,:,:], cv2.COLOR_BGR2RGB))
plt.subplot(133)
plt.imshow(cv2.cvtColor(X[100,:,:,:], cv2.COLOR_BGR2RGB)) 
# show the plot
plt.show()

# Setting Model Parameters

# Model Creation

## Define the model

The model consists of four convolution blocks with a max pool layer in each of them.

Before the final Dense layers, we're also applying a Dropout probability of 0.5. It means that 50% of the values coming into the Dropout layer will be set to zero. This helps to prevent overfitting.

Then we have a fully connected layer with 512 units, with a `relu` activation function. The model will output class probabilities for two classes — plants and weeds — using `softmax`. 

In [None]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(256, 256, 3)),
    tf.keras.layers.MaxPooling2D(2, 2),

    tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),

    tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),

    tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    
    tf.keras.layers.Conv2D(128, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2),

    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dense(2, activation='softmax')
])

### Compiling the model

As usual, we will use the `adam` optimizer. Since we output a softmax categorization, we'll use `sparse_categorical_crossentropy` as the loss function. We would also like to look at training and validation accuracy on each epoch as we train our network, so we are passing in the metrics argument.

In [None]:
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

### Model Summary

Let's look at all the layers of our network using **summary** method.

In [None]:
model.summary()

### Train the model

It's time we train our network.

In [None]:
# saving the log and show it by tensorboard
from tensorboardcolab import TensorBoardColab, TensorBoardColabCallback
tbcCNN=TensorBoardColab()

In [None]:
history = model.fit(
    X, y, 
    validation_data=(X_val, y_val),
    epochs=100, batch_size=32,
    verbose=1,
    callbacks=[TensorBoardColabCallback(tbcCNN)])