# Plant Pathology
*Identify the category of foliar diseases in apple trees.*

Folder structuur voor een snel en makkelijk overzicht:
```bash
.
├───datasets
│   └───train.csv
├───images
│   ├───resized_train_images
│   └───train_images
├───foliar_diseases.ipynb
└───image_resizer.ipynb
```


## 0. Modules importeren

In [0]:
import os
os.environ["KERAS_BACKEND"] = "tensorflow"
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn.preprocessing import OneHotEncoder
from sklearn import preprocessing

import matplotlib.image as mpimg
from skimage.io import imread, imshow
from skimage import data, color, io, filters, morphology,transform, exposure, feature, util
from scipy import ndimage
#importeer Tensorflow namespaces
import keras
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Activation
from keras.optimizers import SGD
from tensorflow.python.keras.layers import Dense, Dropout, Flatten, BatchNormalization
from tensorflow.python.keras.layers import Conv2D, MaxPooling2D
from tensorflow.python.keras.callbacks import EarlyStopping
from keras import backend as K
from keras.utils import np_utils
from keras.preprocessing import image
from keras.utils.np_utils import to_categorical
import cv2
import tensorflow



import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

device_name="/gpu:0"
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession

config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)

#K.set_image_dim_ordering('tf')

## 1. Data inlezen.

### We maken een lijst van alle fotos in onze dataset.
\> lijst met lables  
\> lijst met fotos

In [0]:
directory = "images/resized_train_images/"
labels = pd.read_csv("datasets/train.csv")
label = labels["labels"].tolist()
images = labels["image"].tolist()
print(label[0], images[0])
img = io.imread(directory + images[0])/255
width = img.shape[0]
height = img.shape[1]

### LabelEncoder gebruiken om labels naar integers te converteren.

In [0]:
le = preprocessing.LabelEncoder()
le.fit(np.array(label).reshape(-1,))
y_encoded = le.transform(np.array(label).reshape(-1,))
print(y_encoded)

### Fotos inlezen.

In [0]:
X = np.zeros((len(y_encoded),width,height,3)).astype(np.float32)

for i,img in enumerate(images):
    image = io.imread(directory + img)/255
    resized_image = image.astype(np.float32)
    X[i]=resized_image

### Train -en testsplits maken.

In [0]:
X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.33, random_state=1234)

print(X_train.shape)
print(y_train.shape)

### De uitkomst one-hot-encoderen.

In [0]:
y_train_onehot = to_categorical(y_train)

## 2. Basic CNN model

### Model builden, compileren en trainen:

We gebruiken een 'sequential model':

\> *Convolutional*:
 - 32 Feature maps + ReLU activation function  
 - MAX pooling
 - Batch Normalization

\> *Neural Network*:
 - Flatten (2D foto omzetten in een array)  
 - ReLU activation function  
 - Softmax activation function (12 outputs -> 11 plantenziekten + gezonde plant)

In [0]:
# Architecture

input_shape=(width,height,3)

model = Sequential()

model.add(Conv2D(32, kernel_size=(3, 3), activation='relu',input_shape=(input_shape))) 
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(BatchNormalization())
    
model.add(Flatten()) 
model.add(Dense(50, activation='relu')) 
model.add(Dense(12, activation='softmax'))

# Compile
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
with tf.device(device_name):
    # Train 
    early_stopping =  EarlyStopping(patience=10)

    fit1 = model.fit(X_train, y_train_onehot,batch_size=32, epochs=100,
                        validation_split=0.2,  callbacks=[early_stopping], verbose=1)

### Plotten

In [0]:
# Accuray 
plt.plot(fit1.history['acc'],'b', fit1.history['val_acc'],'r')

plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper left')
plt.show()
# Loss 
plt.plot(fit1.history['loss'],'b', fit1.history['val_loss'],'r')

plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper left')
plt.show()

Uiteindelijk komen we op een accuracy/accuraatheid van 39,61%. **TO EDIT**

## 3. Data augmentatie

### Train/Validatie split

In [0]:
# splitting
X_train2, X_val, y_train2, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=123)

# one-hot encoding
y_train2_onehot = to_categorical(y_train2)
y_val_onehot = to_categorical(y_val)

### Model builden en compileren.

In [0]:
# Model
input_shape=(width,height,3)

def get_model():
    model = Sequential()

    model.add(Conv2D(32, kernel_size=(3, 3), activation='relu',input_shape=(input_shape))) 
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(BatchNormalization())
    
    model.add(Flatten()) 
    model.add(Dense(50, activation='relu')) 
    model.add(Dense(12, activation='softmax'))
    return(model)

model = get_model() 

model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

### ImageDataGenerator initiëren.

In [0]:
from keras.preprocessing.image import ImageDataGenerator
aug = ImageDataGenerator()
 

### Model trainen.
Aan de hand van 100 Epochs

In [0]:
early_stopping =  EarlyStopping(patience=5)
fit2_aug = model.fit_generator(aug.flow(X_train2, y_train2_onehot,batch_size=32), epochs=100,
                    steps_per_epoch=len(X_train2) // 32, callbacks=[early_stopping], verbose=1,validation_data = (X_val,y_val_onehot))


### Plot vergelijken met plot in stap 2.

In [0]:
plt.plot( fit1.history['val_loss'],'g', fit2_aug.history['val_loss'],'b')

plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['model', 'models_with_aug'], loc='upper left')
plt.show()

**TODO: CONCLUSION**

## 4. Transfer learning via VGG19

### VGG19 Model initialiseren

In [0]:
modelVGG19 = tensorflow.keras.applications.vgg19.VGG19(include_top=False, weights='imagenet', input_shape=(width,height,3))
type(modelVGG19)

### VGG19 model omzetten naar sequentieel model.

In [0]:
# Store as a sequential model.

model = Sequential()

for layer in modelVGG19.layers[:]:
    model.add(layer)

model.summary()

### Layers in het model vastzetten.

In [0]:
for layer in model.layers:
    layer.trainable = False

### Top layers toevoegen aan het model

 - Flatten  
 - ReLU activation function  
 - Softmax activation function (12 outputs -> 11 plantenziekten + gezonde plant)

In [0]:
model.add(Flatten()) 
model.add(Dense(100, activation='relu')) 
model.add(Dense(12, activation='softmax'))

### Model compileren en trainen

In [0]:
#Compile
model.compile(loss='categorical_crossentropy',optimizer ='adam',metrics=['accuracy'])
model.summary()

# Train
batch_size = 32
early_stopping =  EarlyStopping(patience=3)
history = model.fit(X_train, y_train_onehot,batch_size=batch_size, epochs=100,
                    validation_split=0.2, callbacks=[early_stopping])


### Vergelijken met vorige modellen aan de hand van de plots
(Zie plots uit stap 2 en 3)

In [0]:
plt.plot( fit1.history['val_loss'],'g', fit2_aug.history['val_loss'],'r', history.history['val_loss'],'b')

plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['model1', 'model2_aug', 'model3_transfer_learning'], loc='upper left')
plt.show()

**TODO: CONCLUSION**

## 5. Uitvoeren op testset

In [0]:
test_predict = model.predict_classes(X_test)
print(classification_report(y_test, test_predict))