# IFQ619. Module 8 - Exercises

## Uncertainty In Flower Data

You've been contracted by a floral research company in Canada to assist with a dilemma they are facing concerning some uncertainty in their data.

You see, they recently held an expedition to count and characterise flowers in a nearby forest, to inform a local honey company on the yield and quality of flowers for the year. However, some of their papers were damaged during the expedition, causing the species types on certain records to be de-identified, and leaving only the measurements of the flowers behind.

Their management team is upset, considering that the expedition was expensive to fund, and they still need to submit the report to the honey company.

**They want to know if there is some way to confidently determine the flower species of the damaged records?**

## Main Libraries

In [None]:
# You may possibly need to uncomment these lines to install the required libraries

#!pip install pytest
#!pip install pandas_profiling

# For reproduciability reasons:

import numpy as np
import random as rn
import tensorflow as tf
import csv

# Necessary for starting numpy generated random numbers in an initial state

np.random.seed(515)

# Necessary for starting core Python generated random numbers in a state

rn.seed(515)

# Force TensorFlow to single thread

# Multiple threads are a potential source of non-reprocible research results

session_conf = tf.compat.v1.ConfigProto( intra_op_parallelism_threads=1,
                                          inter_op_parallelism_threads=1 )

# tf.set_random_seed() will make random number generation in the TensorFlow backend
# have a well defined initial state
# more details: https://www.tensorflow.org/api_docs/python/tf/set_random_seed

tf.compat.v1.set_random_seed(515)

In [None]:
from __future__ import absolute_import, division, print_function, unicode_literals

import pandas as pd
import numpy as np
#import pandas_profiling
import seaborn as sns
#sns.set_style('dark')
import re
#sns.set(style="ticks", context="talk")
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

# update for tensorflow

# keras / deep learning libraries

from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.models import model_from_json
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Nadam
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras.utils import plot_model

# callbacks

from tensorflow.keras import backend as K
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.callbacks import ReduceLROnPlateau

from matplotlib import pyplot as plt
from matplotlib.pyplot import figure
import matplotlib.image as mpimg
import pylab as pl
from pylab import savefig

plt.style.use('seaborn-deep')
%matplotlib inline

from sklearn import datasets
from sklearn.model_selection import train_test_split
# very important for feature transformation
from sklearn.preprocessing import OneHotEncoder, StandardScaler,MinMaxScaler


### Aux Functions: Loading Trained Models

In [None]:
# LOAD_MODEL_HISTORY ------------------------------------------
def load_model_history( model_name, path):

    model_hist_loaded = {}
    values = []

    # load dictionary
    r = open( path + model_name + "_hist.csv", "r").read()
    for line in r.split("\n"):
        if(len(line) == 0):
            continue
  
        metric = line.split(",\"[")[0]                                    # extract metrics
        values_str = line.split(",\"[")[1].replace("]\"","").split(", ")  # extract validation values
        values = [float(val_str) for val_str in values_str]
        
        model_hist_loaded.update( {metric : values} )
    return model_hist_loaded

# LOAD_MODEL ------------------------------------------
def load_model( model_name, path ):
    json_file = open( path + model_name +  "_DUO.json", 'r')
    loaded_model_json = json_file.read()
    json_file.close()

    # load weights into new model
    loaded_model = model_from_json(loaded_model_json)
    loaded_model.load_weights(path + model_name +  "_DUO.h5")
    print("Loaded model from disk")
    return loaded_model

### The Dataset

They provide you with the dataset of the undamaged records. It appears that they were recording the measurements of the flower sizes to indicate the quality of the surveyed flowers. Perhaps we can use this data to inform an analysis that might provide some reassurance under the current uncertainty.

In [None]:


dataset_path = "iris.csv"
class_var = "species"
dataset = pd.read_csv( dataset_path )
dataset

Here we can see the features for which we will be stipulating the classes (however that will come later).

In [None]:
dataset.drop([class_var], axis=1)

Machines cannot understand text and we need to convert these labels to numbers. It is better for the classifier to have one output neuron for each class. This means we will have to transform this data into an M x 3 matrix

In [None]:
pd.DataFrame(dataset[class_var])

### Checking Class Balance



Here, we're going to try to detect imbalances in the class labels. A classifier needs to have the same amount of instances for each class, otherwise it will be biased towards one of them

In [None]:
dataset.groupby(class_var).count()

### Feature Transformation

In [None]:
# separate variables into independent variables and dependent variable

feature_names = dataset.columns.to_list()
feature_names.remove(class_var)
labels = dataset[class_var].unique()

# select features from dataset
X = 

# select the class values from dataset
y = 

# general info about number of features, samples, and classes
n_features =  
n_samples = 
n_classes = 

print("There are a total of %d training instances, %d features and a total of %d classes\n" %(n_samples, n_features, n_classes))

In [None]:
# create numerical encoding for attribute species
# each class will be in one neuron, one column in a matrix
# 'setosa' - index 0
# 'versicolor' - index 1 
# 'virginica' - index 2

enc = OneHotEncoder()

# transform the class variable using OneHot encoder 
Y = 
Y[0:10,:]

In [None]:
# Scale data to have mean 0 and variance 1 
# which is importance for convergence of the neural network

scaler = MinMaxScaler()
# transform features using MinMaxScaler
X_scaled = 

# taking a look
X_scaled[0:10,:]

#### Get some data

In [None]:
# Split the data set into training, testing and validation

X_train, X_test, Y_train, Y_test = 
X_validation, X_test, Y_validation, Y_test = 


####  Train Model

In [None]:
# create a Neural Network with
# 3 hidden layers
# 12 neurons in each hidden layer
# activation function is ReLu
# output layer uses softmax

model = 



In [None]:
# compile the model with 
# loss function = categorical_crossentropy
# optimization function - nadam
# metrics - accuracy



In [None]:
# fit model to data

early_stop = EarlyStopping(monitor='val_loss', patience=patience, verbose=1, mode='min')
callbacks_list = [early_stop]

history_callback = model.fit(X_train, Y_train, batch_size = 12, epochs = 150,
                                 verbose=0, validation_data=(X_validation, Y_validation), callbacks=callbacks_list)
    



#### Evaluate Model

In [None]:
# evaluate model performance in training data
train_loss,train_acc= 

# evaluate model performance in test data
test_loss,test_acc =

print('[Accuracy] Train: %.3f, Test: %.3f' % (train_acc, test_acc))
print('[Loss] Train: %.3f, Test: %.3f' % (train_loss, test_loss))


In [None]:
# get model's training history
plt.plot(best_model_hist_loaded['accuracy'], label='train')
plt.plot(best_model_hist_loaded['val_accuracy'], label='test')
plt.ylabel('Accuracy')
plt.xlabel('Number of Epochs')
plt.ylim([0, 1])
plt.legend()
plt.show()

plt.plot(best_model_hist_loaded['loss'], label='train')
plt.plot(best_model_hist_loaded['val_loss'], label='test')
plt.ylabel('Loss')
plt.xlabel('Number of Epochs')
plt.ylim([0, 1])
plt.legend()
plt.show()

### Generating Explanations



In [None]:
import lime
from lime import lime_tabular

In [None]:
feature_names = dataset.columns.to_list()
feature_names.remove(class_var)


In [None]:
# the dataset does not have many features
# let's use all of them
MAX_FEAT = len(feature_names)

In [None]:
# Calling LIME's explainer for Tabular data
explainer = 


In [None]:
# instance to be explained
flower_indx = 0

flower_feat = X_scaled[flower_indx,:]
flower_true_pred = enc.inverse_transform(np.expand_dims(Y[flower_indx,:], 0))[0][0]

print("Flower id: %d \t Groundtruth Label: %s\n" %(flower_indx, flower_true_pred))


#### Generating Lime Explanations

In [None]:
# explain instance using lime
exp = 
exp.show_in_notebook(show_table=True)

In [None]:
fig = exp_good.as_pyplot_figure(label=labels.tolist().index(pred_good))

### Generating Many Lime Explanations


In [None]:
%matplotlib notebook

for flower_indx in range(0, len(X_test)):
    
    flower_feat = X_test[flower_indx,:]
    flower_true_pred = enc.inverse_transform(np.expand_dims(Y_test[flower_indx,:], 0))[0][0]
    
    pred_good = best_model_loaded.predict(np.expand_dims(flower_feat, 0))
    pred_good = enc.inverse_transform( pred_good )[0][0]

    # explain instance using a good model
    exp_good = explainer.explain_instance(flower_feat, best_model_loaded.predict, num_features= MAX_FEAT, 
                                 labels=[labels.tolist().index(pred_good)])
    
    exp_good.show_in_notebook(show_table=True)
    exp_good.as_pyplot_figure(label=labels.tolist().index(pred_good),)
    print("---------------------------------------------------------------")
    
