## Tensor Flow Problem Sheet

These problems relate to the [TensorFlow](https://www.tensorflow.org/) python library for pattern recognition. This notebook uses the [Iris Data Set](https://archive.ics.uci.edu/ml/datasets/iris).

### 1.  Use Tensorflow to create a model.
Create a model that uses a flower's sepal width / length and petal width / length to predict the species of Iris.

In [1]:
# Adapted from: https://gist.github.com/NiharG15/cd8272c9639941cf8f481a7c4478d525

import numpy as np
import tensorflow as tf
import keras as kr
# SciKit (http://scikit-learn.org/stable/index.html) has good functionality for dealing with datasets - 
# http://scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets
import sklearn.datasets as skds
import sklearn.preprocessing as skpp

# Load the dataset and print out the first 5 rows to make sure.
iris = skds.load_iris()   # Inbuilt function - http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html
print('First 5 rows: \n ', iris.data[:5])

x = iris.data # let x equal the full set of data in its original form (columns for sepal width/length etc)
y_ = iris.target.reshape(-1, 1) # let y equal a single column of all data, for one hot encoding purposes

# One Hot Encode - formats data to better fit classification algorithms in machine learning. See note.
# http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html#sklearn.preprocessing.OneHotEncoder
encoder = skpp.OneHotEncoder(sparse=False)
y = encoder.fit_transform(y_) # y is now the one hot encoded version of the dataset
#print(y) # Uncomment the print statement to see better example of the table printed in the OneHotEncoding explaination.


Using TensorFlow backend.


First 5 rows: 
  [[ 5.1  3.5  1.4  0.2]
 [ 4.9  3.   1.4  0.2]
 [ 4.7  3.2  1.3  0.2]
 [ 4.6  3.1  1.5  0.2]
 [ 5.   3.6  1.4  0.2]]


**NB:** While searching through examples of classification prediction in TensorFlow/Keras, I came across the term *One Hot Encoding* (OHE) a lot. After some online searching, I found that OHE is basically a way of transforming categorical features, such as plant type/classification, to a format that works better for machine learning algorithms. From my understanding, boolean columns are generated for each entry in the dataset - one column per type of entry. If an entry is a particular type, a `1` appears in the column, if not a `0`. For example, a randomised iris data set might look like this:

|     | Setosa        | Versicolor    | Virginica  |
|---- | ------------- |:-------------:|:-----:     |
|Plant1     | 0      | 1 | 0     |
|Plant2     | 1      | 0  |  0    |
|Plant3     | 0 | 0     |  1     |

In this table, Plant1 is a versicolor, Plant2 is a setosa, and Plant3 is a virginica.  
See https://www.quora.com/What-is-one-hot-encoding-and-when-is-it-used-in-data-science for a slightly longer explaination.

In [2]:
# Actual building of the model
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam

model = Sequential() # Sequential models have a linear stack of layers

# Dense = standard densely connected layer
model.add(Dense(10, input_shape=(4,), activation='relu')) # Model will take as input arrays of shape (*, 4) and output arrays of shape (*, 10)
model.add(Dense(10, activation='relu'))  # Don't need to specify input shape after first layer. relu = Rectified Linear Unit
model.add(Dense(3, activation='softmax', name='output')) # Use softmax only in last layer because... see here https://stackoverflow.com/a/37601915/7232648

# Adam optimizer with learning rate of 0.001 - Adam algorithm used when datasets have a seemingly random pattern
# See https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/ for further reading
optimizer = Adam(lr=0.001) 
model.compile(optimizer, loss='categorical_crossentropy', metrics=['accuracy']) # Categorical crossentropy used for catergorical based datasets, like Iris

print('Model Summary: ')
print(model.summary()) # Prints a summary of the model - entire table below

Model Summary: 
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 10)                50        
_________________________________________________________________
dense_2 (Dense)              (None, 10)                110       
_________________________________________________________________
output (Dense)               (None, 3)                 33        
Total params: 193
Trainable params: 193
Non-trainable params: 0
_________________________________________________________________
None


### 2. Split the data into training and testing sets.
Instructions: Investigate the *best way to do this* - write some code to randomly separate data if desired. Reference relevant material.

I've decided to use the `train_test_split` function, from the `model_selection` class in SciKit Learn, because it handles spliting arrays into randomised subsets very simply. A full list of parameters can be found [here](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html#sklearn.model_selection.train_test_split), but in this instance we'll use the following two:
1. Arrays - a sequence of indexables with same length. In this case, `x` (original) and `y` (OHE).
2. `test-size` - the size of the resulting test array. Option can be set to either a `float` between 0.0 and 1.0 to indicate the percentage of the original set to be used in the test set, or an int to indicate the number of entries to be used in the test set. Because the test size has been defined, we don't need to define the training size - the remaining percentage or entries will automatically be put into the training set.

In [3]:
import sklearn.model_selection as skms # for splitting a set

# Split the data for training and testing
train_x, test_x, train_y, test_y = skms.train_test_split(x, y, test_size=0.75)

# Probably a better way of formatting this but will do for now.
print("Training data:                           ", "Test data:")
for i in range(5):
    print('{0:} {1:} {2:} {3:} {4:}'.format(train_x[i], train_y[i],'     ', test_x[i], test_y[i]))

Training data:                            Test data:
[ 6.7  3.   5.   1.7] [ 0.  1.  0.]       [ 5.5  2.5  4.   1.3] [ 0.  1.  0.]
[ 5.7  2.8  4.1  1.3] [ 0.  1.  0.]       [ 7.1  3.   5.9  2.1] [ 0.  0.  1.]
[ 5.9  3.2  4.8  1.8] [ 0.  1.  0.]       [ 5.4  3.   4.5  1.5] [ 0.  1.  0.]
[ 4.3  3.   1.1  0.1] [ 1.  0.  0.]       [ 5.2  3.5  1.5  0.2] [ 1.  0.  0.]
[ 6.4  3.2  4.5  1.5] [ 0.  1.  0.]       [ 5.8  2.6  4.   1.2] [ 0.  1.  0.]


If you look at the original [Iris Data](https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data), the first row of the new training set above can be mapped to the 127th row in the original set, which is in fact virginica (indicated by a 1 in the third column of the OHE set above).  
**Don't pay attention to this for now, sets randomise each time so data differs - will change on final commit.**  

### 3. Train the model.
Train the model using the test (should be training?) set.

Using `model.fit` to train the model for a fixed number of epochs - see https://keras.io/models/sequential/ .  
The `verbose` parameter can be set to either 0, 1 or 2. 0 means no output, 1 will display progress bars, and 2 will display one line per epoch. I've set it to 0 here for the sake of tidiness.  
The `batch_size` parameter sets the number of samples per gradient update. A lower number means a lower loss but takes longer to run (not much longer but still). Tried 2, good (0.8~ loss) but somewhat slow, 32 (default) was a lot faster but not as good as 2 (loss was 0.17~). 5 seems to run pretty well without much of an effect on loss.


In [4]:
model.fit(train_x, train_y, verbose=0, batch_size=5, epochs=200) 

<keras.callbacks.History at 0x1d01b6f95f8>

### 4. Test the model.
Test your model using the testing set. Calculate and display clearly the error rate.  

In general, the lower the loss, the better a model. High accuracy is obviously good.

In [5]:
results = model.evaluate(test_x, test_y)

print('Final test set loss:     {:4f}'.format(results[0]))
print('Final test set accuracy: {:4f}'.format(results[1]))

Final test set loss:     0.090511
Final test set accuracy: 0.973451
