## Tensor Flow Problem Sheet

These problems relate to the [TensorFlow](https://www.tensorflow.org/) python library for pattern recognition. This notebook uses the [Iris Data Set](https://archive.ics.uci.edu/ml/datasets/iris).

### 1.  Use Tensorflow to create a model.
Create a model that uses a flower's sepal width / length and petal width / length to predict the species of Iris.

In [8]:
# Adapted from: https://gist.github.com/NiharG15/cd8272c9639941cf8f481a7c4478d525

import numpy as np
import tensorflow as tf
import keras as kr
# SciKit has good functionality for dealing with datasets - 
# http://scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets
import sklearn.datasets as skds
import sklearn.preprocessing as skpp

# Load the dataset and print out the first 5 rows to make sure.
iris = skds.load_iris()   # Inbuilt function - http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html
print('First 5 rows: \n ', iris.data[:5])

x = iris.data # let x equal the full dataset
y_ = iris.target.reshape(-1, 1) # let y equal to a single column of all data

# One Hot Encode - formats data to better fit classification algorithms in machine learning. See note.
encoder = skpp.OneHotEncoder(sparse=False)
y = encoder.fit_transform(y_)
#print(y)


First 5 rows: 
  [[ 5.1  3.5  1.4  0.2]
 [ 4.9  3.   1.4  0.2]
 [ 4.7  3.2  1.3  0.2]
 [ 4.6  3.1  1.5  0.2]
 [ 5.   3.6  1.4  0.2]]


**NB:** While searching through examples of classification prediction in TensorFlow/Keras, I came across the term *One Hot Encoding* a lot. After some online searching, I found that One Hot Encoding is basically a way of transforming categorical features, such as plant type/classification, to a format that works better for machine learning algorithms. From my understanding, boolean columns are generated for each entry in the dataset - one column per type of entry. If an entry is a particular type, a `1` appears in the column, if not a `0`. For example, a randomised iris data set might look like this:

|     | Setosa        | Versicolor    | Virginica  |
|---- | ------------- |:-------------:|:-----:     |
|Plant1     | 0      | 1 | 0     |
|Plant2     | 1      | 0  |  0    |
|Plant3     | 0 | 0     |  1     |

In this table, Plant1 is a versicolor, Plant2 is a setosa, and Plant3 is a virginica.

### 2. Split the data into training and testing sets.
Investigate the *best way to do this* - write some code to randomly separate data if desired. Reference relevant material.

In [None]:
inds = np.random.permutation(len(inputs))
train_inds, test_inds = np.array_split(inds, 2)
inputs_train, outputs_train = inputs[train_inds], outputs_cats[train_inds]
inputs_test,  outputs_test  = inputs[test_inds],  outputs_cats[test_inds]

print(inputs_train)
print(outputs_train)

### 3. Train the model.
Train the model using the test (should be training?) set.

### 4. Test the model.
Test your model using the testing set. Calculate and display clearly the error rate.