## Predicting Iris Specie from Sepal and Petal Length and Width

## Data Set Information: https://archive.ics.uci.edu/ml/datasets/iris
This is perhaps the best known database to be found in the pattern recognition literature. Fisher's paper is a classic in the field and is referenced frequently to this day. (See Duda & Hart, for example.) The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other. 

In [21]:
import numpy as np
import pandas as pd
import tensorflow as tf
from __future__ import division
from math import e



## NP softmax_with_cross_entropy Functions:

In [22]:
def vector_stochastic_creator(A_matrix):
    m = A_matrix.shape[0]
    sum_row_vect = np.sum(A_matrix, 1).reshape((m, 1))
    B_matrix = A_matrix / sum_row_vect
    return B_matrix

In [39]:
def cross_entropy(logits, labels):
    return -np.sum(labels * np.log(logits), 1)

In [40]:
def softmax_cross_entropy(logits, labels):
    logits = np.exp(logits)
    softmax_array = vector_stochastic_creator(logits)
    return cross_entropy(softmax_array, labels)

In [42]:
example = np.array([[.3, .8, .1], [.1, .4, .9]])
ex_labels = np.array([[0, 0, 1], [1, 0, 0]])
softmax_cross_entropy(example, ex_labels)

array([1.44342004, 1.52069407])

In [None]:
example = np.array([[.3, .8, .1], [.1, .4, .9]])
ex_labels = np.array([[0, 1, 0], [0, 0, 1]])
softmax_cross_entropy(example, ex_labels)

## Read Data

In [26]:
df = pd.read_csv("iris.data.txt")
df.columns = ["sepal length", "sepal width", "petal length", "petal width", "Class"]
df.head()

Unnamed: 0,sepal length,sepal width,petal length,petal width,Class
0,4.9,3.0,1.4,0.2,Iris-setosa
1,4.7,3.2,1.3,0.2,Iris-setosa
2,4.6,3.1,1.5,0.2,Iris-setosa
3,5.0,3.6,1.4,0.2,Iris-setosa
4,5.4,3.9,1.7,0.4,Iris-setosa


## Normalize and Shuffle Data

We normalize the data so that outliers do not drastically affect our learning. By making our average zero, we also make the parameter adjustment done in back propagation be weighted the same between data samples which deviate from a certain amount above and below the average. We also shuffle the data so that the model doesn't overfit to the current label it is training on.

In [27]:
for column in df:
    if column == "Class":
        df[column][df[column] == 'Iris-setosa'] = 0
        df[column][df[column] == 'Iris-versicolor'] = 1
        df[column][df[column] == 'Iris-virginica'] = 2
    if column != "ID" and column != "Class":
        df[column] = (df[column] - df[column].mean()) / df[column].std()
np.round(df, 3)
df = df.sample(frac=1).reset_index(drop=True)
df.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """


Unnamed: 0,sepal length,sepal width,petal length,petal width,Class
0,0.545114,-0.579025,0.753276,0.387014,2
1,-1.506555,0.343699,-1.349413,-1.320609,0
2,-1.385868,0.343699,-1.235754,-1.320609,0
3,-1.023809,-2.424474,-0.155995,-0.269764,1
4,0.183054,-1.963112,0.128152,-0.269764,1


## Load data into NP array and convert labels into one-hot vect
We manipulate the true_label data into a one-hot vector so that we have a "perfect or ideal distribution" for our predictor which has dimensions which matches our output vector from the model

In [28]:
iris_array = np.asarray(df[['sepal length', 'sepal width', 'petal length', 'petal width']])
class_array = np.asarray(df['Class'])
# y_class = tf.constant(class_array, dtype = tf.float32, shape = [149, 1])
num_rows = class_array.shape[0]
num_classes = 3
one_hot = np.zeros((num_rows, num_classes))
rows = range(num_rows)
cols = class_array.tolist()
zipped = zip(rows, cols)
for x, y in zipped:
    one_hot[x, y] = 1
print(iris_array.shape)
print(one_hot.shape)



(149, 4)
(149, 3)


## Create tf.data.Dataset:
X are Features <br>
Y is Label

In [29]:
X = tf.placeholder(tf.float32, shape = (None, 4))
Y = tf.placeholder(tf.float32, [None, 3])

In [30]:
dataset = tf.data.Dataset.from_tensor_slices((X, Y))
dataset = tf.data.Dataset.batch(dataset, 5)

In [31]:
iter = dataset.make_initializable_iterator()

## Create two layer model with hidden layer size of 50
50 was chosen because there are not many features and a large amount of hidden units are not needed. Two layers are substantial to fit a dataset with only 4 features.

In [32]:
h1 = 50
h2 = 50
W1 = tf.Variable(tf.random_normal([4, h1]))
b1 = tf.Variable(tf.random_normal([1, h1]))
layer1 = tf.nn.relu(tf.matmul(X, W1) + b1)

W2 = tf.Variable(tf.random_normal([h1, h2]))
b2 = tf.Variable(tf.random_normal([1, h2]))
layer2 = tf.nn.relu(tf.matmul(layer1, W2) + b2)

W_out = tf.Variable(tf.random_normal([h2, 3]))
b_out = tf.Variable(tf.random_normal([1, 3]))  # TODO: Look Convexivity and RELU
logits = tf.matmul(layer2, W_out) + b_out

## Make loss function to reduce average of softmax_cross_entropy 
We use the negative sum for our loss function because logs of [0, 1] are negative. We want to increase the probability of the true label. Higher the probability of the true label, the lower the entroy function is, making it ideal to minimize for our loss function.
$ H(y, p) = - \sum_{i} y_{i} log(p_{i}) $ 



In [33]:
entropy = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y)
loss = tf.reduce_mean(entropy) #TODO: Implement and undestand the softmax with cross entropy

In [34]:
optimizer = tf.train.GradientDescentOptimizer(.05).minimize(loss)

## Convert one_hot back to true label vect and define accuracy
Manipulate data to choose label with highest probability and define accuracy function as difference between predicted and true label

In [35]:
Y_pred = tf.nn.softmax(logits)
y_pred_cls = tf.argmax(Y_pred,1)
y_cls = tf.argmax(Y, 1)
accuracy = tf.reduce_mean(tf.cast(tf.equal(y_pred_cls, y_cls), tf.float32))

## Create feed forward structure
Let batch size be 5 <br>
Compute accuracy every 5 epochs <br>
Uses tf.data.Dataset to iterate through data

In [38]:
batch_size = 5
epochs = 50
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
tf.train.start_queue_runners(sess = sess)

n_batches = (int) (149 / batch_size)
for i in range(epochs):
    total_loss = 0
    sess.run(iter.initializer, feed_dict={ X: iris_array, Y: one_hot})
    for batch in range(n_batches):
        try:
            X_batch, Y_batch = iter.get_next()
            X_batch, Y_batch = sess.run([X_batch, Y_batch])
            curr_batch_size = X_batch.shape[0]
            X_batch = X_batch.reshape((curr_batch_size, 4))
            Y_batch = Y_batch.reshape((curr_batch_size, 3))
            o, l = sess.run([optimizer, loss], feed_dict={X: X_batch, Y: Y_batch})
            total_loss += l
        except tf.errors.OutOfRangeError:
            print("End of dataset")  # ==> "End of dataset"
            
    print("Epoch {0}: {1}".format(i, total_loss))
    if i % 5 == 0 and i!= 0:
        try:
            X_val, Y_val = iter.get_next()
            X_val, Y_val = sess.run([X_val, Y_val])
            curr_batch_size = X_val.shape[0]
            X_val = X_val.reshape((curr_batch_size, 4))
            Y_val = Y_val.reshape((curr_batch_size, 3))
            val_accuracy = sess.run(accuracy, feed_dict={X: X_val, Y: Y_val})
            print("\tVal Accuracy {0}".format(val_accuracy))
        except tf.errors.OutOfRangeError:
            print("End of dataset")



Epoch 0: 235.1692228165455
Epoch 1: 24.25648232473325
Epoch 2: 8.87294908204776
Epoch 3: 8.179501727356161
Epoch 4: 5.63910186382331
Epoch 5: 4.720722337679717
	Val Accuracy 1.0
Epoch 6: 5.471416684458102
Epoch 7: 3.5911605264357362
Epoch 8: 3.4908024283752397
Epoch 9: 3.457334449936308
Epoch 10: 2.470090581829009
	Val Accuracy 1.0
Epoch 11: 1.4939517586744842
Epoch 12: 0.9606963034079854
Epoch 13: 0.821691231827117
Epoch 14: 0.7180675856044445
Epoch 15: 0.7144797955162119
	Val Accuracy 1.0
Epoch 16: 0.651300773394837
Epoch 17: 0.6335670737161934
Epoch 18: 0.6216203599907146
Epoch 19: 0.6445922859958841
Epoch 20: 0.5250844906979282
	Val Accuracy 1.0
Epoch 21: 0.5486216386020146
Epoch 22: 0.5115825595623491
Epoch 23: 0.61553482169478
Epoch 24: 0.5086179607915966
Epoch 25: 0.49729455616497376
	Val Accuracy 1.0
Epoch 26: 0.4568351186800612
Epoch 27: 0.4551624329792219
Epoch 28: 0.4161814339269654
Epoch 29: 0.46436872017449105
Epoch 30: 0.3666721815814924
	Val Accuracy 1.0
Epoch 31: 0.3397

In [37]:
sess.close()

# Header 1

Consider the following dataset.

## Data Exploration

some image show here and data quality, data whiteing 


fancy math $ E = m c^2 $ 

$$
 E = \frac{2}{2}
$$