### Training model with Iris dataset


In [1]:
import numpy as np
import tensorflow as tf

#print(tf.__file__ )

In [2]:
# Tensorflow already have Iris dataset we can use to train our model.
# Set contains 2 list:
#     1) Length and width data for petal and sepal of each flower sample
#     2) 3 dimensional vector of species types. 
#        Each specy is represented by 0, 1, 2 to be setosa, versicolor and virginica respectively.
iris_set = tf.contrib.learn.datasets.load_iris()

# print(iris_train.data)
# print(iris_train.target)


In [3]:
# Simple example of how to shuffle elemnts in 2 arrays of the same length  
# and maintain shuffle order for both.
# Adopted from https://stackoverflow.com/questions/4601373/better-way-to-shuffle-two-numpy-arrays-in-unison

arr1 = [1, 2, 3, 4, 5]
arr2 = [11, 12, 13, 14, 15]

arr1 =  np.array(arr1)
arr2 =  np.array(arr2)
# list of indexes equal to the length of target array
randomize = np.arange(len(arr1))
print(randomize)
# shuffle it
np.random.shuffle(randomize)

# set a new index order for both arrays
arr1 = arr1[randomize]
arr2 = arr2[randomize]

print(arr1)
print(arr2)

[0 1 2 3 4]
[4 5 2 3 1]
[14 15 12 13 11]


In [4]:
# Shuffle data to randomly pick train and test sets 
# to train and test model.

data = iris_set.data
target = iris_set.target
 
randomize = np.arange(len(data))
np.random.shuffle(randomize)
data = data[randomize]
target = target[randomize]

# split iris data and target arrays into training and test arrays
# adopted from: https://github.com/emerging-technologies/keras-iris/blob/master/iris_nn.py
inds = np.random.permutation(len(data)) 
train_inds, test_inds = np.array_split(inds, 2) 
training_data, training_target = data[train_inds], target[train_inds]
test_data, test_target  = data[test_inds],  target[test_inds]

# feed twice less data, but use same amount of test data
inds2 = np.random.permutation(len(training_data)) 
train_inds2, test_inds2 = np.array_split(inds2, 2)
training_data2, training_target2 = training_data[train_inds2], training_target[train_inds2]


#print(len(training_data2))
# print(len(test_data))

# print(training_data)
# print(test_data)


In [6]:
# Code is adopted from tutorial: https://www.tensorflow.org/get_started/estimator


# Specify that all features have real-value data. We specify that dimension of array in
# a each row of dataset has shape of 4 i.e., 4 rows.
feature_columns = [tf.feature_column.numeric_column("x", shape=[4])]

# Build 3 layer DNN with 10, 20, 10 units respectively.
# Estimator using the ProximalAdagradOptimizer optimizer with
# regularization.
classifier = tf.estimator.DNNClassifier(feature_columns=feature_columns,
                                      hidden_units=[10, 20, 10],
                                      n_classes=3,
                                      model_dir="./models/iris_model3")
# Define the training inputs
train_input_fn = tf.estimator.inputs.numpy_input_fn(
  x={"x": np.array(training_data)},
  y=np.array(training_target),
  num_epochs=None,
  shuffle=True)

# Train model.
classifier.train(input_fn=train_input_fn, steps=3000)

# Define the test inputs
test_input_fn = tf.estimator.inputs.numpy_input_fn(
  x={"x": np.array(test_data)},
  y=np.array(test_target),
  num_epochs=1,
  shuffle=False)

# Evaluate accuracy.
accuracy_score = classifier.evaluate(input_fn=test_input_fn)["accuracy"]

print("\nTest Accuracy: {0:f}\n".format(accuracy_score))

# Classify new flower samples.
new_samples = np.array(
  [[6.4, 3.2, 4.5, 1.5],
   [5.8, 3.1, 5.0, 1.7],
   [4.6, 3.4, 1.4, 0.3],
   [5.9, 3.0, 5.1, 1.8],
   [4.9, 2.1, 2.3, 1.7],
   [4.9, 2.1, 1.3, 1.8]], dtype=np.float32)

# Set data to a model
predict_input_fn = tf.estimator.inputs.numpy_input_fn(
  x={"x": new_samples},
  num_epochs=1,
  shuffle=False)

# Run another test with new flower samples model to get predictions
predictions = list(classifier.predict(input_fn=predict_input_fn))
predicted_classes = [p["classes"] for p in predictions]

print(
  "New Samples, Class Predictions:    {}\n"
  .format(predicted_classes))


INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': './models/iris_model3', '_tf_random_seed': 1, '_save_summary_steps': 100, '_save_checkpoints_secs': 600, '_save_checkpoints_steps': None, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100}
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from ./models/iris_model3\model.ckpt-2
INFO:tensorflow:Saving checkpoints for 3 into ./models/iris_model3\model.ckpt.
INFO:tensorflow:loss = 118.332, step = 3
INFO:tensorflow:global_step/sec: 388.884
INFO:tensorflow:loss = 11.9692, step = 103 (0.257 sec)
INFO:tensorflow:global_step/sec: 642.229
INFO:tensorflow:loss = 12.444, step = 203 (0.156 sec)
INFO:tensorflow:global_step/sec: 676.692
INFO:tensorflow:loss = 4.24492, step = 303 (0.148 sec)
INFO:tensorflow:global_step/sec: 665.492
INFO:tensorflow:loss = 7.58108, step = 403 (0.166 sec)
INFO:tensorflow:global_step/sec: 

In [7]:
# Transform predictions into something meaningful
predictions = np.array(predicted_classes).astype(int)
#flowers = ['setosa', 'versicolor', 'virginica']
report = []

for index, item in enumerate(predictions):
    if(item == 0):
        report.append('setosa') #flowers[0] 
    elif(item == 1):
        report.append('versicolor') #flowers[1] 
    elif(item == 2):
        report.append('virginica') #flowers[2]  
print(report)

['versicolor', 'virginica', 'setosa', 'virginica', 'versicolor', 'setosa']


### Conclusion

After training model multiple times, we can see that loss estimates to around 4.5, which is still quite high.
Accuracy of the test in most cases varies from 96 to 98 percents. In some rare cases accuracy was recorded at 100 percent, or at least exceeding 98.

What can we do to minimize loss and improve accuracy rate? 

* Increase training set size.
* Increase neural network size.
* Increase number of training steps from 3k to 20k and see what happens. 
* Pick defferent optimizer

Generally speaking, to improve certain algorithm learning rate,
you need to practice with different parameters stated above.
If results are not satisfying for algorithm you've picked
for your model, you are free to choose other algorithm. 