# Iris Data Analysis with Tensorflow Estimators

Estimators from tensorflow is much easier to use, but we sacrifice some level of customization of the model. 

In [1]:
import pandas as pd

## Get the Data

In [2]:
df = pd.read_csv('mk028-project_iris_data_analysis_with_tensorflow_and_estimators/iris.csv')
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0.0
1,4.9,3.0,1.4,0.2,0.0
2,4.7,3.2,1.3,0.2,0.0
3,4.6,3.1,1.5,0.2,0.0
4,5.0,3.6,1.4,0.2,0.0


In tensorflow column names cannot have spaces or special characters between them.

In [3]:
df.columns = ['sepal_length',
              'sepal_width',
              'petal_length',
              'petal_width',
              'target']
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,target
0,5.1,3.5,1.4,0.2,0.0
1,4.9,3.0,1.4,0.2,0.0
2,4.7,3.2,1.3,0.2,0.0
3,4.6,3.1,1.5,0.2,0.0
4,5.0,3.6,1.4,0.2,0.0


`target` column must be integer for tensorflow, because it is binary type.

In [4]:
X = df.drop('target',
            axis=1)
y = df['target'].apply(int)
y.head()

0    0
1    0
2    0
3    0
4    0
Name: target, dtype: int64

## Train Test Split

In [5]:
from sklearn.model_selection import train_test_split

In [6]:
X_train, X_test, y_train, y_test = train_test_split(X, 
                                                    y, 
                                                    test_size=0.3)

## Estimators

In [7]:
import tensorflow as tf

### Feature Columns

Create `feat_cols` for the tensorflow estimator.

In [8]:
feat_cols = []
[feat_cols.append(tf.feature_column.numeric_column(col)) for col in X.columns]
feat_cols

[NumericColumn(key='sepal_length', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='sepal_width', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='petal_length', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='petal_width', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

### Create the DNN(Deep Neural Network) Estimator

`hidden_units` define the number of neurons for each hidden neural layer.

In [9]:
classifier = tf.estimator.DNNClassifier(hidden_units=[10, 20, 10], 
                                        n_classes=3, 
                                        feature_columns=feat_cols)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'C:\\Users\\MK\\AppData\\Local\\Temp\\tmpdylq1613', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x000001B48526BEB8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


### Input Function

Create input functions for training data. `num_epochs` means maximum number of repeats.

In [10]:
tr_input_func = tf.estimator.inputs.pandas_input_fn(x=X_train, 
                                                    y=y_train, 
                                                    batch_size=10, 
                                                    num_epochs=5, 
                                                    shuffle=True)

Now we train the `classifier` with the input function.

In [11]:
classifier.train(input_fn=tr_input_func, 
                 steps=50)

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
INFO:tensorflow:Calling model_fn.
Instructions for updating:
Use tf.cast instead.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
INFO:tensorflow:Saving checkpoints for 0 into C:\Users\MK\AppData\Local\Temp\tmpdylq1613\model.ckpt.
INFO:tensorflow:loss = 19.249048, step = 1
INFO:tensorflow:Saving checkpoints for 50 into C:\Users\MK\AppData\Local\Temp\tmpdylq1613\model.ckpt.
INFO:tensorflow:Loss for final step: 4.011321.


<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifier at 0x1b48526b828>

## Model Evaluation

Now we create input functions for test(prediction) data. 

In [12]:
pred_input_func = tf.estimator.inputs.pandas_input_fn(x=X_test, 
                                                      batch_size=len(X_test), 
                                                      shuffle=False)

Now we predict with the trained classifier by using prediction input function.

In [13]:
predictions = list(classifier.predict(input_fn=pred_input_func))

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from C:\Users\MK\AppData\Local\Temp\tmpdylq1613\model.ckpt-50
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.


In [14]:
predictions[:4]

[{'logits': array([ 2.9656258 , -0.57708263, -2.31746   ], dtype=float32),
  'probabilities': array([0.9671071 , 0.0279831 , 0.00490975], dtype=float32),
  'class_ids': array([0], dtype=int64),
  'classes': array([b'0'], dtype=object)},
 {'logits': array([-2.548995 ,  1.3202333,  1.4887124], dtype=float32),
  'probabilities': array([0.00946955, 0.45364276, 0.53688776], dtype=float32),
  'class_ids': array([2], dtype=int64),
  'classes': array([b'2'], dtype=object)},
 {'logits': array([-3.7392511,  1.4174407,  2.1217246], dtype=float32),
  'probabilities': array([0.00190239, 0.33023366, 0.6678639 ], dtype=float32),
  'class_ids': array([2], dtype=int64),
  'classes': array([b'2'], dtype=object)},
 {'logits': array([-3.166516 ,  1.5035763,  1.9235064], dtype=float32),
  'probabilities': array([0.00370232, 0.39506537, 0.6012323 ], dtype=float32),
  'class_ids': array([2], dtype=int64),
  'classes': array([b'2'], dtype=object)}]

In [15]:
final_predictions = []
[final_predictions.append(p["class_ids"][0]) for p in predictions]
final_predictions[:5]

[0, 2, 2, 2, 2]

Now create a classification report and a Confusion Matrix.

In [16]:
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

In [17]:
print(classification_report(y_test, final_predictions))
print(confusion_matrix(y_test, final_predictions))
print(accuracy_score(y_test, final_predictions))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        13
           1       1.00      0.06      0.12        16
           2       0.52      1.00      0.68        16

   micro avg       0.67      0.67      0.67        45
   macro avg       0.84      0.69      0.60        45
weighted avg       0.83      0.67      0.57        45

[[13  0  0]
 [ 0  1 15]
 [ 0  0 16]]
0.6666666666666666
