# Estimator Steps:
* **Read in Data(normalize if necessary)**
* **Train/Test split the data**
* **Create Estimator Feature Columns**
* **Create Input Estimator Function**
* **Train Estimator Model**
* **Predict with net Test Input Function**

## 1. Read the Data

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('iris.csv')

In [3]:
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


* **Actual column can't have spaces**

In [4]:
df.columns

Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width',
       'species'],
      dtype='object')

In [5]:
df.species.value_counts()

setosa        50
versicolor    50
virginica     50
Name: species, dtype: int64

In [11]:
def add_target(cols):
    species = cols.lower()
    if species=='setosa':
        return 0
    elif species =='versicolor':
        return 1
    else:
        return 2

In [12]:
df['target'] = df['species'].apply(add_target)

In [14]:
df.drop('species',inplace=True,axis=1)

In [15]:
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [17]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
sepal_length    150 non-null float64
sepal_width     150 non-null float64
petal_length    150 non-null float64
petal_width     150 non-null float64
target          150 non-null int64
dtypes: float64(4), int64(1)
memory usage: 5.9 KB


## 2. Train/Test Split the data

In [19]:
y = df.target
X = df.drop('target',axis=1)

In [22]:
from sklearn.model_selection import train_test_split

In [23]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

In [25]:
import tensorflow as tf

## 3. Create Estimator Feature Columns

In [26]:
X.columns

Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width'], dtype='object')

In [28]:
feat_cols = []
for col in X.columns:
    feat_cols.append(tf.feature_column.numeric_column(col))

In [29]:
feat_cols

[_NumericColumn(key='sepal_length', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='sepal_width', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='petal_length', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 _NumericColumn(key='petal_width', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

## 4. Create Input Estimator Function

In [30]:
# epoch: going through all the training data atleast once
input_func = tf.estimator.inputs.pandas_input_fn(x=X_train,y=y_train,batch_size=5,num_epochs=5,shuffle=True)

### 1. Create the classifier (estimator) object
* **Estimator(ie, Classifier) DNN- Deep Neural Network**

In [43]:
# 1st layer 10 neuror, 2nd layer 20 neuron, 3rd layer 10 layer
classifier = tf.estimator.DNNClassifier(hidden_units=[10,10,10],n_classes=3,feature_columns=feat_cols)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'C:\\Users\\Mir\\AppData\\Local\\Temp\\tmp2cuikr8i', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x00000236F80163C8>, '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


### 2. Train it with input function with no. of steps

In [44]:
classifier.train(input_fn=input_func,steps=50)

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Saving checkpoints for 1 into C:\Users\Mir\AppData\Local\Temp\tmp2cuikr8i\model.ckpt.
INFO:tensorflow:loss = 13.423351, step = 1
INFO:tensorflow:Saving checkpoints for 50 into C:\Users\Mir\AppData\Local\Temp\tmp2cuikr8i\model.ckpt.
INFO:tensorflow:Loss for final step: 1.0592998.


<tensorflow.python.estimator.canned.dnn.DNNClassifier at 0x236f8016dd8>

## 5. Predict with net Test input Function (Evaluate the model)

In [45]:
# For testing input the whole test cases at once 
pred_fn = tf.estimator.inputs.pandas_input_fn(x=X_test,batch_size=len(X_test),shuffle=False)

In [46]:
predictions = list(classifier.predict(input_fn=pred_fn))

INFO:tensorflow:Restoring parameters from C:\Users\Mir\AppData\Local\Temp\tmp2cuikr8i\model.ckpt-50


In [55]:
predictions[:5]

[{'logits': array([-2.767083 ,  1.3478318,  3.8570619], dtype=float32),
  'probabilities': array([0.00122653, 0.07512139, 0.92365205], dtype=float32),
  'class_ids': array([2], dtype=int64),
  'classes': array([b'2'], dtype=object)},
 {'logits': array([ 4.8113356 ,  0.02646166, -5.032252  ], dtype=float32),
  'probabilities': array([9.916619e-01, 8.285510e-03, 5.264386e-05], dtype=float32),
  'class_ids': array([0], dtype=int64),
  'classes': array([b'0'], dtype=object)},
 {'logits': array([-3.4274123,  1.720722 ,  3.528216 ], dtype=float32),
  'probabilities': array([8.1823173e-04, 1.4082594e-01, 8.5835576e-01], dtype=float32),
  'class_ids': array([2], dtype=int64),
  'classes': array([b'2'], dtype=object)},
 {'logits': array([-2.6591017,  1.5493202,  1.2794127], dtype=float32),
  'probabilities': array([0.00836172, 0.5623285 , 0.4293098 ], dtype=float32),
  'class_ids': array([1], dtype=int64),
  'classes': array([b'1'], dtype=object)},
 {'logits': array([ 5.02231  , -0.0304554, -4.

In [48]:
# Create a list of predictions
final_pred = []
for pred in predictions:
    final_pred.append(pred['class_ids'][0])

In [53]:
final_pred[:10]

[2, 0, 2, 1, 0, 2, 0, 1, 0, 2]

In [50]:
from sklearn.metrics import classification_report,confusion_matrix

In [51]:
print(confusion_matrix(y_test,final_pred))

[[18  0  0]
 [ 0 12  2]
 [ 0  0 13]]


In [52]:
print(classification_report(y_test,final_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        18
           1       1.00      0.86      0.92        14
           2       0.87      1.00      0.93        13

   micro avg       0.96      0.96      0.96        45
   macro avg       0.96      0.95      0.95        45
weighted avg       0.96      0.96      0.96        45

