# Deep Nets using TF- Abstractions

[Prashant Brahmbhatt](www.github.com/hashbanger)

## Estimator API

_____

### The Data

To compare these various abstractions we'll use a dataset easily available from the SciKit Learn library. The data is comprised of the results of a chemical analysis of wines grown in the same region in Italy by three different cultivators. There are thirteen different
measurements taken for different constituents found in the three types of wine. We will use the various TF Abstractions to classify the wine to one of the 3 possible labels.


In [2]:
from sklearn.datasets import load_wine

In [3]:
wine_data = load_wine()

In [5]:
type(wine_data)

sklearn.utils.Bunch

The sklearn Bunch is kind of a dictionary which contains the data as well as other information related to the data

In [8]:
wine_data.keys()

dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names'])

Taking a look at the description of the data.

In [11]:
print(wine_data['DESCR'])

Wine Data Database

Notes
-----
Data Set Characteristics:
    :Number of Instances: 178 (50 in each of three classes)
    :Number of Attributes: 13 numeric, predictive attributes and the class
    :Attribute Information:
 		- 1) Alcohol
 		- 2) Malic acid
 		- 3) Ash
		- 4) Alcalinity of ash  
 		- 5) Magnesium
		- 6) Total phenols
 		- 7) Flavanoids
 		- 8) Nonflavanoid phenols
 		- 9) Proanthocyanins
		- 10)Color intensity
 		- 11)Hue
 		- 12)OD280/OD315 of diluted wines
 		- 13)Proline
        	- class:
                - class_0
                - class_1
                - class_2
		
    :Summary Statistics:
    
                                   Min   Max   Mean     SD
    Alcohol:                      11.0  14.8    13.0   0.8
    Malic Acid:                   0.74  5.80    2.34  1.12
    Ash:                          1.36  3.23    2.36  0.27
    Alcalinity of Ash:            10.6  30.0    19.5   3.3
    Magnesium:                    70.0 162.0    99.7  14.3
    Total Phenols:     

____

Creating the features and the labels

In [13]:
feat_data = wine_data['data']

In [16]:
labels = wine_data['target']

### Splitting the dataset

In [17]:
from sklearn.model_selection import train_test_split

In [35]:
X_train, X_test, y_train, y_test = train_test_split(feat_data, labels, test_size = 0.3, random_state = 101)

### Scaling the data

In [36]:
from sklearn.preprocessing import MinMaxScaler

In [37]:
scaler = MinMaxScaler()

In [38]:
scaled_x_train = scaler.fit_transform(X_train)

In [39]:
scaled_x_test = scaler.fit_transform(X_test)

_______

## The abstraction

In [40]:
import tensorflow as tf

In [41]:
from tensorflow import estimator

In [42]:
X_train.shape

(124, 13)

Since we have around 13 features we are going to need 13 neurons for input in our Neural Network

In [43]:
feat_cols = [tf.feature_column.numeric_column('x', shape= [13])]

Using the Gradien Descent as our Optimizer

In [44]:
opt = tf.train.GradientDescentOptimizer(learning_rate= 0.01)

In [45]:
deep_model = estimator.DNNClassifier(hidden_units= [13, 13, 13], feature_columns= feat_cols, n_classes= 3, optimizer= opt)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'C:\\Users\\prash\\AppData\\Local\\Temp\\tmpc_xu52fl', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x000002B3E44EF400>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


In [46]:
input_fn = estimator.inputs.numpy_input_fn(x= {'x': scaled_x_train}, y = y_train
                                           , shuffle= True, batch_size= 10, num_epochs= 5)

In [49]:
deep_model.train(input_fn, steps= 500)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from C:\Users\prash\AppData\Local\Temp\tmpc_xu52fl\model.ckpt-124
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 124 into C:\Users\prash\AppData\Local\Temp\tmpc_xu52fl\model.ckpt.
INFO:tensorflow:loss = 2.0404105, step = 125
INFO:tensorflow:Saving checkpoints for 186 into C:\Users\prash\AppData\Local\Temp\tmpc_xu52fl\model.ckpt.
INFO:tensorflow:Loss for final step: 1.8998374.


<tensorflow.python.estimator.canned.dnn.DNNClassifier at 0x2b3e44ef278>

We need to create the input function that we will need to feed to our network model

In [63]:
input_fn_eval = estimator.inputs.numpy_input_fn(x= {'x': scaled_x_test}, y = y_test, shuffle= False)

In [64]:
preds = deep_model.predict(input_fn_eval)

In [65]:
preds

<generator object Estimator.predict at 0x000002B3DFEB7DB0>

since its a generator we have to convert it into a list

In [66]:
preds = list(preds)
preds

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from C:\Users\prash\AppData\Local\Temp\tmpc_xu52fl\model.ckpt-186
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.


[{'class_ids': array([0], dtype=int64),
  'classes': array([b'0'], dtype=object),
  'logits': array([ 5.10259  ,  1.1054025, -7.653941 ], dtype=float32),
  'probabilities': array([9.8196131e-01, 1.8035902e-02, 2.8314146e-06], dtype=float32)},
 {'class_ids': array([0], dtype=int64),
  'classes': array([b'0'], dtype=object),
  'logits': array([ 6.3221316,  1.0418949, -9.380457 ], dtype=float32),
  'probabilities': array([9.9493450e-01, 5.0654355e-03, 1.5074632e-07], dtype=float32)},
 {'class_ids': array([2], dtype=int64),
  'classes': array([b'2'], dtype=object),
  'logits': array([-4.1112194,  1.7459702,  4.5026093], dtype=float32),
  'probabilities': array([1.7070574e-04, 5.9702609e-02, 9.4012672e-01], dtype=float32)},
 {'class_ids': array([0], dtype=int64),
  'classes': array([b'0'], dtype=object),
  'logits': array([  7.5365577 ,   0.48793095, -10.545517  ], dtype=float32),
  'probabilities': array([9.9913222e-01, 8.6784706e-04, 1.4017724e-08], dtype=float32)},
 {'class_ids': array([

We can use some list comprehension for collecting our predicted labels

In [69]:
preds_labels = [x['class_ids'][0] for x in preds]

Evaluating using the confusion matrix

In [76]:
from sklearn.metrics import confusion_matrix, classification_report

In [77]:
confusion_matrix(y_true= y_test, y_pred= preds_labels)

array([[19,  0,  0],
       [ 4, 18,  0],
       [ 0,  3, 10]], dtype=int64)

In [79]:
print(classification_report(y_test, preds_labels))

             precision    recall  f1-score   support

          0       0.83      1.00      0.90        19
          1       0.86      0.82      0.84        22
          2       1.00      0.77      0.87        13

avg / total       0.88      0.87      0.87        54



So we got an accuracy floating around 88%

### de nada!