To compare various abstractions we'll use a dataset available from SciKit Learn library. The data is comprised of the results of a chemical analysis of wines grown in the same region in Italy by three different cultivators. There are 13 different measurements taken for different constituents found in the three types of wine. We will use various TF abstractions to classify the wine to one of the 3 possible labels

In [1]:
from sklearn.datasets import load_wine
wine_data = load_wine()
type(wine_data)

sklearn.utils.Bunch

sklearn.utils.Bunch object is very similar to a dictionary


In [2]:
wine_data.keys()

dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names'])

In [3]:
print(wine_data.DESCR)

Wine Data Database

Notes
-----
Data Set Characteristics:
    :Number of Instances: 178 (50 in each of three classes)
    :Number of Attributes: 13 numeric, predictive attributes and the class
    :Attribute Information:
 		- 1) Alcohol
 		- 2) Malic acid
 		- 3) Ash
		- 4) Alcalinity of ash  
 		- 5) Magnesium
		- 6) Total phenols
 		- 7) Flavanoids
 		- 8) Nonflavanoid phenols
 		- 9) Proanthocyanins
		- 10)Color intensity
 		- 11)Hue
 		- 12)OD280/OD315 of diluted wines
 		- 13)Proline
        	- class:
                - class_0
                - class_1
                - class_2
		
    :Summary Statistics:
    
                                   Min   Max   Mean     SD
    Alcohol:                      11.0  14.8    13.0   0.8
    Malic Acid:                   0.74  5.80    2.34  1.12
    Ash:                          1.36  3.23    2.36  0.27
    Alcalinity of Ash:            10.6  30.0    19.5   3.3
    Magnesium:                    70.0 162.0    99.7  14.3
    Total Phenols:     

We will attempt to do a classification on this data.

In [4]:
feat_data = wine_data['data']
labels = wine_data['target']

# 1. Train Test Split
Because this dataset is small, we'll just do a simple 70/30 train test split and we won't have any holdout data set. 

In [5]:
from sklearn.model_selection import train_test_split

In [6]:
X_train, X_test, y_train, y_test = train_test_split(feat_data, labels, test_size=0.3, random_state=101)

# 2. Scale the Data

In [7]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()

Keep in mind we only fit the scaler to the training data, we don't want to assume we'll have knowledge of future test data. We just transform the test set, not fit.

In [8]:
scaled_x_train = scaler.fit_transform(X_train)
scaled_x_test = scaler.transform(X_test)

# 3. Abstractions

## 3.1 Estimator API:

In [9]:
import tensorflow as tf
from tensorflow import estimator 

  from ._conv import register_converters as _register_converters


The estimator API can perform both deep neural network Classification and Regression, as well as straight Linear Classification and Linear Regression. 

In [10]:
#estimator.DNNClassifier
#estimator.DNNRegressor
#estimator.LinearClassifier
#estimator.LinearRegressor

In [11]:
X_train.shape
#this is 70% of the data
#so we have 124 rows with 13 columns

(124, 13)

#### 3.1.1: Feature columns
In order to use the estimator, we need feature columns. These can be numeric or categorical, but in this case we only have numeric columns

In [12]:
X_train #all numeric

array([[1.187e+01, 4.310e+00, 2.390e+00, ..., 7.500e-01, 3.640e+00,
        3.800e+02],
       [1.217e+01, 1.450e+00, 2.530e+00, ..., 1.450e+00, 2.230e+00,
        3.550e+02],
       [1.234e+01, 2.450e+00, 2.460e+00, ..., 8.000e-01, 3.380e+00,
        4.380e+02],
       ...,
       [1.272e+01, 1.810e+00, 2.200e+00, ..., 1.160e+00, 3.140e+00,
        7.140e+02],
       [1.412e+01, 1.480e+00, 2.320e+00, ..., 1.170e+00, 2.820e+00,
        1.280e+03],
       [1.247e+01, 1.520e+00, 2.200e+00, ..., 1.160e+00, 2.630e+00,
        9.370e+02]])

In [13]:
feat_cols = [tf.feature_column.numeric_column('x', shape=[13])]
# so here we have 13 tensorflow numeric column to represent the actual 13 data features. 

#### 3.1.2: Create the estimator DNN Classifier model:

In [14]:
deep_model = estimator.DNNClassifier(hidden_units=[13,13,13],
                                    feature_columns=feat_cols,
                                    n_classes=3, 
                                    optimizer=tf.train.GradientDescentOptimizer(learning_rate=0.01))
#hidden units: how many neurons you want each layer
#n_classes: by default it's binary classification which is 2, but in our case we have 3 classes

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/var/folders/2v/xzpgsprx4dgft47fp23gnl_r0000gn/T/tmpy1r4g73z', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x1a19918358>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


#### 3.1.3: Create the input function:

In [15]:
input_fn = estimator.inputs.numpy_input_fn(x = {'x':scaled_x_train}, y=y_train, shuffle=True, batch_size=10, num_epochs=5)

#### 3.1.4: Train the created model with the created input function:

In [16]:
deep_model.train(input_fn=input_fn, steps=500)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 1 into /var/folders/2v/xzpgsprx4dgft47fp23gnl_r0000gn/T/tmpy1r4g73z/model.ckpt.
INFO:tensorflow:loss = 10.678273, step = 1
INFO:tensorflow:Saving checkpoints for 62 into /var/folders/2v/xzpgsprx4dgft47fp23gnl_r0000gn/T/tmpy1r4g73z/model.ckpt.
INFO:tensorflow:Loss for final step: 3.058302.


<tensorflow.python.estimator.canned.dnn.DNNClassifier at 0x1a19918ef0>

#### 3.1.5: Create the train input function for evaluation:
Remember that, we want our model to predict the y = the output, therefore when creating the test input function we don't add, because y will be our output. We also don't shuffle the test input function

In [17]:
input_fn_eval = estimator.inputs.numpy_input_fn(x={'x':scaled_x_test}, shuffle=False)

#### 3.1.6: Make the predictions (get the output y):

In [18]:
preds = list(deep_model.predict(input_fn=input_fn_eval))
# we put it on a list because .predict is actually a generator

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /var/folders/2v/xzpgsprx4dgft47fp23gnl_r0000gn/T/tmpy1r4g73z/model.ckpt-62
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.


In [19]:
preds #list of dictionary objects
# we just want the class_ids as our output

[{'class_ids': array([0]),
  'classes': array([b'0'], dtype=object),
  'logits': array([ 1.4556081,  1.0666745, -1.4202793], dtype=float32),
  'probabilities': array([0.57665294, 0.39084336, 0.03250368], dtype=float32)},
 {'class_ids': array([0]),
  'classes': array([b'0'], dtype=object),
  'logits': array([ 1.593666 ,  1.3269914, -1.5806112], dtype=float32),
  'probabilities': array([0.55317485, 0.42368895, 0.02313616], dtype=float32)},
 {'class_ids': array([2]),
  'classes': array([b'2'], dtype=object),
  'logits': array([-1.7441597,  1.991466 ,  2.6715727], dtype=float32),
  'probabilities': array([0.00795819, 0.33356166, 0.65848017], dtype=float32)},
 {'class_ids': array([0]),
  'classes': array([b'0'], dtype=object),
  'logits': array([ 2.9118807,  0.6215365, -2.9245722], dtype=float32),
  'probabilities': array([0.90567344, 0.09168278, 0.00264382], dtype=float32)},
 {'class_ids': array([2]),
  'classes': array([b'2'], dtype=object),
  'logits': array([-1.8677185,  2.157586 ,  2.9

In [20]:
predictions = [p['class_ids'][0] for p in preds]

In [21]:
predictions

[0,
 0,
 2,
 0,
 2,
 1,
 2,
 0,
 1,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 1,
 2,
 1,
 1,
 1,
 1,
 2,
 2,
 0,
 0,
 1,
 1,
 2,
 1,
 2,
 1,
 1,
 0,
 1,
 0,
 0,
 1,
 1,
 2,
 1,
 0,
 0,
 0,
 2,
 1,
 1,
 1,
 2,
 1,
 0,
 1,
 1,
 0]

#### 3.1.7: Create the confusion matrix and classification report to get the accuracy:

In [22]:
from sklearn.metrics import confusion_matrix, classification_report

In [23]:
print(classification_report(y_test, predictions))

             precision    recall  f1-score   support

          0       0.89      0.89      0.89        19
          1       0.83      0.91      0.87        22
          2       1.00      0.85      0.92        13

avg / total       0.90      0.89      0.89        54

