To compare various abstractions we'll use a dataset available from SciKit Learn library. The data is comprised of the results of a chemical analysis of wines grown in the same region in Italy by three different cultivators. There are 13 different measurements taken for different constituents found in the three types of wine. We will use various TF abstractions to classify the wine to one of the 3 possible labels

In [1]:
from sklearn.datasets import load_wine
wine_data = load_wine()
type(wine_data)

sklearn.utils.Bunch

sklearn.utils.Bunch object is very similar to a dictionary


In [2]:
wine_data.keys()

dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names'])

In [3]:
print(wine_data.DESCR)

Wine Data Database

Notes
-----
Data Set Characteristics:
    :Number of Instances: 178 (50 in each of three classes)
    :Number of Attributes: 13 numeric, predictive attributes and the class
    :Attribute Information:
 		- 1) Alcohol
 		- 2) Malic acid
 		- 3) Ash
		- 4) Alcalinity of ash  
 		- 5) Magnesium
		- 6) Total phenols
 		- 7) Flavanoids
 		- 8) Nonflavanoid phenols
 		- 9) Proanthocyanins
		- 10)Color intensity
 		- 11)Hue
 		- 12)OD280/OD315 of diluted wines
 		- 13)Proline
        	- class:
                - class_0
                - class_1
                - class_2
		
    :Summary Statistics:
    
                                   Min   Max   Mean     SD
    Alcohol:                      11.0  14.8    13.0   0.8
    Malic Acid:                   0.74  5.80    2.34  1.12
    Ash:                          1.36  3.23    2.36  0.27
    Alcalinity of Ash:            10.6  30.0    19.5   3.3
    Magnesium:                    70.0 162.0    99.7  14.3
    Total Phenols:     

We will attempt to do a classification on this data.

In [4]:
feat_data = wine_data['data']
labels = wine_data['target']

# 1. Train Test Split
Because this dataset is small, we'll just do a simple 70/30 train test split and we won't have any holdout data set. 

In [5]:
from sklearn.model_selection import train_test_split

In [6]:
X_train, X_test, y_train, y_test = train_test_split(feat_data, labels, test_size=0.3, random_state=101)

# 2. Scale the Data

In [7]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()

Keep in mind we only fit the scaler to the training data, we don't want to assume we'll have knowledge of future test data. We just transform the test set, not fit.

In [8]:
scaled_x_train = scaler.fit_transform(X_train)
scaled_x_test = scaler.transform(X_test)

# 3. Abstractions: LAYERS API:

In [9]:
import tensorflow as tf
import pandas as pd

  from ._conv import register_converters as _register_converters


### 3.1 One-hot encoded data: output-y:

In [10]:
onehot_y_train = pd.get_dummies(y_train).as_matrix()
onehot_y_test = pd.get_dummies(y_test).as_matrix()

### 3.2 Define the parameters:

In [11]:
num_feat = 13 #features = columns
num_hidden1 = 13 #number of neurons
num_hidden2 = 13
num_outputs =3
learning_rate = 0.01

In [12]:
from tensorflow.contrib.layers import fully_connected

### 3.3 Placeholders:

In [13]:
X=tf.placeholder(tf.float32, shape=[None, num_feat]) #none, because it's going to be the batch size
y_true = tf.placeholder(tf.float32, shape=[None, 3]) 

### 3.4 Activation function:

In [14]:
actf = tf.nn.relu

### 3.5 Create Layers:

In [15]:
hidden1 = fully_connected(X, num_hidden1, activation_fn=actf) #inputs =x; number of outputs = num_hidden1
hidden2 = fully_connected(hidden1, num_hidden2, activation_fn=actf)
output = fully_connected(hidden2, num_outputs)

### 3.6: Loss Function:

In [16]:
loss = tf.losses.softmax_cross_entropy(onehot_labels=y_true, logits=output)

Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.



### 3.7: Optimizer:

In [17]:
optimizer = tf.train.AdamOptimizer(learning_rate)
train = optimizer.minimize(loss)

### 3.8 Initialize and make predictions:

In [18]:
init = tf.global_variables_initializer()

In [19]:
training_steps = 10
with tf.Session() as sess:
    sess.run(init)
    
    for i in range(training_steps):
        sess.run(train, feed_dict={X:scaled_x_train, y_true:onehot_y_train})
        
    #Get predictions:
    logits = output.eval(feed_dict={X:scaled_x_test})
    preds = tf.argmax(logits, axis=1)
    results = preds.eval()

In [20]:
from sklearn.metrics import classification_report

In [21]:
print(classification_report(results, y_test)) #100% accuracy with 1000 training steps.

             precision    recall  f1-score   support

          0       1.00      0.79      0.88        24
          1       0.73      0.89      0.80        18
          2       0.85      0.92      0.88        12

avg / total       0.87      0.85      0.85        54

