<a href="https://colab.research.google.com/github/nisaac21/TensorFlow/blob/main/Basic_Classification_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Classification

The point of this model is to separate the data points into classes. 

We are trying to predict what class flowers are in. 

In [None]:
## Imports 
from __future__ import absolute_import, division, print_function, unicode_literals

import tensorflow as tf
import pandas as pd

##Dataset

Inputs:
* septal length
* speal width
* petal length
* petal width

Outputs:
* Setosa
* Versicolor
* Virginica

In [None]:
CSV_COLUMN_NAMES = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species']
SPECIES = ['Setosa', 'Versicolor', 'Virginica']

In [None]:
# Here we load the data different
# we are using keras which has useful datasets and tools 
train_path = tf.keras.utils.get_file(
    "iris_training.csv", "https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv")
test_path = tf.keras.utils.get_file(
    "iris_test.csv", "https://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv")

train = pd.read_csv(train_path, names=CSV_COLUMN_NAMES, header=0)
test = pd.read_csv(test_path, names=CSV_COLUMN_NAMES, header=0)

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv
Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv


In [None]:
# Looking at our data 
train.head()

Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth,Species
0,6.4,2.8,5.6,2.2,2
1,5.0,2.3,3.3,1.0,1
2,4.9,2.5,4.5,1.7,2
3,4.9,3.1,1.5,0.1,0
4,5.7,3.8,1.7,0.3,0


Our species are defined numerically, so we don't need to redefine the data. They are in the order we gave them in. 

Lengths in cm 

In [None]:
# Popping off labels 
train_y = train.pop('Species')
test_y = test.pop('Species')

##Input Function

In [None]:
def input_fn(features, labels, 
             training=True, batch_size=256):
  # Covert the inputs to a Dataset
  dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))

  if training:
    dataset = dataset.shuffle(1000).repeat()

  return dataset.batch(batch_size)

Notice we don't have any epochs and we are not returning an input function, we are just doing it. 

##Feature Columns

In [None]:
feature_columns = []
for key in train.keys():
  feature_columns.append(tf.feature_column.numeric_column(key=key))

feature_columns

[NumericColumn(key='SepalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='SepalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='PetalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='PetalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

##Building the Model

TensorFlow has a lot of different pre-made models. For classification, we have two choices

* `DNNClassifier` - A Deep Neural Network Classifier 
* `LinearClassifier` - Similar to Linear Regression

Since our data might not have a linear relationship, let's use DNN

Let's build a model! 

In [None]:
# Let's build a DNN with 2 hidden layers with 30 and 10 hidden nodes each 

classifier = tf.estimator.DNNClassifier(
    feature_columns=feature_columns,

    # 2 hidden layers with layers of 30 and 10 nodes each
    hidden_units=[30,10],

    # Model is choosing between 3 classes
    n_classes=3)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpcs0amb15', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


##Training the Model

In [None]:
classifier.train(
    input_fn=lambda: input_fn(train, train_y, training=True), 
    # expects a function object, so we use lambda to covert it to one

    steps=5000 # similar to epoch, but we are going through the dataset 
    # until we have gone through 5000 data points instead of seeing
    # whole data set multiple times  
)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpcs0amb15/model.ckpt-5000
Instructions for updating:
Use standard file utilities to get mtimes.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 5000...
INFO:tensorflow:Saving checkpoints for 5000 into /tmp/tmpcs0amb15/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 5000...
INFO:tensorflow:loss = 0.4289471, step = 5000
INFO:tensorflow:global_step/sec: 180.524
INFO:tensorflow:loss = 0.43277976, step = 5100 (0.559 sec)
INFO:tensorflow:global_step/sec: 210.841
INFO:tensorflow:loss = 0.4246751, step = 5200 (0.473 sec)
INFO:tensorflow:global_step/sec: 231.727
INFO:tensorflow:loss = 0.4170404, step = 5300 (0.437 sec)
INFO:tensorflow:global_step/sec: 233.808


<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifierV2 at 0x7ff4ad83b810>

In [None]:
from IPython.core.display import clear_output
# Evaluating the model

eval_result = classifier.evaluate(
    input_fn=lambda:input_fn(test, test_y, training=False),
    steps=5000)

clear_output()
print(f"\nTest set accuracy of the model: {eval_result['accuracy']}")


Test set accuracy of the model: 0.9666666388511658


In [None]:
## Let's make a prediction on any single data point
expected = ['Setosa', 'Versicolor', 'Virginica']
predict_x = {
    'SepalLength': [5.1, 5.9, 6.9],
    'SepalWidth': [3.3, 3.0, 3.1],
    'PetalLength': [1.7, 4.2, 5.4],
    'PetalWidth': [0.5, 1.5, 2.1],
}

def input_fn(features, batch_size=256):
    """An input function for prediction."""
    # Convert the inputs to a Dataset without labels.
    return tf.data.Dataset.from_tensor_slices(dict(features)).batch(batch_size)

predictions = classifier.predict(
    input_fn=lambda: input_fn(predict_x))

for pred_dict, expec in zip(predictions, expected):
    class_id = pred_dict['class_ids'][0]
    probability = pred_dict['probabilities'][class_id]

    print('Prediction is "{}" ({:.1f}%), expected "{}"'.format(
        SPECIES[class_id], 100 * probability, expec))


INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpcs0amb15/model.ckpt-10000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Prediction is "Setosa" (87.5%), expected "Setosa"
Prediction is "Versicolor" (61.2%), expected "Versicolor"
Prediction is "Virginica" (66.5%), expected "Virginica"
