# TensorFlow

## Module 03: Core Learning Algorithms: Categorization
https://www.freecodecamp.org/learn/machine-learning-with-python/tensorflow/core-learning-algorithms

freecodecamp.com

>[TensorFlow](#scrollTo=79GaUqHIT3OC)

>>[Module 03: Core Learning Algorithms: Categorization](#scrollTo=79GaUqHIT3OC)

>>>[Categorization](#scrollTo=3FOT-AYdU9S5)

>>>[Data](#scrollTo=_DvxuHHjixV0)

>>>[Input Function](#scrollTo=Yr3YllGRLFgo)

>>>[Feature Columns](#scrollTo=Iqnndv-dLUnE)

>>>[Building the Model](#scrollTo=Wn_si7YfMIgA)

>>>[Training](#scrollTo=OYpTsl3CNJEJ)

>>>[Evaluation](#scrollTo=czlKx4rYNbpf)

>>>[Making predictions (inferring) from the trained model](#scrollTo=q-6PQ1mqPINJ)

>>[Resources](#scrollTo=nSpipOqGclEm)



In this module we'll learn about 4 main algorithms:
1. Linear regression
2. Classification
3. Clustering
4. Hidden Markov Models

### 2. Categorization
Categorization is used to categorize. :)

In [1]:
%tensorflow_version 2.x # this line is not required unless you're in a notebook

Colab only includes TensorFlow 2.x; %tensorflow_version has no effect.


In [2]:
from __future__ import absolute_import, division, print_function, unicode_literals

import pandas as pd
import tensorflow as tf

### Data

The sample program in this document builds and tests a model that classifies Iris flowers into three different species based on the size of their sepals and petals.

This dataset has 4 categories of flowers:
- Setosa
- Versicolor
- Virginica

You will train a model using the Iris data set. The Iris data set contains four features and one label. The four features identify the following botanical characteristics of individual Iris flowers:

- sepal length
- sepal width
- petal length
- petal width

In [3]:
CSV_COLUMN_NAMES = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species']
SPECIES = ['Setosa', 'Versicolor', 'Virginica']

In [4]:
train_path = tf.keras.utils.get_file(
    "iris_training.csv", "https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv")
test_path = tf.keras.utils.get_file(
    "iris_test.csv", "https://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv")

train = pd.read_csv(train_path, names=CSV_COLUMN_NAMES, header=0)
test = pd.read_csv(test_path, names=CSV_COLUMN_NAMES, header=0)
# Here we use Keras (a module inside TensorFlow) to grab our datasets and read them into a pandas dataframe

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv
Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv


Looking at our data:

In [5]:
train.head()

Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth,Species
0,6.4,2.8,5.6,2.2,2
1,5.0,2.3,3.3,1.0,1
2,4.9,2.5,4.5,1.7,2
3,4.9,3.1,1.5,0.1,0
4,5.7,3.8,1.7,0.3,0


Now we can pop the species column off and use that as our label.

In [6]:
train_y = train.pop("Species")
test_y = test.pop("Species")
train.head()

Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth
0,6.4,2.8,5.6,2.2
1,5.0,2.3,3.3,1.0
2,4.9,2.5,4.5,1.7
3,4.9,3.1,1.5,0.1
4,5.7,3.8,1.7,0.3


In [7]:
train.shape

(120, 4)

### Input Function

Remember the Input Function we created earlier. Fortunately this one is easier to digest.

In [8]:
def input_evaluation_set():
    features = {'SepalLength': np.array([6.4, 5.0]),
                'SepalWidth':  np.array([2.8, 2.3]),
                'PetalLength': np.array([5.6, 3.3]),
                'PetalWidth':  np.array([2.2, 1.0])}
    labels = np.array([2, 1])
    return features, labels

In [9]:
def input_fn(features, labels, training=True, batch_size=256):
    """An input function for training or evaluating"""
    # Convert the inputs to a Dataset.
    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))

    # Shuffle and repeat if you are in training mode.
    if training:
        dataset = dataset.shuffle(1000).repeat()

    return dataset.batch(batch_size)

### Feature Columns


In [11]:
# Feature columns describe how to use the input.
my_feature_columns = []
for key in train.keys():
    my_feature_columns.append(tf.feature_column.numeric_column(key=key))

print(my_feature_columns)

[NumericColumn(key='SepalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='SepalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='PetalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='PetalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]


### Building the Model

Now we're ready to choose a model. For classification tasks there are a variety of different estimators / models that we can pick from. Some options are listed below.
- `DNNClassifier`(Deep Neural Network)
- `LinearClassifier`

We can choose either model but the DNN seems to be the best choice. This is because we may not be able to find a linear correspondence in our data. 

So let's build a model!

In [14]:
# Build a DNN with 2 hidden layers with 30 and 10 hidden nodes each.
classifier = tf.estimator.DNNClassifier(
    feature_columns=my_feature_columns,
    # Two hidden layers of 30 and 10 nodes respectively.
    hidden_units=[30, 10],
    # The model must choose between 3 classes.
    n_classes=3)



### Training
Not it's time to train the model!

In [15]:
# Train the Model.
classifier.train(
    input_fn=lambda: input_fn(train, train_y, training=True),
    steps=5000)

<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifierV2 at 0x7f3c61a5c850>

The only thing to explain here is the **steps** argument. This simply tells the classifier to run for 5000 steps. Try modifying this and seeing if your results change. Keep in mind that more is not always better.

### Evaluation
Now let's see how this trained model does!

In [16]:
eval_result = classifier.evaluate(
    input_fn=lambda: input_fn(test, test_y, training=False))

print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))


Test set accuracy: 0.933



### Making predictions (inferring) from the trained model

You now have a trained model that produces good evaluation results. You can now use the trained model to predict the species of an Iris flower based on some unlabeled measurements. As with training and evaluation, you make predictions using a single function call:

In [17]:
# Generate predictions from the model
expected = ['Setosa', 'Versicolor', 'Virginica']
predict_x = {
    'SepalLength': [5.1, 5.9, 6.9],
    'SepalWidth': [3.3, 3.0, 3.1],
    'PetalLength': [1.7, 4.2, 5.4],
    'PetalWidth': [0.5, 1.5, 2.1],
}

def input_fn(features, batch_size=256):
    """An input function for prediction."""
    # Convert the inputs to a Dataset without labels.
    return tf.data.Dataset.from_tensor_slices(dict(features)).batch(batch_size)

predictions = classifier.predict(
    input_fn=lambda: input_fn(predict_x))

In [19]:
def input_fn(features, batch_size=256):
    """An input function for prediction."""
    # Convert the inputs to a Dataset without labels.
    return tf.data.Dataset.from_tensor_slices(dict(features)).batch(batch_size)

features = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth']
predict = {}

print("Please type numeric values as prompted.")
for feature in features:
  valid = True
  while valid:
    val = input(feature + ": ")
    if not val.isdigit(): valid = False

predictions = classifier.predict(input_fn=lambda: input_fn(predict_x))
for pred_dict in predictions:
  class_id = pred_dict['class_ids'][0]
  probability = pred_dict['probabilities'][class_id]

  print('Prediction is "{}" ({:.1f}%)'.format(
      SPECIES[class_id], 100 * probability))

Please type numeric values as prompted.
SepalLength: 2.4
SepalWidth: 2.6
PetalLength: 6.5
PetalWidth: 6.3
Prediction is "Setosa" (39.0%)
Prediction is "Versicolor" (51.4%)
Prediction is "Virginica" (69.8%)


## Resources

- [X] [Build a linear model with Estimators](https://www.tensorflow.org/tutorials/estimator/premade)