In [0]:
#https://www.tensorflow.org/tutorials/estimator/premade

An Estimator is TensorFlow's high-level representation of a complete model, and it has been designed for easy scaling and asynchronous training.

In [3]:
from __future__ import absolute_import, division, print_function, unicode_literals

# TensorFlow and tf.keras
try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass
import tensorflow as tf
from tensorflow import keras
import numpy as np

TensorFlow 2.x selected.


In [0]:
import pandas as pd

#The data set

The sample program in this document builds and tests a model that classifies Iris flowers into three different species based on the size of their sepals and petals.

You will train a model using the Iris data set. The Iris data set contains four features and one label. The four features identify the following botanical characteristics of individual Iris flowers:

    sepal length
    sepal width
    petal length
    petal width

Based on this information, you can define a few helpful constants for parsing the data:

In [0]:
#columns and labels
CSV_COLUMN_NAMES = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species']
SPECIES = ['Setosa', 'Versicolor', 'Virginica']


Next, download and parse the Iris data set using Keras and Pandas. Note that you keep distinct datasets for training and testing.

In [6]:
train_path = tf.keras.utils.get_file(
    "iris_training.csv", "https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv")
test_path = tf.keras.utils.get_file(
    "iris_test.csv", "https://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv")

train = pd.read_csv(train_path, names=CSV_COLUMN_NAMES, header=0) #simple csv reader with pandas
test = pd.read_csv(test_path, names=CSV_COLUMN_NAMES, header=0)


Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv
Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv


In [7]:
#check data
train.head()


Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth,Species
0,6.4,2.8,5.6,2.2,2
1,5.0,2.3,3.3,1.0,1
2,4.9,2.5,4.5,1.7,2
3,4.9,3.1,1.5,0.1,0
4,5.7,3.8,1.7,0.3,0


In [8]:
#For each of the datasets, split out the labels, which the model will be trained to predict.
train_y = train.pop('Species')
test_y = test.pop('Species')

# The label column has now been removed from the features.
train.head()


Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth
0,6.4,2.8,5.6,2.2
1,5.0,2.3,3.3,1.0
2,4.9,2.5,4.5,1.7
3,4.9,3.1,1.5,0.1
4,5.7,3.8,1.7,0.3


#Overview of programming with Estimators

Now that you have the data set up, you can define a model using a TensorFlow Estimator. An Estimator is any class derived from `tf.estimator.Estimator`. TensorFlow provides a collection of `tf.estimator `(for example, LinearRegressor) to implement common ML algorithms. custom estimators may be made.

To write a TensorFlow program based on pre-made Estimators, you must perform the following tasks:

    1. Create one or more input functions.
    2. Define the model's feature columns.
    3. Instantiate an Estimator, specifying the feature columns and various hyperparameters.
    4. Call one or more methods on the Estimator object, passing the appropriate input function as the source of the data.



#Create input functions

You must create input functions to supply data for training, evaluating, and prediction.

An input function is a function that returns a tf.data.Dataset object which outputs the following two-element tuple:

    features - A Python dictionary in which:
        Each key is the name of a feature.
        Each value is an array containing all of that feature's values.
    label - An array containing the values of the label for every example.

Just to demonstrate the format of the input function, here's a simple implementation:

In [0]:
def input_evaluation_set(): #I think this is just making a simple input dictionary example
    features = {'SepalLength': np.array([6.4, 5.0]),
                'SepalWidth':  np.array([2.8, 2.3]),
                'PetalLength': np.array([5.6, 3.3]),
                'PetalWidth':  np.array([2.2, 1.0])}
    labels = np.array([2, 1]) #Why does it have two labels?
    return features, labels

Your input function may generate the features dictionary and label list any way you like. However, we recommend using **TensorFlow's Dataset API**, which can parse all sorts of data.

The Dataset API can handle a lot of common cases for you. For example, using the Dataset API, you can easily read in records from a large collection of files in parallel and join them into a single stream.

To keep things simple in this example you are going to load the data with pandas, and build an input pipeline from this in-memory data:

In [0]:
def input_fn(features, labels, training=True, batch_size=256):
    """An input function for training or evaluating"""
    # Convert the inputs to a Dataset.
    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels)) #Si le quitas el dict() a features es lo mismo, creo que es porque ya esta en formato de diccionario
    #Esto toma los features y su label y lo transforma a BatchDataset. Nota, puedes ponerlo sin las labels

    # Shuffle and repeat if you are in training mode.
    if training:
        dataset = dataset.shuffle(1000).repeat()
    
    return dataset.batch(batch_size) #Regresa un BatchDataset


#Define the feature columns

A feature column is an object describing how the model should use raw input data from the features dictionary. 

When you build an Estimator model,you pass it a list of feature columns that describes each of the features you want the model to use. The `tf.feature_column` module provides many options for representing data to the model.

For Iris, the 4 raw features are numeric values, so we'll build a list of feature columns to tell the Estimator model to represent each of the four features as 32-bit floating-point values. Therefore, the code to create the feature column is:

In [0]:
# Feature columns describe how to use the input.
my_feature_columns = []
for key in train.keys(): #la key de train.keys() es el nombre de la columna. te pasa cada nombre, pero no como string creo, como un key object??? 
    my_feature_columns.append(tf.feature_column.numeric_column(key=key)) #agrega esa key al feature_column


In [12]:
my_feature_columns
#Feature columns can be far more sophisticated than those we're showing here

[NumericColumn(key='SepalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='SepalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='PetalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
 NumericColumn(key='PetalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

#Instantiate an estimator

(I think this are just pre-made neural nets, like a benchmark)

The Iris problem is a classic classification problem. Fortunately, TensorFlow provides several pre-made classifier Estimators, including:

    tf.estimator.DNNClassifier for deep models that perform multi-class classification.
    tf.estimator.DNNLinearCombinedClassifier for wide & deep models.
    tf.estimator.LinearClassifier for classifiers based on linear models.

For the Iris problem, tf.estimator.DNNClassifier seems like the best choice. Here's how you instantiated this Estimator:

In [13]:
# Build a DNN with 2 hidden layers with 30 and 10 hidden nodes each.
classifier = tf.estimator.DNNClassifier(
    feature_columns=my_feature_columns, #my_feature_columns son 4 numeric columns con la feature key
    # Two hidden layers of 30 and 10 nodes respectively.
    hidden_units=[30, 10],
    # The model must choose between 3 classes.
    n_classes=3)


INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmp73pdzl9s', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


#Train, Evaluate, and Predict

Now that you have an Estimator object, you can call methods to do the following:

    Train the model.
    Evaluate the trained model.
    Use the trained model to make predictions.


#Train the model

Train the model by calling the Estimator's train method as follows:

In [15]:
# Train the Model.
classifier.train( #classifier is the name of the model we built with DNNClassifier estimator
    input_fn=lambda: input_fn(train, train_y, training=True), #We wrap the input in lambda, 
    #to capture the arguments while providing an input function that takes no arguments, as expected by the Estimator.
    #I think this is just how the estimator expects the input shape or function?
    #Its because input_fn is a function we built, and it returns a BatchDataset.
    #So we are basically making all this info a batch_dataset
    steps=5000)


INFO:tensorflow:Calling model_fn.


To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmp73pdzl9s/model.ckpt-5000
Instructions for updating:
Use standard file utilities to get mtimes.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 5000 into /tmp/tmp73pdzl9s/model.ckpt.
INFO:tensorflow:loss = 0.37135062, step = 5000
INFO:tensorflow:global_step/sec: 256.688
INFO:tensorflow:loss = 0.36242497, step = 5100 (0.395 sec)
INFO:tensorflow:global_step/sec: 324.619
INFO:tensorflow:loss = 0.3580719, step = 5200 (0.30

<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifierV2 at 0x7f8243243860>

Note that you wrap up your input_fn call in a lambda to capture the arguments while providing an input function that takes no arguments, as expected by the Estimator. The steps argument tells the method to stop training after a number of training steps.

#Evaluate the trained model

Now that the model has been trained, you can get some statistics on its performance. The following code block evaluates the accuracy of the trained model on the test data:

In [21]:
eval_result = classifier.evaluate(
    input_fn=lambda: input_fn(test, test_y, training=False)) #Again it is wrapped in lambda
    #Without the lamba it just does not work

print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))


INFO:tensorflow:Calling model_fn.


To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2020-01-19T19:02:27Z
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmp73pdzl9s/model.ckpt-10000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Inference Time : 0.22057s
INFO:tensorflow:Finished evaluation at 2020-01-19-19:02:27
INFO:tensorflow:Saving dict for global step 10000: accuracy = 0.96666664, average_loss = 0.2710706, global_step = 10000, loss = 0.2710706
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 10000: /tmp/tmp73pdzl9s/model.ckpt-10000

Test set accuracy: 0.967

Unlike the call to the train method, you did not pass the steps argument to evaluate. The input_fn for eval only yields a single epoch of data.

The `eval_result` dictionary also contains the `average_loss` (mean loss per sample), the `loss` (mean loss per mini-batch) and the value of the estimator's `global_step` (the number of training iterations it underwent).

#Making predictions (inferring) from the trained model

You now have a trained model that produces good evaluation results. You can now use the trained model to predict the species of an Iris flower based on some unlabeled measurements. As with training and evaluation, you make predictions using a single function call:

In [0]:
# Generate predictions from the model
expected = ['Setosa', 'Versicolor', 'Virginica'] #the ones the model should output. (because there are 3 classes to choose from), it gives probabilities
predict_x = { #the characteristics of the plant
    'SepalLength': [5.1, 5.9, 6.9],
    'SepalWidth': [3.3, 3.0, 3.1],
    'PetalLength': [1.7, 4.2, 5.4],
    'PetalWidth': [0.5, 1.5, 2.1],
}

def input_fn(features, batch_size=256): #Nota, teníamos otras input_fn function definida, pero la cambiamos ahora
    """An input function for prediction.""" #otra nota: creo que input_fn también es un argumento usado por el esimator
    # Convert the inputs to a Dataset without labels.
    return tf.data.Dataset.from_tensor_slices(dict(features)).batch(batch_size)

predictions = classifier.predict(
    input_fn=lambda: input_fn(predict_x)) #I do think input_fn, apart of beign a function we defined, its part of the estimator function


The predict method returns a Python iterable, yielding a dictionary of prediction results for each example. The following code prints a few predictions and their probabilities:

In [41]:
#Lo que hace esto es que te da la probabilidad de las 3 flores, toma la más alta e imprime la flor con más probabilidad
for pred_dict, expec in zip(predictions, expected):
    class_id = pred_dict['class_ids'][0] #pred dict es un diccionario, class ids es la id de la flor. el [0] solo es para tener el array sin los [[]] 
    probability = pred_dict['probabilities'][class_id]

    print('Prediction is "{}" ({:.1f}%), expected "{}"'.format(
        SPECIES[class_id], 100 * probability, expec))


INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmp73pdzl9s/model.ckpt-10000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Prediction is "Setosa" (90.3%), expected "Setosa"
Prediction is "Versicolor" (69.2%), expected "Versicolor"
Prediction is "Virginica" (69.4%), expected "Virginica"


In [39]:
for pred_dict, expec in zip(predictions, expected):
  print(pred_dict['probabilities']) #Te da las probabilidades de las categorías de las flores.
  

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmp73pdzl9s/model.ckpt-10000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
[9.0284789e-01 9.6867301e-02 2.8477467e-04]
[0.05087036 0.69219446 0.2569351 ]
[0.0044178  0.30163544 0.6939467 ]
