<a href="https://colab.research.google.com/github/squeeko/DL_TF20_KerasCNNGANSRNNNLP/blob/in_progress/DL_TF2_Ch2_TF_Estimators.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# TensorFlow Estimators

TensorFlow provides Estimators as higher-level APIs, to provide scalable and
production-oriented solutions. They take care of all behind-the-scene activities such
as creating computational graphs, initializing the variables, training the model,
saving checkpoints, and logging TensorBoard files. TensorFlow provides two types
of Estimators:

• Canned Estimators: These are premade Estimators available in the
TensorFlow estimator module. These are models in a box; you just pass
them the input features and they are ready to use. Some examples are
Linear Classifier, Linear Regressor, DNN Classifier, and so on.

• Custom Estimators: Users can also create their own estimators from the
models they build in TensorFlow Keras. These are user-defined Estimators.
Before being able to use TensorFlow Estimator let us understand two important
components of the Estimator pipeline:

## Feature columns

The feature_column module of TensorFlow 2.0 acts as a bridge between your input
data and the model. The input parameters to be used by the estimators for training
are passed as feature columns. They are defined in TensorFlow feature_column and
specify how the data is interpreted by the model. To create feature columns we will
need to call functions from tensorflow.feature_columns. There are nine functions
available in feature column:

**• categorical_column_with_identity:** Here each category is one-hot
encoded, and thus has a unique identity. This can be used for numeric values
only.

**• categorical_column_with_vocabulary_file:** This is used when the
categorical input is a string and the categories are given in a file. The string is
first converted to a numeric value and then one-hot encoded.

**• categorical_column_with_vocabulary_list:** This is used when the categorical input is a string and the categories are explicitly defined in a list.
The string is first converted to a numeric value and then one-hot encoded.

**• categorical_column_with_hash_bucket:** In case the number of categories is very large, and it is not possible to one-hot encode, we use hashing.

**• crossed_column:** When we want to use two columns combined as one
feature, for example, in the case of geolocation-based data it makes sense to
combine longitude and latitude values as one feature.

**• numeric_column:** Used when the feature is a numeric, it can be a single value or even a matrix.

**• indicator_column:** We do not use this directly. Instead, it is used with the categorical column, but only when the number of categories is limited and can be represented as one-hot encoded.

**• embedding_column:** We do not use this directly. Instead, it is used with the categorical column, but only when the number of categories is very large and
cannot be represented as one-hot encoded.

**• bucketized_column:** This is used when, instead of a specific numeric value, we split the data into different categories depending upon its value.

The first six functions inherit from the Categorical Column class, the next three inherit from the Dense Column class, and the last one inherits from both classes. 

In the following example we will use numeric_column and categorical_column_ with_vocabulary_list functions.

## MNIST using TensorFlow Estimator API

Let us build a simple TensorFlow estimator with a simple dataset for a multiple regression problem. We continue with the home price prediction, but now have two features, that is, we are considering two independent variables: the area of the house and its type (bungalow or apartment) on which we presume our price should
depend:

In [2]:
# Import the modules

import tensorflow as tf
from tensorflow import feature_column as fc
numeric_column = fc.numeric_column
categorical_column_with_vocabulary = fc.categorical_column_with_vocabulary_list

In [4]:
# define the feature columns to train the regressor

featcols = [
            tf.feature_column.numeric_column('area'),
            tf.feature_column.categorical_column_with_vocabulary_list("type",["bungalow", "apartment"])
]

In [5]:
# define an input function to provide input for training

def train_input_fn():
  features = {"area":[1000, 2000, 4000, 1000, 2000, 4000],
              "type":["bungalow","bungalow","house","apartment",
                      "apartment","apartment"]}
  labels = [500, 1000, 1500, 700, 1300, 1900]
  return features, labels

In [6]:
# use the premade LinearRegressor estimator and fit it on the training dataset

model = tf.estimator.LinearRegressor(featcols)
model.train(train_input_fn, steps = 200)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpp6mn22gg', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and 

<tensorflow_estimator.python.estimator.canned.linear.LinearRegressorV2 at 0x7fd44e0ab048>

In [8]:
# estimator is trained and lets see the results of the prediction

def predict_input_fn():
  features = {"area":[1500, 1800],
              "type":["house", "apt"]}
  return features

predictions = model.predict(predict_input_fn)

print(next(predictions))
print(next(predictions))

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpp6mn22gg/model.ckpt-200
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
{'predictions': array([692.7829], dtype=float32)}
{'predictions': array([830.9035], dtype=float32)}
