#Data types:
<ul>
  <li>Estimator?</li>
  <li>BatchDataset</li>
  <li>feature_column.dense_features_v2.DenseFeatures</li>
  <li>CrossedColumn (This are two columns and ¿merges them? into one)</li>
  <li>IndicatorColumn: this one is one-hot-encoded</li>
</ul>

#Estimators
#Overview of programming with Estimators

Now that you have the data set up, you can define a model using a TensorFlow Estimator. An Estimator is any class derived from `tf.estimator.Estimator`. TensorFlow provides a collection of `tf.estimator `(for example, LinearRegressor) to implement common ML algorithms. custom estimators may be made.

To write a TensorFlow program based on pre-made Estimators, you must perform the following tasks:

    1. Create one or more input functions.
      #to supply data for training, evaluating, and prediction.
      #An input function is a function that returns a tf.data.Dataset object which outputs the following two-element tuple:
      *features - A Python dictionary in which: Each key is the name of a feature AND each value is an array containing all of that feature's values.
      *label - An array containing the values of the label for every example.
    2. Define the model's feature columns. #Feature columns describe how to use the input.
      A feature column is an object describing how the model should use raw input data from the features dictionary. 
    3. Instantiate an Estimator, specifying the feature columns and various hyperparameters.
    4. Call one or more methods on the Estimator object, passing the appropriate input function as the source of the data.


#Feature Engineering for the Model

Estimators use a system called feature columns to describe how the model should interpret each of the raw input features. An Estimator expects a vector of numeric inputs, and feature columns describe how the model should convert each feature.

Selecting and crafting the right set of feature columns is key to learning an effective model. A feature column can be either one of the raw inputs in the original features dict (a base feature column), or any new columns created using transformations defined over one or multiple base columns (a derived feature columns).

Feature columns work with all TensorFlow estimators and their purpose is to define the features used for modeling. Additionally, they provide some feature engineering capabilities like one-hot-encoding, normalization, and bucketization.

**NOTE:** to define the feature columns the process seems simple:
You take the name of the categorical columns and of the numerical columns (both should be apart). (You could also one-hot-encode the columns as in: https://www.tensorflow.org/tutorials/estimator/boosted_trees)
an example of the titanic dataset:


```
#Base Feature Columns
CATEGORICAL_COLUMNS = ['sex', 'n_siblings_spouses', 'parch', 'class', 'deck',
                       'embark_town', 'alone'] #the categorial column names
NUMERIC_COLUMNS = ['age', 'fare'] #the numerical column names

feature_columns = [] #creating the feature column array
for feature_name in CATEGORICAL_COLUMNS: #getting the categorical column names
  vocabulary = dftrain[feature_name].unique() #creating a vocabilary, which is basically all the possibilities of categories in the column
  feature_columns.append(tf.feature_column.categorical_column_with_vocabulary_list(feature_name, vocabulary)) #
  #this creates a column of type VocabularyListCategoricalColumn, with the column name as 'key' and its possibilites as the vocabulary

for feature_name in NUMERIC_COLUMNS:
  feature_columns.append(tf.feature_column.numeric_column(feature_name, dtype=tf.float32)) #a numeric column

```
**USUALLY AFTER BUILDING THE FEATURES, THE DATA IS BATCHED AND SHUFFLED**


#Train and evaluate the model

Below you will do the following steps:

    1. Initialize the model, specifying the features and hyperparameters.
    2. Feed the training data to the model using the train_input_fn and train the model using the train function.
    3. You will assess model performance using the evaluation set—in this example, the dfeval DataFrame. You will verify that the predictions match the labels from the y_eval array.


Super simple linear example
```
#Adding the feature_columns
linear_est = tf.estimator.LinearClassifier(feature_columns,n_batches_per_layer=n_batches) #This is like the previous model, simple linear

# Train model.
linear_est.train(train_input_fn, max_steps=100)

# Evaluation.
result = linear_est.evaluate(eval_input_fn)
clear_output()
print(pd.Series(result))
```



#Useful functions:
<ul>
  <li>
    <a>tf.data.Dataset.from_tensor_slices</a>: Esto toma los features y su label y lo transforma a BatchDataset. Nota, puedes ponerlo sin las labels.
    
    ejemplo: tf.data.Dataset.from_tensor_slices((dict(data_df), label_df)) #creo que depende del formato de tus datos el dict() es relevante
    #Note: usually after doing this you shuffle the dataset:
    ds = ds.shuffle(1000) #shuffling the dataset
    #And then create batches:
    ds = ds.batch(batch_size).repeat(num_epochs)
  </li>
  <li><a>tf.feature_column.indicator_colum(categorical_column_of_type_VocabularyListCategoricalColumn)</a>.

    this transforms from vocabularylist.... column into a indicator_column.
    Which is needed to pass into a tf.keras.layers.DenseFeatures
  </li>
  <li><a>tf.feature_column.indicator_column(tf.feature_column.categorical_column_with_vocabulary_list(feature_name, vocabulary))</a>: this transforms a column into an indicator column and one-hot-encodes it, (the feature name is what you want to encode and the vocabulary is how many values there are to encode) (see in: https://www.tensorflow.org/tutorials/estimator/boosted_trees) 
    
    example = dict(dftrain.head(1)) #get a dictionary of the first passenger
    class_fc = tf.feature_column.indicator_column(tf.feature_column.categorical_column_with_vocabulary_list('class', ('First', 'Second', 'Third')))
    
    print('Feature value: "{}"'.format(example['class'].iloc[0])) #Show the value (non-encoded) of the feature we want
    print('One-hot encoded: ', tf.keras.layers.DenseFeatures([class_fc])(example).numpy()) #showing the one-hot

  </li>

  <li><a>dataset = dataset.shuffle()</a></li>
  <li><a>tf.keras.estimator.model_to_estimator(keras_model=model,model_dir=model_dir)</a> Builds an estimator from a keras_model </li>
  <li><a>clear_output()</a> importar de: from IPython.display import clear_output . Lo que hace es que remueve los warnings y eso del output después de una función</li>

</ul>

#NOTES
  *Study(understand) ROC Curve (from Linear model tutorial):
  https://www.youtube.com/watch?v=4jRBRDbJemM
  
  *It seems like the estimators are only-tensorflow models, (no-keras included), when you use the layers and use the model.add(xxxx) thats keras