Manejo de un DataFrame de Pandas como un diccionario y model-subclass --- 6:10 min
===

* Última modificación: Mayo 6, 2022 | [YouTube](https://youtu.be/bol6NQiEHN0)

In [1]:
import os

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"

import pandas as pd
import tensorflow as tf

Lectura de datos usando Pandas
---

In [2]:
SHUFFLE_BUFFER = 500
BATCH_SIZE = 2

In [3]:
csv_file = tf.keras.utils.get_file(
    "heart.csv",
    "https://storage.googleapis.com/download.tensorflow.org/data/heart.csv",
)

In [4]:
#
# Lecura
#
df = pd.read_csv(csv_file)
target = df.pop("target")

Extracción de características numéricas
---

In [5]:
numeric_feature_names = [
    "age",
    "thalach",
    "trestbps",
    "chol",
    "oldpeak",
]
numeric_features = df[numeric_feature_names]
numeric_features.head()

Unnamed: 0,age,thalach,trestbps,chol,oldpeak
0,63,150,145,233,2.3
1,67,108,160,286,1.5
2,67,129,120,229,2.6
3,37,187,130,250,3.5
4,41,172,130,204,1.4


Manejo como un diccionario
---

**Este modelo puede aceptar un diccionario de columnas o un dataset de elementos tipo diccionario para entrenamiento**

In [6]:
#
# Caso 1: Diccionario de columnas
#
def stack_dict(inputs, fun=tf.stack):
    values = []
    for key in sorted(inputs.keys()):
        values.append(tf.cast(inputs[key], tf.float32))

    return fun(values, axis=-1)


class MyModel(tf.keras.Model):
    def __init__(self):
        super().__init__(self)

        self.normalizer = tf.keras.layers.Normalization(axis=-1)

        self.sequential = tf.keras.Sequential(
            [
                self.normalizer,
                tf.keras.layers.Dense(10, activation="relu"),
                tf.keras.layers.Dense(10, activation="relu"),
                tf.keras.layers.Dense(1),
            ]
        )

    def adapt(self, inputs):
        inputs = stack_dict(inputs)
        self.normalizer.adapt(inputs)

    def call(self, inputs):
        inputs = stack_dict(inputs)
        result = self.sequential(inputs)
        return result


model = MyModel()

model.adapt(
    #
    # Diccionario de columnas
    #
    dict(numeric_features),
)

model.compile(
    optimizer="adam",
    loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
    metrics=["accuracy"],
    run_eagerly=True,
)

model.fit(
    #
    # Diccionario de columnas
    #
    dict(numeric_features),
    target,
    epochs=5,
    batch_size=BATCH_SIZE,
)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7fca40591bb0>

In [7]:
#
# Opción 2: Dataset de TF
#
numeric_dict_ds = tf.data.Dataset.from_tensor_slices(
    (dict(numeric_features), target),
)

numeric_dict_batches = numeric_dict_ds.shuffle(SHUFFLE_BUFFER)
numeric_dict_batches = numeric_dict_batches.batch(BATCH_SIZE)

model.fit(
    numeric_dict_batches,
    epochs=5,
)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7fcad44f13d0>

In [8]:
#
# Pronóstico
#
model.predict(dict(numeric_features.iloc[:3]))

array([[[0.2139862 ]],

       [[0.38692302]],

       [[0.62569433]]], dtype=float32)