## The content

1. Setup dev environment
1. Setup MLFlow

1. Explain the training pipeline
Diagram: Training Flow
This lesson acts like an introduction to what we'll be building.
This pipeline covers up to model registration.


1. Introduction to metaflow. 
Create a sample metaflow flow. 
Explain how to run it from the command line and from a notebook.
This session is necessary so we can start running every step as part of a flow.


1. Load the data.
We create a function to do this. 
Unit test this function.
Create a flow that does it.


1. Cross-validation.
Diagram: cross-validation process.
Explain why we'll be using cross-validation.
Explain general structure of a cross-validation process: split, transform, train, evaluate, and then final evaluation.
Explain metaflow's foreach.
Set up flow: cross-validation(split) -> transform_fold -> train_fold -> evaluate_fold -> evaluation
Implement the split part.


1. Transform the data. 
Explain steps to transform the data.
Integrate transforming the data into the flow.


1. Training.
Explain how to train the model.
Integrate in the flow.
Keras with JAX backend.
Explain metaflow env variables


1. Introduction to Experiment tracking
Update training step to track metrics in mlflow


1. Evaluating fold
Explain how to evaluate the model.
Integrate this in the flow.
log metrics in mlflow


1. Evaluate cross-validated model
Explain how to do this
Integrate this in the flow.
log metrics in mlflow


1. Train final model using all of the data
Add transform entire dataset to flow
Add train model to flow
log metrics in mlflow


1. Introduction to model versioning
Implement Registration in the flow


1. The need for a Custom model
Implement the custom model class
Update registration to use this class


## Considerations:

1. Have an EDA flow
1. Create a new flow for Hyperparameter tuning. The result of this flow will be the best set of hyperparameters.

In [31]:
%load_ext autoreload
%autoreload 2
%load_ext dotenv
%dotenv

import os
from pathlib import Path

os.environ["KERAS_BACKEND"] = "tensorflow"

import sys

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
The dotenv extension is already loaded. To reload it, use:
  %reload_ext dotenv


In [32]:
CODE_FOLDER = Path("code")
CODE_FOLDER.mkdir(exist_ok=True)
sys.path.extend([f"./{CODE_FOLDER}"])

In [33]:
%%writefile code/load.py

from io import StringIO
from pathlib import Path

import pandas as pd
from metaflow import S3


def load_data_from_s3(location: str):
    """Load the dataset from an S3 location.

    This function will concatenate every CSV file in the given location
    and return a single DataFrame.
    """
    print(f"Loading dataset from location {location}")

    with S3(s3root=location) as s3:
        files = s3.get_all()

        print(f"Found {len(files)} file(s) in remote location")

        raw_data = [pd.read_csv(StringIO(file.text)) for file in files]
        return pd.concat(raw_data)


def load_data_from_file():
    """Load the dataset from a local file.

    This function is useful to test the pipeline locally
    without having to access the data remotely.
    """
    location = Path("../../penguins.csv")
    print(f"Loading dataset from location {location.as_posix()}")
    return pd.read_csv(location)


def load_data(bucket, dataset_location, debug=False):
    if debug:
        df = load_data_from_file()
    else:
        location = f"s3://{bucket}/{dataset_location}"
        df = load_data_from_s3(location)

    # Shuffle the data
    data = df.sample(frac=1, random_state=42)

    print(f"Loaded dataset with {len(data)} samples")

    return data

Overwriting code/load.py


In [None]:
from load import load_data

data = load_data("mlschool", "penguins.csv", debug=True)
data

In [34]:
import os
from pathlib import Path
from metaflow import FlowSpec, NBRunner, Parameter, pypi_base, step, pypi


PACKAGES = {
    "python": "3.10.14",
    "packages": {
        "python-dotenv": "1.0.1",
        "pandas": "2.2.2",
        "numpy": "1.26.4",
    },
}


# @pypi_base(**PACKAGES)
class MyFlow2(FlowSpec):
    @step
    def start(self):
        self.next(self.load_data)

    @pypi(packages={"pandas": "2.2.2"})
    @step
    def load_data(self):
        from load import load_data

        data = load_data("mlschool", "penguins.csv", debug=True)
        print(data.head())

        self.next(self.end)

    @step
    def end(self):
        print("the end")


run = NBRunner(MyFlow2, base_dir="code", environment="pypi").nbrun()

Metaflow 2.12.3 executing MyFlow2 for user:svpino
Validating your flow...
    The graph looks good!
Running pylint...
    Pylint not found, so extra checks are disabled.
Bootstrapping virtual environment(s) ...
Virtual environment(s) bootstrapped!
2024-07-18 13:01:58.127 Workflow starting (run-id 1721300518120064):
2024-07-18 13:01:58.162 [1721300518120064/start/1 (pid 87062)] Task is starting.
2024-07-18 13:01:58.345 [1721300518120064/start/1 (pid 87062)] Task finished successfully.
2024-07-18 13:01:58.375 [1721300518120064/load_data/2 (pid 87073)] Task is starting.
2024-07-18 13:01:58.673 [1721300518120064/load_data/2 (pid 87073)] Loading dataset from location ../../penguins.csv
2024-07-18 13:01:58.675 [1721300518120064/load_data/2 (pid 87073)] Loaded dataset with 344 samples
2024-07-18 13:01:58.678 [1721300518120064/load_data/2 (pid 87073)] species  island  ...  body_mass_g     sex
2024-07-18 13:01:58.717 [1721300518120064/load_data/2 (pid 87073)] 194  Chinstrap   Dream  ...       3

In [45]:
from metaflow import FlowSpec, NBRunner, Parameter, step, pypi


class MyFlow2(FlowSpec):
    @step
    def start(self):
        self.next(self.load_data)

    @pypi(packages={"pandas": "2.2.2"})
    @step
    def load_data(self):
        from load import load_data

        data = load_data("mlschool", "penguins.csv", debug=True)
        print(data.head())

        self.next(self.step2)

    @step
    def step2(self):
        print("Second step here!")
        self.next(self.end)

    @step
    def end(self):
        print("the end")


run = NBRunner(MyFlow2, base_dir="code", environment="pypi").nbrun()

Metaflow 2.12.3 executing MyFlow2 for user:svpino
Validating your flow...
    The graph looks good!
Running pylint...
    Pylint not found, so extra checks are disabled.
Bootstrapping virtual environment(s) ...
Virtual environment(s) bootstrapped!
2024-07-18 13:15:29.103 Workflow starting (run-id 1721301329101247):
2024-07-18 13:15:29.136 [1721301329101247/start/1 (pid 5624)] Task is starting.
2024-07-18 13:15:29.292 [1721301329101247/start/1 (pid 5624)] Task finished successfully.
2024-07-18 13:15:29.321 [1721301329101247/load_data/2 (pid 5629)] Task is starting.
2024-07-18 13:15:29.643 [1721301329101247/load_data/2 (pid 5629)] Loading dataset from location ../../penguins.csv
2024-07-18 13:15:29.644 [1721301329101247/load_data/2 (pid 5629)] Loaded dataset with 344 samples
2024-07-18 13:15:29.648 [1721301329101247/load_data/2 (pid 5629)] species  island  ...  body_mass_g     sex
2024-07-18 13:15:29.694 [1721301329101247/load_data/2 (pid 5629)] 194  Chinstrap   Dream  ...       3550.0  

In [46]:
from metaflow import Runner

with Runner("training.py", environment="pypi", env={"KERAS_BACKEND": "jax"}).run(
    max_workers=1
) as running:
    print(f"{running.run}")

Metaflow 2.12.3 executing TrainingFlow for user:svpino
Validating your flow...
    The graph looks good!
Running pylint...
    Pylint not found, so extra checks are disabled.
Bootstrapping virtual environment(s) ...
Virtual environment(s) bootstrapped!
2024-07-18 20:17:15.307 Workflow starting (run-id 1721326635302940):
2024-07-18 20:17:15.344 [1721326635302940/start/1 (pid 10409)] Task is starting.
2024-07-18 20:17:16.375 [1721326635302940/start/1 (pid 10409)] Task finished successfully.
2024-07-18 20:17:16.407 [1721326635302940/load_data/2 (pid 10459)] Task is starting.
2024-07-18 20:17:17.102 [1721326635302940/load_data/2 (pid 10459)] Loading dataset from location s3://mlschool/metaflow/data/
2024-07-18 20:17:18.646 [1721326635302940/load_data/2 (pid 10459)] Found 1 file(s) in remote location
2024-07-18 20:17:18.649 [1721326635302940/load_data/2 (pid 10459)] Loaded dataset with 344 samples
2024-07-18 20:17:18.736 [1721326635302940/load_data/2 (pid 10459)] Task finished successfully.

In [15]:
from pathlib import Path

location = Path("../penguins.csv")
df = pd.read_csv(location)
df.head()

Unnamed: 0,species,island,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_g,sex
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,MALE
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,FEMALE
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,FEMALE
3,Adelie,Torgersen,,,,,
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,FEMALE


In [16]:
import numpy as np

labels = df.pop("species")

In [17]:
labels = labels.to_numpy().reshape(-1, 1)
labels.shape

(344, 1)

In [18]:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OrdinalEncoder

label_transformer = ColumnTransformer(
    transformers=[("species", OrdinalEncoder(), [0])],
)

In [19]:
x = label_transformer.fit_transform(labels)

array([[0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],

In [46]:
from sklearn.compose import ColumnTransformer, make_column_selector
from sklearn.impute import SimpleImputer
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler

numeric_transformer = make_pipeline(
    SimpleImputer(strategy="mean"),
    StandardScaler(),
)

categorical_transformer = make_pipeline(
    SimpleImputer(strategy="most_frequent"),
    OneHotEncoder(handle_unknown="ignore"),
)

transformer = ColumnTransformer(
    transformers=[
        (
            "numeric",
            numeric_transformer,
            make_column_selector(dtype_exclude="object"),
        ),
        (
            "categorical",
            categorical_transformer,
            ["island"],
        ),
    ],
)

In [47]:
td = transformer.fit_transform(df)
td[0]

array([-0.88708123,  0.78774251, -1.42248782, -0.56578921,  0.        ,
        0.        ,  1.        ])

In [48]:
df2 = pd.DataFrame(
    [["Torgersen2", 39.1, 18.7, 181.0, 3750.0, "MALE"]],
    columns=list(df.columns),
)

In [49]:
transformer.transform(df2)

array([[-0.88708123,  0.78774251, -1.42248782, -0.56578921,  0.        ,
         0.        ,  0.        ]])

In [348]:
df["sex"] = df["sex"].replace(".", np.nan)
df = df.dropna()

In [345]:
test_df = df.sample(frac=0.2, random_state=42)
train_df = df.drop(test_df.index)

In [346]:
train_df.head()

Unnamed: 0,species,island,culmen_length_mm,culmen_depth_mm,flipper_length_mm,body_mass_g,sex
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,MALE
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,FEMALE
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,FEMALE
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,FEMALE
6,Adelie,Torgersen,38.9,17.8,181.0,3625.0,FEMALE


In [349]:
from sklearn.model_selection import KFold

kfold = KFold(n_splits=2, shuffle=True)
kfold_indices = list(enumerate(kfold.split(train_df)))

In [351]:
kfold_indices[0]

(0,
 (array([  0,   1,   3,   4,   5,   7,  10,  14,  16,  17,  18,  23,  24,
          25,  26,  28,  29,  30,  31,  32,  35,  36,  37,  38,  40,  41,
          43,  45,  46,  47,  50,  52,  53,  59,  60,  61,  65,  66,  67,
          68,  69,  71,  72,  78,  79,  80,  81,  82,  84,  95,  98,  99,
         101, 102, 103, 104, 108, 109, 111, 115, 116, 117, 119, 122, 130,
         132, 133, 136, 137, 139, 140, 141, 143, 144, 145, 147, 149, 150,
         152, 153, 154, 155, 160, 162, 166, 172, 173, 174, 175, 177, 178,
         179, 187, 188, 194, 195, 196, 197, 200, 201, 202, 203, 205, 215,
         216, 217, 219, 220, 221, 222, 224, 225, 227, 228, 230, 234, 235,
         238, 239, 242, 244, 245, 246, 247, 248, 249, 251, 254, 255, 256,
         261, 263, 265, 267, 268, 271, 274]),
  array([  2,   6,   8,   9,  11,  12,  13,  15,  19,  20,  21,  22,  27,
          33,  34,  39,  42,  44,  48,  49,  51,  54,  55,  56,  57,  58,
          62,  63,  64,  70,  73,  74,  75,  76,  77,  83,  85

In [352]:
train_df = df.iloc[kfold_indices[0][1][0]]
test_df = df.iloc[kfold_indices[0][1][1]]

In [353]:
import tensorflow as tf
from keras.layers import StringLookup

label_lookup = StringLookup(
    # the order here is important since the first index will be encoded as 0
    vocabulary=["Adelie", "Chinstrap", "Gentoo"],
    num_oov_indices=0,
)


def encode_label(x, y):
    encoded_y = label_lookup(y)
    return x, encoded_y


def dataframe_to_dataset(df):
    df = df.copy()
    labels = df.pop("species")
    ds = tf.data.Dataset.from_tensor_slices((dict(df), labels))
    ds = ds.map(encode_label, num_parallel_calls=tf.data.AUTOTUNE)
    ds = ds.shuffle(buffer_size=len(df))
    return ds


train_dataset = dataframe_to_dataset(train_df)
test_dataset = dataframe_to_dataset(test_df)

In [354]:
for x, y in train_dataset.take(1):
    print("Input:", x)
    print("Target:", y)

Input: {'island': <tf.Tensor: shape=(), dtype=string, numpy=b'Biscoe'>, 'culmen_length_mm': <tf.Tensor: shape=(), dtype=float64, numpy=42.8>, 'culmen_depth_mm': <tf.Tensor: shape=(), dtype=float64, numpy=14.2>, 'flipper_length_mm': <tf.Tensor: shape=(), dtype=float64, numpy=209.0>, 'body_mass_g': <tf.Tensor: shape=(), dtype=float64, numpy=4700.0>, 'sex': <tf.Tensor: shape=(), dtype=string, numpy=b'FEMALE'>}
Target: tf.Tensor(2, shape=(), dtype=int64)


2024-06-08 17:21:07.476200: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


In [119]:
train_dataset = train_dataset.batch(32)
test_dataset = test_dataset.batch(32)

In [271]:
for x, y in train_dataset.take(1):
    print("Input:", x)

Input: {'island': <tf.Tensor: shape=(32,), dtype=string, numpy=
array([b'Biscoe', b'Torgersen', b'Biscoe', b'Dream', b'Biscoe', b'Biscoe',
       b'Torgersen', b'Torgersen', b'Dream', b'Biscoe', b'Dream',
       b'Biscoe', b'Dream', b'Biscoe', b'Biscoe', b'Biscoe', b'Dream',
       b'Biscoe', b'Biscoe', b'Biscoe', b'Biscoe', b'Biscoe', b'Biscoe',
       b'Biscoe', b'Biscoe', b'Biscoe', b'Dream', b'Dream', b'Dream',
       b'Torgersen', b'Dream', b'Biscoe'], dtype=object)>, 'culmen_length_mm': <tf.Tensor: shape=(32,), dtype=float64, numpy=
array([44. , 41.5, 49.6, 49.8, 35.5, 43.3, 36.2, 36.2, 43.5, 39.7, 47.6,
       45.6, 51.3, 49.9, 49.5, 46.8, 45.6, 45.2, 37.7, 47.5, 40.6, 50.8,
       49.4, 50. , 36.5, 49.3, 50.8, 39.7, 49. , 39.7, 35.6, 42. ])>, 'culmen_depth_mm': <tf.Tensor: shape=(32,), dtype=float64, numpy=
array([13.6, 18.3, 15. , 17.3, 16.2, 14. , 17.2, 16.1, 18.1, 17.7, 18.3,
       20.3, 19.9, 16.1, 16.1, 14.3, 19.4, 13.8, 16. , 14.2, 18.6, 15.7,
       15.8, 15.9, 16.6, 15

2024-06-08 16:08:53.143833: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


In [120]:
from keras.utils import FeatureSpace

feature_space = FeatureSpace(
    features={
        "sex": FeatureSpace.string_categorical(num_oov_indices=0),
        "island": "string_categorical",
        "culmen_length_mm": "float_normalized",
        "culmen_depth_mm": "float_normalized",
        "flipper_length_mm": "float_normalized",
        "body_mass_g": "float_normalized",
    },
    output_mode="concat",
)

In [121]:
train_ds_with_no_labels = train_dataset.map(lambda x, _: x)
feature_space.adapt(train_ds_with_no_labels)

2024-06-08 14:52:46.673085: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
2024-06-08 14:52:46.701861: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
2024-06-08 14:52:46.726271: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
2024-06-08 14:52:46.750699: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
2024-06-08 14:52:46.774244: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
2024-06-08 14:52:46.801095: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
2024-06-08 14:52:46.823662: W tensorflow/core/framework/local_rendezvous.cc:404] L

In [122]:
for x, _ in train_dataset.take(1):
    preprocessed_x = feature_space(x)
    print("preprocessed_x.shape:", preprocessed_x.shape)
    print("preprocessed_x.dtype:", preprocessed_x.dtype)

preprocessed_x.shape: (32, 10)
preprocessed_x.dtype: <dtype: 'float32'>


2024-06-08 14:52:47.613255: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


In [123]:
preprocessed_train_ds = train_dataset.map(
    lambda x, y: (feature_space(x), y),
    num_parallel_calls=tf.data.AUTOTUNE,
)
preprocessed_train_ds = preprocessed_train_ds.prefetch(tf.data.AUTOTUNE)

preprocessed_test_ds = test_dataset.map(
    lambda x, y: (feature_space(x), y),
    num_parallel_calls=tf.data.AUTOTUNE,
)
preprocessed_test_ds = preprocessed_test_ds.prefetch(tf.data.AUTOTUNE)

In [124]:
dict_inputs = feature_space.get_inputs()
encoded_features = feature_space.get_encoded_features()

In [125]:
dict_inputs

{'island': <KerasTensor shape=(None, 1), dtype=string, sparse=None, name=island>,
 'sex': <KerasTensor shape=(None, 1), dtype=string, sparse=None, name=sex>,
 'culmen_length_mm': <KerasTensor shape=(None, 1), dtype=float32, sparse=None, name=culmen_length_mm>,
 'culmen_depth_mm': <KerasTensor shape=(None, 1), dtype=float32, sparse=None, name=culmen_depth_mm>,
 'flipper_length_mm': <KerasTensor shape=(None, 1), dtype=float32, sparse=None, name=flipper_length_mm>,
 'body_mass_g': <KerasTensor shape=(None, 1), dtype=float32, sparse=None, name=body_mass_g>}

In [126]:
encoded_features

<KerasTensor shape=(None, 10), dtype=float32, sparse=False, name=keras_tensor_110>

In [127]:
from keras import Model
from keras.layers import Dense
from keras.optimizers import SGD

x = Dense(10, activation="relu")(encoded_features)
x = Dense(8, activation="relu")(x)
outputs = Dense(3, activation="softmax")(x)

model = Model(inputs=encoded_features, outputs=outputs)
model.compile(
    optimizer=SGD(learning_rate=0.01),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

In [128]:
inference_model = Model(inputs=dict_inputs, outputs=outputs)

In [129]:
model.fit(
    preprocessed_train_ds,
    epochs=50,
    validation_data=preprocessed_test_ds,
    verbose=2,
)

Epoch 1/50


9/9 - 0s - 23ms/step - accuracy: 0.1873 - loss: 1.2161 - val_accuracy: 0.2879 - val_loss: 1.1632
Epoch 2/50
9/9 - 0s - 5ms/step - accuracy: 0.2322 - loss: 1.1728 - val_accuracy: 0.3636 - val_loss: 1.1302
Epoch 3/50
9/9 - 0s - 5ms/step - accuracy: 0.2996 - loss: 1.1336 - val_accuracy: 0.3636 - val_loss: 1.1041
Epoch 4/50
9/9 - 0s - 5ms/step - accuracy: 0.3408 - loss: 1.1017 - val_accuracy: 0.4242 - val_loss: 1.0803
Epoch 5/50
9/9 - 0s - 5ms/step - accuracy: 0.3858 - loss: 1.0732 - val_accuracy: 0.4545 - val_loss: 1.0581
Epoch 6/50
9/9 - 0s - 5ms/step - accuracy: 0.4120 - loss: 1.0464 - val_accuracy: 0.5152 - val_loss: 1.0365
Epoch 7/50
9/9 - 0s - 5ms/step - accuracy: 0.4494 - loss: 1.0204 - val_accuracy: 0.5152 - val_loss: 1.0140
Epoch 8/50
9/9 - 0s - 5ms/step - accuracy: 0.4906 - loss: 0.9946 - val_accuracy: 0.5606 - val_loss: 0.9917
Epoch 9/50
9/9 - 0s - 5ms/step - accuracy: 0.5243 - loss: 0.9701 - val_accuracy: 0.6364 - val_loss: 0.9700
Epoch 10/50
9/9 - 0s - 5ms/step - accuracy: 0.5

<keras.src.callbacks.history.History at 0x3a63ae4a0>

In [304]:
sample = {
    "island": ["Biscoe", "Torgersen", "Torgersen"],
    "culmen_length_mm": [48.6, 44.1, 39.1],
    "culmen_depth_mm": [16.0, 18.0, 18.7],
    "flipper_length_mm": [230.0, 210.0, 181.0],
    "body_mass_g": [5800.0, 4000.0, 3750.0],
    "sex": ["MALE", "FEMALE", "MALE"],
}

input_dict = {name: tf.convert_to_tensor(value) for name, value in sample.items()}
input_dict

{'island': <tf.Tensor: shape=(3,), dtype=string, numpy=array([b'Biscoe', b'Torgersen', b'Torgersen'], dtype=object)>,
 'culmen_length_mm': <tf.Tensor: shape=(3,), dtype=float32, numpy=array([48.6, 44.1, 39.1], dtype=float32)>,
 'culmen_depth_mm': <tf.Tensor: shape=(3,), dtype=float32, numpy=array([16. , 18. , 18.7], dtype=float32)>,
 'flipper_length_mm': <tf.Tensor: shape=(3,), dtype=float32, numpy=array([230., 210., 181.], dtype=float32)>,
 'body_mass_g': <tf.Tensor: shape=(3,), dtype=float32, numpy=array([5800., 4000., 3750.], dtype=float32)>,
 'sex': <tf.Tensor: shape=(3,), dtype=string, numpy=array([b'MALE', b'FEMALE', b'MALE'], dtype=object)>}

In [314]:
result = inference_model.predict(input_dict)
result

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step


array([[0.03257361, 0.14812116, 0.81930524],
       [0.3940005 , 0.28193894, 0.32406056],
       [0.94649315, 0.02197206, 0.0315347 ]], dtype=float32)

In [315]:
from keras.layers import Lambda


def pred(p):
    return tf.stack(
        [
            tf.cast(tf.math.argmax(p, axis=1), dtype=tf.float32),
            tf.math.reduce_max(p, axis=1),
        ],
    )


prediction = Lambda(pred)(outputs)

inference_model2 = Model(inputs=dict_inputs, outputs=prediction)

In [316]:
result2 = inference_model2.predict(input_dict)
result2

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 50ms/step


array([[2.        , 0.        , 0.        ],
       [0.81930524, 0.3940005 , 0.94649315]], dtype=float32)

In [317]:
label_lookup.get_vocabulary()

['Adelie', 'Chinstrap', 'Gentoo']

In [318]:
decoder = StringLookup(
    vocabulary=label_lookup.get_vocabulary(),
    invert=True,
    num_oov_indices=0,
)

In [320]:
decoder(np.argmax(result, axis=1)).numpy()

array([b'Gentoo', b'Adelie', b'Adelie'], dtype=object)

In [321]:
np.argmax(result, axis=1), np.max(result, axis=1)

(array([2, 0, 0]), array([0.81930524, 0.3940005 , 0.94649315], dtype=float32))

In [12]:
import numpy as np

classes = ["Adelie", "Chinstrap", "Gentoo"]

prediction = np.array([0, 2, 1, 1])
condifence = np.array([0.6, 0.9, 0.8, 0.7])

prediction = np.vectorize(lambda x: classes[x])(prediction)

[
    {"prediction": p, "confidence": c}
    for p, c in zip(prediction, condifence, strict=True)
]

[{'prediction': 'Adelie', 'confidence': 0.6},
 {'prediction': 'Gentoo', 'confidence': 0.9},
 {'prediction': 'Chinstrap', 'confidence': 0.8},
 {'prediction': 'Chinstrap', 'confidence': 0.7}]

In [7]:
predictions

['Adelie', 'Gentoo', 'Chinstrap', 'Chinstrap']