# Keras Overview

An overview of using Keras 3 for building an autoencoder and its application in anomaly detection.

Neural Networks provide a powerful and flexible framework for learning complex patterns in data. They represent data with general structures called tensors and learn by iteratively adjusting internal parameters through a process called training. This allows them to build sophisticated representations of the underlying data distribution.

**Structure and Training:**

A neural network is structured as a series of interconnected layers. Each layer is composed of nodes (or neurons) connected to the nodes in preceding and subsequent layers. During training, the network learns optimal values for these connections (weights) and biases associated with each node, enabling it to map inputs to desired outputs.

**Frameworks and Keras:**

Several popular frameworks facilitate neural network development, including TensorFlow, PyTorch, and the increasingly adopted JAX. Keras 3 offers a user-friendly, high-level API that sits atop these frameworks, enabling you to build and train models that can leverage the strengths of each backend. This means you can write your code once in Keras and potentially switch the backend (TensorFlow, PyTorch, or JAX) depending on your needs or performance considerations.

**Autoencoders and Anomaly Detection:**

Autoencoders are a specific type of neural network architecture that learn to reconstruct their input data. By forcing the network to learn a compressed representation of the data (encoding) and then reconstruct it (decoding), autoencoders can capture essential features. This makes them useful for tasks like dimensionality reduction and, importantly, anomaly detection. Anomalies, being significantly different from the learned patterns, typically result in higher reconstruction errors, allowing for their identification.

**What This Workflow Covers:**
- Train an autoencoder with Keras API
    - Use Tensorflow Dataset to manage reading data
    - Format data for training
    - Preprocess data for training with feature engineering steps
    - Train the model
    - Evaluate the model
    - Predict with the model
- Enhance the autoencoder:
    - include preprocessing within the model
    - name the model inputs to help with training/serving skew
    - customize the model outputs
    - export and import the model for serving

---
## Colab Setup

When running this notebook in [Colab](https://colab.google/) or [Colab Enterprise](https://cloud.google.com/colab/docs/introduction), this section will authenticate to GCP (follow prompts in the popup) and set the current project for the session.

In [1]:
PROJECT_ID = 'statmike-mlops-349915' # replace with project ID

In [2]:
try:
    from google.colab import auth
    auth.authenticate_user()
    !gcloud config set project {PROJECT_ID}
except Exception:
    pass

---
## Installs and API Enablement

The clients packages may need installing in this environment. 

### Installs (If Needed)

In [3]:
# tuples of (import name, install name, min_version)
packages = [
    ('google.cloud.aiplatform', 'google-cloud-aiplatform'),
    ('google.cloud.bigquery', 'google-cloud-bigquery'),
    ('jax','jax'),
    ('keras', 'keras', '3.6.0'),
    ('tensorflow', 'tensorflow'),
    ('pydot', 'pydot')
]

import importlib
install = False
for package in packages:
    if not importlib.util.find_spec(package[0]):
        print(f'installing package {package[1]}')
        install = True
        !pip install {package[1]} -U -q --user
    elif len(package) == 3:
        if importlib.metadata.version(package[0]) < package[2]:
            print(f'updating package {package[1]}')
            install = True
            !pip install {package[1]} -U -q --user

### Graphviz Install

Plotting the [model structure with Keras](https://keras.io/api/utils/model_plotting_utils/) uses [Graphviz](https://graphviz.org/download/).

This code checks for Graphviz and if missing installs it.

In [4]:
check = !dot -V
if check[0].startswith('dot'):
    print(f'Graphviz installed with version: {check[0]}')
else:
    print('Installing Graphviz...')
    install = !sudo apt-get install graphviz --assume-yes
    print('Completed')

Graphviz installed with version: dot - graphviz version 2.43.0 (0)


### API Enablement

In [5]:
!gcloud services enable aiplatform.googleapis.com

### Restart Kernel (If Installs Occured)

After a kernel restart the code submission can start with the next cell after this one.

In [6]:
if install:
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)
    IPython.display.display(IPython.display.Markdown("""<div class=\"alert alert-block alert-warning\">
        <b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. The previous cells do not need to be run again⚠️</b>
        </div>"""))

---
## Setup

Inputs

In [7]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [8]:
REGION = 'us-central1'
SERIES = 'frameworks-keras'
EXPERIMENT = 'overview'

# Data source for this series of notebooks: Described above
BQ_SOURCE = 'bigquery-public-data.ml_datasets.ulb_fraud_detection'

# make this the BigQuery Project / Dataset / Table prefix to store results
BQ_PROJECT = PROJECT_ID
BQ_DATASET = SERIES.replace('-', '_')
BQ_TABLE = EXPERIMENT
BQ_REGION = REGION[0:2] # use a multi region

Packages

In [9]:
import os

# import keras, set backend prior to first import of keras
os.environ['KERAS_BACKEND'] = 'jax'
import keras
import tensorflow as tf

# Vertex AI
from google.cloud import aiplatform

# BigQuery
from google.cloud import bigquery

2024-12-19 15:00:17.574487: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1734620417.622660 2733624 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1734620417.639216 2733624 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


Check Versions of Packages:

In [10]:
aiplatform.__version__

'1.71.0'

In [11]:
keras.__version__

'3.6.0'

In [12]:
tf.__version__

'2.18.0'

Clients

In [13]:
# vertex ai clients
aiplatform.init(project = PROJECT_ID, location = REGION)

# bigquery client
bq = bigquery.Client(project = PROJECT_ID)

---
## Review Source Data

This is a BigQuery public table of 284,807 credit card transactions classified as fradulant or normal in the column `Class`.
- The data can be researched further at this [Kaggle link](https://www.kaggle.com/mlg-ulb/creditcardfraud).
- Read mode about BigQuery public datasets [here](https://cloud.google.com/bigquery/public-data)

In order protect confidentiality, the original features have been transformed using [principle component analysis (PCA)](https://en.wikipedia.org/wiki/Principal_component_analysis) into 28 features named `V1, V2, ... V28` (float).  Two descriptive features are provided without transformation by PCA:
- `Time` (integer) is the seconds elapsed between the transaction and the earliest transaction in the table
- `Amount` (float) is the value of the transaction
 

### Review BigQuery table:

In [14]:
source_data = bq.query(f'SELECT * FROM `{BQ_SOURCE}` LIMIT 5').to_dataframe()
source_data

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
0,282.0,-0.356466,0.725418,1.971749,0.831343,0.369681,-0.107776,0.75161,-0.120166,-0.420675,...,0.020804,0.424312,-0.015989,0.466754,-0.809962,0.657334,-0.04315,-0.046401,0.0,0
1,14332.0,1.07195,0.340678,1.784068,2.846396,-0.751538,0.403028,-0.73492,0.205807,1.092726,...,-0.169632,-0.113604,0.067643,0.468669,0.223541,-0.112355,0.014015,0.021504,0.0,0
2,32799.0,1.153477,-0.047859,1.358363,1.48062,-1.222598,-0.48169,-0.654461,0.128115,0.907095,...,0.125514,0.480049,-0.025964,0.701843,0.417245,-0.257691,0.060115,0.035332,0.0,0
3,35799.0,-0.769798,0.622325,0.242491,-0.586652,0.527819,-0.104512,0.209909,0.669861,-0.304509,...,0.152738,0.255654,-0.130237,-0.660934,-0.493374,0.331855,-0.011101,0.049089,0.0,0
4,36419.0,1.04796,0.145048,1.624573,2.932652,-0.726574,0.690451,-0.627288,0.278709,0.318434,...,0.078499,0.658942,-0.06781,0.476882,0.52683,0.219902,0.070627,0.028488,0.0,0


In [15]:
source_data.dtypes

Time      float64
V1        float64
V2        float64
V3        float64
V4        float64
V5        float64
V6        float64
V7        float64
V8        float64
V9        float64
V10       float64
V11       float64
V12       float64
V13       float64
V14       float64
V15       float64
V16       float64
V17       float64
V18       float64
V19       float64
V20       float64
V21       float64
V22       float64
V23       float64
V24       float64
V25       float64
V26       float64
V27       float64
V28       float64
Amount    float64
Class       Int64
dtype: object

---
## Prepare Data Source

The data preparation includes adding splits for machine learning with a column named `splits` with 80% for training (`TRAIN`), 10% for validation (`VALIDATE`) and 10% for testing (`TEST`). Additionally, a unique identifier was added to each transaction, `transaction_id`. 

>These steps could be done locally at training but are instead done in the source system, BigQuery in this case, which provides several advantages:
>
>-   **Single Source of Truth:** A single data preparation can benefit multiple model training jobs for different architectures or even different team members working on the same model. This ensures consistency and avoids duplication of effort.
>-   **Leverage BigQuery's Power:** BigQuery is highly optimized for large-scale data processing. Performing these operations directly in BigQuery leverages its distributed processing capabilities, making the preparation significantly faster and more efficient than local processing, especially for massive datasets.
>-   **Reduced Data Movement:** Preparing the data in BigQuery reduces the amount of data that needs to be moved out of BigQuery and into the training environment. This minimizes latency and potential bottlenecks associated with data transfer.
>-   **Data Versioning and Reproducibility:** By preparing the splits and unique ID in BigQuery, the specific dataset used for training can be easily tracked and versioned. This enhances the reproducibility of experiments and makes it easier to understand the provenance of the data used in a particular model.
>-   **Simplified Training Pipeline:** The training pipeline becomes simpler because it can directly read pre-split data from BigQuery, eliminating the need for complex splitting logic within the training code.
>-   **Pre-calculated Joins and Features:** BigQuery can be used to pre-calculate joins and engineer new features that are beneficial for the model. This can improve model performance and further reduce the workload during the training phase.
>
>**Further Considerations:**
>
>-   **Data Governance and Security:** BigQuery offers robust data governance and security features. Performing data preparation within BigQuery allows you to maintain control over access and ensure data quality.
>-   **Scalability:** This approach is highly scalable. As your dataset grows, BigQuery can handle the increased workload without requiring significant changes to your data preparation pipeline.
>-   **Cost Optimization:** While moving large amounts of data out of BigQuery can incur costs, performing the preparation steps within BigQuery and only extracting the necessary data for training can often be more cost-effective.
>
>By preparing the data in BigQuery, you create a streamlined, efficient, and reproducible workflow (pipeline) that leverages the strengths of the platform and sets your machine learning models up for success.


### Create/Recall Dataset

In [16]:
dataset = bigquery.Dataset(f"{BQ_PROJECT}.{BQ_DATASET}")
dataset.location = BQ_REGION
bq_dataset = bq.create_dataset(dataset, exists_ok = True)

### Create/Recall Table With Preparation For ML

Copy the data from the source while adding columns:
- `transaction_id` as a unique identify for the row
    - Use the `GENERATE_UUID()` function
- `splits` column to randomly assign rows to 'TRAIN", "VALIDATE" and "TEST" groups
    - stratified sampling within the levels of `class` by first assigning row numbers within the levels of `class` then using the with a CASE statment to assign the `splits` level.

In [17]:
job = bq.query(f"""
#CREATE OR REPLACE TABLE
CREATE TABLE IF NOT EXISTS 
    `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}` AS
WITH
    add_id AS (
        SELECT *,
            GENERATE_UUID() transaction_id,
            ROW_NUMBER() OVER (PARTITION BY class ORDER BY RAND()) as rn
            FROM `{BQ_SOURCE}`
    )
SELECT * EXCEPT(rn),
    CASE 
        WHEN rn <= 0.8 * COUNT(*) OVER (PARTITION BY class) THEN 'TRAIN'
        WHEN rn <= 0.9 * COUNT(*) OVER (PARTITION BY class) THEN 'VALIDATE'
        ELSE 'TEST'
    END AS splits
FROM add_id
""")
job.result()
(job.ended-job.started).total_seconds()

0.354

In [18]:
raw_sample = bq.query(f'SELECT * FROM `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}` LIMIT 5').to_dataframe()
raw_sample

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V23,V24,V25,V26,V27,V28,Amount,Class,transaction_id,splits
0,43330.0,-1.510308,0.780684,2.085747,3.123133,0.589557,1.301681,-1.300991,-2.345333,0.148722,...,-0.341324,-0.370869,-0.201199,0.423964,0.210998,-0.058616,0.0,0,34a1f550-443c-45f0-858d-94cf0baebfe8,TEST
1,141955.0,-1.585532,1.240411,-0.558349,-0.841657,0.397502,-1.532723,0.388818,0.575896,-0.27837,...,-0.249493,0.001462,-0.045791,-0.110667,0.305249,0.015271,0.0,0,a2dfc3ff-aa20-4742-bb1c-c6f67440e34a,TEST
2,58813.0,-2.524342,-0.313784,2.304085,3.043016,5.495686,-2.600347,-3.779679,-1.042154,-1.014503,...,-6.185491,0.44008,-1.409641,-0.209465,-0.111065,0.116788,0.0,0,d7910b3a-fc5e-417c-8b61-5516cc4dc981,TEST
3,139716.0,-2.006121,2.048709,-0.963971,-1.046216,-0.118273,-1.40003,0.401296,0.728337,0.095944,...,0.059168,-0.113542,0.066162,-0.143905,-0.218363,-0.091904,0.0,0,4de5ecd4-7c82-40dd-a52a-4b5e8b1e8ba3,TEST
4,59183.0,1.175167,-0.0879,0.278759,-0.279214,-0.166236,0.058772,-0.232681,0.192241,-0.014614,...,0.165047,-0.253526,-0.016523,0.890504,-0.0594,-0.013996,0.0,0,ec122bde-5780-4223-9e35-7f26de453512,TEST


### Review the number of records for each level of `Class` for each of the data splits:

In [19]:
bq.query(f"""
SELECT splits, class,
    count(*) as count,
    ROUND(count(*) * 100.0 / SUM(count(*)) OVER (PARTITION BY class), 2) AS percentage
FROM `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}`
GROUP BY splits, class
""").to_dataframe()

Unnamed: 0,splits,class,count,percentage
0,TEST,1,50,10.16
1,TRAIN,1,393,79.88
2,VALIDATE,1,49,9.96
3,TEST,0,28432,10.0
4,TRAIN,0,227452,80.0
5,VALIDATE,0,28431,10.0


---
## Training An Autoencoder

### Source Data

Pandas Dataframe here

Link to alternative sources

In [20]:
train_ds = bq.query(f"SELECT * EXCEPT(splits) FROM `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}` WHERE splits = 'TRAIN'").to_dataframe()
test_ds = bq.query(f"SELECT * EXCEPT(splits) FROM `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}` WHERE splits = 'TEST'").to_dataframe()
validate_ds = bq.query(f"SELECT * EXCEPT(splits) FROM `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}` WHERE splits = 'VALIDATE'").to_dataframe()

In [21]:
train_ds.columns

Index(['Time', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', 'V7', 'V8', 'V9', 'V10',
       'V11', 'V12', 'V13', 'V14', 'V15', 'V16', 'V17', 'V18', 'V19', 'V20',
       'V21', 'V22', 'V23', 'V24', 'V25', 'V26', 'V27', 'V28', 'Amount',
       'Class', 'transaction_id'],
      dtype='object')

In [22]:
# Define column categories
var_class = ['Class']
var_omit = ['transaction_id']
var_numeric = [x for x in train_ds.columns.tolist() if x not in var_class + var_omit]

### Record Reader

A mechanism for reading records of data for training, validation, and testing.

Use tf.data.Dataset

In [23]:
def reader(ds):
    return tf.data.Dataset.from_tensor_slices(dict(ds))

In [24]:
train_read = reader(train_ds)
validate_read = reader(validate_ds)
test_read = reader(test_ds)

2024-12-19 15:01:06.192481: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:152] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)


In [25]:
batch = next(iter(train_read.batch(10).take(1)))
batch.keys()

dict_keys(['Time', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', 'V7', 'V8', 'V9', 'V10', 'V11', 'V12', 'V13', 'V14', 'V15', 'V16', 'V17', 'V18', 'V19', 'V20', 'V21', 'V22', 'V23', 'V24', 'V25', 'V26', 'V27', 'V28', 'Amount', 'Class', 'transaction_id'])

In [26]:
batch['Time']

<tf.Tensor: shape=(10,), dtype=float64, numpy=
array([ 83533., 113822.,  39200.,  61781., 148437., 123042., 127907.,
         1888., 117742., 163935.])>

### Shaping Records With The Reader

In [27]:
def prep_batch(source):
    for k in var_omit + var_class:
        source.pop(k, None)
    numeric_values = tf.stack([source[col] for col in var_numeric], axis=-1)
    return numeric_values, numeric_values

In [28]:
batch, batch = next(iter(train_read.map(prep_batch).batch(10).take(1)))

In [29]:
batch.shape

TensorShape([10, 30])

In [30]:
batch[0]

<tf.Tensor: shape=(30,), dtype=float64, numpy=
array([ 8.35330000e+04,  1.12552389e+00,  1.89853946e-01,  1.50865969e+00,
        2.71993690e+00, -8.82341741e-01,  1.28390258e-01, -6.22650925e-01,
        2.19444433e-01,  2.55370444e-01,  5.21744641e-01, -1.11755689e+00,
       -3.42581768e-01, -9.18035487e-01, -2.24182825e-01, -1.98108066e-01,
        6.65628330e-01, -3.60429717e-01, -8.31641571e-02, -8.35711759e-01,
       -2.25172417e-01, -9.75838741e-02, -1.72683342e-01,  5.75558670e-02,
        3.60859312e-01,  2.65443187e-01, -4.04174990e-02,  3.66861302e-02,
        3.36852774e-02,  0.00000000e+00])>

### Preprocessing Layers

https://keras.io/api/layers/preprocessing_layers/

In [31]:
# Normalization
normalizer = keras.layers.Normalization(axis=-1, name='normalization')

# Adapt the normalizer 
numeric_feature_reader = train_read.map(prep_batch).batch(1000).prefetch(tf.data.AUTOTUNE)
normalizer.adapt(numeric_feature_reader.map(lambda x, _: x))

# Denormalization using normalizer parameters
denormalizer = keras.layers.Normalization(axis = -1, name = 'denormalize', invert = True, mean = normalizer.mean, variance = normalizer.variance)

### Autoencoder

#### Model Layers

In [42]:
autoencoder_input = keras.Input(shape = (len(var_numeric),), name = "autoencoder_input")

encoder = keras.layers.Dense(16, activation='relu', name='enc_dense1', kernel_regularizer = keras.regularizers.l2(0.001))(autoencoder_input)
encoder = keras.layers.Dropout(0.4, name='enc_dropout1')(encoder)
encoder = keras.layers.Dense(8, activation='relu', name='enc_dense2', kernel_regularizer = keras.regularizers.l2(0.001))(encoder)
encoder = keras.layers.Dropout(0.4, name='enc_dropout2')(encoder)

latent = keras.layers.Dense(4, activation='relu', name='latent', kernel_regularizer = keras.regularizers.l2(0.001))(encoder)

decoder = keras.layers.Dense(8, activation='relu', name='dec_dense1', kernel_regularizer = keras.regularizers.l2(0.001))(latent)
decoder = keras.layers.Dropout(0.4, name='dec_dropout1')(decoder)
decoder = keras.layers.Dense(16, activation='relu', name='dec_dense2', kernel_regularizer = keras.regularizers.l2(0.001))(decoder)
decoder = keras.layers.Dropout(0.4, name='dec_dropout2')(decoder)

reconstructed = keras.layers.Dense(len(var_numeric), activation='linear', name='reconstructed')(decoder)

In [43]:
autoencoder = keras.Model(autoencoder_input, reconstructed, name = 'autoencoder')

#### Define Loss Function

The mean absolute error (MAE).

In [44]:
import jax.numpy as jnp
def custom_loss(y_true, y_pred):
    return jnp.mean(jnp.abs(y_true - y_pred), axis=-1)

#### Compile The Model

Include mmetrics to report during training and evaluation.

In [45]:
autoencoder.compile(
    optimizer = keras.optimizers.Adam(learning_rate = 0.0005),
    loss = custom_loss, #keras.losses.MeanAbsoluteError(),
    metrics = [
        keras.metrics.RootMeanSquaredError(name = 'rmse'),
        keras.metrics.MeanSquaredError(name = 'mse'),
        keras.metrics.MeanAbsoluteError(name = 'mae'),
        keras.metrics.MeanSquaredLogarithmicError(name = 'msle')
    ]
)

#### Review The Model

In [46]:
autoencoder.summary()

#### Read And Prepare Training/Validation Data

Use the reader and:
- Shuffle
- Shape
- Normalize
- Batch
- Reshape
- Prefetch

In [47]:
# Data preparation for training
def reshape_data(x, y):
    return tf.reshape(x, (-1, len(var_numeric))), tf.reshape(y, (-1, len(var_numeric)))


# Data preparation for training
train_dataset = train_read \
    .shuffle(buffer_size = len(train_ds), reshuffle_each_iteration = True) \
    .map(prep_batch) \
    .map(lambda x, _: (normalizer(x), normalizer(x))) \
    .batch(100) \
    .map(reshape_data) \
    .prefetch(tf.data.AUTOTUNE)
val_dataset = validate_read \
    .map(prep_batch) \
    .map(lambda x, _: (normalizer(x), normalizer(x))) \
    .batch(100) \
    .map(reshape_data) \
    .prefetch(tf.data.AUTOTUNE)

#### Define Early Stopping Criteria

In [48]:
early_stopping = keras.callbacks.EarlyStopping(
    monitor = "val_loss",  # Monitor validation loss
    patience = 5,          # Number of epochs with no improvement before stopping
    restore_best_weights = True, # using the model that generalized best to the validation set, not the overfitted model from the last epoch
)

#### Train

In [49]:
history = autoencoder.fit(
    train_dataset,
    epochs = 10,
    validation_data = val_dataset,
    callbacks = [early_stopping]
)

Epoch 1/10
[1m2279/2279[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 12ms/step - loss: 0.7071 - mae: 0.6744 - mse: 1.0619 - msle: 0.1462 - rmse: 1.0299 - val_loss: 0.6453 - val_mae: 0.6401 - val_mse: 0.9728 - val_msle: 0.1374 - val_rmse: 0.9863
Epoch 2/10
[1m2279/2279[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m30s[0m 11ms/step - loss: 0.6452 - mae: 0.6407 - mse: 0.9751 - msle: 0.1353 - rmse: 0.9874 - val_loss: 0.6308 - val_mae: 0.6274 - val_mse: 0.9560 - val_msle: 0.1309 - val_rmse: 0.9777
Epoch 3/10
[1m2279/2279[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m32s[0m 12ms/step - loss: 0.6386 - mae: 0.6352 - mse: 0.9739 - msle: 0.1327 - rmse: 0.9868 - val_loss: 0.6309 - val_mae: 0.6276 - val_mse: 0.9564 - val_msle: 0.1312 - val_rmse: 0.9780
Epoch 4/10
[1m2279/2279[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m31s[0m 11ms/step - loss: 0.6374 - mae: 0.6341 - mse: 0.9689 - msle: 0.1325 - rmse: 0.9843 - val_loss: 0.6289 - val_mae: 0.6256 - val_mse: 0.9541 - val_msle: 0.

#### Predict

In [50]:
test_instance = next(iter(test_read.map(prep_batch).batch(2).take(1)))[0] # get first element of tuple for prediction
test_instance

<tf.Tensor: shape=(2, 30), dtype=float64, numpy=
array([[ 4.33300000e+04, -1.51030791e+00,  7.80683602e-01,
         2.08574736e+00,  3.12313283e+00,  5.89556504e-01,
         1.30168094e+00, -1.30099101e+00, -2.34533311e+00,
         1.48722209e-01,  1.06300824e+00, -1.88035726e+00,
         3.10971711e-01,  9.54298638e-01, -1.10646280e+00,
        -7.86015840e-01, -5.55324333e-01,  3.35236557e-01,
         1.85637688e-01,  1.54380629e+00, -6.44864027e-01,
         2.22981820e+00, -4.60620924e-02, -3.41323962e-01,
        -3.70869344e-01, -2.01199298e-01,  4.23963954e-01,
         2.10998182e-01, -5.86163935e-02,  0.00000000e+00],
       [ 1.41955000e+05, -1.58553155e+00,  1.24041103e+00,
        -5.58349101e-01, -8.41657245e-01,  3.97502038e-01,
        -1.53272269e+00,  3.88817678e-01,  5.75896042e-01,
        -2.78369582e-01, -8.56928209e-01, -1.11892547e+00,
         7.37223588e-01,  4.79262560e-01,  7.95002104e-01,
        -3.96344845e-01, -3.62277829e-01,  4.99583980e-02,
      

In [51]:
denormalizer(autoencoder.predict(normalizer(test_instance)))

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 83ms/step


Array([[ 6.63753750e+04, -3.61446470e-01,  1.20810561e-01,
         6.60756111e-01,  3.48414816e-02, -2.17960656e-01,
        -1.63819164e-01, -3.55948396e-02,  7.73874894e-02,
        -1.21010855e-01, -1.23573966e-01,  4.41530533e-02,
         1.33108526e-01,  5.87938819e-03,  2.61603110e-02,
         2.04240218e-01,  8.09435993e-02, -2.89056599e-02,
        -2.40718555e-02, -9.41007305e-03, -3.17713767e-02,
        -5.38093783e-02, -7.44783580e-02, -3.18264663e-02,
         5.11006042e-02,  9.79991332e-02, -6.49984479e-02,
         1.39425742e-02,  2.20378246e-02,  2.30595932e+01],
       [ 1.13567859e+05,  3.81074160e-01,  1.34964641e-02,
        -3.09076011e-01, -1.11767516e-01,  6.75490499e-02,
        -3.35904121e-01,  6.61588833e-02, -1.31024774e-02,
         6.80452585e-03, -6.42847568e-02, -1.08968854e-01,
         1.41189605e-01, -1.10273408e-02,  5.18945828e-02,
        -5.56073189e-02,  6.95755258e-02, -1.03504792e-01,
         2.97438521e-02,  6.64557563e-04, -8.50816667e-

---
## Enchance Autoencoder

### Combine Preprocessing and Model

With Keras models can be layers in new models.  Here a new model combines the preprocessing layers, normalizer and denormalizer, with the autoencoder into a new model object we will call a stacked model.

#### Stacked Model

In [52]:
type(normalizer), type(autoencoder), type(denormalizer)

(keras.src.layers.preprocessing.normalization.Normalization,
 keras.src.models.functional.Functional,
 keras.src.layers.preprocessing.normalization.Normalization)

In [53]:
# Create a new sequential model
stacked_model = keras.Sequential([
    normalizer,
    autoencoder,
    denormalizer
], name='stacked_autoencoder')

#### Predict

In [54]:
test_instance = next(iter(test_read.map(prep_batch).batch(3).take(1)))[0]
stacked_model.predict(test_instance)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 89ms/step


array([[ 6.63753750e+04, -3.61446410e-01,  1.20810553e-01,
         6.60756171e-01,  3.48414816e-02, -2.17960671e-01,
        -1.63819164e-01, -3.55948396e-02,  7.73874819e-02,
        -1.21010855e-01, -1.23573966e-01,  4.41530533e-02,
         1.33108526e-01,  5.87938819e-03,  2.61603128e-02,
         2.04240218e-01,  8.09435993e-02, -2.89056599e-02,
        -2.40718555e-02, -9.41007212e-03, -3.17713767e-02,
        -5.38093783e-02, -7.44783580e-02, -3.18264663e-02,
         5.11006042e-02,  9.79991332e-02, -6.49984479e-02,
         1.39425742e-02,  2.20378246e-02,  2.30595913e+01],
       [ 1.13567867e+05,  3.81074220e-01,  1.34964511e-02,
        -3.09076190e-01, -1.11767538e-01,  6.75490722e-02,
        -3.35904121e-01,  6.61588907e-02, -1.31024858e-02,
         6.80454215e-03, -6.42847493e-02, -1.08968861e-01,
         1.41189605e-01, -1.10273417e-02,  5.18945865e-02,
        -5.56073450e-02,  6.95755258e-02, -1.03504792e-01,
         2.97438577e-02,  6.64558378e-04, -8.50816667e-

In [55]:
reversed_test_instance = tf.reverse(test_instance, axis=[-1])
stacked_model.predict(reversed_test_instance)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 73ms/step


array([[ 6.63649531e+04, -3.61633986e-01,  1.20835721e-01,
         6.60983503e-01,  3.48673873e-02, -2.18024790e-01,
        -1.63774595e-01, -3.56186368e-02,  7.74103850e-02,
        -1.21039025e-01, -1.23589650e-01,  4.41869609e-02,
         1.33104444e-01,  5.88490441e-03,  2.61502825e-02,
         2.04296768e-01,  8.09488893e-02, -2.88888421e-02,
        -2.40808893e-02, -9.41366330e-03, -3.17589156e-02,
        -5.38235717e-02, -7.45193362e-02, -3.18376683e-02,
         5.11051789e-02,  9.80379283e-02, -6.50051609e-02,
         1.39465081e-02,  2.20450107e-02,  2.30601044e+01],
       [ 6.63469375e+04, -3.61958086e-01,  1.20879248e-01,
         6.61376417e-01,  3.49121541e-02, -2.18135610e-01,
        -1.63697556e-01, -3.56597640e-02,  7.74499401e-02,
        -1.21087708e-01, -1.23616755e-01,  4.42455597e-02,
         1.33097395e-01,  5.89443743e-03,  2.61329543e-02,
         2.04394519e-01,  8.09580237e-02, -2.88597811e-02,
        -2.40964945e-02, -9.41986870e-03, -3.17373872e-

#### Review The Model

In [56]:
stacked_model.summary()

In [57]:
stacked_model.summary(expand_nested=True)

### Adding Named Inputs

Named inputs, a dictionary of features
- input dictionary, concatenate features, pass to the stacked model

Why?
- user-friendly!
- prevent training serving skew from feature order causesing serving error


Using an ordered dictionary for the input layers ensures the inputs order as they pass to the concatenation layers is consistent.  The inputs to the model can actually be out of order but this ordering of the input layer will force the consistency needed.

#### Input Layer

In [76]:
from collections import OrderedDict

input_dict = OrderedDict()
for key in var_numeric:
    input_dict[key] = keras.layers.Input(shape=(1,), name=key)

In [77]:
concatenated_inputs = keras.layers.Concatenate(axis = -1)(list(input_dict.values()))

#### Model

In [78]:
final_model = keras.Model(inputs = input_dict, outputs = stacked_model(concatenated_inputs))

#### Data Mapping

In [79]:
def prep_batch_dict(source):
    for k in var_omit + var_class:
        source.pop(k, None)

    # Create an ordered dictionary with empty lists for each key
    result = OrderedDict((key, []) for key in var_numeric)

    # Iterate through each column and append the value to the corresponding list
    for key in var_numeric:
        result[key].append(source[key])

    # Convert lists to tensors
    for key in result:
        result[key] = tf.concat(result[key], axis=0)

    return result

In [80]:
test_instance = next(iter(test_read.map(prep_batch_dict).batch(3).take(1)))

#### Predict

In [81]:
final_model.predict(test_instance)

TypeError: 'NoneType' object is not callable

In [60]:
test_instance

OrderedDict([('Time',
              <tf.Tensor: shape=(3,), dtype=float64, numpy=array([ 43330., 141955.,  58813.])>),
             ('V1',
              <tf.Tensor: shape=(3,), dtype=float64, numpy=array([-1.51030791, -1.58553155, -2.52434169])>),
             ('V2',
              <tf.Tensor: shape=(3,), dtype=float64, numpy=array([ 0.7806836 ,  1.24041103, -0.3137836 ])>),
             ('V3',
              <tf.Tensor: shape=(3,), dtype=float64, numpy=array([ 2.08574736, -0.5583491 ,  2.30408519])>),
             ('V4',
              <tf.Tensor: shape=(3,), dtype=float64, numpy=array([ 3.12313283, -0.84165725,  3.04301624])>),
             ('V5',
              <tf.Tensor: shape=(3,), dtype=float64, numpy=array([0.5895565 , 0.39750204, 5.49568553])>),
             ('V6',
              <tf.Tensor: shape=(3,), dtype=float64, numpy=array([ 1.30168094, -1.53272269, -2.60034741])>),
             ('V7',
              <tf.Tensor: shape=(3,), dtype=float64, numpy=array([-1.30099101,  0.38881768

#### Review The Model

In [61]:
final_model.summary(expand_nested = True)

### Adding Detailed Outputs

## Model Saving And Serving

---
## WIP

---
Customizing the outputs:
    - reconstructed values: normalized, denormalized
    - errors in reconstructed values: normalized, denormalized
    - embedding layer
    - error metrics: MAE, MSE, RMSE    

In [63]:
# Get encoder output
encoder_output = autoencoder.get_layer('enc_dense3').output

In [66]:
# L2 normalize the embedding using Lambda layer with output_shape
normalized_embedding = keras.layers.Lambda(
    lambda x: x / tf.sqrt(tf.reduce_sum(tf.square(x), axis=-1, keepdims=True)),
    output_shape=lambda input_shape: input_shape
)(encoder_output)

In [67]:
# Connect the concatenated inputs to the stacked model
denormalized_output = stacked_model(concatenated_inputs)

In [70]:
# Create outputs for the model
normalized_reconstruction = autoencoder(concatenated_inputs)
denormalized_reconstruction = denormalized_output
# Calculate reconstruction errors using Lambda layers and TensorFlow operations
normalized_reconstruction_error = keras.layers.Lambda(
    lambda x: tf.abs(x[0] - x[1]),
    output_shape=lambda input_shape: input_shape[0]
)([concatenated_inputs, normalized_reconstruction])
denormalized_reconstruction_error = keras.layers.Lambda(
    lambda x: tf.abs(x[0] - x[1]),
    output_shape=lambda input_shape: input_shape[0]
)([denormalizer(concatenated_inputs), denormalized_reconstruction])

In [71]:
# Create the final model with structured output
final_model = keras.Model(
    inputs=input_dict,
    outputs={
        'reconstruction': {
            'normalized': normalized_reconstruction,
            'denormalized': denormalized_reconstruction,
        },
        'reconstruction_errors': {
            'normalized': normalized_reconstruction_error,
            'denormalized': denormalized_reconstruction_error,
        },
        'embedding': {
            'encoding': encoder_output,
            'normalized_encoding': normalized_embedding,
        }
    }
)

In [72]:
test_instance = next(iter(test_read.map(prep_batch_dict).batch(3).take(1)))

In [73]:
final_model.predict(test_instance)

AttributeError: 'str' object has no attribute '_error_repr'

In [75]:
# 1. Prepare a Single Batch with Correct Structure
test_read_dict = test_read.map(prep_batch_dict)
test_batch = next(iter(test_read_dict.batch(3)))  # No need for .take(1)

# Access the dictionary (first element of the tuple)
test_instance_dict = test_batch

# 2. Perform Prediction
predictions = final_model.predict(test_instance_dict)

TypeError: 'NoneType' object is not callable

Notes:
- outputs customized
- more epochs but using the full model
    - or the inner model - how does it update the full
- specific
    - rename middle layer: embedding
    - output normalized version of this
- export/import example
- for dict input convert test instance to actual dictionary:
    - show inference
    - show mis-ordered inference
    - add extra keys, show inference
- supervised (maybe another workflow)
- semi-superfixed with pre-train and fine-tune (maybe another workflow)
- vae (another workflow)
- serving with continual/incrmental learning using mini batch update
- pipeline to train - include experiments, artifacts
    - set hyperparamters
    - train in component
    - train as training job
    - hyperparameter tuning to find best paramters with vizier.  Prepfer pipeline and direct api usage if possible.
- build autoencoder variations
    - fine tune as semi-supervised
    - fine tune as vae
    - fine tune as semi-supervised vae
- introduce timeseries autoencoder
    - fine tune as transformer using rows/time/series as sequence