## Homework 2: TensorFlow Data, Feature, and Keras API Basics
### Erin Akinjide

Goals: 
- Converting a “real” raw dataset into a tfrecord format that enables easy use within the TensorFlow Data API.
- Use keras preprocessing layers to handle basic feature pre-processing, including the embedding of categorical features and normalization of numerical features.
- Build custom preprocessing layers keras.
- Build, tune,and save a small keras model to infer a “real” quantity of interest from the dataset.

## Runs
Check that Scripts Run without errors

In [6]:
import tensorflow as tf
import pandas as pd
import numpy as np
from customImputerLayerDefinition import ImputerLayer
from buildAndTrainModel import parse_example, wrap_to_tf_dataset
import os


## Dataset 
My createSavedDataset.py code creates the correct tfrecords file.

In [2]:
# Check if TFRecord file exists
assert os.path.exists("dataset.tfrecords"), "TFRecord file not found"
print("TFRecord file found")

# Load a few samples to inspect
raw_dataset = tf.data.TFRecordDataset("dataset.tfrecords")
parsed_dataset = raw_dataset.map(parse_example)

# Show one example
for features, label in parsed_dataset.take(1):
    print("Sample features:", {k: v.numpy().shape for k, v in features.items()})
    print("Sample label:", label.numpy())


TFRecord file found
Sample features: {'hour': (), 'month': (), 'tickers': (188,), 'weekday': ()}
Sample label: 12


2025-07-31 14:46:21.093988: I tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


## Imputer

My customImputerLayerDefinition.py code replaces NaNs as explained above.

In [3]:
# Create fake input with NaNs
data = np.array([[1.0, np.nan, 3.0], [0.5, 2.0, np.nan], [np.nan, 1.0, 4.0]], dtype=np.float32)
imputer = ImputerLayer()
imputer.adapt(data)
imputed = imputer(data)

print(" Original data:\n", data)
print(" Imputed data:\n", imputed.numpy())


 Original data:
 [[1.  nan 3. ]
 [0.5 2.  nan]
 [nan 1.  4. ]]
 Imputed data:
 [[1.  1.  3. ]
 [0.5 2.  3. ]
 [0.5 1.  4. ]]


## Loads Data
My buildAndTrainModel.py code loads the data from your tfrecords and processes it to produce a dataset properly.

In [None]:
# Load the exported SavedModel as a layer
reloaded_layer = tf.keras.layers.TFSMLayer("mySavedModel", call_endpoint='serving_default')

# Reconstruct the input structure (must match original model)
tickers_input = tf.keras.layers.Input(shape=(188,), dtype=tf.float32, name="tickers")
weekday_input = tf.keras.layers.Input(shape=(), dtype=tf.int64, name="weekday")
hour_input = tf.keras.layers.Input(shape=(), dtype=tf.int64, name="hour")
month_input = tf.keras.layers.Input(shape=(), dtype=tf.int64, name="month")

# Feed the inputs into the reloaded SavedModel layer
outputs = reloaded_layer({
    "tickers": tickers_input,
    "weekday": weekday_input,
    "hour": hour_input,
    "month": month_input
})

# Now wrap it in a new Model that can be evaluated
reloaded_model = tf.keras.Model(
    inputs=[tickers_input, weekday_input, hour_input, month_input],
    outputs=outputs
)

test_data = wrap_to_tf_dataset(X_test, y_test)

# Now you can evaluate
loss, acc = reloaded_model.evaluate(test_data)
print(f"✅ Reloaded model test accuracy: {acc:.4f}")


AttributeError: 'TFSMLayer' object has no attribute 'evaluate'