This notebook trains a surrogate model using TensorFlow.  The resulting model is then saved and converted to a TensorFlowJS format.  The model is then loaded and used to make predictions on new data on a webpage.

To run this notebooks, use the `ws-env` virtual environment, which can be built using the `environment.yml` file located in the same directory as this notebook.

In [1]:
# imports
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from sklearn.model_selection import train_test_split

In [18]:
# Read in training data
df = pd.read_csv('ExampleData.csv')
# Split the X and y variables
X = df[["age_effect", "initial_effect", "final_effect", "mort_effect", "prod_effect", "fert_effect", "discount_rate"]].values
y = df[["NPV"]].values

In [19]:
# train-test split for model evaluation
# X_train, X_test, y_train, y_test = train_test_split(
#     X, y, train_size=0.7, shuffle=True
# )
# In this case, don't split since it's important
# to use the whole sample to ensure the outer edges
# of the parameter space are covered
X_train = X
y_train = y

In [20]:
# Layer setting
num_input = X.shape[1]
num_hidden1 = 10 * num_input
num_hidden2 = 10 * num_input
num_output = y.shape[1]
layers_dim = [num_input, num_hidden1, num_hidden2, num_output]
print("Dimensions of each layer are {}".format(layers_dim))

Dimensions of each layer are [7, 70, 70, 1]


In [38]:
# Define the neural network
# We use [Keras](https://www.tensorflow.org/guide/keras) to define the
# neural network
# Create a normalization layer
# norm_layer = tf.keras.layers.Normalization(input_shape=[num_input,], axis=None)
norm_layer = tf.keras.layers.Normalization(input_shape=[num_input,], axis=-1)
# Initialize the weights
# Adapt the layer to your training data
norm_layer.adapt(X_train)
# Initialize the weights
initializer = tf.keras.initializers.HeUniform()

nn = tf.keras.Sequential(
    [
        # norm_layer,
        keras.layers.Input(shape=(num_input,)),
        keras.layers.Dense(num_hidden2, activation="gelu", kernel_initializer=initializer),
        keras.layers.Dense(num_hidden2, activation="gelu", kernel_initializer=initializer),
        keras.layers.Dense(num_hidden2, activation="gelu", kernel_initializer=initializer),
        keras.layers.Dense(num_hidden2, activation="gelu", kernel_initializer=initializer),
        keras.layers.Dense(num_hidden2, activation="gelu", kernel_initializer=initializer),
        keras.layers.Dense(num_hidden2, activation="gelu", kernel_initializer=initializer),
        keras.layers.Dense(num_hidden2, activation="gelu", kernel_initializer=initializer),
        keras.layers.Dense(num_output, kernel_initializer=initializer),
    ]
)
print(nn.summary())

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_32 (Dense)            (None, 70)                560       
                                                                 
 dense_33 (Dense)            (None, 70)                4970      
                                                                 
 dense_34 (Dense)            (None, 70)                4970      
                                                                 
 dense_35 (Dense)            (None, 70)                4970      
                                                                 
 dense_36 (Dense)            (None, 70)                4970      
                                                                 
 dense_37 (Dense)            (None, 70)                4970      
                                                                 
 dense_38 (Dense)            (None, 70)               



In [39]:
loss_fn = tf.keras.losses.MeanSquaredError()
optimizer = tf.keras.optimizers.legacy.Adam(learning_rate=0.001)
nn.compile(optimizer=optimizer,
              loss=loss_fn,
              metrics=['mean_squared_error'])

In [40]:
# It's important to use at least 10_000 epochs to ensure the model
# fit is good
nn.fit(X_train, y_train, epochs=20_000)

Epoch 1/20000
Epoch 2/20000
Epoch 3/20000
Epoch 4/20000
Epoch 5/20000
Epoch 6/20000
Epoch 7/20000
Epoch 8/20000
Epoch 9/20000
Epoch 10/20000
Epoch 11/20000
Epoch 12/20000
Epoch 13/20000
Epoch 14/20000
Epoch 15/20000
Epoch 16/20000
Epoch 17/20000
Epoch 18/20000
Epoch 19/20000
Epoch 20/20000
Epoch 21/20000
Epoch 22/20000
Epoch 23/20000
Epoch 24/20000
Epoch 25/20000
Epoch 26/20000
Epoch 27/20000
Epoch 28/20000
Epoch 29/20000
Epoch 30/20000
Epoch 31/20000
Epoch 32/20000
Epoch 33/20000
Epoch 34/20000
Epoch 35/20000
Epoch 36/20000
Epoch 37/20000
Epoch 38/20000
Epoch 39/20000
Epoch 40/20000
Epoch 41/20000
Epoch 42/20000
Epoch 43/20000
Epoch 44/20000
Epoch 45/20000
Epoch 46/20000
Epoch 47/20000
Epoch 48/20000
Epoch 49/20000
Epoch 50/20000
Epoch 51/20000
Epoch 52/20000
Epoch 53/20000
Epoch 54/20000
Epoch 55/20000
Epoch 56/20000
Epoch 57/20000
Epoch 58/20000
Epoch 59/20000
Epoch 60/20000
Epoch 61/20000
Epoch 62/20000
Epoch 63/20000
Epoch 64/20000
Epoch 65/20000
Epoch 66/20000
Epoch 67/20000
Epoc

<keras.src.callbacks.History at 0x3bd9261d0>

In [36]:
nn.evaluate(X_train,  y_train, verbose=2)

89/89 - 0s - loss: 0.9868 - mean_squared_error: 0.9868 - 48ms/epoch - 544us/step


[0.9868072867393494, 0.9868072867393494]

In [37]:
# See how well the model is doing in terms of predictions
# on our dataset
predictions = nn.predict(X)
# add predictions to original df
df["NPV_pred"] = predictions[:, 0]
df["NPV_diff"] = df["NPV"] - df["NPV_pred"]
print('The maximum difference between the actual NPV and the predicted NPV is: {}'.format(df["NPV_diff"].max()))
print('The minimum difference between the actual NPV and the predicted NPV is: {}'.format(df["NPV_diff"].min()))
print('The mean absolute difference between the actual NPV and the predicted NPV is: {}'.format(np.absolute(df["NPV_diff"]).mean()))
print('The S.D. in the predicted value is: {}'.format(df["NPV_pred"].std()))

The maximum difference between the actual NPV and the predicted NPV is: 6.4286822802606025
The minimum difference between the actual NPV and the predicted NPV is: -7.19815594749344
The mean absolute difference between the actual NPV and the predicted NPV is: 0.6546459950993563
The S.D. in the predicted value is: 64.36851501464844


In [None]:
# predict a single value
y_pred = nn.predict(np.array([40, 10, 10, 0.5, 0.0, 0.0, 0.02]).reshape(1, 7))
print(y_pred)

In [29]:
# save full model - not just weights
tf.keras.Model.save(nn, "SL_model_full.h5", save_format="h5")

  saving_api.save_model(


In [30]:
!tensorflowjs_converter --input_format=keras SL_model_full.h5 ./tf_model/