## Feature Engineering using Keras Lambda Layers for complete training pipeline.

Often for structured data problems we end up using multiple libraries for preprocessing or feature engineering. We can go as far as having a full ML training pipeline using different libraries for example Pandas for reading data and also feature engineeering, sklearn for encoding features for example OneHot encoding and Normalization. The estimator might be an sklearn classifier, xgboost or it can for example be a Keras model. In the latter case, we would end up with artifacts for feature engineering and encoding and also different artifacts for the saved model. The pipeline is also disconnected and an extra step is needed to feed encoded data to the Keras model. For this step the data can be mapped from a dataframe to something like tf.data.Datasets type or numpy array before feeding it to a Keras model.

In this post we will consider implementing a training pipeline natively with Keras/Tensorflow. From loading data with tf.data. As the the title suggested we will use Lambda layers for feature engineering. These engineered features will be stateless. For stateful preprocessing we could use something like Keras preprocessing layers. We will end up with a training pipeline where feature engineering will be part of the network architecture and can be persisted and loaded for inference as standalone.

Keep in mind that tf.keras.layers.Lambda layers have (de)serialization limitations because Lambda layers are saved by serializing the python bytecode.

Steps we will follow:
- Load data with tf.data
- Create Input layer
- Create feature layer using Lambda layers
- Train model

### Example

For the example below we will use the heart disease dataset. Lets import tensorflow and read in the data:

In [1]:
import tensorflow as tf
  
binary_features = ['sex', 'fbs', 'exang']
numeric_features =  ['trestbps', 'chol', 'thalach', 'oldpeak', 'slope', 'cp', 'restecg', 'ca']
categoric_features = ['thal']

dtype_mapper = {
        'age': tf.float32,
        'sex': tf.float32,
        'cp': tf.float32,
        'trestbps': tf.float32,
        'chol': tf.float32,
        'fbs': tf.float32,
        'restecg': tf.float32,
        'thalach': tf.float32,
        'exang': tf.float32,
        'oldpeak': tf.float32,
        'slope': tf.float32,
        'ca': tf.float32,
        'thal': tf.string
}

heart_dir = tf.keras.utils.get_file("heart.csv", origin="http://storage.googleapis.com/download.tensorflow.org/data/heart.csv")

dataset = tf.data.experimental.make_csv_dataset(
      heart_dir,
      batch_size=64,
      label_name='target',
      num_epochs=10
)

2022-01-22 18:33:22.765978: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


#### Create a dictionary of Input objects for each feature:


In [2]:
def create_inputs(data_type_mapper):
    """Create model inputs
    Args:
        data_type_mapper (dict): Dictionary with feature as key and dtype as value
                                 For example {'age': tf.float32, ...}
    Returns:
        (dict): Keras inputs for each feature
    """
    return {feature: tf.keras.Input(shape=(1,), name=feature, dtype=dtype)\
        for feature, dtype in data_type_mapper.items()}

feature_layer_inputs = create_inputs(dtype_mapper)

#### We will be using Lambda layers in this example for feature engineering. Below are the functions that we will use to create our engineered features.

In [4]:
# Define functions for engineered features

def square(x):
  """apply sqaure of feature"""
  return x ** 2

def ratio(x):
  """compute the ratio between two numeric features"""
  return x[0] / x[1]

def cross_feature(x):
  """compute the crossing of two features"""
  return tf.cast(x[0] * x[1], dtype = tf.float32)

def age_and_gender(x):
  """check if age gt 50 and if gender is male"""
  return tf.cast(
    tf.math.logical_and(x[0] > 50, x[1] == 1), dtype = tf.float32
  )

def is_fixed(x):
  """encode categoric feature if value is equal to fixed"""
  return tf.cast(x == 'fixed', dtype = tf.float32)

def is_reversible(x):
  """encode categoric feature if value is equal to fixed"""
  return tf.cast(x == 'reversible', dtype = tf.float32)

def is_normal(x):
  """encode categoric feature if value is equal to fixed"""
  return tf.cast(x == 'normal', dtype = tf.float32)

### Now that we have our functions lets create the features as Lambda layers:

In [5]:
# The features based on thal is similar to one-hot encoding. 
# Here we only illustrate using lambda layers

is_fixed = tf.keras.layers.Lambda(is_fixed)(
   feature_layer_inputs['thal']
)

is_normal = tf.keras.layers.Lambda(is_normal)(
   feature_layer_inputs['thal']
)

is_reversible = tf.keras.layers.Lambda(is_reversible)(
   feature_layer_inputs['thal']
)

age_and_gender = tf.keras.layers.Lambda(age_and_gender)(
    (feature_layer_inputs['age'], feature_layer_inputs['sex'])
)

age = tf.keras.layers.Lambda(lambda x: tf.cast(x > 50, dtype = tf.float32))(
    feature_layer_inputs['age']
)

trest_chol_ratio = tf.keras.layers.Lambda(ratio, name='trest_chol_ratio')(
   (feature_layer_inputs['trestbps'], feature_layer_inputs['chol'])
)

trest_cross_thalach = tf.keras.layers.Lambda(cross_feature)(
   (feature_layer_inputs['trestbps'], feature_layer_inputs['thalach'])
)

#### All our engineered feature layers are created and we can now combine it with our other features

In [6]:
# concat all newly created features into one layer
lambda_feature_layer = tf.keras.layers.concatenate(
    [is_fixed, is_normal, is_reversible, 
     age, age_and_gender, trest_chol_ratio, trest_cross_thalach]
)

numeric_feature_layer = tf.keras.layers.concatenate(
    [feature_layer_inputs[feature] for feature in numeric_features]
)

binary_feature_layer = tf.keras.layers.concatenate(
    [feature_layer_inputs[feature] for feature in binary_features]
)

# Add the rest of features
feature_layer = tf.keras.layers.concatenate(
    [lambda_feature_layer, numeric_feature_layer, binary_feature_layer]
)

#### Our last step is to create and fit our Keras model. For this example we will use a simple model architecture. We will persist the model and load it for inference.

In [8]:
# setup model, this is basically Logistic regression
x = tf.keras.layers.BatchNormalization()(feature_layer)
output = tf.keras.layers.Dense(1, activation='sigmoid')(x)
model = tf.keras.Model(inputs=feature_layer_inputs, outputs=output)
model.compile(
  loss=tf.keras.losses.BinaryCrossentropy(),
  optimizer=tf.keras.optimizers.Adam(learning_rate=0.01),
  metrics=[tf.keras.metrics.BinaryAccuracy(name='accuracy'), 
           tf.keras.metrics.AUC(name='auc')]
)

model.fit(dataset, epochs=10)

# save model
tf.keras.models.save_model(model, "lambda_layered_model")

# load model for inference
loaded_model = tf.keras.models.load_model("lambda_layered_model")
loaded_model.evaluate(dataset)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


2022-01-22 18:36:16.318756: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.


INFO:tensorflow:Assets written to: lambda_layered_model/assets


[0.2968518137931824, 0.867986798286438, 0.9341182708740234]

#### To conclude we were able to successfully build a model using Keras Lambda layers. This model was saved and loaded for inference. Our feature engineering is part of our saved model(model architecture). Everything natively in Keras.