# Custom metrics with `Keras`
In this notebook, we implement a `CustomPrecision`  metric by subclassing the base `Metric` class in `keras`.

`keras` has a `keras.metrics.Precision` class, which we will use to check our results.


In [1]:
import keras
from keras.metrics import Metric
import tensorflow as tf
import numpy as np

2025-05-12 09:50:17.177802: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-05-12 09:50:17.185614: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1747057817.194325  168378 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1747057817.196884  168378 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1747057817.203752  168378 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking 

## Define custom metric classes

In [2]:
# Test some code
m = Metric()

l = m.add_weight(initializer="zeros")
l.assign_add(1)
l.assign_add(5)

2025-05-12 09:50:19.283558: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
2025-05-12 09:50:19.283570: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:178] verbose logging is disabled. Rerun with verbose logging (usually --v=1 or --vmodule=cuda_diagnostics=1) to get more diagnostic output from this module
2025-05-12 09:50:19.283573: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:183] retrieving CUDA diagnostic information for host: PC-Linux-Mint
2025-05-12 09:50:19.283574: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:190] hostname: PC-Linux-Mint
2025-05-12 09:50:19.283617: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:197] libcuda reported version is: 550.163.1
2025-05-12 09:50:19.283623: I external/local_xla/xla/stream_executor/cuda/cuda_diagnostics.cc:201] kernel reported version is: 550

<tf.Tensor: shape=(), dtype=float32, numpy=6.0>

For custom metrics, the key method is `update_state`, which expects `y_true` and `y_pred` as its inputs.

It's important to verify that these two inputs have the right shape.  If not (for example, if y)

In [3]:
# from keras.src.metrics.metrics_utils import squeeze_or_expand_to_same_rank

class CustomPrecision(keras.metrics.Metric):
    def __init__(self, name="custom_precision", **kwargs):
        super().__init__(name=name, **kwargs)
        self.pred_pos = self.add_variable(shape=(), initializer="zeros")
        self.tp = self.add_variable(shape=(), initializer="zeros")
    
    def update_state(self, y_true, y_pred, sample_weight=None):
        # Print the shape to debug if necessary
        # print("y_pred shape", y_pred.shape)
        # print("y_true shape", y_true.shape)
        # y_pred, y_true = squeeze_or_expand_to_same_rank(y_pred, y_true)

        # Convert logits or probabilities to binary predictions if needed
        y_pred = tf.cast(y_pred > 0.5, tf.int32)

        # update pred_pos and true_pos, assuming pos = 1, neg = 0
        self.pred_pos.assign_add(tf.reduce_sum(y_pred))
        self.tp.assign_add(
            tf.reduce_sum(
                tf.cast((y_pred == 1) & (y_true == 1), tf.int32) # must cast to int because tf.reduce_sum doesn't take boolean tensors as inputs
                )
            )
    
    def result(self):
        return tf.cast(
            tf.math.divide_no_nan(self.tp, self.pred_pos), tf.float32)

    # def reset_states(self):
    #     self.pred_pos.assign(0.0)
    #     self.tp.assign(0.0)

### Test custom precision with simple input

In [4]:
vals = [0, 1]
seed = 17
rng = np.random.default_rng(seed=seed)
test_true = rng.choice(vals, size = 5000)
test_pred = rng.choice(vals, size = 5000)

In [5]:
cp = CustomPrecision(name="test")
cp.update_state(test_true, test_pred)
cp.result()

<tf.Tensor: shape=(), dtype=float32, numpy=0.5076121687889099>

In [6]:
rp = keras.metrics.Precision(name="real_precision")
rp.update_state(test_true, test_pred)
rp.result()

<tf.Tensor: shape=(), dtype=float32, numpy=0.5076121687889099>

In [7]:
test_true2 = rng.choice(vals, size = 5000)
test_pred2 = rng.choice(vals, size = 5000)


In [8]:
cp.update_state(test_true2, test_pred2)
rp.update_state(test_true2, test_pred2)

In [9]:
cp.result()

<tf.Tensor: shape=(), dtype=float32, numpy=0.49677419662475586>

In [10]:
rp.result()

<tf.Tensor: shape=(), dtype=float32, numpy=0.49677419662475586>

## Test with simple classification model

We will use the MNIST dataset.  To perform a binary classification, we will classify whether a handwritten digit is greater or equal to 5.

In [11]:
# Load data
(X_train, y_train), (X_test, y_test) = keras.datasets.mnist.load_data()

# Reclassify the labels by greater or equal to 5
y_train = np.where(y_train >= 5, 1, 0)
y_test = np.where(y_test >=5, 1, 0)

# Standardize and convert to float
X_train = X_train / 255.
X_test = X_test / 255.

# Reshape X input
X_train = X_train.reshape(-1, 28*28)
X_test = X_test.reshape(-1, 28*28)

# Reshape y <--- This is an important step to make sure y_pred and y_true have compatible shapes when calculating custom metric
y_train = y_train.reshape(-1,1)
y_test = y_test.reshape(-1,1)


In [12]:
nn = keras.models.Sequential([
    keras.layers.Dense(300, activation="relu", input_shape=(784,)),
    keras.layers.Dense(100, activation="relu"),
    keras.layers.Dense(1, activation="sigmoid")
])

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [13]:
nn.compile(
    loss="binary_crossentropy",
    optimizer="sgd",
    metrics=[CustomPrecision(), keras.metrics.Precision()])

I find that setting verbose=2 gives more reliable training metrics than verbose=auto.  In particular, the training metrics that are printed on the screen would be consistent with those that are saved into `history`.

In [14]:
history = nn.fit(
    X_train, y_train,
    epochs=5,
    validation_split=0.2,
    verbose=2)

Epoch 1/5
1500/1500 - 3s - 2ms/step - custom_precision: 0.8390 - loss: 0.3545 - precision: 0.8390 - val_custom_precision: 0.9254 - val_loss: 0.2022 - val_precision: 0.9254
Epoch 2/5
1500/1500 - 3s - 2ms/step - custom_precision: 0.9375 - loss: 0.1683 - precision: 0.9375 - val_custom_precision: 0.9565 - val_loss: 0.1291 - val_precision: 0.9565
Epoch 3/5
1500/1500 - 3s - 2ms/step - custom_precision: 0.9563 - loss: 0.1212 - precision: 0.9563 - val_custom_precision: 0.9691 - val_loss: 0.1055 - val_precision: 0.9691
Epoch 4/5
1500/1500 - 2s - 2ms/step - custom_precision: 0.9641 - loss: 0.1000 - precision: 0.9641 - val_custom_precision: 0.9739 - val_loss: 0.0980 - val_precision: 0.9739
Epoch 5/5
1500/1500 - 3s - 2ms/step - custom_precision: 0.9686 - loss: 0.0859 - precision: 0.9686 - val_custom_precision: 0.9736 - val_loss: 0.0865 - val_precision: 0.9736


We see that our `custom_precision` metric is exactly the same as the default `precision` metric in `keras`.