##### Copyright 2018 The TensorFlow Constrained Optimization Authors. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

> http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

## Problem Setup

This is a simple example of recall-constrained optimization on simulated data: we seek a classifier that minimizes the average hinge loss while constraining recall to be at least 90%.

We'll start with the required imports&mdash;notice the definition of `tfco`:

In [None]:
import math
import numpy as np
from six.moves import xrange
import tensorflow as tf

# Use the GitHub version of TFCO
!pip install git+https://github.com/google-research/tensorflow_constrained_optimization
import tensorflow_constrained_optimization as tfco

We'll next create a simple simulated dataset by sampling 1000 random 10-dimensional feature vectors from a Gaussian, finding their labels using a random \"ground truth\" linear model, and then adding noise by randomly flipping 200 labels.

In [None]:
# Create a simulated 10-dimensional training dataset consisting of 1000 labeled
# examples, of which 800 are labeled correctly and 200 are mislabeled.
num_examples = 1000
num_mislabeled_examples = 200
dimension = 10
# We will constrain the recall to be at least 90%.
recall_lower_bound = 0.9

# Create random "ground truth" parameters for a linear model.
ground_truth_weights = np.random.normal(size=dimension) / math.sqrt(dimension)
ground_truth_threshold = 0

# Generate a random set of features for each example.
features = np.random.normal(size=(num_examples, dimension)).astype(
    np.float32) / math.sqrt(dimension)
# Compute the labels from these features given the ground truth linear model.
labels = (np.matmul(features, ground_truth_weights) >
          ground_truth_threshold).astype(np.float32)
# Add noise by randomly flipping num_mislabeled_examples labels.
mislabeled_indices = np.random.choice(
    num_examples, num_mislabeled_examples, replace=False)
labels[mislabeled_indices] = 1 - labels[mislabeled_indices]

# Constant Tensors containing the labels and features.
constant_labels = tf.constant(labels, dtype=tf.float32)
constant_features = tf.constant(features, dtype=tf.float32)

We're almost ready to construct and train our model, but first we'll create a couple of functions to measure performance. We're interested in two quantities: the average hinge loss (which we seek to minimize), and the recall (which we constrain).

In [None]:
def average_hinge_loss(labels, predictions):
  # Recall that the labels are binary (0 or 1).
  signed_labels = (labels * 2) - 1
  return np.mean(np.maximum(0.0, 1.0 - signed_labels * predictions))

def recall(labels, predictions):
  # Recall that the labels are binary (0 or 1).
  positive_count = np.sum(labels)
  true_positives = labels * (predictions > 0)
  true_positive_count = np.sum(true_positives)
  return true_positive_count / positive_count

## Constructing and Optimizing the Model

The first step is to create the [KerasPlaceholder](https://github.com/google-research/tensorflow_constrained_optimization/tree/master/tensorflow_constrained_optimization/python/rates/keras.py)s that we'll need. Even in eager mode, these objects act similarly to graph-mode placeholders, in that they initially contain no values, but will be filled-in later (when the Keras loss function is called).

They're parameterized by a function that takes the same parameters as a Keras loss function (prediction and labels), and returns the Tensor that the placeholder should represent. In this case, tfco_predictions returns the predictions themselves, and tfco_labels returns the labels themselves, but in more complex settings, one might need to extract multiple different quantities (e.g. protected class information, the predictions of a baseline model, etc.) from the labels.

In [None]:
tfco_predictions = tfco.KerasPlaceholder(lambda _, y_pred: y_pred)
tfco_labels = tfco.KerasPlaceholder(lambda y_true, _: y_true)

The main motivation of TFCO is to make it easy to create and optimize constrained problems written in terms of linear combinations of *rates*, where a "rate" is the proportion of training examples on which an event occurs (e.g. the false positive rate, which is the number of negatively-labeled examples on which the model makes a positive prediction, divided by the number of negatively-labeled examples). Our current example (minimizing a hinge relaxation of the error rate subject to a recall constraint) is such a problem.

Using the placeholders defined above, we are now able to define the problem to optimize. The [KerasLayer](https://github.com/google-research/tensorflow_constrained_optimization/tree/master/tensorflow_constrained_optimization/python/rates/keras.py) interface is similar to the [RateMinimizationProblem](https://github.com/google-research/tensorflow_constrained_optimization/tree/master/tensorflow_constrained_optimization/python/rates/rate_minimization_problem.py) interface, in that its two main parameters are the expression to minimize, and a list of constraints. Unlike a [RateMinimizationProblem](https://github.com/google-research/tensorflow_constrained_optimization/tree/master/tensorflow_constrained_optimization/python/rates/rate_minimization_problem.py), however, it also requires a list of all placeholders that are required by its inputs.

In [None]:
context = tfco.rate_context(predictions=tfco_predictions, labels=tfco_labels)
tfco_layer = tfco.KerasLayer(
    tfco.error_rate(context), [tfco.recall(context) >= recall_lower_bound],
    placeholders=[tfco_predictions, tfco_labels])

A [KerasLayer](https://github.com/google-research/tensorflow_constrained_optimization/tree/master/tensorflow_constrained_optimization/python/rates/keras.py) plays two roles.


1.   It defines the optimization problem, in terms of an objective and constraints. To this end, it also contains the loss function that should be passed to Keras' Model.compile() method.
2.   It also contains the internal state needed by TFCO. For this reason, it must be included somewhere in the Keras model. It doesn't matter *where* it's included, since from the perspective of the model, it's an identity function. However, it must be included *somewhere*, so that the internal TFCO state will be updated during optimization.

We now construct our model. As in [README.md](https://github.com/google-research/tensorflow_constrained_optimization/tree/master/README.md), we're using a linear model with a bias. Notice that we include tfco_layer in the Sequential model, which ensures that the TFCO internal state will be updated during optimization. We also pass tfco_layer.loss to the Model.compile() function, which causes us to optimize the correct constrained objective. The placeholders that we constructed earlier will be filled-in when tfco_layer.loss() is called.

In [None]:
# You can put the tfco.KerasLayer anywhere in the sequence--its only purpose is
# to contain the slack variables, denominators, Lagrange multipliers, and loss.
# It's a NO-OP (more accurately, an identity function) as far as the model is
# concerned.
model = tf.keras.Sequential([
    tf.keras.layers.Dense(1, activation=None, input_shape=(dimension,)),
    tfco_layer
])

# Notice that we take the loss function from the tfco.KerasLayer, instead of
# using tf.keras.losses.Hinge(), as we did above.
model.compile(
    optimizer=tf.keras.optimizers.Adagrad(learning_rate=1.0),
    loss=tfco_layer.loss)
model.fit(constant_features, constant_labels, epochs=1000, verbose=0)

trained_predictions = np.ndarray.flatten(model.predict(features))
print("Constrained average hinge loss = %f" %
      average_hinge_loss(labels, trained_predictions))
print("Constrained recall = %f" % recall(labels, trained_predictions))

Constrained average hinge loss = 0.745828
Constrained recall = 0.902299


As we hoped, the recall is extremely close to 90%&mdash;and, thanks to the fact that the optimizer uses a (hinge) proxy constraint only when needed, and the actual (zero-one) constraint whenever possible, this is the *true* recall, not a hinge approximation.

### Unconstrained Model

For comparison, let's try optimizing the same problem *without* the recall constraint:

In [None]:
model = tf.keras.Sequential([
    tf.keras.layers.Dense(1, activation=None, input_shape=(dimension,))
])

model.compile(
    optimizer=tf.keras.optimizers.Adagrad(learning_rate=1.0),
    loss=tf.keras.losses.Hinge())
model.fit(constant_features, constant_labels, epochs=1000, verbose=0)

trained_predictions = np.ndarray.flatten(model.predict(features))
print("Unconstrained average hinge loss = %f" % average_hinge_loss(
    labels, trained_predictions))
print("Unconstrained recall = %f" % recall(labels, trained_predictions))

Unconstrained average hinge loss = 0.630985
Unconstrained recall = 0.781609


Because there is no constraint, the unconstrained problem does a better job of minimizing the average hinge loss, but naturally doesn't approach 90% recall.