##### Copyright 2021 The TensorFlow Constrained Optimization Authors. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

> http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

## Problem Setup

In this colab, we'll show how to use the TF Constrained Optimization (TFCO) library with canned TF estimators. We demonstrate this on a simple recall-constrained optimization problem on simulated data: we seek a classifier that minimizes the average hinge loss while constraining recall to be at least 90%.

We'll start with the required imports:

In [18]:
import math
import numpy as np
import os
import shutil
import tensorflow.compat.v2 as tf

In [None]:
# Tensorflow constrained optimization library
!pip install git+https://github.com/google-research/tensorflow_constrained_optimization
import tensorflow_constrained_optimization as tfco

We'll next create a simple simulated dataset by sampling 1000 random 10-dimensional feature vectors from a Gaussian, finding their labels using a random "ground truth" linear model, and then adding noise by randomly flipping 200 labels.

In [11]:
# Create a simulated 10-dimensional training dataset consisting of 1000 labeled
# examples, of which 800 are labeled correctly and 200 are mislabeled.
num_examples = 1000
num_mislabeled_examples = 200
dimension = 10
# We will constrain the recall to be at least 90%.
recall_lower_bound = 0.9

# Create random "ground truth" parameters for a linear model.
ground_truth_weights = np.random.normal(size=dimension) / math.sqrt(dimension)
ground_truth_threshold = 0

# Generate a random set of features for each example.
features = np.random.normal(size=(num_examples, dimension)).astype(
    np.float32) / math.sqrt(dimension)
# Compute the labels from these features given the ground truth linear model.
labels = (np.matmul(features, ground_truth_weights) >
          ground_truth_threshold).astype(np.float32)
# Add noise by randomly flipping num_mislabeled_examples labels.
mislabeled_indices = np.random.choice(
    num_examples, num_mislabeled_examples, replace=False)
labels[mislabeled_indices] = 1 - labels[mislabeled_indices]

# Constant Tensors containing the labels and features.
constant_labels = tf.constant(labels, dtype=tf.float32)
constant_features = tf.constant(features, dtype=tf.float32)

## Training with a Canned Estimator

We now show how to train a canned `LinearEstimator` using the custom head that TFCO provides. For this, we'll need to create function that takes "logits", "labels", "features", and an (optional) "weight_column", and returns a
`RateMinimizationProblem`.

In [12]:
def problem_fn(logits, labels, features, weight_column=None):
  # Minimize error rate s.t. recall >= recall_lower_bound.
  del features, weight_column
  context = tfco.rate_context(logits, labels)
  objective = tfco.error_rate(context)
  constraints = [tfco.recall(context) >= recall_lower_bound]
  return tfco.RateMinimizationProblem(objective, constraints)

Next, we create a  custom `tfco.Head` that wraps around an existing binary classification head.

In [13]:
binary_head = tf.estimator.BinaryClassHead()
head = tfco.HeadV2(binary_head, problem_fn)

All that remains is to set up the input pipeline. We first create `feature_columns` to convert the dataset into a format that can be processed by an estimator.

In [14]:
feature_columns = []
for ii in range(features.shape[-1]):
  feature_columns.append(
      tf.feature_column.numeric_column(str(ii), dtype=tf.float32))

We next construct the input functions that return the data to be used by the estimator for training and evaluation.

In [15]:
def make_input_fn(
    data_df, label_df, num_epochs=10, shuffle=True, batch_size=32):
  def input_fn():
    features_dict = {
        str(ii): features[:, ii] for ii in range(features.shape[-1])}
    ds = tf.data.Dataset.from_tensor_slices((features_dict, labels))
    if shuffle:
      ds = ds.shuffle(1000)
    ds = ds.batch(batch_size).repeat(num_epochs)
    return ds
  return input_fn

train_input_fn = make_input_fn(features, labels, num_epochs=1000)
test_input_fn = make_input_fn(features, labels, num_epochs=1, shuffle=False)

We are now ready to train the estimator. We'll pass the `ProxyLagrangianOptimizer` that TFCO provides to the estimator. We could also instead use a standard TF optimizer here.

In [16]:
# Create a temporary model directory.
model_dir = "tfco_tmp"
if os.path.exists(model_dir):
  shutil.rmtree(model_dir)

# Train estimator with TFCO's custom optimizer.
optimizer = tfco.ProxyLagrangianOptimizer(
    tf.keras.optimizers.Adagrad(1))
estimator = tf.estimator.LinearEstimator(
    head, feature_columns, model_dir=model_dir, 
    optimizer=optimizer)
estimator.train(train_input_fn, steps=1000) 

<tensorflow_estimator.python.estimator.canned.linear.LinearEstimatorV2 at 0x7fbbb02682b0>

In [17]:
estimator.evaluate(test_input_fn)

{'accuracy': 0.664,
 'accuracy_baseline': 0.503,
 'auc': 0.8110252,
 'auc_precision_recall': 0.7720304,
 'average_loss': 0.6005209,
 'label/mean': 0.503,
 'loss': 0.59834015,
 'precision': 0.6129905,
 'prediction/mean': 0.635723,
 'recall': 0.90059644,
 'global_step': 1000}

Notice that the recall is close to 90% as desired.