In [1]:
!pip install adanet

Collecting adanet
[?25l  Downloading https://files.pythonhosted.org/packages/04/c4/11ac106b2f8946ebe1940ebe26ef4dd212d655c4a2e28bbcc3b5312268e4/adanet-0.3.0-py2.py3-none-any.whl (65kB)
[K    100% |################################| 71kB 353kB/s ta 0:00:01
Installing collected packages: adanet
Successfully installed adanet-0.3.0
[33mYou are using pip version 18.0, however version 18.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


##### Copyright 2018 The AdaNet Authors.

In [0]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# The AdaNet objective

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/adanet/blob/master/adanet/examples/tutorials/adanet_objective.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/adanet/blob/master/adanet/examples/tutorials/adanet_objective.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>

One of key contributions from *AdaNet: Adaptive Structural Learning of Neural
Networks* [[Cortes et al., ICML 2017](https://arxiv.org/abs/1607.01097)] is
defining an algorithm that aims to directly minimize the DeepBoost
generalization bound from *Deep Boosting*
[[Cortes et al., ICML 2014](http://proceedings.mlr.press/v32/cortesb14.pdf)]
when applied to neural networks. This algorithm, called **AdaNet**, adaptively
grows a neural network as an ensemble of subnetworks that minimizes the AdaNet
objective (a.k.a. AdaNet loss):

$$F(w) = \frac{1}{m} \sum_{i=1}^{m} \Phi \left(\sum_{j=1}^{N}w_jh_j(x_i), y_i \right) + \sum_{j=1}^{N} \left(\lambda r(h_j) + \beta \right) |w_j| $$

where $w$ is the set of mixture weights, one per subnetwork $h$,
$\Phi$ is a surrogate loss function such as logistic loss or MSE, $r$ is a
function for measuring a subnetwork's complexity, and $\lambda$ and $\beta$
are hyperparameters.

## Mixture weights

So what are mixture weights? When forming an ensemble $f$ of subnetworks $h$,
we need to somehow combine the their predictions. This is done by multiplying
the outputs of subnetwork $h_i$ with mixture weight $w_i$, and summing the
results:

$$f(x) = \sum_{j=1}^{N}w_jh_j(x)$$

In practice, most commonly used set of mixture weight is **uniform average
weighting**:

$$f(x) = \frac{1}{N}\sum_{j=1}^{N}h_j(x)$$

However, we can also solve a convex optimization problem to learn the mixture
weights that minimize the loss function $\Phi$:

$$F(w) = \frac{1}{m} \sum_{i=1}^{m} \Phi \left(\sum_{j=1}^{N}w_jh_j(x_i), y_i \right)$$

This is the first term in the AdaNet objective. The second term applies L1
regularization to the mixture weights:

$$\sum_{j=1}^{N} \left(\lambda r(h_j) + \beta \right) |w_j|$$

When $\lambda > 0$ this penalty serves to prevent the optimization from
assigning too much weight to more complex subnetworks according to the
complexity measure function $r$.

## How AdaNet uses the objective

This objective function serves two purposes:

1.  To **learn to scale/transform the outputs of each subnetwork $h$** as part
    of the ensemble.
2.  To **select the best candidate subnetwork $h$** at each AdaNet iteration
    to include in the ensemble.

Effectively, when learning mixture weights $w$, AdaNet solves a convex
combination of the outputs of the frozen subnetworks $h$. For $\lambda >0$,
AdaNet penalizes more complex subnetworks with greater L1 regularization on
their mixture weight, and will be less likely to select more complex subnetworks
to add to the ensemble at each iteration.

In this tutorial, in you will observe the benefits of using AdaNet to learn the
ensemble's mixture weights and to perform candidate selection.



In [2]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import functools

import adanet
import tensorflow as tf

# The random seed to use.
RANDOM_SEED = 42

## Boston Housing dataset

In this example, we will solve a regression task known as the [Boston Housing dataset](https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html) to predict the price of suburban houses in Boston, MA in the 1970s. There are 13 numerical features, the labels are in thousands of dollars, and there are only 506 examples.


## Download the data
Conveniently, the data is available via Keras:

In [3]:
(x_train, y_train), (x_test, y_test) = (
    tf.keras.datasets.boston_housing.load_data())

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/boston_housing.npz


In [11]:
print(x_test.shape)
print(x_test[0])
print(y_test.shape)
print(y_test[0])

(102, 13)
[ 18.0846   0.      18.1      0.       0.679    6.434  100.       1.8347
  24.     666.      20.2     27.25    29.05  ]
(102,)
7.2


## Supply the data in TensorFlow

Our first task is to supply the data in TensorFlow. Using the
tf.estimator.Estimator convention, we will define a function that returns an
input_fn which returns feature and label Tensors.

We will also use the tf.data.Dataset API to feed the data into our models.

Also, as a preprocessing step, we will apply `tf.log1p` to log-scale the
features and labels for improved numerical stability during training. To recover
the model's predictions in the correct scale, you can apply `tf.math.expm1` to the
prediction.

In [12]:
FEATURES_KEY = "x"


def input_fn(partition, training, batch_size):
  """Generate an input function for the Estimator."""

  def _input_fn():

    if partition == "train":
      dataset = tf.data.Dataset.from_tensor_slices(({
          FEATURES_KEY: tf.log1p(x_train)
      }, tf.log1p(y_train)))
    else:
      dataset = tf.data.Dataset.from_tensor_slices(({
          FEATURES_KEY: tf.log1p(x_test)
      }, tf.log1p(y_test)))

    # We call repeat after shuffling, rather than before, to prevent separate
    # epochs from blending together.
    if training:
      dataset = dataset.shuffle(10 * batch_size, seed=RANDOM_SEED).repeat()

    dataset = dataset.batch(batch_size)
    iterator = dataset.make_one_shot_iterator()
    features, labels = iterator.get_next()
    return features, labels

  return _input_fn

## Define the subnetwork generator

Let's define a subnetwork generator similar to the one in
[[Cortes et al., ICML 2017](https://arxiv.org/abs/1607.01097)] and in
`simple_dnn.py` which creates two candidate fully-connected neural networks at
each iteration with the same width, but one an additional hidden layer. To make
our generator *adaptive*, each subnetwork will have at least the same number
of hidden layers as the most recently added subnetwork to the
`previous_ensemble`.

We define the complexity measure function $r$ to be $r(h) = \sqrt{d(h)}$, where
$d$ is the number of hidden layers in the neural network $h$, to approximate the
Rademacher bounds from
[[Golowich et. al, 2017](https://arxiv.org/abs/1712.06541)]. So subnetworks
with more hidden layers, and therefore more capacity, will have more heavily
regularized mixture weights.

In [13]:
_NUM_LAYERS_KEY = "num_layers"


class _SimpleDNNBuilder(adanet.subnetwork.Builder):
  """Builds a DNN subnetwork for AdaNet."""

  def __init__(self, optimizer, layer_size, num_layers, learn_mixture_weights,
               seed):
    """Initializes a `_DNNBuilder`.

    Args:
      optimizer: An `Optimizer` instance for training both the subnetwork and
        the mixture weights.
      layer_size: The number of nodes to output at each hidden layer.
      num_layers: The number of hidden layers.
      learn_mixture_weights: Whether to solve a learning problem to find the
        best mixture weights, or use their default value according to the
        mixture weight type. When `False`, the subnetworks will return a no_op
        for the mixture weight train op.
      seed: A random seed.

    Returns:
      An instance of `_SimpleDNNBuilder`.
    """

    self._optimizer = optimizer
    self._layer_size = layer_size
    self._num_layers = num_layers
    self._learn_mixture_weights = learn_mixture_weights
    self._seed = seed

  def build_subnetwork(self,
                       features,
                       logits_dimension,
                       training,
                       iteration_step,
                       summary,
                       previous_ensemble=None):
    """See `adanet.subnetwork.Builder`."""

    input_layer = tf.to_float(features[FEATURES_KEY])
    kernel_initializer = tf.glorot_uniform_initializer(seed=self._seed)
    last_layer = input_layer
    for _ in range(self._num_layers):
      last_layer = tf.layers.dense(
          last_layer,
          units=self._layer_size,
          activation=tf.nn.relu,
          kernel_initializer=kernel_initializer)
    logits = tf.layers.dense(
        last_layer,
        units=logits_dimension,
        kernel_initializer=kernel_initializer)

    persisted_tensors = {_NUM_LAYERS_KEY: tf.constant(self._num_layers)}
    return adanet.Subnetwork(
        last_layer=last_layer,
        logits=logits,
        complexity=self._measure_complexity(),
        persisted_tensors=persisted_tensors)

  def _measure_complexity(self):
    """Approximates Rademacher complexity as the square-root of the depth."""
    return tf.sqrt(tf.to_float(self._num_layers))

  def build_subnetwork_train_op(self, subnetwork, loss, var_list, labels,
                                iteration_step, summary, previous_ensemble):
    """See `adanet.subnetwork.Builder`."""
    return self._optimizer.minimize(loss=loss, var_list=var_list)

  def build_mixture_weights_train_op(self, loss, var_list, logits, labels,
                                     iteration_step, summary):
    """See `adanet.subnetwork.Builder`."""

    if not self._learn_mixture_weights:
      return tf.no_op()
    return self._optimizer.minimize(loss=loss, var_list=var_list)

  @property
  def name(self):
    """See `adanet.subnetwork.Builder`."""

    if self._num_layers == 0:
      # A DNN with no hidden layers is a linear model.
      return "linear"
    return "{}_layer_dnn".format(self._num_layers)


class SimpleDNNGenerator(adanet.subnetwork.Generator):
  """Generates a two DNN subnetworks at each iteration.

  The first DNN has an identical shape to the most recently added subnetwork
  in `previous_ensemble`. The second has the same shape plus one more dense
  layer on top. This is similar to the adaptive network presented in Figure 2 of
  [Cortes et al. ICML 2017](https://arxiv.org/abs/1607.01097), without the
  connections to hidden layers of networks from previous iterations.
  """

  def __init__(self,
               optimizer,
               layer_size=32,
               learn_mixture_weights=False,
               seed=None):
    """Initializes a DNN `Generator`.

    Args:
      optimizer: An `Optimizer` instance for training both the subnetwork and
        the mixture weights.
      layer_size: Number of nodes in each hidden layer of the subnetwork
        candidates. Note that this parameter is ignored in a DNN with no hidden
        layers.
      learn_mixture_weights: Whether to solve a learning problem to find the
        best mixture weights, or use their default value according to the
        mixture weight type. When `False`, the subnetworks will return a no_op
        for the mixture weight train op.
      seed: A random seed.

    Returns:
      An instance of `Generator`.
    """

    self._seed = seed
    self._dnn_builder_fn = functools.partial(
        _SimpleDNNBuilder,
        optimizer=optimizer,
        layer_size=layer_size,
        learn_mixture_weights=learn_mixture_weights)

  def generate_candidates(self, previous_ensemble, iteration_number,
                          previous_ensemble_reports, all_reports):
    """See `adanet.subnetwork.Generator`."""

    num_layers = 0
    seed = self._seed
    if previous_ensemble:
      num_layers = tf.contrib.util.constant_value(
          previous_ensemble.weighted_subnetworks[
              -1].subnetwork.persisted_tensors[_NUM_LAYERS_KEY])
    if seed is not None:
      seed += iteration_number
    return [
        self._dnn_builder_fn(num_layers=num_layers, seed=seed),
        self._dnn_builder_fn(num_layers=num_layers + 1, seed=seed),
    ]

## Train and evaluate

Next we create an `adanet.Estimator` using the `SimpleDNNGenerator` we just defined.

In this section we will show the effects of two hyperparamters: **learning mixture weights** and **complexity regularization**.

On the righthand side you will be able to play with the hyperparameters of this model. Until you reach the end of this section, we ask that you not change them. 

At first we will not learn the mixture weights, using their default initial value. Here they will be scalars initialized to $1/N$ where $N$ is the number of subnetworks in the ensemble, effectively creating a **uniform average ensemble**.

In [14]:
#@title AdaNet parameters
LEARNING_RATE = 0.001  #@param {type:"number"}
TRAIN_STEPS = 100000  #@param {type:"integer"}
BATCH_SIZE = 32  #@param {type:"integer"}

LEARN_MIXTURE_WEIGHTS = False  #@param {type:"boolean"}
ADANET_LAMBDA = 0  #@param {type:"number"}
BOOSTING_ITERATIONS = 5  #@param {type:"integer"}


def train_and_evaluate(learn_mixture_weights=LEARN_MIXTURE_WEIGHTS,
                       adanet_lambda=ADANET_LAMBDA):
  """Trains an `adanet.Estimator` to predict housing prices."""

  estimator = adanet.Estimator(
      # Since we are predicting housing prices, we'll use a regression
      # head that optimizes for MSE.
      head=tf.contrib.estimator.regression_head(
          loss_reduction=tf.losses.Reduction.SUM_OVER_BATCH_SIZE),

      # Define the generator, which defines our search space of subnetworks
      # to train as candidates to add to the final AdaNet model.
      subnetwork_generator=SimpleDNNGenerator(
          optimizer=tf.train.RMSPropOptimizer(learning_rate=LEARNING_RATE),
          learn_mixture_weights=learn_mixture_weights,
          seed=RANDOM_SEED),

      # Lambda is a the strength of complexity regularization. A larger
      # value will penalize more complex subnetworks.
      adanet_lambda=adanet_lambda,

      # The number of train steps per iteration.
      max_iteration_steps=TRAIN_STEPS // BOOSTING_ITERATIONS,

      # The evaluator will evaluate the model on the full training set to
      # compute the overall AdaNet loss (train loss + complexity
      # regularization) to select the best candidate to include in the
      # final AdaNet model.
      evaluator=adanet.Evaluator(
          input_fn=input_fn("train", training=False, batch_size=BATCH_SIZE)),

      # Configuration for Estimators.
      config=tf.estimator.RunConfig(
          save_checkpoints_steps=50000,
          save_summary_steps=50000,
          tf_random_seed=RANDOM_SEED))

  # Train and evaluate using using the tf.estimator tooling.
  train_spec = tf.estimator.TrainSpec(
      input_fn=input_fn("train", training=True, batch_size=BATCH_SIZE),
      max_steps=TRAIN_STEPS)
  eval_spec = tf.estimator.EvalSpec(
      input_fn=input_fn("test", training=False, batch_size=BATCH_SIZE),
      steps=None)
  return tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)


def ensemble_architecture(result):
  """Extracts the ensemble architecture from evaluation results."""

  architecture = result["architecture/adanet/ensembles"]
  # The architecture is a serialized Summary proto for TensorBoard.
  summary_proto = tf.summary.Summary.FromString(architecture)
  return summary_proto.value[0].tensor.string_val[0]


results, _ = train_and_evaluate()
print("Loss:", results["average_loss"])
print("Architecture:", ensemble_architecture(results))

INFO:tensorflow:Using config: {'_save_checkpoints_secs': None, '_experimental_distribute': None, '_service': None, '_task_id': 0, '_is_chief': True, '_master': '', '_evaluation_master': '', '_train_distribute': None, '_model_dir': '/tmp/tmplcezpthw', '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f8074e7df28>, '_keep_checkpoint_every_n_hours': 10000, '_global_id_in_cluster': 0, '_keep_checkpoint_max': 5, '_save_checkpoints_steps': 50000, '_tf_random_seed': 42, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_protocol': None, '_device_fn': None, '_save_summary_steps': 50000, '_num_ps_replicas': 0, '_eval_distribute': None, '_num_worker_replicas': 1, '_log_step_count_steps': 100, '_task_type': 'worker'}
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkp

INFO:tensorflow:global_step/sec: 393.635
INFO:tensorflow:loss = 0.020565277, step = 6701 (0.254 sec)
INFO:tensorflow:global_step/sec: 451.012
INFO:tensorflow:loss = 0.020751141, step = 6801 (0.220 sec)
INFO:tensorflow:global_step/sec: 471.442
INFO:tensorflow:loss = 0.016550886, step = 6901 (0.213 sec)
INFO:tensorflow:global_step/sec: 378.272
INFO:tensorflow:loss = 0.026294494, step = 7001 (0.264 sec)
INFO:tensorflow:global_step/sec: 452.051
INFO:tensorflow:loss = 0.025629986, step = 7101 (0.221 sec)
INFO:tensorflow:global_step/sec: 468.433
INFO:tensorflow:loss = 0.017876476, step = 7201 (0.213 sec)
INFO:tensorflow:global_step/sec: 457.513
INFO:tensorflow:loss = 0.048108123, step = 7301 (0.219 sec)
INFO:tensorflow:global_step/sec: 341.219
INFO:tensorflow:loss = 0.025422232, step = 7401 (0.296 sec)
INFO:tensorflow:global_step/sec: 395.504
INFO:tensorflow:loss = 0.025654687, step = 7501 (0.252 sec)
INFO:tensorflow:global_step/sec: 366.815
INFO:tensorflow:loss = 0.012629356, step = 7601 (0

INFO:tensorflow:global_step/sec: 476.546
INFO:tensorflow:loss = 0.027412493, step = 14801 (0.210 sec)
INFO:tensorflow:global_step/sec: 500.457
INFO:tensorflow:loss = 0.042520046, step = 14901 (0.203 sec)
INFO:tensorflow:global_step/sec: 496.861
INFO:tensorflow:loss = 0.026533308, step = 15001 (0.200 sec)
INFO:tensorflow:global_step/sec: 482.884
INFO:tensorflow:loss = 0.01399735, step = 15101 (0.206 sec)
INFO:tensorflow:global_step/sec: 473.638
INFO:tensorflow:loss = 0.024882462, step = 15201 (0.210 sec)
INFO:tensorflow:global_step/sec: 493.16
INFO:tensorflow:loss = 0.024065059, step = 15301 (0.204 sec)
INFO:tensorflow:global_step/sec: 471.448
INFO:tensorflow:loss = 0.010977234, step = 15401 (0.212 sec)
INFO:tensorflow:global_step/sec: 479.813
INFO:tensorflow:loss = 0.020695545, step = 15501 (0.208 sec)
INFO:tensorflow:global_step/sec: 478.489
INFO:tensorflow:loss = 0.023899324, step = 15601 (0.209 sec)
INFO:tensorflow:global_step/sec: 483.478
INFO:tensorflow:loss = 0.026056474, step = 

INFO:tensorflow:Saving 'checkpoint_path' summary for global step 20000: /tmp/tmplcezpthw/model.ckpt-20000
INFO:tensorflow:Loss for final step: 0.033959076.
INFO:tensorflow:Starting ensemble evaluation for iteration 0
INFO:tensorflow:Restoring parameters from /tmp/tmplcezpthw/model.ckpt-20000
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
INFO:tensorflow:Encountered end of input after 14 evaluations
INFO:tensorflow:Computed ensemble metrics: adanet_loss/linear = 0.035089, adanet_loss/1_layer_dnn = 0.020768
INFO:tensorflow:Finished ensemble evaluation for iteration 0
INFO:tensorflow:'1_layer_dnn' at index 1 is moving onto the next iteration
INFO:tensorflow:Freezing best ensemble to /tmp/tmplcezpthw/frozen/ensemble-0.meta
INFO:tensorflow:Restoring parameters from /tmp/tmplcezpthw/model.ckpt-20000
INFO:tensorflow:Importing frozen ensemble from /tmp/tmplcezpthw/frozen/ensemble-0.meta with features: ['x'].
INFO:tensorflow:Overwriting checkpoint with new gr

INFO:tensorflow:global_step/sec: 360.759
INFO:tensorflow:loss = 0.015690364, step = 25901 (0.279 sec)
INFO:tensorflow:global_step/sec: 377.939
INFO:tensorflow:loss = 0.022406287, step = 26001 (0.263 sec)
INFO:tensorflow:global_step/sec: 415.156
INFO:tensorflow:loss = 0.025258576, step = 26101 (0.244 sec)
INFO:tensorflow:global_step/sec: 440.884
INFO:tensorflow:loss = 0.02043756, step = 26201 (0.224 sec)
INFO:tensorflow:global_step/sec: 453.824
INFO:tensorflow:loss = 0.014347989, step = 26301 (0.220 sec)
INFO:tensorflow:global_step/sec: 406.719
INFO:tensorflow:loss = 0.026024658, step = 26401 (0.249 sec)
INFO:tensorflow:global_step/sec: 421.758
INFO:tensorflow:loss = 0.0362145, step = 26501 (0.235 sec)
INFO:tensorflow:global_step/sec: 437.915
INFO:tensorflow:loss = 0.019093065, step = 26601 (0.230 sec)
INFO:tensorflow:global_step/sec: 447.336
INFO:tensorflow:loss = 0.013939275, step = 26701 (0.222 sec)
INFO:tensorflow:global_step/sec: 402.339
INFO:tensorflow:loss = 0.01612249, step = 26

INFO:tensorflow:global_step/sec: 406.421
INFO:tensorflow:loss = 0.014301172, step = 34001 (0.243 sec)
INFO:tensorflow:global_step/sec: 394.751
INFO:tensorflow:loss = 0.016645, step = 34101 (0.253 sec)
INFO:tensorflow:global_step/sec: 398.683
INFO:tensorflow:loss = 0.016692562, step = 34201 (0.254 sec)
INFO:tensorflow:global_step/sec: 391.976
INFO:tensorflow:loss = 0.012835465, step = 34301 (0.256 sec)
INFO:tensorflow:global_step/sec: 388.637
INFO:tensorflow:loss = 0.014207549, step = 34401 (0.256 sec)
INFO:tensorflow:global_step/sec: 399.82
INFO:tensorflow:loss = 0.011994822, step = 34501 (0.250 sec)
INFO:tensorflow:global_step/sec: 389.151
INFO:tensorflow:loss = 0.020384066, step = 34601 (0.254 sec)
INFO:tensorflow:global_step/sec: 438.082
INFO:tensorflow:loss = 0.021298494, step = 34701 (0.228 sec)
INFO:tensorflow:global_step/sec: 394.358
INFO:tensorflow:loss = 0.022692181, step = 34801 (0.254 sec)
INFO:tensorflow:global_step/sec: 448.772
INFO:tensorflow:loss = 0.019764349, step = 34

INFO:tensorflow:Saving candidate '2_layer_dnn' dict for global step 40000: architecture/adanet/ensembles = b"\nr\n;adanet/iteration_1/ensemble_2_layer_dnn/architecture/adanetB)\x08\x07\x12\x00B#| b'1_layer_dnn' | b'2_layer_dnn' |J\x08\n\x06\n\x04text", average_loss/adanet/adanet_weighted_ensemble = 0.03574329, average_loss/adanet/subnetwork = 0.03454547, average_loss/adanet/uniform_average_ensemble = 0.03574329, label/mean/adanet/adanet_weighted_ensemble = 3.1049454, label/mean/adanet/subnetwork = 3.1049454, label/mean/adanet/uniform_average_ensemble = 3.1049454, loss/adanet/adanet_weighted_ensemble = 0.04744883, loss/adanet/subnetwork = 0.04466034, loss/adanet/uniform_average_ensemble = 0.04744883, prediction/mean/adanet/adanet_weighted_ensemble = 3.148061, prediction/mean/adanet/subnetwork = 3.1383395, prediction/mean/adanet/uniform_average_ensemble = 3.148061
INFO:tensorflow:Finished evaluation at 2018-11-19-07:26:07
INFO:tensorflow:Saving dict for global step 40000: average_loss = 

INFO:tensorflow:global_step/sec: 379.79
INFO:tensorflow:loss = 0.010589315, step = 44201 (0.263 sec)
INFO:tensorflow:global_step/sec: 368.266
INFO:tensorflow:loss = 0.008432313, step = 44301 (0.272 sec)
INFO:tensorflow:global_step/sec: 344.976
INFO:tensorflow:loss = 0.009699082, step = 44401 (0.291 sec)
INFO:tensorflow:global_step/sec: 425.395
INFO:tensorflow:loss = 0.010581549, step = 44501 (0.234 sec)
INFO:tensorflow:global_step/sec: 416.583
INFO:tensorflow:loss = 0.01785604, step = 44601 (0.236 sec)
INFO:tensorflow:global_step/sec: 414.673
INFO:tensorflow:loss = 0.013925884, step = 44701 (0.246 sec)
INFO:tensorflow:global_step/sec: 364.821
INFO:tensorflow:loss = 0.009575358, step = 44801 (0.269 sec)
INFO:tensorflow:global_step/sec: 406.694
INFO:tensorflow:loss = 0.019161768, step = 44901 (0.246 sec)
INFO:tensorflow:global_step/sec: 428.92
INFO:tensorflow:loss = 0.0071926797, step = 45001 (0.233 sec)
INFO:tensorflow:global_step/sec: 423.532
INFO:tensorflow:loss = 0.013663843, step = 

INFO:tensorflow:loss = 0.010002465, step = 52201 (0.246 sec)
INFO:tensorflow:global_step/sec: 379.137
INFO:tensorflow:loss = 0.013323818, step = 52301 (0.261 sec)
INFO:tensorflow:global_step/sec: 431.415
INFO:tensorflow:loss = 0.009106889, step = 52401 (0.232 sec)
INFO:tensorflow:global_step/sec: 436.398
INFO:tensorflow:loss = 0.01559157, step = 52501 (0.229 sec)
INFO:tensorflow:global_step/sec: 391.383
INFO:tensorflow:loss = 0.009623263, step = 52601 (0.258 sec)
INFO:tensorflow:global_step/sec: 382.939
INFO:tensorflow:loss = 0.014402029, step = 52701 (0.259 sec)
INFO:tensorflow:global_step/sec: 388.51
INFO:tensorflow:loss = 0.018671855, step = 52801 (0.258 sec)
INFO:tensorflow:global_step/sec: 378.029
INFO:tensorflow:loss = 0.010615724, step = 52901 (0.265 sec)
INFO:tensorflow:global_step/sec: 417.32
INFO:tensorflow:loss = 0.022925321, step = 53001 (0.240 sec)
INFO:tensorflow:global_step/sec: 441.391
INFO:tensorflow:loss = 0.0099629145, step = 53101 (0.228 sec)
INFO:tensorflow:global_

INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmplcezpthw/model.ckpt-60000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving candidate 'previous_ensemble' dict for global step 60000: architecture/adanet/ensembles = b"\nc\n,adanet/previous_ensemble/architecture/adanetB)\x08\x07\x12\x00B#| b'1_layer_dnn' | b'2_layer_dnn' |J\x08\n\x06\n\x04text", average_loss/adanet/adanet_weighted_ensemble = 0.03574329, average_loss/adanet/subnetwork = 0.03454547, average_loss/adanet/uniform_average_ensemble = 0.03574329, label/mean/adanet/adanet_weighted_ensemble = 3.1049454, label/mean/adanet/subnetwork = 3.1049454, label/mean/adanet/uniform_average_ensemble = 3.1049454, loss/adanet/adanet_weighted_ensemble = 0.04744883, loss/adanet/subnetwork = 0.04466034, loss/adanet/uniform_average_ensemble = 0.04744883, prediction/mean/adanet/adanet_weighted_ensemble = 3.148061, prediction/mean/adanet/subnetwork = 3.138339

INFO:tensorflow:loss = 0.014049293, step = 62201 (0.262 sec)
INFO:tensorflow:global_step/sec: 403.624
INFO:tensorflow:loss = 0.012692098, step = 62301 (0.246 sec)
INFO:tensorflow:global_step/sec: 421.775
INFO:tensorflow:loss = 0.009777548, step = 62401 (0.237 sec)
INFO:tensorflow:global_step/sec: 421.404
INFO:tensorflow:loss = 0.015690956, step = 62501 (0.237 sec)
INFO:tensorflow:global_step/sec: 383.937
INFO:tensorflow:loss = 0.008675326, step = 62601 (0.261 sec)
INFO:tensorflow:global_step/sec: 386.342
INFO:tensorflow:loss = 0.013752129, step = 62701 (0.259 sec)
INFO:tensorflow:global_step/sec: 421.059
INFO:tensorflow:loss = 0.010698263, step = 62801 (0.237 sec)
INFO:tensorflow:global_step/sec: 436.797
INFO:tensorflow:loss = 0.010378689, step = 62901 (0.229 sec)
INFO:tensorflow:global_step/sec: 382.163
INFO:tensorflow:loss = 0.009665085, step = 63001 (0.263 sec)
INFO:tensorflow:global_step/sec: 396.868
INFO:tensorflow:loss = 0.0077294107, step = 63101 (0.253 sec)
INFO:tensorflow:glob

INFO:tensorflow:global_step/sec: 424.886
INFO:tensorflow:loss = 0.01797392, step = 70301 (0.236 sec)
INFO:tensorflow:global_step/sec: 412.407
INFO:tensorflow:loss = 0.011372922, step = 70401 (0.245 sec)
INFO:tensorflow:global_step/sec: 436.412
INFO:tensorflow:loss = 0.005085878, step = 70501 (0.229 sec)
INFO:tensorflow:global_step/sec: 431.976
INFO:tensorflow:loss = 0.008556084, step = 70601 (0.230 sec)
INFO:tensorflow:global_step/sec: 396.757
INFO:tensorflow:loss = 0.009044853, step = 70701 (0.252 sec)
INFO:tensorflow:global_step/sec: 427.659
INFO:tensorflow:loss = 0.0126294475, step = 70801 (0.236 sec)
INFO:tensorflow:global_step/sec: 433.657
INFO:tensorflow:loss = 0.013419818, step = 70901 (0.228 sec)
INFO:tensorflow:global_step/sec: 426.989
INFO:tensorflow:loss = 0.0065733306, step = 71001 (0.237 sec)
INFO:tensorflow:global_step/sec: 412.139
INFO:tensorflow:loss = 0.011789247, step = 71101 (0.241 sec)
INFO:tensorflow:global_step/sec: 408.635
INFO:tensorflow:loss = 0.008844627, step

INFO:tensorflow:loss = 0.013359522, step = 78301 (0.274 sec)
INFO:tensorflow:global_step/sec: 389.34
INFO:tensorflow:loss = 0.011084627, step = 78401 (0.257 sec)
INFO:tensorflow:global_step/sec: 390.246
INFO:tensorflow:loss = 0.010920718, step = 78501 (0.255 sec)
INFO:tensorflow:global_step/sec: 382.927
INFO:tensorflow:loss = 0.010683222, step = 78601 (0.261 sec)
INFO:tensorflow:global_step/sec: 409.348
INFO:tensorflow:loss = 0.008939888, step = 78701 (0.245 sec)
INFO:tensorflow:global_step/sec: 364.25
INFO:tensorflow:loss = 0.008002801, step = 78801 (0.276 sec)
INFO:tensorflow:global_step/sec: 412.499
INFO:tensorflow:loss = 0.01741761, step = 78901 (0.242 sec)
INFO:tensorflow:global_step/sec: 429.949
INFO:tensorflow:loss = 0.009349164, step = 79001 (0.231 sec)
INFO:tensorflow:global_step/sec: 383.631
INFO:tensorflow:loss = 0.011067858, step = 79101 (0.261 sec)
INFO:tensorflow:global_step/sec: 426.309
INFO:tensorflow:loss = 0.009572687, step = 79201 (0.236 sec)
INFO:tensorflow:global_s

INFO:tensorflow:loss = 0.012014208, step = 80201 (0.242 sec)
INFO:tensorflow:global_step/sec: 364.276
INFO:tensorflow:loss = 0.008635065, step = 80301 (0.275 sec)
INFO:tensorflow:global_step/sec: 366.59
INFO:tensorflow:loss = 0.012590489, step = 80401 (0.273 sec)
INFO:tensorflow:global_step/sec: 418.17
INFO:tensorflow:loss = 0.008656274, step = 80501 (0.239 sec)
INFO:tensorflow:global_step/sec: 391.518
INFO:tensorflow:loss = 0.0081290435, step = 80601 (0.256 sec)
INFO:tensorflow:global_step/sec: 365.457
INFO:tensorflow:loss = 0.007441818, step = 80701 (0.275 sec)
INFO:tensorflow:global_step/sec: 397.09
INFO:tensorflow:loss = 0.011507606, step = 80801 (0.250 sec)
INFO:tensorflow:global_step/sec: 330.144
INFO:tensorflow:loss = 0.012954185, step = 80901 (0.306 sec)
INFO:tensorflow:global_step/sec: 412.795
INFO:tensorflow:loss = 0.014048508, step = 81001 (0.241 sec)
INFO:tensorflow:global_step/sec: 406.829
INFO:tensorflow:loss = 0.013463169, step = 81101 (0.247 sec)
INFO:tensorflow:global_

INFO:tensorflow:global_step/sec: 437.248
INFO:tensorflow:loss = 0.013460955, step = 88301 (0.227 sec)
INFO:tensorflow:global_step/sec: 401.017
INFO:tensorflow:loss = 0.007128556, step = 88401 (0.253 sec)
INFO:tensorflow:global_step/sec: 380.569
INFO:tensorflow:loss = 0.005910028, step = 88501 (0.263 sec)
INFO:tensorflow:global_step/sec: 378.059
INFO:tensorflow:loss = 0.009571411, step = 88601 (0.263 sec)
INFO:tensorflow:global_step/sec: 284.202
INFO:tensorflow:loss = 0.009896141, step = 88701 (0.350 sec)
INFO:tensorflow:global_step/sec: 266.092
INFO:tensorflow:loss = 0.017198602, step = 88801 (0.376 sec)
INFO:tensorflow:global_step/sec: 351.454
INFO:tensorflow:loss = 0.022983724, step = 88901 (0.285 sec)
INFO:tensorflow:global_step/sec: 382.425
INFO:tensorflow:loss = 0.019097468, step = 89001 (0.261 sec)
INFO:tensorflow:global_step/sec: 382.878
INFO:tensorflow:loss = 0.011339415, step = 89101 (0.262 sec)
INFO:tensorflow:global_step/sec: 380.846
INFO:tensorflow:loss = 0.010781945, step 

INFO:tensorflow:loss = 0.005483456, step = 96301 (0.353 sec)
INFO:tensorflow:global_step/sec: 391.645
INFO:tensorflow:loss = 0.010100526, step = 96401 (0.255 sec)
INFO:tensorflow:global_step/sec: 412.007
INFO:tensorflow:loss = 0.011245234, step = 96501 (0.242 sec)
INFO:tensorflow:global_step/sec: 383.114
INFO:tensorflow:loss = 0.006841069, step = 96601 (0.261 sec)
INFO:tensorflow:global_step/sec: 338.838
INFO:tensorflow:loss = 0.01420257, step = 96701 (0.298 sec)
INFO:tensorflow:global_step/sec: 332.931
INFO:tensorflow:loss = 0.008054491, step = 96801 (0.298 sec)
INFO:tensorflow:global_step/sec: 396.436
INFO:tensorflow:loss = 0.006289351, step = 96901 (0.249 sec)
INFO:tensorflow:global_step/sec: 369.889
INFO:tensorflow:loss = 0.012853891, step = 97001 (0.273 sec)
INFO:tensorflow:global_step/sec: 375.088
INFO:tensorflow:loss = 0.013619079, step = 97101 (0.264 sec)
INFO:tensorflow:global_step/sec: 389.18
INFO:tensorflow:loss = 0.00617637, step = 97201 (0.260 sec)
INFO:tensorflow:global_s

These hyperparameters preduce a model that achieves **0.0348** MSE on the test
set. Notice that the ensemble is composed of 5 subnetworks, each one a hidden
layer deeper than the previous. The most complex subnetwork is made of 5 hidden
layers.

Since `SimpleDNNGenerator` produces subnetworks of varying complexity, and our
model gives each one an equal weight, AdaNet selected the subnetwork that most
lowered the ensemble's training loss at each iteration, likely the one with the
most hidden layers, since it has the most capacity, and we aren't penalizing
more complex subnetworks (yet).

Next, instead of assigning equal weight to each subnetwork, let's learn the
mixture weights as a convex optimization problem using SGD:

In [15]:
#@test {"skip": true}
results, _ = train_and_evaluate(learn_mixture_weights=True)
print("Loss:", results["average_loss"])
print("Uniform average loss:", results["average_loss/adanet/uniform_average_ensemble"])
print("Architecture:", ensemble_architecture(results))

INFO:tensorflow:Using config: {'_save_checkpoints_secs': None, '_experimental_distribute': None, '_service': None, '_task_id': 0, '_is_chief': True, '_master': '', '_evaluation_master': '', '_train_distribute': None, '_model_dir': '/tmp/tmpsbdccn23', '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f8029968a90>, '_keep_checkpoint_every_n_hours': 10000, '_global_id_in_cluster': 0, '_keep_checkpoint_max': 5, '_save_checkpoints_steps': 50000, '_tf_random_seed': 42, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_protocol': None, '_device_fn': None, '_save_summary_steps': 50000, '_num_ps_replicas': 0, '_eval_distribute': None, '_num_worker_replicas': 1, '_log_step_count_steps': 100, '_task_type': 'worker'}
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkp

INFO:tensorflow:global_step/sec: 281.343
INFO:tensorflow:loss = 0.020708136, step = 6701 (0.350 sec)
INFO:tensorflow:global_step/sec: 354.588
INFO:tensorflow:loss = 0.020778513, step = 6801 (0.286 sec)
INFO:tensorflow:global_step/sec: 318.526
INFO:tensorflow:loss = 0.016706605, step = 6901 (0.313 sec)
INFO:tensorflow:global_step/sec: 387.289
INFO:tensorflow:loss = 0.026352338, step = 7001 (0.262 sec)
INFO:tensorflow:global_step/sec: 195.744
INFO:tensorflow:loss = 0.02617799, step = 7101 (0.508 sec)
INFO:tensorflow:global_step/sec: 221.335
INFO:tensorflow:loss = 0.017944645, step = 7201 (0.449 sec)
INFO:tensorflow:global_step/sec: 337.755
INFO:tensorflow:loss = 0.04815356, step = 7301 (0.296 sec)
INFO:tensorflow:global_step/sec: 396.037
INFO:tensorflow:loss = 0.025712956, step = 7401 (0.259 sec)
INFO:tensorflow:global_step/sec: 330.685
INFO:tensorflow:loss = 0.025668332, step = 7501 (0.297 sec)
INFO:tensorflow:global_step/sec: 284.466
INFO:tensorflow:loss = 0.01265048, step = 7601 (0.35

INFO:tensorflow:global_step/sec: 165.709
INFO:tensorflow:loss = 0.027445987, step = 14801 (0.603 sec)
INFO:tensorflow:global_step/sec: 207.162
INFO:tensorflow:loss = 0.04248728, step = 14901 (0.485 sec)
INFO:tensorflow:global_step/sec: 170.31
INFO:tensorflow:loss = 0.026710147, step = 15001 (0.590 sec)
INFO:tensorflow:global_step/sec: 205.071
INFO:tensorflow:loss = 0.014114473, step = 15101 (0.483 sec)
INFO:tensorflow:global_step/sec: 238.418
INFO:tensorflow:loss = 0.02495965, step = 15201 (0.420 sec)
INFO:tensorflow:global_step/sec: 203.796
INFO:tensorflow:loss = 0.024134407, step = 15301 (0.492 sec)
INFO:tensorflow:global_step/sec: 268.184
INFO:tensorflow:loss = 0.010985184, step = 15401 (0.380 sec)
INFO:tensorflow:global_step/sec: 216.299
INFO:tensorflow:loss = 0.020854292, step = 15501 (0.453 sec)
INFO:tensorflow:global_step/sec: 166.083
INFO:tensorflow:loss = 0.024032414, step = 15601 (0.606 sec)
INFO:tensorflow:global_step/sec: 317.255
INFO:tensorflow:loss = 0.026188534, step = 1

INFO:tensorflow:Saving 'checkpoint_path' summary for global step 20000: /tmp/tmpsbdccn23/model.ckpt-20000
INFO:tensorflow:Loss for final step: 0.034430638.
INFO:tensorflow:Starting ensemble evaluation for iteration 0
INFO:tensorflow:Restoring parameters from /tmp/tmpsbdccn23/model.ckpt-20000
INFO:tensorflow:Encountered end of input after 14 evaluations
INFO:tensorflow:Computed ensemble metrics: adanet_loss/linear = 0.035082, adanet_loss/1_layer_dnn = 0.021031
INFO:tensorflow:Finished ensemble evaluation for iteration 0
INFO:tensorflow:'1_layer_dnn' at index 1 is moving onto the next iteration
INFO:tensorflow:Freezing best ensemble to /tmp/tmpsbdccn23/frozen/ensemble-0.meta
INFO:tensorflow:Restoring parameters from /tmp/tmpsbdccn23/model.ckpt-20000
INFO:tensorflow:Importing frozen ensemble from /tmp/tmpsbdccn23/frozen/ensemble-0.meta with features: ['x'].
INFO:tensorflow:Overwriting checkpoint with new graph for iteration 1 to /tmp/tmpsbdccn23/model.ckpt-20000
INFO:tensorflow:Restoring 

INFO:tensorflow:global_step/sec: 356.343
INFO:tensorflow:loss = 0.021942817, step = 26201 (0.281 sec)
INFO:tensorflow:global_step/sec: 412.273
INFO:tensorflow:loss = 0.015928082, step = 26301 (0.243 sec)
INFO:tensorflow:global_step/sec: 422.135
INFO:tensorflow:loss = 0.028541323, step = 26401 (0.236 sec)
INFO:tensorflow:global_step/sec: 352.565
INFO:tensorflow:loss = 0.0317541, step = 26501 (0.284 sec)
INFO:tensorflow:global_step/sec: 381.905
INFO:tensorflow:loss = 0.01571558, step = 26601 (0.265 sec)
INFO:tensorflow:global_step/sec: 409.229
INFO:tensorflow:loss = 0.013409937, step = 26701 (0.241 sec)
INFO:tensorflow:global_step/sec: 389.545
INFO:tensorflow:loss = 0.0139035545, step = 26801 (0.257 sec)
INFO:tensorflow:global_step/sec: 412.059
INFO:tensorflow:loss = 0.008198794, step = 26901 (0.242 sec)
INFO:tensorflow:global_step/sec: 397.527
INFO:tensorflow:loss = 0.01690805, step = 27001 (0.252 sec)
INFO:tensorflow:global_step/sec: 417.433
INFO:tensorflow:loss = 0.011334573, step = 2

INFO:tensorflow:global_step/sec: 368.627
INFO:tensorflow:loss = 0.013047135, step = 34301 (0.271 sec)
INFO:tensorflow:global_step/sec: 380.667
INFO:tensorflow:loss = 0.011771475, step = 34401 (0.263 sec)
INFO:tensorflow:global_step/sec: 381.095
INFO:tensorflow:loss = 0.012609817, step = 34501 (0.262 sec)
INFO:tensorflow:global_step/sec: 453.402
INFO:tensorflow:loss = 0.021411832, step = 34601 (0.221 sec)
INFO:tensorflow:global_step/sec: 387.492
INFO:tensorflow:loss = 0.020396, step = 34701 (0.258 sec)
INFO:tensorflow:global_step/sec: 424.78
INFO:tensorflow:loss = 0.022956783, step = 34801 (0.239 sec)
INFO:tensorflow:global_step/sec: 349.204
INFO:tensorflow:loss = 0.022739397, step = 34901 (0.283 sec)
INFO:tensorflow:global_step/sec: 303.931
INFO:tensorflow:loss = 0.025612412, step = 35001 (0.332 sec)
INFO:tensorflow:global_step/sec: 383.001
INFO:tensorflow:loss = 0.012663094, step = 35101 (0.258 sec)
INFO:tensorflow:global_step/sec: 422.752
INFO:tensorflow:loss = 0.013372776, step = 35

INFO:tensorflow:Finished evaluation at 2018-11-19-07:34:14
INFO:tensorflow:Saving dict for global step 40000: average_loss = 0.034577098, average_loss/adanet/adanet_weighted_ensemble = 0.034577098, average_loss/adanet/subnetwork = 0.03454547, average_loss/adanet/uniform_average_ensemble = 0.03574329, global_step = 40000, label/mean = 3.1049454, label/mean/adanet/adanet_weighted_ensemble = 3.1049454, label/mean/adanet/subnetwork = 3.1049454, label/mean/adanet/uniform_average_ensemble = 3.1049454, loss = 0.044068314, loss/adanet/adanet_weighted_ensemble = 0.044068314, loss/adanet/subnetwork = 0.04466034, loss/adanet/uniform_average_ensemble = 0.04744883, prediction/mean = 3.1163297, prediction/mean/adanet/adanet_weighted_ensemble = 3.1163297, prediction/mean/adanet/subnetwork = 3.1383395, prediction/mean/adanet/uniform_average_ensemble = 3.148061
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 40000: /tmp/tmpsbdccn23/model.ckpt-40000
INFO:tensorflow:Loss for final step: 

INFO:tensorflow:global_step/sec: 335.553
INFO:tensorflow:loss = 0.009013201, step = 45101 (0.297 sec)
INFO:tensorflow:global_step/sec: 404.226
INFO:tensorflow:loss = 0.012831824, step = 45201 (0.249 sec)
INFO:tensorflow:global_step/sec: 377.237
INFO:tensorflow:loss = 0.019404039, step = 45301 (0.266 sec)
INFO:tensorflow:global_step/sec: 367.002
INFO:tensorflow:loss = 0.010275149, step = 45401 (0.271 sec)
INFO:tensorflow:global_step/sec: 345.575
INFO:tensorflow:loss = 0.008305512, step = 45501 (0.293 sec)
INFO:tensorflow:global_step/sec: 412.571
INFO:tensorflow:loss = 0.016967151, step = 45601 (0.238 sec)
INFO:tensorflow:global_step/sec: 374.589
INFO:tensorflow:loss = 0.010462762, step = 45701 (0.270 sec)
INFO:tensorflow:global_step/sec: 428.88
INFO:tensorflow:loss = 0.010056531, step = 45801 (0.231 sec)
INFO:tensorflow:global_step/sec: 378.849
INFO:tensorflow:loss = 0.010550333, step = 45901 (0.263 sec)
INFO:tensorflow:global_step/sec: 300.686
INFO:tensorflow:loss = 0.010025362, step =

INFO:tensorflow:loss = 0.0075753564, step = 53101 (0.254 sec)
INFO:tensorflow:global_step/sec: 443.121
INFO:tensorflow:loss = 0.008639535, step = 53201 (0.226 sec)
INFO:tensorflow:global_step/sec: 366.811
INFO:tensorflow:loss = 0.0059426376, step = 53301 (0.272 sec)
INFO:tensorflow:global_step/sec: 436.38
INFO:tensorflow:loss = 0.009628052, step = 53401 (0.233 sec)
INFO:tensorflow:global_step/sec: 369.474
INFO:tensorflow:loss = 0.009919619, step = 53501 (0.268 sec)
INFO:tensorflow:global_step/sec: 442.452
INFO:tensorflow:loss = 0.010228084, step = 53601 (0.225 sec)
INFO:tensorflow:global_step/sec: 370.581
INFO:tensorflow:loss = 0.0070839766, step = 53701 (0.270 sec)
INFO:tensorflow:global_step/sec: 425.726
INFO:tensorflow:loss = 0.013984772, step = 53801 (0.236 sec)
INFO:tensorflow:global_step/sec: 387.864
INFO:tensorflow:loss = 0.009667996, step = 53901 (0.258 sec)
INFO:tensorflow:global_step/sec: 411.66
INFO:tensorflow:loss = 0.0065857605, step = 54001 (0.243 sec)
INFO:tensorflow:glo

INFO:tensorflow:Saving candidate '2_layer_dnn' dict for global step 60000: architecture/adanet/ensembles = b"\n\x83\x01\n;adanet/iteration_2/ensemble_2_layer_dnn/architecture/adanetB:\x08\x07\x12\x00B4| b'1_layer_dnn' | b'2_layer_dnn' | b'2_layer_dnn' |J\x08\n\x06\n\x04text", average_loss/adanet/adanet_weighted_ensemble = 0.032477677, average_loss/adanet/subnetwork = 0.035940513, average_loss/adanet/uniform_average_ensemble = 0.033833675, label/mean/adanet/adanet_weighted_ensemble = 3.1049454, label/mean/adanet/subnetwork = 3.1049454, label/mean/adanet/uniform_average_ensemble = 3.1049454, loss/adanet/adanet_weighted_ensemble = 0.038081136, loss/adanet/subnetwork = 0.038228974, loss/adanet/uniform_average_ensemble = 0.042276174, prediction/mean/adanet/adanet_weighted_ensemble = 3.1002867, prediction/mean/adanet/subnetwork = 3.1267877, prediction/mean/adanet/uniform_average_ensemble = 3.14097
INFO:tensorflow:Saving candidate '3_layer_dnn' dict for global step 60000: architecture/adanet/

INFO:tensorflow:global_step/sec: 344.91
INFO:tensorflow:loss = 0.015341707, step = 63301 (0.292 sec)
INFO:tensorflow:global_step/sec: 388.371
INFO:tensorflow:loss = 0.011125417, step = 63401 (0.255 sec)
INFO:tensorflow:global_step/sec: 385.791
INFO:tensorflow:loss = 0.015757501, step = 63501 (0.261 sec)
INFO:tensorflow:global_step/sec: 382.056
INFO:tensorflow:loss = 0.0055276924, step = 63601 (0.261 sec)
INFO:tensorflow:global_step/sec: 381.476
INFO:tensorflow:loss = 0.0087307785, step = 63701 (0.261 sec)
INFO:tensorflow:global_step/sec: 377.094
INFO:tensorflow:loss = 0.007943563, step = 63801 (0.265 sec)
INFO:tensorflow:global_step/sec: 373.001
INFO:tensorflow:loss = 0.0052700844, step = 63901 (0.268 sec)
INFO:tensorflow:global_step/sec: 382.688
INFO:tensorflow:loss = 0.009482314, step = 64001 (0.262 sec)
INFO:tensorflow:global_step/sec: 379.242
INFO:tensorflow:loss = 0.008546544, step = 64101 (0.263 sec)
INFO:tensorflow:global_step/sec: 360.529
INFO:tensorflow:loss = 0.0058548674, st

INFO:tensorflow:loss = 0.0076406007, step = 71301 (0.217 sec)
INFO:tensorflow:global_step/sec: 425.033
INFO:tensorflow:loss = 0.0035454957, step = 71401 (0.235 sec)
INFO:tensorflow:global_step/sec: 377.817
INFO:tensorflow:loss = 0.012230123, step = 71501 (0.265 sec)
INFO:tensorflow:global_step/sec: 389.877
INFO:tensorflow:loss = 0.0046327766, step = 71601 (0.257 sec)
INFO:tensorflow:global_step/sec: 416.979
INFO:tensorflow:loss = 0.009797904, step = 71701 (0.240 sec)
INFO:tensorflow:global_step/sec: 426.299
INFO:tensorflow:loss = 0.0057702614, step = 71801 (0.234 sec)
INFO:tensorflow:global_step/sec: 434.612
INFO:tensorflow:loss = 0.0057235556, step = 71901 (0.230 sec)
INFO:tensorflow:global_step/sec: 388.485
INFO:tensorflow:loss = 0.0064652134, step = 72001 (0.260 sec)
INFO:tensorflow:global_step/sec: 374.645
INFO:tensorflow:loss = 0.004923028, step = 72101 (0.264 sec)
INFO:tensorflow:global_step/sec: 397.817
INFO:tensorflow:loss = 0.0047397236, step = 72201 (0.251 sec)
INFO:tensorflo

INFO:tensorflow:global_step/sec: 411.164
INFO:tensorflow:loss = 0.007632213, step = 79401 (0.243 sec)
INFO:tensorflow:global_step/sec: 429.003
INFO:tensorflow:loss = 0.0043778038, step = 79501 (0.235 sec)
INFO:tensorflow:global_step/sec: 405.753
INFO:tensorflow:loss = 0.005868679, step = 79601 (0.246 sec)
INFO:tensorflow:global_step/sec: 394.006
INFO:tensorflow:loss = 0.0047539934, step = 79701 (0.255 sec)
INFO:tensorflow:global_step/sec: 387.249
INFO:tensorflow:loss = 0.005021378, step = 79801 (0.256 sec)
INFO:tensorflow:global_step/sec: 386.278
INFO:tensorflow:loss = 0.008445743, step = 79901 (0.258 sec)
INFO:tensorflow:Saving checkpoints for 80000 into /tmp/tmpsbdccn23/model.ckpt.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Importing frozen ensemble from /tmp/tmpsbdccn23/frozen/ensemble-2.meta with features: ['x'].
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-11-19-07:36:38
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring par

INFO:tensorflow:global_step/sec: 416.172
INFO:tensorflow:loss = 0.0038198205, step = 81301 (0.239 sec)
INFO:tensorflow:global_step/sec: 425.783
INFO:tensorflow:loss = 0.002871193, step = 81401 (0.236 sec)
INFO:tensorflow:global_step/sec: 420.426
INFO:tensorflow:loss = 0.0057152724, step = 81501 (0.238 sec)
INFO:tensorflow:global_step/sec: 402.228
INFO:tensorflow:loss = 0.0043174643, step = 81601 (0.250 sec)
INFO:tensorflow:global_step/sec: 413.499
INFO:tensorflow:loss = 0.007462681, step = 81701 (0.241 sec)
INFO:tensorflow:global_step/sec: 414.319
INFO:tensorflow:loss = 0.006305181, step = 81801 (0.241 sec)
INFO:tensorflow:global_step/sec: 390.071
INFO:tensorflow:loss = 0.00810397, step = 81901 (0.256 sec)
INFO:tensorflow:global_step/sec: 411.099
INFO:tensorflow:loss = 0.0071171797, step = 82001 (0.243 sec)
INFO:tensorflow:global_step/sec: 420.092
INFO:tensorflow:loss = 0.010425732, step = 82101 (0.239 sec)
INFO:tensorflow:global_step/sec: 382.505
INFO:tensorflow:loss = 0.006138864, st

INFO:tensorflow:loss = 0.0045325924, step = 89301 (0.261 sec)
INFO:tensorflow:global_step/sec: 417.166
INFO:tensorflow:loss = 0.005176927, step = 89401 (0.239 sec)
INFO:tensorflow:global_step/sec: 411.981
INFO:tensorflow:loss = 0.0048831245, step = 89501 (0.243 sec)
INFO:tensorflow:global_step/sec: 416.779
INFO:tensorflow:loss = 0.005080943, step = 89601 (0.242 sec)
INFO:tensorflow:global_step/sec: 412.321
INFO:tensorflow:loss = 0.0035263128, step = 89701 (0.241 sec)
INFO:tensorflow:global_step/sec: 404.429
INFO:tensorflow:loss = 0.005478317, step = 89801 (0.246 sec)
INFO:tensorflow:global_step/sec: 408.066
INFO:tensorflow:loss = 0.006956035, step = 89901 (0.248 sec)
INFO:tensorflow:global_step/sec: 362.109
INFO:tensorflow:loss = 0.0063518365, step = 90001 (0.277 sec)
INFO:tensorflow:global_step/sec: 414.001
INFO:tensorflow:loss = 0.0044877417, step = 90101 (0.241 sec)
INFO:tensorflow:global_step/sec: 421.714
INFO:tensorflow:loss = 0.007858607, step = 90201 (0.234 sec)
INFO:tensorflow:

INFO:tensorflow:global_step/sec: 421.408
INFO:tensorflow:loss = 0.002677832, step = 97401 (0.238 sec)
INFO:tensorflow:global_step/sec: 402.995
INFO:tensorflow:loss = 0.0071991016, step = 97501 (0.250 sec)
INFO:tensorflow:global_step/sec: 420.571
INFO:tensorflow:loss = 0.0049621644, step = 97601 (0.238 sec)
INFO:tensorflow:global_step/sec: 415.601
INFO:tensorflow:loss = 0.005003189, step = 97701 (0.241 sec)
INFO:tensorflow:global_step/sec: 415.16
INFO:tensorflow:loss = 0.008433397, step = 97801 (0.241 sec)
INFO:tensorflow:global_step/sec: 357.071
INFO:tensorflow:loss = 0.004794498, step = 97901 (0.280 sec)
INFO:tensorflow:global_step/sec: 419.041
INFO:tensorflow:loss = 0.009021459, step = 98001 (0.237 sec)
INFO:tensorflow:global_step/sec: 408.839
INFO:tensorflow:loss = 0.004788899, step = 98101 (0.245 sec)
INFO:tensorflow:global_step/sec: 389.398
INFO:tensorflow:loss = 0.0073380973, step = 98201 (0.257 sec)
INFO:tensorflow:global_step/sec: 353.001
INFO:tensorflow:loss = 0.00816357, step

Learning the mixture weights produces a model with **0.0449** MSE, a bit worse
than the uniform average model, which the `adanet.Estimator` always compute as a
baseline. The mixture weights were learned without regularization, so they
likely overfit to the training set.

Observe that AdaNet learned the same ensemble composition as the previous run.
Without complexity regularization, AdaNet will favor more complex subnetworks,
which may have worse generalization despite improving the empirical error.

Finally, let's apply some **complexity regularization** by using $\lambda > 0$.
Since this will penalize more complex subnetworks, AdaNet will select the
candidate subnetwork that most improves the objective for its marginal
complexity:

In [16]:
#@test {"skip": true}
results, _ = train_and_evaluate(learn_mixture_weights=True, adanet_lambda=.015)
print("Loss:", results["average_loss"])
print("Uniform average loss:", results["average_loss/adanet/uniform_average_ensemble"])
print("Architecture:", ensemble_architecture(results))

INFO:tensorflow:Using config: {'_save_checkpoints_secs': None, '_experimental_distribute': None, '_service': None, '_task_id': 0, '_is_chief': True, '_master': '', '_evaluation_master': '', '_train_distribute': None, '_model_dir': '/tmp/tmpyxwongpm', '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f802a6f6668>, '_keep_checkpoint_every_n_hours': 10000, '_global_id_in_cluster': 0, '_keep_checkpoint_max': 5, '_save_checkpoints_steps': 50000, '_tf_random_seed': 42, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_protocol': None, '_device_fn': None, '_save_summary_steps': 50000, '_num_ps_replicas': 0, '_eval_distribute': None, '_num_worker_replicas': 1, '_log_step_count_steps': 100, '_task_type': 'worker'}
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkp

INFO:tensorflow:global_step/sec: 359.77
INFO:tensorflow:loss = 0.033137664, step = 6701 (0.278 sec)
INFO:tensorflow:global_step/sec: 360.24
INFO:tensorflow:loss = 0.021211509, step = 6801 (0.277 sec)
INFO:tensorflow:global_step/sec: 377.291
INFO:tensorflow:loss = 0.01806557, step = 6901 (0.268 sec)
INFO:tensorflow:global_step/sec: 379.734
INFO:tensorflow:loss = 0.03138727, step = 7001 (0.260 sec)
INFO:tensorflow:global_step/sec: 376.72
INFO:tensorflow:loss = 0.03288175, step = 7101 (0.266 sec)
INFO:tensorflow:global_step/sec: 360.23
INFO:tensorflow:loss = 0.015384585, step = 7201 (0.280 sec)
INFO:tensorflow:global_step/sec: 422.461
INFO:tensorflow:loss = 0.07852742, step = 7301 (0.233 sec)
INFO:tensorflow:global_step/sec: 372.544
INFO:tensorflow:loss = 0.037094302, step = 7401 (0.271 sec)
INFO:tensorflow:global_step/sec: 368.76
INFO:tensorflow:loss = 0.054601498, step = 7501 (0.270 sec)
INFO:tensorflow:global_step/sec: 347.916
INFO:tensorflow:loss = 0.020195387, step = 7601 (0.289 sec)

INFO:tensorflow:loss = 0.027374016, step = 14801 (0.281 sec)
INFO:tensorflow:global_step/sec: 360.692
INFO:tensorflow:loss = 0.042642456, step = 14901 (0.283 sec)
INFO:tensorflow:global_step/sec: 373.764
INFO:tensorflow:loss = 0.04404347, step = 15001 (0.267 sec)
INFO:tensorflow:global_step/sec: 380.937
INFO:tensorflow:loss = 0.014351687, step = 15101 (0.262 sec)
INFO:tensorflow:global_step/sec: 373.372
INFO:tensorflow:loss = 0.025546595, step = 15201 (0.264 sec)
INFO:tensorflow:global_step/sec: 373.91
INFO:tensorflow:loss = 0.029243704, step = 15301 (0.268 sec)
INFO:tensorflow:global_step/sec: 354.819
INFO:tensorflow:loss = 0.020585781, step = 15401 (0.283 sec)
INFO:tensorflow:global_step/sec: 367.96
INFO:tensorflow:loss = 0.020710247, step = 15501 (0.273 sec)
INFO:tensorflow:global_step/sec: 387.022
INFO:tensorflow:loss = 0.0501779, step = 15601 (0.257 sec)
INFO:tensorflow:global_step/sec: 347.191
INFO:tensorflow:loss = 0.02641895, step = 15701 (0.290 sec)
INFO:tensorflow:global_step

INFO:tensorflow:Saving 'checkpoint_path' summary for global step 20000: /tmp/tmpyxwongpm/model.ckpt-20000
INFO:tensorflow:Loss for final step: 0.050165728.
INFO:tensorflow:Starting ensemble evaluation for iteration 0
INFO:tensorflow:Restoring parameters from /tmp/tmpyxwongpm/model.ckpt-20000
INFO:tensorflow:Encountered end of input after 14 evaluations
INFO:tensorflow:Computed ensemble metrics: adanet_loss/linear = 0.035082, adanet_loss/1_layer_dnn = 0.035733
INFO:tensorflow:Finished ensemble evaluation for iteration 0
INFO:tensorflow:'linear' at index 0 is moving onto the next iteration
INFO:tensorflow:Freezing best ensemble to /tmp/tmpyxwongpm/frozen/ensemble-0.meta
INFO:tensorflow:Restoring parameters from /tmp/tmpyxwongpm/model.ckpt-20000
INFO:tensorflow:Importing frozen ensemble from /tmp/tmpyxwongpm/frozen/ensemble-0.meta with features: ['x'].
INFO:tensorflow:Overwriting checkpoint with new graph for iteration 1 to /tmp/tmpyxwongpm/model.ckpt-20000
INFO:tensorflow:Restoring param

INFO:tensorflow:global_step/sec: 338.478
INFO:tensorflow:loss = 0.037047483, step = 26201 (0.295 sec)
INFO:tensorflow:global_step/sec: 346.557
INFO:tensorflow:loss = 0.042897377, step = 26301 (0.289 sec)
INFO:tensorflow:global_step/sec: 378.775
INFO:tensorflow:loss = 0.0555079, step = 26401 (0.260 sec)
INFO:tensorflow:global_step/sec: 357.643
INFO:tensorflow:loss = 0.04718653, step = 26501 (0.283 sec)
INFO:tensorflow:global_step/sec: 350.334
INFO:tensorflow:loss = 0.020758241, step = 26601 (0.286 sec)
INFO:tensorflow:global_step/sec: 355.577
INFO:tensorflow:loss = 0.032562613, step = 26701 (0.279 sec)
INFO:tensorflow:global_step/sec: 359.237
INFO:tensorflow:loss = 0.02193155, step = 26801 (0.278 sec)
INFO:tensorflow:global_step/sec: 362.032
INFO:tensorflow:loss = 0.013505356, step = 26901 (0.279 sec)
INFO:tensorflow:global_step/sec: 341.729
INFO:tensorflow:loss = 0.024709681, step = 27001 (0.297 sec)
INFO:tensorflow:global_step/sec: 345.474
INFO:tensorflow:loss = 0.022348955, step = 27

INFO:tensorflow:global_step/sec: 252.9
INFO:tensorflow:loss = 0.035877682, step = 34301 (0.389 sec)
INFO:tensorflow:global_step/sec: 285.33
INFO:tensorflow:loss = 0.019192742, step = 34401 (0.352 sec)
INFO:tensorflow:global_step/sec: 329.339
INFO:tensorflow:loss = 0.027362991, step = 34501 (0.303 sec)
INFO:tensorflow:global_step/sec: 364.933
INFO:tensorflow:loss = 0.035525072, step = 34601 (0.276 sec)
INFO:tensorflow:global_step/sec: 343.012
INFO:tensorflow:loss = 0.03035288, step = 34701 (0.289 sec)
INFO:tensorflow:global_step/sec: 81.626
INFO:tensorflow:loss = 0.04075265, step = 34801 (1.226 sec)
INFO:tensorflow:global_step/sec: 96.802
INFO:tensorflow:loss = 0.05653507, step = 34901 (1.050 sec)
INFO:tensorflow:global_step/sec: 134.492
INFO:tensorflow:loss = 0.036090188, step = 35001 (0.725 sec)
INFO:tensorflow:global_step/sec: 158.743
INFO:tensorflow:loss = 0.016563129, step = 35101 (0.631 sec)
INFO:tensorflow:global_step/sec: 160.766
INFO:tensorflow:loss = 0.023504052, step = 35201 

INFO:tensorflow:Finished evaluation at 2018-11-19-08:58:19
INFO:tensorflow:Saving dict for global step 40000: average_loss = 0.04485139, average_loss/adanet/adanet_weighted_ensemble = 0.04485139, average_loss/adanet/subnetwork = 0.04407932, average_loss/adanet/uniform_average_ensemble = 0.043675844, global_step = 40000, label/mean = 3.1049454, label/mean/adanet/adanet_weighted_ensemble = 3.1049454, label/mean/adanet/subnetwork = 3.1049454, label/mean/adanet/uniform_average_ensemble = 3.1049454, loss = 0.059158117, loss/adanet/adanet_weighted_ensemble = 0.059158117, loss/adanet/subnetwork = 0.061847888, loss/adanet/uniform_average_ensemble = 0.0585649, prediction/mean = 3.1251307, prediction/mean/adanet/adanet_weighted_ensemble = 3.1251307, prediction/mean/adanet/subnetwork = 3.1570365, prediction/mean/adanet/uniform_average_ensemble = 3.131466
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 40000: /tmp/tmpyxwongpm/model.ckpt-40000
INFO:tensorflow:Loss for final step: 0

INFO:tensorflow:global_step/sec: 268.541
INFO:tensorflow:loss = 0.017422926, step = 45101 (0.372 sec)
INFO:tensorflow:global_step/sec: 314.142
INFO:tensorflow:loss = 0.015257321, step = 45201 (0.318 sec)
INFO:tensorflow:global_step/sec: 289.481
INFO:tensorflow:loss = 0.041434586, step = 45301 (0.342 sec)
INFO:tensorflow:global_step/sec: 306.188
INFO:tensorflow:loss = 0.015478523, step = 45401 (0.325 sec)
INFO:tensorflow:global_step/sec: 305.61
INFO:tensorflow:loss = 0.018797157, step = 45501 (0.328 sec)
INFO:tensorflow:global_step/sec: 300.935
INFO:tensorflow:loss = 0.021855002, step = 45601 (0.335 sec)
INFO:tensorflow:global_step/sec: 322.617
INFO:tensorflow:loss = 0.023638379, step = 45701 (0.308 sec)
INFO:tensorflow:global_step/sec: 325.89
INFO:tensorflow:loss = 0.019294918, step = 45801 (0.308 sec)
INFO:tensorflow:global_step/sec: 315.821
INFO:tensorflow:loss = 0.008947721, step = 45901 (0.315 sec)
INFO:tensorflow:global_step/sec: 321.599
INFO:tensorflow:loss = 0.019638894, step = 

INFO:tensorflow:global_step/sec: 350.063
INFO:tensorflow:loss = 0.011961583, step = 53201 (0.285 sec)
INFO:tensorflow:global_step/sec: 318.362
INFO:tensorflow:loss = 0.014506684, step = 53301 (0.319 sec)
INFO:tensorflow:global_step/sec: 276.802
INFO:tensorflow:loss = 0.02157983, step = 53401 (0.358 sec)
INFO:tensorflow:global_step/sec: 268.414
INFO:tensorflow:loss = 0.025674623, step = 53501 (0.377 sec)
INFO:tensorflow:global_step/sec: 210.473
INFO:tensorflow:loss = 0.011091793, step = 53601 (0.475 sec)
INFO:tensorflow:global_step/sec: 217.217
INFO:tensorflow:loss = 0.010901217, step = 53701 (0.460 sec)
INFO:tensorflow:global_step/sec: 289.767
INFO:tensorflow:loss = 0.034340452, step = 53801 (0.342 sec)
INFO:tensorflow:global_step/sec: 294.66
INFO:tensorflow:loss = 0.011284155, step = 53901 (0.340 sec)
INFO:tensorflow:global_step/sec: 258.428
INFO:tensorflow:loss = 0.0117264055, step = 54001 (0.386 sec)
INFO:tensorflow:global_step/sec: 264.069
INFO:tensorflow:loss = 0.01872945, step = 

INFO:tensorflow:Saving candidate '1_layer_dnn' dict for global step 60000: architecture/adanet/ensembles = b"\n~\n;adanet/iteration_2/ensemble_1_layer_dnn/architecture/adanetB5\x08\x07\x12\x00B/| b'linear' | b'1_layer_dnn' | b'1_layer_dnn' |J\x08\n\x06\n\x04text", average_loss/adanet/adanet_weighted_ensemble = 0.042682763, average_loss/adanet/subnetwork = 0.038611967, average_loss/adanet/uniform_average_ensemble = 0.040760666, label/mean/adanet/adanet_weighted_ensemble = 3.1049454, label/mean/adanet/subnetwork = 3.1049454, label/mean/adanet/uniform_average_ensemble = 3.1049454, loss/adanet/adanet_weighted_ensemble = 0.055231333, loss/adanet/subnetwork = 0.053511024, loss/adanet/uniform_average_ensemble = 0.05558739, prediction/mean/adanet/adanet_weighted_ensemble = 3.098613, prediction/mean/adanet/subnetwork = 3.1419017, prediction/mean/adanet/uniform_average_ensemble = 3.1349444
INFO:tensorflow:Saving candidate '2_layer_dnn' dict for global step 60000: architecture/adanet/ensembles = 

INFO:tensorflow:loss = 0.018950937, step = 63301 (0.274 sec)
INFO:tensorflow:global_step/sec: 381.861
INFO:tensorflow:loss = 0.0125065185, step = 63401 (0.262 sec)
INFO:tensorflow:global_step/sec: 387.235
INFO:tensorflow:loss = 0.025653698, step = 63501 (0.258 sec)
INFO:tensorflow:global_step/sec: 386.667
INFO:tensorflow:loss = 0.009902426, step = 63601 (0.259 sec)
INFO:tensorflow:global_step/sec: 384.128
INFO:tensorflow:loss = 0.010109541, step = 63701 (0.260 sec)
INFO:tensorflow:global_step/sec: 388.901
INFO:tensorflow:loss = 0.007783378, step = 63801 (0.257 sec)
INFO:tensorflow:global_step/sec: 391.62
INFO:tensorflow:loss = 0.008886563, step = 63901 (0.256 sec)
INFO:tensorflow:global_step/sec: 390.313
INFO:tensorflow:loss = 0.01669297, step = 64001 (0.256 sec)
INFO:tensorflow:global_step/sec: 385.505
INFO:tensorflow:loss = 0.012598669, step = 64101 (0.259 sec)
INFO:tensorflow:global_step/sec: 384.781
INFO:tensorflow:loss = 0.0065435776, step = 64201 (0.260 sec)
INFO:tensorflow:globa

INFO:tensorflow:global_step/sec: 319.912
INFO:tensorflow:loss = 0.007563833, step = 71401 (0.313 sec)
INFO:tensorflow:global_step/sec: 335.905
INFO:tensorflow:loss = 0.018885966, step = 71501 (0.298 sec)
INFO:tensorflow:global_step/sec: 340.835
INFO:tensorflow:loss = 0.0063192933, step = 71601 (0.293 sec)
INFO:tensorflow:global_step/sec: 336.149
INFO:tensorflow:loss = 0.012419142, step = 71701 (0.298 sec)
INFO:tensorflow:global_step/sec: 337.271
INFO:tensorflow:loss = 0.009384869, step = 71801 (0.297 sec)
INFO:tensorflow:global_step/sec: 388.592
INFO:tensorflow:loss = 0.00930639, step = 71901 (0.257 sec)
INFO:tensorflow:global_step/sec: 378.326
INFO:tensorflow:loss = 0.011872329, step = 72001 (0.264 sec)
INFO:tensorflow:global_step/sec: 395.484
INFO:tensorflow:loss = 0.005632778, step = 72101 (0.253 sec)
INFO:tensorflow:global_step/sec: 389.649
INFO:tensorflow:loss = 0.00506723, step = 72201 (0.259 sec)
INFO:tensorflow:global_step/sec: 387.76
INFO:tensorflow:loss = 0.008731108, step = 

INFO:tensorflow:loss = 0.009111306, step = 79401 (0.318 sec)
INFO:tensorflow:global_step/sec: 327.153
INFO:tensorflow:loss = 0.008054039, step = 79501 (0.309 sec)
INFO:tensorflow:global_step/sec: 339.574
INFO:tensorflow:loss = 0.006679391, step = 79601 (0.292 sec)
INFO:tensorflow:global_step/sec: 350.033
INFO:tensorflow:loss = 0.008028477, step = 79701 (0.286 sec)
INFO:tensorflow:global_step/sec: 322.011
INFO:tensorflow:loss = 0.008896752, step = 79801 (0.311 sec)
INFO:tensorflow:global_step/sec: 320.338
INFO:tensorflow:loss = 0.012979875, step = 79901 (0.312 sec)
INFO:tensorflow:Saving checkpoints for 80000 into /tmp/tmpyxwongpm/model.ckpt.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Importing frozen ensemble from /tmp/tmpyxwongpm/frozen/ensemble-2.meta with features: ['x'].
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-11-19-09:01:01
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpyxwongpm/model.ckpt-80

INFO:tensorflow:loss = 0.016247045, step = 81301 (0.524 sec)
INFO:tensorflow:global_step/sec: 281.409
INFO:tensorflow:loss = 0.0046351785, step = 81401 (0.355 sec)
INFO:tensorflow:global_step/sec: 184.101
INFO:tensorflow:loss = 0.0068879593, step = 81501 (0.540 sec)
INFO:tensorflow:global_step/sec: 292.058
INFO:tensorflow:loss = 0.0076613384, step = 81601 (0.345 sec)
INFO:tensorflow:global_step/sec: 274.88
INFO:tensorflow:loss = 0.012942525, step = 81701 (0.361 sec)
INFO:tensorflow:global_step/sec: 233.896
INFO:tensorflow:loss = 0.012133781, step = 81801 (0.431 sec)
INFO:tensorflow:global_step/sec: 198.759
INFO:tensorflow:loss = 0.01579272, step = 81901 (0.503 sec)
INFO:tensorflow:global_step/sec: 310.894
INFO:tensorflow:loss = 0.010960525, step = 82001 (0.323 sec)
INFO:tensorflow:global_step/sec: 282.843
INFO:tensorflow:loss = 0.019725045, step = 82101 (0.349 sec)
INFO:tensorflow:global_step/sec: 348.46
INFO:tensorflow:loss = 0.008978224, step = 82201 (0.291 sec)
INFO:tensorflow:globa

INFO:tensorflow:global_step/sec: 283.085
INFO:tensorflow:loss = 0.0064634783, step = 89401 (0.353 sec)
INFO:tensorflow:global_step/sec: 299.229
INFO:tensorflow:loss = 0.015360342, step = 89501 (0.332 sec)
INFO:tensorflow:global_step/sec: 299.435
INFO:tensorflow:loss = 0.010400895, step = 89601 (0.334 sec)
INFO:tensorflow:global_step/sec: 311.035
INFO:tensorflow:loss = 0.005599436, step = 89701 (0.322 sec)
INFO:tensorflow:global_step/sec: 301.039
INFO:tensorflow:loss = 0.010744687, step = 89801 (0.331 sec)
INFO:tensorflow:global_step/sec: 295.358
INFO:tensorflow:loss = 0.012388031, step = 89901 (0.344 sec)
INFO:tensorflow:global_step/sec: 319.242
INFO:tensorflow:loss = 0.01192112, step = 90001 (0.309 sec)
INFO:tensorflow:global_step/sec: 348.833
INFO:tensorflow:loss = 0.0075354977, step = 90101 (0.289 sec)
INFO:tensorflow:global_step/sec: 368.975
INFO:tensorflow:loss = 0.012779966, step = 90201 (0.270 sec)
INFO:tensorflow:global_step/sec: 347.696
INFO:tensorflow:loss = 0.014670095, step

INFO:tensorflow:loss = 0.007828134, step = 97401 (0.514 sec)
INFO:tensorflow:global_step/sec: 224.604
INFO:tensorflow:loss = 0.014394507, step = 97501 (0.441 sec)
INFO:tensorflow:global_step/sec: 296.158
INFO:tensorflow:loss = 0.013275066, step = 97601 (0.338 sec)
INFO:tensorflow:global_step/sec: 196.911
INFO:tensorflow:loss = 0.008199433, step = 97701 (0.509 sec)
INFO:tensorflow:global_step/sec: 308.738
INFO:tensorflow:loss = 0.012882975, step = 97801 (0.326 sec)
INFO:tensorflow:global_step/sec: 279.348
INFO:tensorflow:loss = 0.009654939, step = 97901 (0.355 sec)
INFO:tensorflow:global_step/sec: 220.2
INFO:tensorflow:loss = 0.015698953, step = 98001 (0.454 sec)
INFO:tensorflow:global_step/sec: 270.433
INFO:tensorflow:loss = 0.010378662, step = 98101 (0.371 sec)
INFO:tensorflow:global_step/sec: 197.8
INFO:tensorflow:loss = 0.011898065, step = 98201 (0.504 sec)
INFO:tensorflow:global_step/sec: 232.261
INFO:tensorflow:loss = 0.013942307, step = 98301 (0.431 sec)
INFO:tensorflow:global_st

Learning the mixture weights with $\lambda > 0$ produces a model with **0.0320**
MSE. Notice that this is even better than the uniform average ensemble produced
from the chosen subnetworks with **0.0345** MSE.

Inspecting the ensemble architecture demonstrates the effects of complexity
regularization on candidate selection. The selected subnetworks are relatively
less complex: unlike in previous runs, the simplest subnetwork is a linear model
and the deepest subnetwork has only 3 hidden layers.

In general, learning to combine subnetwork ouputs with optimal hyperparameters
should be at least as good assigning uniform average weights.

## Conclusion

In this tutorial, you were able to explore training an AdaNet model's mixture
weights with $\lambda \ge 0$. You were also able to compare against building an
ensemble formed by always choosing the best candidate subnetwork at each
iteration based on it's ability to improve the ensemble's loss on the training
set, and averaging their results.

Uniform average ensembles work unreasonably well in practice, yet learning the
mixture weights with the correct values of $\lambda$ and $\beta$ should always
produce a better model when candidates have varying complexity. However, this
does require some additional hyperparameter tuning, so practically you can train
an AdaNet with the default mixture weights and $\lambda=0$ first, and once you
have confirmed that the subnetworks are training correctly, you can tune the
mixture weight hyperparameters.

While this example explored a regression task, these observations apply to using
AdaNet on other tasks like binary-classification and multi-class classification.