<span style="color:red; font-family:Helvetica Neue, Helvetica, Arial, sans-serif; font-size:2em;">An Exception was encountered at '<a href="#papermill-error-cell">In [14]</a>'.</span>

## Trial-level early stopping in Ax

This tutorial illustrates how to add a trial-level early stopping strategy to an Ax hyper-parameter optimization (HPO) loop. The goal of trial-level early stopping is to monitor the results of expensive evaluations and terminate those that are unlikely to produce promising results, freeing up resources to explore more configurations.

Most of this tutorial is adapted from the [PyTorch Ax Multiobjective NAS Tutorial](https://pytorch.org/tutorials/intermediate/ax_multiobjective_nas_tutorial.html). The training job is different from the original in that we do not optimize `batch_size` or `epochs`. This was done for illustrative purposes, as each validation curve now has the same number of points. The companion training file `mnist_train_nas.py` has also been altered to log to Tensorboard during training.

NOTE: Although the original NAS tutorial is for a multi-objective problem, this tutorial focuses on a single objective (validation accuracy) problem. Early stopping currently does not support \"true\" multi-objective stopping, although one can use [logical compositions of early stopping strategies](https://github.com/facebook/Ax/blob/main/ax/early_stopping/strategies/logical.py) to target multiple objectives separately. Early stopping for the multi-objective case is currently a work in progress.

In [1]:
import os
import tempfile

from pathlib import Path

import torchx

from ax.core import Experiment, Objective, ParameterType, RangeParameter, SearchSpace
from ax.core.optimization_config import OptimizationConfig

from ax.early_stopping.strategies import PercentileEarlyStoppingStrategy
from ax.metrics.tensorboard import TensorboardMetric

from ax.modelbridge.dispatch_utils import choose_generation_strategy

from ax.runners.torchx import TorchXRunner

from ax.service.scheduler import Scheduler, SchedulerOptions
from ax.service.utils.report_utils import exp_to_df

from tensorboard.backend.event_processing import plugin_event_multiplexer as event_multiplexer

from torchx import specs
from torchx.components import utils

from matplotlib import pyplot as plt


%matplotlib inline

In [2]:
SMOKE_TEST = os.environ.get("SMOKE_TEST")

## Defining the TorchX App

Our goal is to optimize the PyTorch Lightning training job defined in
[mnist_train_nas.py](https://github.com/pytorch/tutorials/tree/master/intermediate_source/mnist_train_nas.py)_.
To do this using TorchX, we write a helper function that takes in
the values of the architcture and hyperparameters of the training
job and creates a [TorchX AppDef](https://pytorch.org/torchx/latest/basics.html)_
with the appropriate settings.



In [3]:
if SMOKE_TEST:
    epochs = 3
else:
    epochs = 10

In [4]:
def trainer(
    log_path: str,
    hidden_size_1: int,
    hidden_size_2: int,
    learning_rate: float,
    dropout: float,
    trial_idx: int = -1,
) -> specs.AppDef:

    # define the log path so we can pass it to the TorchX AppDef
    if trial_idx >= 0:
        log_path = Path(log_path).joinpath(str(trial_idx)).absolute().as_posix()

    batch_size = 32

    return utils.python(
        # command line args to the training script
        "--log_path",
        log_path,
        "--hidden_size_1",
        str(hidden_size_1),
        "--hidden_size_2",
        str(hidden_size_2),
        "--learning_rate",
        str(learning_rate),
        "--epochs",
        str(epochs),
        "--dropout",
        str(dropout),
        "--batch_size",
        str(batch_size),
        # other config options
        name="trainer",
        script="tutorials/early_stopping/mnist_train_nas.py",
        image=torchx.version.TORCHX_IMAGE,
    )

## Setting up the Runner

Ax’s [Runner](https://ax.dev/api/core.html#ax.core.runner.Runner)
abstraction allows writing interfaces to various backends.
Ax already comes with Runner for TorchX, so we just need to
configure it. For the purpose of this tutorial, we run jobs locally
in a fully asynchronous fashion. In order to launch them on a cluster, you can instead specify a
different TorchX scheduler and adjust the configuration appropriately.
For example, if you have a Kubernetes cluster, you just need to change the
scheduler from ``local_cwd`` to ``kubernetes``.

The training job launched by this runner will log partial results to Tensorboard, which will then be monitored by the early stopping strategy. We will show how this is done using an Ax 
[TensorboardMetric](https://ax.dev/api/metrics.html#module-ax.metrics.tensorboard) below.

In [5]:
# Make a temporary dir to log our results into
log_dir = tempfile.mkdtemp()

ax_runner = TorchXRunner(
    tracker_base="/tmp/",
    component=trainer,
    # NOTE: To launch this job on a cluster instead of locally you can
    # specify a different scheduler and adjust args appropriately.
    scheduler="local_cwd",
    component_const_params={"log_path": log_dir},
    cfg={},
)

## Setting up the SearchSpace

First, we define our search space. Ax supports both range parameters
of type integer and float as well as choice parameters which can have
non-numerical types such as strings.
We will tune the hidden sizes, learning rate, and dropout parameters.

In [6]:
parameters = [
    # NOTE: In a real-world setting, hidden_size_1 and hidden_size_2
    # should probably be powers of 2, but in our simple example this
    # would mean that num_params can't take on that many values, which
    # in turn makes the Pareto frontier look pretty weird.
    RangeParameter(
        name="hidden_size_1",
        lower=16,
        upper=128,
        parameter_type=ParameterType.INT,
        log_scale=True,
    ),
    RangeParameter(
        name="hidden_size_2",
        lower=16,
        upper=128,
        parameter_type=ParameterType.INT,
        log_scale=True,
    ),
    RangeParameter(
        name="learning_rate",
        lower=1e-4,
        upper=1e-2,
        parameter_type=ParameterType.FLOAT,
        log_scale=True,
    ),
    RangeParameter(
        name="dropout",
        lower=0.0,
        upper=0.5,
        parameter_type=ParameterType.FLOAT,
    ),
]

search_space = SearchSpace(
    parameters=parameters,
    # NOTE: In practice, it may make sense to add a constraint
    # hidden_size_2 <= hidden_size_1
    parameter_constraints=[],
)

## Setting up Metrics

Ax has the concept of a Metric that defines properties of outcomes and how observations are obtained for these outcomes. This allows e.g. encodig how data is fetched from some distributed execution backend and post-processed before being passed as input to Ax.

We will optimize the validation accuracy, which is a `TensorboardMetric` that points to the logging directory assigned above. Note that we have set `is_available_while_running`, allowing for the metric to be queried as the trial progresses. This is critical for the early stopping strategy to monitor partial results.

In [7]:
class MyTensorboardMetric(TensorboardMetric):

    # NOTE: We need to tell the new Tensorboard metric how to get the id /
    # file handle for the tensorboard logs from a trial. In this case
    # our convention is to just save a separate file per trial in
    # the pre-specified log dir.
    def _get_event_multiplexer_for_trial(self, trial):
        mul = event_multiplexer.EventMultiplexer(max_reload_threads=20)
        mul.AddRunsFromDirectory(Path(log_dir).joinpath(str(trial.index)).as_posix(), None)
        mul.Reload()

        return mul

    # This indicates whether the metric is queryable while the trial is
    # still running. This is required for early stopping to monitor the
    # progress of the running trial.ArithmeticError
    @classmethod
    def is_available_while_running(cls):
        return True

In [8]:
val_acc = MyTensorboardMetric(
    name="val_acc",
    tag="val_acc",
    lower_is_better=False,
)

## Setting up the OptimizationConfig

The `OptimizationConfig` specifies the objective for Ax to optimize.

In [9]:
opt_config = OptimizationConfig(
    objective=Objective(
        metric=val_acc,
        minimize=False,
    )
)

## Defining an Early Stopping Strategy

A `PercentileEarlyStoppingStrategy` is a simple method that stops a trial if its performance falls below a certain percentile of other trials at the same step (e.g., when `percentile_threshold` is 50, at a given point in time, if a trial ranks in the bottom 50% of trials, it is stopped). 
- We make use of `normalize_progressions` which normalizes the progression column (e.g. timestamp, epochs, training data used) to be in [0, 1]. This is useful because one doesn't need to know the maximum progression values of the curve (which might be, e.g., the total number of data points in the training dataset).
- The `min_progression` parameter specifies that trials should only be considered for stopping if the latest progression value is greater than this threshold.
- The `min_curves` parameter specifies the minimum number of completed curves (i.e., fully completed training jobs) before early stopping will be considered. This should be larger than zero if `normalize_progression` is used. In general, we want a few completed curves to have a baseline for comparison.

Note that `PercentileEarlyStoppingStrategy` does not make use of learning curve modeling or prediction. More sophisticated model-based methods will be available in future versions of Ax.

In [10]:
percentile_early_stopping_strategy = PercentileEarlyStoppingStrategy(
    # stop if in bottom 70% of runs at the same progression
    percentile_threshold=70,
    # the trial must have passed `min_progression` steps before early stopping is initiated
    # note that we are using `normalize_progressions`, so this is on a scale of [0, 1]
    min_progression=0.3,
    # there must be `min_curves` completed trials and `min_curves` trials reporting data in
    # order for early stopping to be applicable
    min_curves=5,
    # specify, e.g., [0, 1] if the first two trials should never be stopped
    trial_indices_to_ignore=None,
    # check for new data every 10 seconds
    seconds_between_polls=10,
    normalize_progressions=True,
)

## Creating the Ax Experiment

In Ax, the Experiment object is the object that stores all the information about the problem setup.

In [11]:
experiment = Experiment(
    name="torchx_mnist",
    search_space=search_space,
    optimization_config=opt_config,
    runner=ax_runner,
)

## Choosing the GenerationStrategy

A [GenerationStrategy](https://ax.dev/api/modelbridge.html#ax.modelbridge.generation_strategy.GenerationStrategy)
is the abstract representation of how we would like to perform the
optimization. While this can be customized (if you’d like to do so, see
[this tutorial](https://ax.dev/tutorials/generation_strategy.html)),
in most cases Ax can automatically determine an appropriate strategy
based on the search space, optimization config, and the total number
of trials we want to run.

Typically, Ax chooses to evaluate a number of random configurations
before starting a model-based Bayesian Optimization strategy.

We remark that in Ax, generation strategies and early stopping strategies are separate, a design decision motivated by ease-of-use. However, we should acknowledge that jointly considering generation and stopping using a single strategy would likely be the "proper" formulation.

In [12]:
if SMOKE_TEST:
    total_trials = 6
else:
    total_trials = 15  # total evaluation budget

gs = choose_generation_strategy(
    search_space=experiment.search_space,
    optimization_config=experiment.optimization_config,
    num_trials=total_trials,
)

[INFO 09-30 16:59:48] ax.modelbridge.dispatch_utils: Using Models.BOTORCH_MODULAR since there is at least one ordered parameter and there are no unordered categorical parameters.


[INFO 09-30 16:59:48] ax.modelbridge.dispatch_utils: Calculating the number of remaining initialization trials based on num_initialization_trials=None max_initialization_trials=None num_tunable_parameters=4 num_trials=15 use_batch_trials=False


[INFO 09-30 16:59:48] ax.modelbridge.dispatch_utils: calculated num_initialization_trials=5


[INFO 09-30 16:59:48] ax.modelbridge.dispatch_utils: num_completed_initialization_trials=0 num_remaining_initialization_trials=5


[INFO 09-30 16:59:48] ax.modelbridge.dispatch_utils: `verbose`, `disable_progbar`, and `jit_compile` are not yet supported when using `choose_generation_strategy` with ModularBoTorchModel, dropping these arguments.


[INFO 09-30 16:59:48] ax.modelbridge.dispatch_utils: Using Bayesian Optimization generation strategy: GenerationStrategy(name='Sobol+BoTorch', steps=[Sobol for 5 trials, BoTorch for subsequent trials]). Iterations after 5 will take longer to generate due to model-fitting.


## Configuring the Scheduler

The `Scheduler` acts as the loop control for the optimization.
It communicates with the backend to launch trials, check their status, retrieve (partial) results, and importantly for this tutorial, calls the early stopping strategy. If the early stopping strategy suggests a trial to be the stopped, the `Scheduler` communicates with the backend to terminate the trial.

The ``Scheduler`` requires the ``Experiment`` and the ``GenerationStrategy``.
A set of options can be passed in via ``SchedulerOptions``. Here, we
configure the number of total evaluations as well as ``max_pending_trials``,
the maximum number of trials that should run concurrently. In our
local setting, this is the number of training jobs running as individual
processes, while in a remote execution setting, this would be the number
of machines you want to use in parallel.


In [13]:
scheduler = Scheduler(
    experiment=experiment,
    generation_strategy=gs,
    options=SchedulerOptions(
        total_trials=total_trials,
        max_pending_trials=5,
        early_stopping_strategy=percentile_early_stopping_strategy,
    ),
)

[INFO 09-30 16:59:48] Scheduler: `Scheduler` requires experiment to have immutable search space and optimization config. Setting property immutable_search_space_and_opt_config to `True` on experiment.


<span id="papermill-error-cell" style="color:red; font-family:Helvetica Neue, Helvetica, Arial, sans-serif; font-size:2em;">Execution using papermill encountered an exception here and stopped:</span>

In [14]:
%%time
scheduler.run_all_trials()

[INFO 09-30 16:59:48] Scheduler: Fetching data for newly completed trials: [].


[INFO 09-30 16:59:48] ax.early_stopping.strategies.base: PercentileEarlyStoppingStrategy received empty data. Not stopping any trials.


  warn("Encountered exception in computing model fit quality: " + str(e))
[INFO 09-30 16:59:48] Scheduler: Running trials [0]...


  warn("Encountered exception in computing model fit quality: " + str(e))
[INFO 09-30 16:59:49] Scheduler: Running trials [1]...


  warn("Encountered exception in computing model fit quality: " + str(e))
[INFO 09-30 16:59:50] Scheduler: Running trials [2]...


  warn("Encountered exception in computing model fit quality: " + str(e))
[INFO 09-30 16:59:51] Scheduler: Running trials [3]...


  warn("Encountered exception in computing model fit quality: " + str(e))
[INFO 09-30 16:59:52] Scheduler: Running trials [4]...




[INFO 09-30 16:59:53] Scheduler: Fetching data for newly completed trials: [].


[INFO 09-30 16:59:53] Scheduler: Fetching data for trials: 0 - 4 because some metrics on experiment are available while trials are running.


[INFO 09-30 16:59:53] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fd01850>")


[INFO 09-30 16:59:53] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fd01850>")


[INFO 09-30 16:59:53] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fd01850>")


[INFO 09-30 16:59:53] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fd01850>")


[INFO 09-30 16:59:53] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fd01850>")


[ERROR 09-30 16:59:53] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fd01850>"). Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 16:59:53] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fd01850>"). Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 16:59:53] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fd01850>"). Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 16:59:53] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fd01850>"). Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 16:59:53] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fd01850>"). Ignoring for now -- will retry query on next call to fetch.




[INFO 09-30 16:59:53] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 0 is still RUNNING continuing the experiment and retrying on next poll...




[INFO 09-30 16:59:53] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 1 is still RUNNING continuing the experiment and retrying on next poll...




[INFO 09-30 16:59:53] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 2 is still RUNNING continuing the experiment and retrying on next poll...




[INFO 09-30 16:59:53] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 3 is still RUNNING continuing the experiment and retrying on next poll...




[INFO 09-30 16:59:53] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 4 is still RUNNING continuing the experiment and retrying on next poll...


[INFO 09-30 16:59:53] ax.early_stopping.strategies.base: PercentileEarlyStoppingStrategy received empty data. Not stopping any trials.


[INFO 09-30 16:59:53] Scheduler: Waiting for completed trials (for 10 sec, currently running trials: 5).


[INFO 09-30 17:00:03] Scheduler: Fetching data for newly completed trials: [].


[INFO 09-30 17:00:03] Scheduler: Fetching data for trials: [0, 1, 3, 4] because some metrics on experiment are available while trials are running.


[INFO 09-30 17:00:03] Scheduler: Retrieved FAILED trials: [2].


[INFO 09-30 17:00:03] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fd44fe0>")


[INFO 09-30 17:00:03] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30f74d460>")


[INFO 09-30 17:00:03] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30f74d460>")


[INFO 09-30 17:00:03] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30f74d460>")


[ERROR 09-30 17:00:03] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fd44fe0>"). Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:00:03] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30f74d460>"). Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:00:03] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30f74d460>"). Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:00:03] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30f74d460>"). Ignoring for now -- will retry query on next call to fetch.




[INFO 09-30 17:00:03] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 0 is still RUNNING continuing the experiment and retrying on next poll...




[INFO 09-30 17:00:03] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 1 is still RUNNING continuing the experiment and retrying on next poll...




[INFO 09-30 17:00:03] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 3 is still RUNNING continuing the experiment and retrying on next poll...




[INFO 09-30 17:00:03] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 4 is still RUNNING continuing the experiment and retrying on next poll...


[INFO 09-30 17:00:03] ax.early_stopping.strategies.base: PercentileEarlyStoppingStrategy received empty data. Not stopping any trials.


  warn("Encountered exception in computing model fit quality: " + str(e))
[INFO 09-30 17:00:03] Scheduler: Running trials [5]...




[INFO 09-30 17:00:04] Scheduler: Fetching data for newly completed trials: [].


[INFO 09-30 17:00:04] Scheduler: Fetching data for trials: [0, 1, 3, 4, 5] because some metrics on experiment are available while trials are running.


[INFO 09-30 17:00:04] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fd7b680>")


[INFO 09-30 17:00:04] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fd7b680>")


[INFO 09-30 17:00:04] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fd7b680>")


[INFO 09-30 17:00:04] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fd7b680>")


[INFO 09-30 17:00:04] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x12350de80>")


[ERROR 09-30 17:00:04] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fd7b680>"). Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:00:04] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fd7b680>"). Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:00:04] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fd7b680>"). Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:00:04] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fd7b680>"). Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:00:04] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x12350de80>"). Ignoring for now -- will retry query on next call to fetch.




[INFO 09-30 17:00:04] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 0 is still RUNNING continuing the experiment and retrying on next poll...




[INFO 09-30 17:00:04] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 1 is still RUNNING continuing the experiment and retrying on next poll...




[INFO 09-30 17:00:04] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 3 is still RUNNING continuing the experiment and retrying on next poll...




[INFO 09-30 17:00:04] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 4 is still RUNNING continuing the experiment and retrying on next poll...




[INFO 09-30 17:00:04] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 5 is still RUNNING continuing the experiment and retrying on next poll...


[INFO 09-30 17:00:04] ax.early_stopping.strategies.base: PercentileEarlyStoppingStrategy received empty data. Not stopping any trials.


[INFO 09-30 17:00:04] Scheduler: Waiting for completed trials (for 10 sec, currently running trials: 5).


[INFO 09-30 17:00:14] Scheduler: Fetching data for newly completed trials: [].


[INFO 09-30 17:00:14] Scheduler: Fetching data for trials: [0, 1, 3, 4, 5] because some metrics on experiment are available while trials are running.


[INFO 09-30 17:00:14] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fd46060>")


[INFO 09-30 17:00:14] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fc7eea0>")


[INFO 09-30 17:00:14] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fc7eea0>")


[INFO 09-30 17:00:14] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fc7eea0>")


[INFO 09-30 17:00:14] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fc7eea0>")


[ERROR 09-30 17:00:14] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fd46060>"). Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:00:14] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fc7eea0>"). Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:00:14] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fc7eea0>"). Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:00:14] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fc7eea0>"). Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:00:14] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fc7eea0>"). Ignoring for now -- will retry query on next call to fetch.




[INFO 09-30 17:00:14] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 0 is still RUNNING continuing the experiment and retrying on next poll...




[INFO 09-30 17:00:14] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 1 is still RUNNING continuing the experiment and retrying on next poll...




[INFO 09-30 17:00:14] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 3 is still RUNNING continuing the experiment and retrying on next poll...




[INFO 09-30 17:00:14] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 4 is still RUNNING continuing the experiment and retrying on next poll...




[INFO 09-30 17:00:14] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 5 is still RUNNING continuing the experiment and retrying on next poll...


[INFO 09-30 17:00:14] ax.early_stopping.strategies.base: PercentileEarlyStoppingStrategy received empty data. Not stopping any trials.


[INFO 09-30 17:00:14] Scheduler: Waiting for completed trials (for 10 sec, currently running trials: 5).


[INFO 09-30 17:00:24] Scheduler: Fetching data for newly completed trials: [].


[INFO 09-30 17:00:24] Scheduler: Fetching data for trials: [0, 1, 3, 4, 5] because some metrics on experiment are available while trials are running.


[INFO 09-30 17:00:25] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x12350d7c0>")


[INFO 09-30 17:00:25] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fd474d0>")


[INFO 09-30 17:00:25] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fd474d0>")


[INFO 09-30 17:00:25] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fa47230>")


[INFO 09-30 17:00:25] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fa47230>")


[ERROR 09-30 17:00:25] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x12350d7c0>"). Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:00:25] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fd474d0>"). Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:00:25] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fd474d0>"). Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:00:25] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fa47230>"). Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:00:25] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fa47230>"). Ignoring for now -- will retry query on next call to fetch.




[INFO 09-30 17:00:25] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 0 is still RUNNING continuing the experiment and retrying on next poll...




[INFO 09-30 17:00:25] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 1 is still RUNNING continuing the experiment and retrying on next poll...




[INFO 09-30 17:00:25] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 3 is still RUNNING continuing the experiment and retrying on next poll...




[INFO 09-30 17:00:25] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 4 is still RUNNING continuing the experiment and retrying on next poll...




[INFO 09-30 17:00:25] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 5 is still RUNNING continuing the experiment and retrying on next poll...


[INFO 09-30 17:00:25] ax.early_stopping.strategies.base: PercentileEarlyStoppingStrategy received empty data. Not stopping any trials.


[INFO 09-30 17:00:25] Scheduler: Waiting for completed trials (for 10 sec, currently running trials: 5).


[INFO 09-30 17:00:35] Scheduler: Fetching data for newly completed trials: [].


[INFO 09-30 17:00:35] Scheduler: Fetching data for trials: [0, 1, 3, 4, 5] because some metrics on experiment are available while trials are running.


[INFO 09-30 17:00:35] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:00:35] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:00:35] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fd46ff0>")


[ERROR 09-30 17:00:35] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:00:35] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:00:35] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fd46ff0>"). Ignoring for now -- will retry query on next call to fetch.


  df = pd.concat(
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:00:35] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 0 is still RUNNING continuing the experiment and retrying on next poll...


with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:00:35] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 4 is still RUNNING continuing the experiment and retrying on next poll...




[INFO 09-30 17:00:35] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 5 is still RUNNING continuing the experiment and retrying on next poll...


  df = pd.concat(
[INFO 09-30 17:00:35] ax.early_stopping.strategies.base: The number of completed trials (0) is less than the minimum number of curves needed for early stopping (5). Not early stopping.


[INFO 09-30 17:00:35] Scheduler: Waiting for completed trials (for 10 sec, currently running trials: 5).


[INFO 09-30 17:00:45] Scheduler: Fetching data for newly completed trials: [].


[INFO 09-30 17:00:45] Scheduler: Fetching data for trials: [0, 1, 3, 4, 5] because some metrics on experiment are available while trials are running.


[INFO 09-30 17:00:45] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:00:45] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[ERROR 09-30 17:00:45] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:00:45] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


  df = pd.concat(
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:00:45] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 0 is still RUNNING continuing the experiment and retrying on next poll...


with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:00:45] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 4 is still RUNNING continuing the experiment and retrying on next poll...


  df = pd.concat(
[INFO 09-30 17:00:45] ax.early_stopping.strategies.base: The number of completed trials (0) is less than the minimum number of curves needed for early stopping (5). Not early stopping.


[INFO 09-30 17:00:45] Scheduler: Waiting for completed trials (for 10 sec, currently running trials: 5).


[INFO 09-30 17:00:55] Scheduler: Fetching data for newly completed trials: [].


[INFO 09-30 17:00:55] Scheduler: Fetching data for trials: [0, 1, 3, 4, 5] because some metrics on experiment are available while trials are running.


[INFO 09-30 17:00:55] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:00:55] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[ERROR 09-30 17:00:55] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:00:55] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


  df = pd.concat(
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:00:55] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 0 is still RUNNING continuing the experiment and retrying on next poll...


with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:00:55] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 4 is still RUNNING continuing the experiment and retrying on next poll...


  df = pd.concat(
[INFO 09-30 17:00:55] ax.early_stopping.strategies.base: The number of completed trials (0) is less than the minimum number of curves needed for early stopping (5). Not early stopping.


[INFO 09-30 17:00:55] Scheduler: Waiting for completed trials (for 10 sec, currently running trials: 5).


[INFO 09-30 17:01:05] Scheduler: Fetching data for newly completed trials: [].


[INFO 09-30 17:01:05] Scheduler: Fetching data for trials: [0, 1, 3, 4, 5] because some metrics on experiment are available while trials are running.


[INFO 09-30 17:01:05] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:01:05] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[ERROR 09-30 17:01:05] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:01:05] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


  df = pd.concat(
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:01:05] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 0 is still RUNNING continuing the experiment and retrying on next poll...


with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:01:05] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 4 is still RUNNING continuing the experiment and retrying on next poll...


  df = pd.concat(
[INFO 09-30 17:01:05] ax.early_stopping.strategies.base: The number of completed trials (0) is less than the minimum number of curves needed for early stopping (5). Not early stopping.


[INFO 09-30 17:01:05] Scheduler: Waiting for completed trials (for 10 sec, currently running trials: 5).


[INFO 09-30 17:01:15] Scheduler: Fetching data for newly completed trials: [].


[INFO 09-30 17:01:15] Scheduler: Fetching data for trials: [0, 1, 3, 4, 5] because some metrics on experiment are available while trials are running.


[INFO 09-30 17:01:15] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:01:15] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[ERROR 09-30 17:01:15] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:01:15] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


  df = pd.concat(
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:01:15] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 0 is still RUNNING continuing the experiment and retrying on next poll...


with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:01:15] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 4 is still RUNNING continuing the experiment and retrying on next poll...


  df = pd.concat(
[INFO 09-30 17:01:15] ax.early_stopping.strategies.base: The number of completed trials (0) is less than the minimum number of curves needed for early stopping (5). Not early stopping.


[INFO 09-30 17:01:15] Scheduler: Waiting for completed trials (for 10 sec, currently running trials: 5).


[INFO 09-30 17:01:25] Scheduler: Fetching data for newly completed trials: [].


[INFO 09-30 17:01:25] Scheduler: Fetching data for trials: [0, 1, 3, 4, 5] because some metrics on experiment are available while trials are running.


[INFO 09-30 17:01:25] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:01:25] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[ERROR 09-30 17:01:25] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:01:25] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


  df = pd.concat(
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:01:25] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 0 is still RUNNING continuing the experiment and retrying on next poll...


with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:01:25] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 4 is still RUNNING continuing the experiment and retrying on next poll...


  df = pd.concat(
[INFO 09-30 17:01:25] ax.early_stopping.strategies.base: The number of completed trials (0) is less than the minimum number of curves needed for early stopping (5). Not early stopping.


[INFO 09-30 17:01:25] Scheduler: Waiting for completed trials (for 10 sec, currently running trials: 5).


[INFO 09-30 17:01:35] Scheduler: Fetching data for newly completed trials: [].


[INFO 09-30 17:01:35] Scheduler: Fetching data for trials: [0, 1, 3, 4, 5] because some metrics on experiment are available while trials are running.


[INFO 09-30 17:01:35] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:01:35] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:01:36] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[ERROR 09-30 17:01:36] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:01:36] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:01:36] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


  df = pd.concat(
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:01:36] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 0 is still RUNNING continuing the experiment and retrying on next poll...


with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:01:36] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 1 is still RUNNING continuing the experiment and retrying on next poll...


with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:01:36] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 4 is still RUNNING continuing the experiment and retrying on next poll...


  df = pd.concat(
[INFO 09-30 17:01:36] ax.early_stopping.strategies.base: The number of completed trials (0) is less than the minimum number of curves needed for early stopping (5). Not early stopping.


[INFO 09-30 17:01:36] Scheduler: Waiting for completed trials (for 10 sec, currently running trials: 5).


[INFO 09-30 17:01:46] Scheduler: Fetching data for newly completed trials: [].


[INFO 09-30 17:01:46] Scheduler: Fetching data for trials: [0, 1, 3, 4, 5] because some metrics on experiment are available while trials are running.


[INFO 09-30 17:01:46] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:01:46] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:01:46] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[ERROR 09-30 17:01:46] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:01:46] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:01:46] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


  df = pd.concat(
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:01:46] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 0 is still RUNNING continuing the experiment and retrying on next poll...


with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:01:46] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 1 is still RUNNING continuing the experiment and retrying on next poll...


with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:01:46] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 4 is still RUNNING continuing the experiment and retrying on next poll...


  df = pd.concat(
[INFO 09-30 17:01:46] ax.early_stopping.strategies.base: The number of completed trials (0) is less than the minimum number of curves needed for early stopping (5). Not early stopping.


[INFO 09-30 17:01:46] Scheduler: Waiting for completed trials (for 10 sec, currently running trials: 5).


[INFO 09-30 17:01:56] Scheduler: Fetching data for newly completed trials: [].


[INFO 09-30 17:01:56] Scheduler: Fetching data for trials: [0, 1, 3, 4, 5] because some metrics on experiment are available while trials are running.


[INFO 09-30 17:01:56] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:01:56] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:01:56] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[ERROR 09-30 17:01:56] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:01:56] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:01:56] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


  df = pd.concat(
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:01:56] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 0 is still RUNNING continuing the experiment and retrying on next poll...


with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:01:56] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 1 is still RUNNING continuing the experiment and retrying on next poll...


with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:01:56] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 4 is still RUNNING continuing the experiment and retrying on next poll...


  df = pd.concat(
[INFO 09-30 17:01:56] ax.early_stopping.strategies.base: The number of completed trials (0) is less than the minimum number of curves needed for early stopping (5). Not early stopping.


[INFO 09-30 17:01:56] Scheduler: Waiting for completed trials (for 10 sec, currently running trials: 5).


[INFO 09-30 17:02:06] Scheduler: Fetching data for newly completed trials: [].


[INFO 09-30 17:02:06] Scheduler: Fetching data for trials: [0, 1, 3, 4, 5] because some metrics on experiment are available while trials are running.


[INFO 09-30 17:02:06] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:02:06] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:02:06] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[ERROR 09-30 17:02:06] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:02:06] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:02:06] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


  df = pd.concat(
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:02:06] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 0 is still RUNNING continuing the experiment and retrying on next poll...


with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:02:06] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 1 is still RUNNING continuing the experiment and retrying on next poll...


with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:02:06] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 4 is still RUNNING continuing the experiment and retrying on next poll...


  df = pd.concat(
[INFO 09-30 17:02:06] ax.early_stopping.strategies.base: The number of completed trials (0) is less than the minimum number of curves needed for early stopping (5). Not early stopping.


[INFO 09-30 17:02:06] Scheduler: Waiting for completed trials (for 10 sec, currently running trials: 5).


[INFO 09-30 17:02:16] Scheduler: Fetching data for newly completed trials: [].


[INFO 09-30 17:02:16] Scheduler: Fetching data for trials: [0, 1, 3, 4, 5] because some metrics on experiment are available while trials are running.


[INFO 09-30 17:02:16] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:02:16] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:02:16] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[ERROR 09-30 17:02:16] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:02:16] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:02:16] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


  df = pd.concat(
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:02:16] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 0 is still RUNNING continuing the experiment and retrying on next poll...


with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:02:16] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 1 is still RUNNING continuing the experiment and retrying on next poll...


with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:02:16] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 4 is still RUNNING continuing the experiment and retrying on next poll...


  df = pd.concat(
[INFO 09-30 17:02:16] ax.early_stopping.strategies.base: The number of completed trials (0) is less than the minimum number of curves needed for early stopping (5). Not early stopping.


[INFO 09-30 17:02:16] Scheduler: Waiting for completed trials (for 10 sec, currently running trials: 5).


[INFO 09-30 17:02:26] Scheduler: Fetching data for newly completed trials: [].


[INFO 09-30 17:02:26] Scheduler: Fetching data for trials: [0, 1, 3, 4, 5] because some metrics on experiment are available while trials are running.


[INFO 09-30 17:02:26] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:02:26] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:02:26] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[ERROR 09-30 17:02:26] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:02:26] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:02:26] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


  df = pd.concat(
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:02:26] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 0 is still RUNNING continuing the experiment and retrying on next poll...


with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:02:26] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 1 is still RUNNING continuing the experiment and retrying on next poll...


with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:02:26] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 4 is still RUNNING continuing the experiment and retrying on next poll...


  df = pd.concat(
[INFO 09-30 17:02:26] ax.early_stopping.strategies.base: The number of completed trials (0) is less than the minimum number of curves needed for early stopping (5). Not early stopping.


[INFO 09-30 17:02:26] Scheduler: Waiting for completed trials (for 10 sec, currently running trials: 5).


[INFO 09-30 17:02:36] Scheduler: Fetching data for newly completed trials: [].


[INFO 09-30 17:02:36] Scheduler: Fetching data for trials: [0, 1, 3, 4, 5] because some metrics on experiment are available while trials are running.


[INFO 09-30 17:02:36] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:02:37] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:02:37] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[ERROR 09-30 17:02:37] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:02:37] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:02:37] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


  df = pd.concat(
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:02:37] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 0 is still RUNNING continuing the experiment and retrying on next poll...


with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:02:37] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 1 is still RUNNING continuing the experiment and retrying on next poll...


with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:02:37] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 4 is still RUNNING continuing the experiment and retrying on next poll...


  df = pd.concat(
[INFO 09-30 17:02:37] ax.early_stopping.strategies.base: The number of completed trials (0) is less than the minimum number of curves needed for early stopping (5). Not early stopping.


[INFO 09-30 17:02:37] Scheduler: Waiting for completed trials (for 10 sec, currently running trials: 5).


[INFO 09-30 17:02:47] Scheduler: Fetching data for newly completed trials: [].


[INFO 09-30 17:02:47] Scheduler: Fetching data for trials: [0, 1, 3, 4, 5] because some metrics on experiment are available while trials are running.


[INFO 09-30 17:02:47] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:02:47] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:02:47] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[ERROR 09-30 17:02:47] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:02:47] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:02:47] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


  df = pd.concat(
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:02:47] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 0 is still RUNNING continuing the experiment and retrying on next poll...


with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:02:47] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 1 is still RUNNING continuing the experiment and retrying on next poll...


with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:02:47] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 4 is still RUNNING continuing the experiment and retrying on next poll...


  df = pd.concat(
[INFO 09-30 17:02:47] ax.early_stopping.strategies.base: The number of completed trials (0) is less than the minimum number of curves needed for early stopping (5). Not early stopping.


[INFO 09-30 17:02:47] Scheduler: Waiting for completed trials (for 10 sec, currently running trials: 5).


[INFO 09-30 17:02:57] Scheduler: Fetching data for newly completed trials: [].


[INFO 09-30 17:02:57] Scheduler: Fetching data for trials: [0, 1, 3, 4, 5] because some metrics on experiment are available while trials are running.


[INFO 09-30 17:02:57] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:02:57] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:02:57] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[ERROR 09-30 17:02:57] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:02:57] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:02:57] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


  df = pd.concat(
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:02:57] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 0 is still RUNNING continuing the experiment and retrying on next poll...


with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:02:57] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 1 is still RUNNING continuing the experiment and retrying on next poll...


with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:02:57] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 4 is still RUNNING continuing the experiment and retrying on next poll...


  df = pd.concat(
[INFO 09-30 17:02:57] ax.early_stopping.strategies.base: The number of completed trials (0) is less than the minimum number of curves needed for early stopping (5). Not early stopping.


[INFO 09-30 17:02:57] Scheduler: Waiting for completed trials (for 10 sec, currently running trials: 5).


[INFO 09-30 17:03:07] Scheduler: Fetching data for newly completed trials: [].


[INFO 09-30 17:03:07] Scheduler: Fetching data for trials: [0, 1, 3, 4, 5] because some metrics on experiment are available while trials are running.


[INFO 09-30 17:03:07] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:03:07] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:03:07] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[ERROR 09-30 17:03:07] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:03:07] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:03:07] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


  df = pd.concat(
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:03:07] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 0 is still RUNNING continuing the experiment and retrying on next poll...


with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:03:07] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 1 is still RUNNING continuing the experiment and retrying on next poll...


with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:03:07] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 4 is still RUNNING continuing the experiment and retrying on next poll...


  df = pd.concat(
[INFO 09-30 17:03:07] ax.early_stopping.strategies.base: The number of completed trials (0) is less than the minimum number of curves needed for early stopping (5). Not early stopping.


[INFO 09-30 17:03:07] Scheduler: Waiting for completed trials (for 10 sec, currently running trials: 5).


[INFO 09-30 17:03:17] Scheduler: Fetching data for newly completed trials: [].


[INFO 09-30 17:03:17] Scheduler: Fetching data for trials: [0, 1, 3, 4, 5] because some metrics on experiment are available while trials are running.


[INFO 09-30 17:03:17] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:03:17] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:03:17] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[ERROR 09-30 17:03:17] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:03:17] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:03:17] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


  df = pd.concat(
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:03:17] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 0 is still RUNNING continuing the experiment and retrying on next poll...


with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:03:17] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 1 is still RUNNING continuing the experiment and retrying on next poll...


with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:03:17] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 4 is still RUNNING continuing the experiment and retrying on next poll...


  df = pd.concat(
[INFO 09-30 17:03:17] ax.early_stopping.strategies.base: The number of completed trials (0) is less than the minimum number of curves needed for early stopping (5). Not early stopping.


[INFO 09-30 17:03:17] Scheduler: Waiting for completed trials (for 10 sec, currently running trials: 5).


[INFO 09-30 17:03:27] Scheduler: Fetching data for newly completed trials: [].


[INFO 09-30 17:03:27] Scheduler: Fetching data for trials: [0, 1, 3, 4, 5] because some metrics on experiment are available while trials are running.


[INFO 09-30 17:03:27] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:03:27] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:03:27] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[ERROR 09-30 17:03:27] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:03:27] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:03:27] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


  df = pd.concat(
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:03:27] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 0 is still RUNNING continuing the experiment and retrying on next poll...


with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:03:27] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 1 is still RUNNING continuing the experiment and retrying on next poll...


with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:03:27] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 4 is still RUNNING continuing the experiment and retrying on next poll...


  df = pd.concat(
[INFO 09-30 17:03:27] ax.early_stopping.strategies.base: The number of completed trials (0) is less than the minimum number of curves needed for early stopping (5). Not early stopping.


[INFO 09-30 17:03:27] Scheduler: Waiting for completed trials (for 10 sec, currently running trials: 5).


[INFO 09-30 17:03:37] Scheduler: Fetching data for newly completed trials: [1, 3].


[INFO 09-30 17:03:37] Scheduler: Fetching data for trials: [0, 4, 5] because some metrics on experiment are available while trials are running.


[INFO 09-30 17:03:37] Scheduler: Retrieved COMPLETED trials: [1, 3].


[INFO 09-30 17:03:37] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:03:37] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:03:37] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[ERROR 09-30 17:03:37] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:03:37] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:03:37] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


  df = pd.concat(
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:03:37] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 0 is still RUNNING continuing the experiment and retrying on next poll...


with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.




with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:03:37] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 4 is still RUNNING continuing the experiment and retrying on next poll...


  df = pd.concat(
[INFO 09-30 17:03:37] ax.early_stopping.strategies.base: The number of completed trials (1) is less than the minimum number of curves needed for early stopping (5). Not early stopping.


  df = pd.concat(
  warn("Encountered exception in computing model fit quality: " + str(e))
[INFO 09-30 17:03:38] Scheduler: Running trials [6]...


[INFO 09-30 17:03:39] Scheduler: Generated all trials that can be generated currently. Model requires more data to generate more trials.




[INFO 09-30 17:03:39] Scheduler: Fetching data for newly completed trials: [0].


[INFO 09-30 17:03:39] Scheduler: Fetching data for trials: 4 - 6 because some metrics on experiment are available while trials are running.


[INFO 09-30 17:03:39] Scheduler: Retrieved COMPLETED trials: [0].


[INFO 09-30 17:03:39] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:03:39] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:03:39] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x171d56cc0>")


[ERROR 09-30 17:03:39] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:03:39] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:03:39] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x171d56cc0>"). Ignoring for now -- will retry query on next call to fetch.


  df = pd.concat(
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.




with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:03:39] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 4 is still RUNNING continuing the experiment and retrying on next poll...




[INFO 09-30 17:03:39] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 6 is still RUNNING continuing the experiment and retrying on next poll...


  df = pd.concat(
[INFO 09-30 17:03:39] ax.early_stopping.strategies.base: The number of completed trials (1) is less than the minimum number of curves needed for early stopping (5). Not early stopping.


  df = pd.concat(
  warn("Encountered exception in computing model fit quality: " + str(e))
[INFO 09-30 17:03:39] Scheduler: Running trials [7]...


[INFO 09-30 17:03:39] Scheduler: Generated all trials that can be generated currently. Model requires more data to generate more trials.




[INFO 09-30 17:03:39] Scheduler: Fetching data for newly completed trials: [].


[INFO 09-30 17:03:39] Scheduler: Fetching data for trials: 4 - 7 because some metrics on experiment are available while trials are running.


[INFO 09-30 17:03:39] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:03:39] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x171d3a480>")


[INFO 09-30 17:03:39] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x171d3a480>")


[ERROR 09-30 17:03:39] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:03:39] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x171d3a480>"). Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:03:39] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x171d3a480>"). Ignoring for now -- will retry query on next call to fetch.


  df = pd.concat(
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.


[INFO 09-30 17:03:39] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 4 is still RUNNING continuing the experiment and retrying on next poll...




[INFO 09-30 17:03:39] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 6 is still RUNNING continuing the experiment and retrying on next poll...




[INFO 09-30 17:03:39] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 7 is still RUNNING continuing the experiment and retrying on next poll...


  df = pd.concat(
[INFO 09-30 17:03:39] ax.early_stopping.strategies.base: The number of completed trials (1) is less than the minimum number of curves needed for early stopping (5). Not early stopping.


[INFO 09-30 17:03:39] Scheduler: Waiting for completed trials (for 10 sec, currently running trials: 4).


[INFO 09-30 17:03:49] Scheduler: Fetching data for newly completed trials: 4 - 5.


[INFO 09-30 17:03:49] Scheduler: Fetching data for trials: 6 - 7 because some metrics on experiment are available while trials are running.


[INFO 09-30 17:03:49] Scheduler: Retrieved COMPLETED trials: 4 - 5.


[INFO 09-30 17:03:49] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:03:49] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data



[INFO 09-30 17:03:49] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fa46cc0>")


[INFO 09-30 17:03:49] ax.core.metric: MetricFetchE INFO: Initialized MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fa46cc0>")


[ERROR 09-30 17:03:49] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:03:49] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="Failed to fetch data for val_acc", exception=Found NaNs or Infs in data)
with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
. Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:03:49] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fa46cc0>"). Ignoring for now -- will retry query on next call to fetch.


[ERROR 09-30 17:03:49] ax.core.experiment: Discovered Metric fetching Err while attaching data MetricFetchE(message="No 'scalar' data found for trial in multiplexer mul=<tensorboard.backend.event_processing.plugin_event_multiplexer.EventMultiplexer object at 0x30fa46cc0>"). Ignoring for now -- will retry query on next call to fetch.


with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.




with Traceback:
 Traceback (most recent call last):
  File "/Users/cristianlara/Projects/Ax-1.0/ax/metrics/tensorboard.py", line 173, in bulk_fetch_trial_data
    raise ValueError("Found NaNs or Infs in data")
ValueError: Found NaNs or Infs in data
.






[INFO 09-30 17:03:49] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 6 is still RUNNING continuing the experiment and retrying on next poll...




[INFO 09-30 17:03:49] Scheduler: MetricFetchE INFO: Because val_acc is available_while_running and trial 7 is still RUNNING continuing the experiment and retrying on next poll...


  df = pd.concat(
[INFO 09-30 17:03:49] ax.early_stopping.strategies.base: The number of completed trials (1) is less than the minimum number of curves needed for early stopping (5). Not early stopping.




FailureRateExceededError: Failure rate exceeds the tolerated trial failure rate of 0.5 (at least 5 out of first 6 trials failed or were abandoned). Checks are triggered both at the end of a optimization and if at least 5 trials have either failed, or have been abandoned, potentially automatically due to issues with the trial.

## Results

First, we examine the data stored on the experiment. This shows that each trial is associated with an entire learning curve, represented by the column "steps".

In [15]:
experiment.lookup_data().map_df.head(n=10)

  df = pd.concat(


Unnamed: 0,arm_name,metric_name,mean,sem,trial_index,step
0,1_0,val_acc,-5.518561e+29,,1,1874.0
1,1_0,val_acc,3.6392360000000003e+31,,1,3749.0
2,1_0,val_acc,1.32168e+31,,1,5624.0
3,3_0,val_acc,-5.522221e+29,,3,1874.0
4,3_0,val_acc,-4.508694e+29,,3,3749.0
5,3_0,val_acc,-4.2754979999999995e+29,,3,5624.0
6,3_0,val_acc,-5.0481579999999995e+29,,3,7499.0
7,3_0,val_acc,-5.3328929999999994e+29,,3,9374.0
8,3_0,val_acc,-2.8559e+29,,3,11249.0
9,3_0,val_acc,-4.429684e+29,,3,13124.0


Below is a summary of the experiment, showing that a portion of trials have been early stopped.

In [16]:
exp_to_df(experiment)

  df = pd.concat(


Unnamed: 0,trial_index,arm_name,trial_status,generation_method,val_acc,hidden_size_1,hidden_size_2,learning_rate,dropout
0,0,0_0,FAILED,Sobol,,22,85,0.009309,0.212433
1,1,1_0,FAILED,Sobol,1.32168e+31,92,23,0.000652,0.397744
2,2,2_0,FAILED,Sobol,,67,59,0.0026,0.113261
3,3,3_0,COMPLETED,Sobol,-4.844701e+29,29,34,0.000182,0.309786
4,4,4_0,FAILED,Sobol,,42,56,0.000362,0.338057
5,5,5_0,FAILED,Sobol,2.957741e+30,47,36,0.004476,0.020423
6,6,6_0,RUNNING,Sobol,,100,118,0.00013,0.484715
7,7,7_0,RUNNING,Sobol,,20,17,0.001605,0.186138


We can give a very rough estimate of the amount of computational savings due to early stopping, by looking at the total number of steps used when early stopping is used versus the number of steps used if we ran all trials to completion. Note to do a true comparison, one should run full HPO loops with and without early stopping (as early stopping will influence the model and future points selected by the generation strategy). 

In [17]:
map_df = experiment.lookup_data().map_df
trial_to_max_steps = map_df.groupby("trial_index")["step"].max()
completed_trial_steps = trial_to_max_steps.iloc[0]
savings = 1.0 - trial_to_max_steps.sum() / (
    completed_trial_steps * len(trial_to_max_steps)
)
# TODO format nicer
print(f"A rough estimate of the computational savings is {100 * savings}%.")

A rough estimate of the computational savings is -144.4760550023708%.


  df = pd.concat(


## Visualizations

Finally, we show a visualization of learning curves versus actual elapsed wall time. This helps to illustrate that stopped trials make room for additional trials to be run.

In [18]:
# helper function for getting trial start times
def time_started(row):
    trial_index = row["trial_index"]
    return experiment.trials[trial_index].time_run_started


# helper function for getting trial completion times
def time_completed(row):
    trial_index = row["trial_index"]
    return experiment.trials[trial_index].time_completed


# helper function for getting relevant data from experiment
# with early stopping into useful dfs
def early_stopping_exp_to_df(experiment):
    trials_df = exp_to_df(experiment)
    curve_df = experiment.lookup_data().map_df
    training_row_df = (
        curve_df.groupby("trial_index").max().reset_index()[["trial_index", "steps"]]
    )
    trials_df = trials_df.merge(training_row_df, on="trial_index")
    trials_df["time_started"] = trials_df.apply(func=time_started, axis=1)
    trials_df["time_completed"] = trials_df.apply(func=time_completed, axis=1)
    start_time = trials_df["time_started"].min()
    trials_df["time_started_rel"] = (
        trials_df["time_started"] - start_time
    ).dt.total_seconds()
    trials_df["time_completed_rel"] = (
        trials_df["time_completed"] - start_time
    ).dt.total_seconds()
    return trials_df, curve_df


def plot_curves_by_wall_time(trials_df, curve_df):
    trials = set(curve_df["trial_index"])
    fig, ax = plt.subplots(1, 1, figsize=(10, 6))
    ax.set(xlabel="seconds since start", ylabel="validation accuracy")
    for trial_index in trials:
        this_trial_df = curve_df[curve_df["trial_index"] == trial_index]
        start_time_rel = trials_df["time_started_rel"].iloc[trial_index]
        completed_time_rel = trials_df["time_completed_rel"].iloc[trial_index]
        total_steps = trials_df.loc[trial_index, "steps"]
        smoothed_curve = this_trial_df["mean"].rolling(window=3).mean()
        x = (
            start_time_rel
            + (completed_time_rel - start_time_rel)
            / total_steps
            * this_trial_df["steps"]
        )
        ax.plot(
            x,
            smoothed_curve,
            label=f"trial #{trial_index}" if trial_index % 2 == 1 else None,
        )
    ax.legend()

In [19]:
# wrap in try/except in case of flaky I/O issues
try:
    trials_df, curve_df = early_stopping_exp_to_df(experiment)
    plot_curves_by_wall_time(trials_df, curve_df)
except Exception as e:
    print(f"Encountered exception while plotting results: {e}")

  df = pd.concat(


Encountered exception while plotting results: "['steps'] not in index"


  df = pd.concat(
