[tune] tune.choice, tune.qrandint not compatible with keras #11434

gregoruar · 2020-10-16T11:14:21Z

tune.choice, tune.qrandint not compatible with keras

I wanted to do hparam search based on tune functionalities. Unfortunately, usage of tune built-in methods did not work, for neurons it raised TypeError: int() argument must be a string, a bytes-like object or a number, not 'Integer', when I replaced function with simple int, debugger stumbled upon tune.choice next: TypeError: Expected float32, got <ray.tune.sample.Categorical object at 0x7fe7d1835a90> of type 'Categorical' instead.

config = {
    "learning_rate": tune.qloguniform(1e-4, 1e-1, 5e-5),
    "batch_size": tune.choice([32, 64, 128, 256]),
    "neurons1": tune.qrandint(32, 1024, 32),
    "neurons2": tune.qrandint(32, 1024, 32),
    "dropout": tune.choice([0.1, 0.2, 0.3,]),
}

The error reproduced on the script with mnist data, which I specially prepared and in the clean environment with newly installed ray. However, everything worked, when I launched hyperparameter hp for config creation:

from hyperopt import hp
config = {
    "learning_rate": hp.choice("learning_rate", [0.001, 0.0001]),
    "batch_size": tune.choice("batch_size", [32, 64, 128, 256]),
    "neurons1": hp.choice("neurons1", [32, 64]),
    "neurons2": hp.choice("neurons2", [32, 64]),
    "dropout": hp.choice("dropout", [0.1, 0.2, 0.3,]),
}

[mnist.txt](https://github.com/ray-project/ray/files/5391092/mnist.txt)

Here is the script:

import os

os.environ["CUDA_VISIBLE_DEVICES"] = "3,4,5"
os.environ["TF_XLA_FLAGS"] = "--tf_xla_cpu_global_jit"
# loglevel : 0 all printed, 1 I not printed, 2 I and W not printed, 3 nothing printed
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1'

from tensorflow import keras

import ray
from ray import tune
from ray.tune.schedulers import AsyncHyperBandScheduler
from ray.tune.suggest.hyperopt import HyperOptSearch
from ray.tune.suggest import ConcurrencyLimiter

ray.init(configure_logging=False)

EPOCHS = 20
num_samples = 100
experiment_name = "test_1"

config = {
    "learning_rate": tune.qloguniform(1e-4, 1e-1, 5e-5),
    "batch_size": tune.choice([32, 64, 128, 256]),
    "neurons1": tune.qrandint(32, 1024, 32),
    "neurons2": tune.qrandint(32, 1024, 32),
    "dropout": tune.choice([0.1, 0.2, 0.3,]),
}

class TuneReporter(keras.callbacks.Callback):
    """Tune Callback for Keras."""
    def on_epoch_end(self, epoch, logs=None):
        tune.report(keras_info=logs, val_loss=logs['val_loss'], val_accuracy=logs["val_accuracy"])



def trainer(config):

  # Load MNIST dataset as NumPy arrays
  (x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

  # Preprocess the data
  x_train = x_train.reshape(-1, 784).astype('float32') / 255
  x_test = x_test.reshape(-1, 784).astype('float32') / 255

  model = keras.Sequential([
    keras.layers.Dense(config["neurons1"], input_shape=(784,), activation='relu', name='dense_1'),
    keras.layers.Dropout(config['dropout']),
    keras.layers.Dense(config["neurons2"], activation='relu', name='dense_2'),
    keras.layers.Dense(10, activation='softmax', name='predictions'),
  ])

  model.compile(optimizer=optimizers.Adam(learning_rate = config['learning_rate']),
          loss=keras.losses.SparseCategoricalCrossentropy(),
          metrics=['accuracy'])

  earlystopping = keras.callbacks.EarlyStopping(monitor="val_loss", 
    patience=10,  
    min_delta=1e-4, 
    mode='min', 
    restore_best_weights=True, 
    verbose=1)

  tunerrep = TuneReporter()
  callbacks_ = [earlystopping, tunerrep,]



  history = model.fit(
    x_train,
    y_train,
    batch_size=config["batch_size"],
    validation_data=(x_test, y_test),
    epochs=EPOCHS,
    callbacks=callbacks_)

  return history


scheduler = AsyncHyperBandScheduler(time_attr='training_iteration',
                    metric="val_loss",
                    mode="min",
                    grace_period=10)

#Use bayesian optimisation with TPE implemented by hyperopt
search_alg = HyperOptSearch(config,
                metric="val_loss",
                mode="min",
                )

search_alg = ConcurrencyLimiter(search_alg, max_concurrent=4)

analysis = tune.run(trainer, 
          verbose=1,
          local_dir="ray_results",
          name=experiment_name, 
          num_samples=num_samples,
          search_alg=search_alg,
          scheduler=scheduler,
          raise_on_failed_trial=False,
          resources_per_trial={"cpu": 2, "gpu": 1},
          log_to_file=("stdout.log", "stderr.log"),
          fail_fast=True,
          )

best_config = analysis.get_best_config(metric="val_loss", mode='min')
print(f'Best config: {best_config}')

Here is the partial log of two errors:

Traceback (most recent call last):
(pid=82997)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/threading.py", line 916, in _bootstrap_inner
(pid=82997)     self.run()
(pid=82997)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/ray/tune/function_runner.py", line 246, in run
(pid=82997)     raise e
(pid=82997)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/ray/tune/function_runner.py", line 227, in run
(pid=82997)     self._entrypoint()
(pid=82997)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/ray/tune/function_runner.py", line 290, in entrypoint
(pid=82997)     self._status_reporter.get_checkpoint())
(pid=82997)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/ray/tune/function_runner.py", line 497, in _trainable_func
(pid=82997)     output = train_func(config)
(pid=82997)   File "test_ray.py", line 49, in trainer
(pid=82997)     keras.layers.Dense(config["neurons1"], input_shape=(784,), activation='relu', name='dense_1'),
(pid=82997)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/keras/layers/core.py", line 1081, in __init__
(pid=82997)     self.units = int(units) if not isinstance(units, int) else units
(pid=82997) TypeError: int() argument must be a string, a bytes-like object or a number, not 'Integer'
------------------------------------------------------------
(pid=84324) Traceback (most recent call last):
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/threading.py", line 916, in _bootstrap_inner
(pid=84324)     self.run()
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/ray/tune/function_runner.py", line 246, in run
(pid=84324)     raise e
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/ray/tune/function_runner.py", line 227, in run
(pid=84324)     self._entrypoint()
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/ray/tune/function_runner.py", line 290, in entrypoint
(pid=84324)     self._status_reporter.get_checkpoint())
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/ray/tune/function_runner.py", line 497, in _trainable_func
(pid=84324)     output = train_func(config)
(pid=84324)   File "test_ray.py", line 50, in trainer
(pid=84324)     keras.layers.Dense(10, activation='softmax', name='predictions'),
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/training/tracking/base.py", line 457, in _method_wrapper
(pid=84324)     result = method(self, *args, **kwargs)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/sequential.py", line 116, in __init__
(pid=84324)     self.add(layer)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/training/tracking/base.py", line 457, in _method_wrapper
(pid=84324)     result = method(self, *args, **kwargs)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/sequential.py", line 203, in add
(pid=84324)     output_tensor = layer(self.outputs[0])
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 773, in __call__
(pid=84324)     outputs = call_fn(cast_inputs, *args, **kwargs)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/keras/layers/core.py", line 183, in call
(pid=84324)     lambda: array_ops.identity(inputs))
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/keras/utils/tf_utils.py", line 59, in smart_cond
(pid=84324)     pred, true_fn=true_fn, false_fn=false_fn, name=name)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/framework/smart_cond.py", line 59, in smart_cond
(pid=84324)     name=name)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
(pid=84324)     return func(*args, **kwargs)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/ops/control_flow_ops.py", line 1174, in cond
(pid=84324)     return cond_v2.cond_v2(pred, true_fn, false_fn, name)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/ops/cond_v2.py", line 83, in cond_v2
(pid=84324)     op_return_value=pred)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/framework/func_graph.py", line 978, in func_graph_from_py_func
(pid=84324)     func_outputs = python_func(*func_args, **func_kwargs)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/keras/layers/core.py", line 179, in dropped_inputs
(pid=84324)     rate=self.rate)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
(pid=84324)     return func(*args, **kwargs)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/ops/nn_ops.py", line 4289, in dropout
(pid=84324)     return dropout_v2(x, rate, noise_shape=noise_shape, seed=seed, name=name)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/ops/nn_ops.py", line 4383, in dropout_v2
(pid=84324)     rate, dtype=x.dtype, name="rate")
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1314, in convert_to_tensor
(pid=84324)     ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 317, in _constant_tensor_conversion_function
(pid=84324)     return constant(v, dtype=dtype, name=name)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 258, in constant
(pid=84324)     allow_broadcast=True)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 296, in _constant_impl
(pid=84324)     allow_broadcast=allow_broadcast))
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/framework/tensor_util.py", line 451, in make_tensor_proto
(pid=84324)     _AssertCompatible(values, dtype)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/framework/tensor_util.py", line 331, in _AssertCompatible
(pid=84324)     (dtype.name, repr(mismatch), type(mismatch).__name__))
(pid=84324) TypeError: Expected float32, got <ray.tune.sample.Categorical object at 0x7fe7d1835a90> of type 'Categorical' instead.

Here is the conda yml env:

name: ray-test
channels:
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - _tflow_select=2.3.0=mkl
  - absl-py=0.10.0=py36_0
  - aiohttp=3.6.3=py36h7b6447c_0
  - astor=0.8.1=py36_0
  - async-timeout=3.0.1=py36_0
  - attrs=20.2.0=py_0
  - blas=1.0=mkl
  - blinker=1.4=py36_0
  - brotlipy=0.7.0=py36h7b6447c_1000
  - c-ares=1.16.1=h7b6447c_0
  - ca-certificates=2020.10.14=0
  - cachetools=4.1.1=py_0
  - certifi=2020.6.20=py36_0
  - cffi=1.14.3=py36he30daa8_0
  - chardet=3.0.4=py36_1003
  - click=7.1.2=py_0
  - cryptography=3.1.1=py36h1ba5d50_0
  - dataclasses=0.7=py36_0
  - gast=0.2.2=py36_0
  - google-auth=1.22.1=py_0
  - google-auth-oauthlib=0.4.1=py_2
  - google-pasta=0.2.0=py_0
  - grpcio=1.31.0=py36hf8bcb03_0
  - h5py=2.10.0=py36hd6299e0_1
  - hdf5=1.10.6=hb1b8bf9_0
  - idna=2.10=py_0
  - idna_ssl=1.1.0=py36_0
  - importlib-metadata=2.0.0=py_1
  - intel-openmp=2020.2=254
  - keras-applications=1.0.8=py_1
  - keras-preprocessing=1.1.0=py_1
  - ld_impl_linux-64=2.33.1=h53a641e_7
  - libedit=3.1.20191231=h14c3975_1
  - libffi=3.3=he6710b0_2
  - libgcc-ng=9.1.0=hdf63c60_0
  - libgfortran-ng=7.3.0=hdf63c60_0
  - libprotobuf=3.13.0.1=hd408876_0
  - libstdcxx-ng=9.1.0=hdf63c60_0
  - markdown=3.3.1=py36_0
  - mkl=2020.2=256
  - mkl-service=2.3.0=py36he904b0f_0
  - mkl_fft=1.2.0=py36h23d657b_0
  - mkl_random=1.1.1=py36h0573a6f_0
  - multidict=4.7.6=py36h7b6447c_1
  - ncurses=6.2=he6710b0_1
  - numpy=1.19.1=py36hbc911f0_0
  - numpy-base=1.19.1=py36hfa32c7d_0
  - oauthlib=3.1.0=py_0
  - openssl=1.1.1h=h7b6447c_0
  - opt_einsum=3.1.0=py_0
  - pandas=1.1.3=py36he6710b0_0
  - pip=20.2.3=py36_0
  - protobuf=3.13.0.1=py36he6710b0_1
  - pyasn1=0.4.8=py_0
  - pyasn1-modules=0.2.8=py_0
  - pycparser=2.20=py_2
  - pyjwt=1.7.1=py36_0
  - pyopenssl=19.1.0=py_1
  - pysocks=1.7.1=py36_0
  - python=3.6.12=hcff3b4d_2
  - python-dateutil=2.8.1=py_0
  - pytz=2020.1=py_0
  - readline=8.0=h7b6447c_0
  - requests=2.24.0=py_0
  - requests-oauthlib=1.3.0=py_0
  - rsa=4.6=py_0
  - scipy=1.5.2=py36h0b6359f_0
  - setuptools=50.3.0=py36hb0f4dca_1
  - six=1.15.0=py_0
  - sqlite=3.33.0=h62c20be_0
  - tensorboard=2.2.1=pyh532a8cf_0
  - tensorboard-plugin-wit=1.6.0=py_0
  - tensorflow=2.1.0=mkl_py36h23468d9_0
  - tensorflow-base=2.1.0=mkl_py36h6d63fb7_0
  - tensorflow-estimator=2.1.0=pyhd54b08b_0
  - termcolor=1.1.0=py36_1
  - tk=8.6.10=hbc83047_0
  - typing_extensions=3.7.4.3=py_0
  - urllib3=1.25.10=py_0
  - werkzeug=1.0.1=py_0
  - wheel=0.35.1=py_0
  - wrapt=1.12.1=py36h7b6447c_1
  - xz=5.2.5=h7b6447c_0
  - zipp=3.3.0=py_0
  - zlib=1.2.11=h7b6447c_3
  - pip:
    - aiohttp-cors==0.7.0
    - aioredis==1.3.1
    - beautifulsoup4==4.9.3
    - blessings==1.7
    - cloudpickle==1.6.0
    - colorama==0.4.4
    - colorful==0.5.4
    - contextvars==2.4
    - decorator==4.4.2
    - filelock==3.0.12
    - future==0.18.2
    - google==3.0.0
    - google-api-core==1.22.4
    - googleapis-common-protos==1.52.0
    - gpustat==0.6.0
    - hiredis==1.1.0
    - hyperopt==0.2.5
    - immutables==0.14
    - jsonschema==3.2.0
    - msgpack==1.0.0
    - networkx==2.5
    - nvidia-ml-py3==7.352.0
    - opencensus==0.7.11
    - opencensus-context==0.1.2
    - prometheus-client==0.8.0
    - psutil==5.7.2
    - py-spy==0.3.3
    - pyrsistent==0.17.3
    - pyyaml==5.3.1
    - ray==1.0.0
    - redis==3.4.1
    - soupsieve==2.0.1
    - tabulate==0.8.7
    - tensorboardx==2.1
    - tqdm==4.50.2
    - yarl==1.5.1

Ray version and other system information (Python version, TensorFlow version, OS):

ray==1.0.0
tensorflow=2.1.0=gpu_py36h2e5cdaa_0
python=3.6.10=h0371630_0
-os = CentOS Linux 7 (Core)

Reproduction (REQUIRED)

Please provide a script that can be run to reproduce the issue. The script should have no external library dependencies (i.e., use fake or mock data / environments):

If we cannot run your script, we cannot fix your issue.

I have verified my script runs in a clean environment and reproduces the issue.
I have verified the issue also occurs with the latest wheels.

The text was updated successfully, but these errors were encountered:

krfricke · 2020-10-20T15:55:35Z

Hi @gregoruar - first thank you for raising this issue and for the exhaustive reproduction script!

The error is due to passing a tune search space directly to HyperOptSearch - until now we only allowed automatic conversion of tune search space when passed as the tune.run(config) parameter.

So with

search_alg = HyperOptSearch(
                metric="val_loss",
                mode="min",
                )

search_alg = ConcurrencyLimiter(search_alg, max_concurrent=4)

analysis = tune.run(trainer, 
          config=config,
          verbose=1,
          local_dir="ray_results",
          name=experiment_name, 
          num_samples=num_samples,
          search_alg=search_alg,
          scheduler=scheduler,
          raise_on_failed_trial=False,
          resources_per_trial={"cpu": 2, "gpu": 1},
          log_to_file=("stdout.log", "stderr.log"),
          fail_fast=True,
          )

it should work (or run into the next error, which is that HyperOpt does not support lower bounds other than 0 for random integers...).

We addressed this in #11503 however, and once that PR is merged, it will be possible to pass tune search spaces directly to search algorithms.

Let me know if you need any more help.

gregoruar added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Oct 16, 2020

gregoruar changed the title ~~tune.choice, tune.qrandint not compatible with keras [tune]~~ [tune] tune.choice, tune.qrandint not compatible with keras Oct 16, 2020

richardliaw added P1 Issue that should be fixed within a few weeks tune Tune-related issues and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Oct 19, 2020

richardliaw assigned amogkam and krfricke and unassigned amogkam Oct 19, 2020

krfricke mentioned this issue Oct 20, 2020

[tune] allow tune search spaces to be passed to search algorithms #11503

Merged

6 tasks

richardliaw closed this as completed in #11503 Oct 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tune] tune.choice, tune.qrandint not compatible with keras #11434

[tune] tune.choice, tune.qrandint not compatible with keras #11434

gregoruar commented Oct 16, 2020

krfricke commented Oct 20, 2020

[tune] tune.choice, tune.qrandint not compatible with keras #11434

[tune] tune.choice, tune.qrandint not compatible with keras #11434

Comments

gregoruar commented Oct 16, 2020

tune.choice, tune.qrandint not compatible with keras

Reproduction (REQUIRED)

krfricke commented Oct 20, 2020