Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tune] tune.choice, tune.qrandint not compatible with keras #11434

Closed
2 tasks done
gregoruar opened this issue Oct 16, 2020 · 1 comment · Fixed by #11503
Closed
2 tasks done

[tune] tune.choice, tune.qrandint not compatible with keras #11434

gregoruar opened this issue Oct 16, 2020 · 1 comment · Fixed by #11503
Assignees
Labels
bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks tune Tune-related issues

Comments

@gregoruar
Copy link

tune.choice, tune.qrandint not compatible with keras

I wanted to do hparam search based on tune functionalities. Unfortunately, usage of tune built-in methods did not work, for neurons it raised TypeError: int() argument must be a string, a bytes-like object or a number, not 'Integer', when I replaced function with simple int, debugger stumbled upon tune.choice next: TypeError: Expected float32, got <ray.tune.sample.Categorical object at 0x7fe7d1835a90> of type 'Categorical' instead.

config = {
    "learning_rate": tune.qloguniform(1e-4, 1e-1, 5e-5),
    "batch_size": tune.choice([32, 64, 128, 256]),
    "neurons1": tune.qrandint(32, 1024, 32),
    "neurons2": tune.qrandint(32, 1024, 32),
    "dropout": tune.choice([0.1, 0.2, 0.3,]),
}

The error reproduced on the script with mnist data, which I specially prepared and in the clean environment with newly installed ray. However, everything worked, when I launched hyperparameter hp for config creation:

from hyperopt import hp
config = {
    "learning_rate": hp.choice("learning_rate", [0.001, 0.0001]),
    "batch_size": tune.choice("batch_size", [32, 64, 128, 256]),
    "neurons1": hp.choice("neurons1", [32, 64]),
    "neurons2": hp.choice("neurons2", [32, 64]),
    "dropout": hp.choice("dropout", [0.1, 0.2, 0.3,]),
}

[mnist.txt](https://github.com/ray-project/ray/files/5391092/mnist.txt)

Here is the script:

import os

os.environ["CUDA_VISIBLE_DEVICES"] = "3,4,5"
os.environ["TF_XLA_FLAGS"] = "--tf_xla_cpu_global_jit"
# loglevel : 0 all printed, 1 I not printed, 2 I and W not printed, 3 nothing printed
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1'

from tensorflow import keras

import ray
from ray import tune
from ray.tune.schedulers import AsyncHyperBandScheduler
from ray.tune.suggest.hyperopt import HyperOptSearch
from ray.tune.suggest import ConcurrencyLimiter

ray.init(configure_logging=False)

EPOCHS = 20
num_samples = 100
experiment_name = "test_1"

config = {
    "learning_rate": tune.qloguniform(1e-4, 1e-1, 5e-5),
    "batch_size": tune.choice([32, 64, 128, 256]),
    "neurons1": tune.qrandint(32, 1024, 32),
    "neurons2": tune.qrandint(32, 1024, 32),
    "dropout": tune.choice([0.1, 0.2, 0.3,]),
}

class TuneReporter(keras.callbacks.Callback):
    """Tune Callback for Keras."""
    def on_epoch_end(self, epoch, logs=None):
        tune.report(keras_info=logs, val_loss=logs['val_loss'], val_accuracy=logs["val_accuracy"])



def trainer(config):

  # Load MNIST dataset as NumPy arrays
  (x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

  # Preprocess the data
  x_train = x_train.reshape(-1, 784).astype('float32') / 255
  x_test = x_test.reshape(-1, 784).astype('float32') / 255

  model = keras.Sequential([
    keras.layers.Dense(config["neurons1"], input_shape=(784,), activation='relu', name='dense_1'),
    keras.layers.Dropout(config['dropout']),
    keras.layers.Dense(config["neurons2"], activation='relu', name='dense_2'),
    keras.layers.Dense(10, activation='softmax', name='predictions'),
  ])

  model.compile(optimizer=optimizers.Adam(learning_rate = config['learning_rate']),
          loss=keras.losses.SparseCategoricalCrossentropy(),
          metrics=['accuracy'])

  earlystopping = keras.callbacks.EarlyStopping(monitor="val_loss", 
    patience=10,  
    min_delta=1e-4, 
    mode='min', 
    restore_best_weights=True, 
    verbose=1)

  tunerrep = TuneReporter()
  callbacks_ = [earlystopping, tunerrep,]



  history = model.fit(
    x_train,
    y_train,
    batch_size=config["batch_size"],
    validation_data=(x_test, y_test),
    epochs=EPOCHS,
    callbacks=callbacks_)

  return history


scheduler = AsyncHyperBandScheduler(time_attr='training_iteration',
                    metric="val_loss",
                    mode="min",
                    grace_period=10)

#Use bayesian optimisation with TPE implemented by hyperopt
search_alg = HyperOptSearch(config,
                metric="val_loss",
                mode="min",
                )

search_alg = ConcurrencyLimiter(search_alg, max_concurrent=4)

analysis = tune.run(trainer, 
          verbose=1,
          local_dir="ray_results",
          name=experiment_name, 
          num_samples=num_samples,
          search_alg=search_alg,
          scheduler=scheduler,
          raise_on_failed_trial=False,
          resources_per_trial={"cpu": 2, "gpu": 1},
          log_to_file=("stdout.log", "stderr.log"),
          fail_fast=True,
          )

best_config = analysis.get_best_config(metric="val_loss", mode='min')
print(f'Best config: {best_config}')

Here is the partial log of two errors:

Traceback (most recent call last):
(pid=82997)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/threading.py", line 916, in _bootstrap_inner
(pid=82997)     self.run()
(pid=82997)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/ray/tune/function_runner.py", line 246, in run
(pid=82997)     raise e
(pid=82997)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/ray/tune/function_runner.py", line 227, in run
(pid=82997)     self._entrypoint()
(pid=82997)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/ray/tune/function_runner.py", line 290, in entrypoint
(pid=82997)     self._status_reporter.get_checkpoint())
(pid=82997)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/ray/tune/function_runner.py", line 497, in _trainable_func
(pid=82997)     output = train_func(config)
(pid=82997)   File "test_ray.py", line 49, in trainer
(pid=82997)     keras.layers.Dense(config["neurons1"], input_shape=(784,), activation='relu', name='dense_1'),
(pid=82997)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/keras/layers/core.py", line 1081, in __init__
(pid=82997)     self.units = int(units) if not isinstance(units, int) else units
(pid=82997) TypeError: int() argument must be a string, a bytes-like object or a number, not 'Integer'
------------------------------------------------------------
(pid=84324) Traceback (most recent call last):
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/threading.py", line 916, in _bootstrap_inner
(pid=84324)     self.run()
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/ray/tune/function_runner.py", line 246, in run
(pid=84324)     raise e
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/ray/tune/function_runner.py", line 227, in run
(pid=84324)     self._entrypoint()
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/ray/tune/function_runner.py", line 290, in entrypoint
(pid=84324)     self._status_reporter.get_checkpoint())
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/ray/tune/function_runner.py", line 497, in _trainable_func
(pid=84324)     output = train_func(config)
(pid=84324)   File "test_ray.py", line 50, in trainer
(pid=84324)     keras.layers.Dense(10, activation='softmax', name='predictions'),
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/training/tracking/base.py", line 457, in _method_wrapper
(pid=84324)     result = method(self, *args, **kwargs)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/sequential.py", line 116, in __init__
(pid=84324)     self.add(layer)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/training/tracking/base.py", line 457, in _method_wrapper
(pid=84324)     result = method(self, *args, **kwargs)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/sequential.py", line 203, in add
(pid=84324)     output_tensor = layer(self.outputs[0])
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 773, in __call__
(pid=84324)     outputs = call_fn(cast_inputs, *args, **kwargs)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/keras/layers/core.py", line 183, in call
(pid=84324)     lambda: array_ops.identity(inputs))
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/keras/utils/tf_utils.py", line 59, in smart_cond
(pid=84324)     pred, true_fn=true_fn, false_fn=false_fn, name=name)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/framework/smart_cond.py", line 59, in smart_cond
(pid=84324)     name=name)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
(pid=84324)     return func(*args, **kwargs)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/ops/control_flow_ops.py", line 1174, in cond
(pid=84324)     return cond_v2.cond_v2(pred, true_fn, false_fn, name)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/ops/cond_v2.py", line 83, in cond_v2
(pid=84324)     op_return_value=pred)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/framework/func_graph.py", line 978, in func_graph_from_py_func
(pid=84324)     func_outputs = python_func(*func_args, **func_kwargs)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/keras/layers/core.py", line 179, in dropped_inputs
(pid=84324)     rate=self.rate)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
(pid=84324)     return func(*args, **kwargs)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/ops/nn_ops.py", line 4289, in dropout
(pid=84324)     return dropout_v2(x, rate, noise_shape=noise_shape, seed=seed, name=name)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/ops/nn_ops.py", line 4383, in dropout_v2
(pid=84324)     rate, dtype=x.dtype, name="rate")
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1314, in convert_to_tensor
(pid=84324)     ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 317, in _constant_tensor_conversion_function
(pid=84324)     return constant(v, dtype=dtype, name=name)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 258, in constant
(pid=84324)     allow_broadcast=True)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 296, in _constant_impl
(pid=84324)     allow_broadcast=allow_broadcast))
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/framework/tensor_util.py", line 451, in make_tensor_proto
(pid=84324)     _AssertCompatible(values, dtype)
(pid=84324)   File "/home/gsukhorukov/.conda/envs/tf2/lib/python3.6/site-packages/tensorflow_core/python/framework/tensor_util.py", line 331, in _AssertCompatible
(pid=84324)     (dtype.name, repr(mismatch), type(mismatch).__name__))
(pid=84324) TypeError: Expected float32, got <ray.tune.sample.Categorical object at 0x7fe7d1835a90> of type 'Categorical' instead.

Here is the conda yml env:

name: ray-test
channels:
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - _tflow_select=2.3.0=mkl
  - absl-py=0.10.0=py36_0
  - aiohttp=3.6.3=py36h7b6447c_0
  - astor=0.8.1=py36_0
  - async-timeout=3.0.1=py36_0
  - attrs=20.2.0=py_0
  - blas=1.0=mkl
  - blinker=1.4=py36_0
  - brotlipy=0.7.0=py36h7b6447c_1000
  - c-ares=1.16.1=h7b6447c_0
  - ca-certificates=2020.10.14=0
  - cachetools=4.1.1=py_0
  - certifi=2020.6.20=py36_0
  - cffi=1.14.3=py36he30daa8_0
  - chardet=3.0.4=py36_1003
  - click=7.1.2=py_0
  - cryptography=3.1.1=py36h1ba5d50_0
  - dataclasses=0.7=py36_0
  - gast=0.2.2=py36_0
  - google-auth=1.22.1=py_0
  - google-auth-oauthlib=0.4.1=py_2
  - google-pasta=0.2.0=py_0
  - grpcio=1.31.0=py36hf8bcb03_0
  - h5py=2.10.0=py36hd6299e0_1
  - hdf5=1.10.6=hb1b8bf9_0
  - idna=2.10=py_0
  - idna_ssl=1.1.0=py36_0
  - importlib-metadata=2.0.0=py_1
  - intel-openmp=2020.2=254
  - keras-applications=1.0.8=py_1
  - keras-preprocessing=1.1.0=py_1
  - ld_impl_linux-64=2.33.1=h53a641e_7
  - libedit=3.1.20191231=h14c3975_1
  - libffi=3.3=he6710b0_2
  - libgcc-ng=9.1.0=hdf63c60_0
  - libgfortran-ng=7.3.0=hdf63c60_0
  - libprotobuf=3.13.0.1=hd408876_0
  - libstdcxx-ng=9.1.0=hdf63c60_0
  - markdown=3.3.1=py36_0
  - mkl=2020.2=256
  - mkl-service=2.3.0=py36he904b0f_0
  - mkl_fft=1.2.0=py36h23d657b_0
  - mkl_random=1.1.1=py36h0573a6f_0
  - multidict=4.7.6=py36h7b6447c_1
  - ncurses=6.2=he6710b0_1
  - numpy=1.19.1=py36hbc911f0_0
  - numpy-base=1.19.1=py36hfa32c7d_0
  - oauthlib=3.1.0=py_0
  - openssl=1.1.1h=h7b6447c_0
  - opt_einsum=3.1.0=py_0
  - pandas=1.1.3=py36he6710b0_0
  - pip=20.2.3=py36_0
  - protobuf=3.13.0.1=py36he6710b0_1
  - pyasn1=0.4.8=py_0
  - pyasn1-modules=0.2.8=py_0
  - pycparser=2.20=py_2
  - pyjwt=1.7.1=py36_0
  - pyopenssl=19.1.0=py_1
  - pysocks=1.7.1=py36_0
  - python=3.6.12=hcff3b4d_2
  - python-dateutil=2.8.1=py_0
  - pytz=2020.1=py_0
  - readline=8.0=h7b6447c_0
  - requests=2.24.0=py_0
  - requests-oauthlib=1.3.0=py_0
  - rsa=4.6=py_0
  - scipy=1.5.2=py36h0b6359f_0
  - setuptools=50.3.0=py36hb0f4dca_1
  - six=1.15.0=py_0
  - sqlite=3.33.0=h62c20be_0
  - tensorboard=2.2.1=pyh532a8cf_0
  - tensorboard-plugin-wit=1.6.0=py_0
  - tensorflow=2.1.0=mkl_py36h23468d9_0
  - tensorflow-base=2.1.0=mkl_py36h6d63fb7_0
  - tensorflow-estimator=2.1.0=pyhd54b08b_0
  - termcolor=1.1.0=py36_1
  - tk=8.6.10=hbc83047_0
  - typing_extensions=3.7.4.3=py_0
  - urllib3=1.25.10=py_0
  - werkzeug=1.0.1=py_0
  - wheel=0.35.1=py_0
  - wrapt=1.12.1=py36h7b6447c_1
  - xz=5.2.5=h7b6447c_0
  - zipp=3.3.0=py_0
  - zlib=1.2.11=h7b6447c_3
  - pip:
    - aiohttp-cors==0.7.0
    - aioredis==1.3.1
    - beautifulsoup4==4.9.3
    - blessings==1.7
    - cloudpickle==1.6.0
    - colorama==0.4.4
    - colorful==0.5.4
    - contextvars==2.4
    - decorator==4.4.2
    - filelock==3.0.12
    - future==0.18.2
    - google==3.0.0
    - google-api-core==1.22.4
    - googleapis-common-protos==1.52.0
    - gpustat==0.6.0
    - hiredis==1.1.0
    - hyperopt==0.2.5
    - immutables==0.14
    - jsonschema==3.2.0
    - msgpack==1.0.0
    - networkx==2.5
    - nvidia-ml-py3==7.352.0
    - opencensus==0.7.11
    - opencensus-context==0.1.2
    - prometheus-client==0.8.0
    - psutil==5.7.2
    - py-spy==0.3.3
    - pyrsistent==0.17.3
    - pyyaml==5.3.1
    - ray==1.0.0
    - redis==3.4.1
    - soupsieve==2.0.1
    - tabulate==0.8.7
    - tensorboardx==2.1
    - tqdm==4.50.2
    - yarl==1.5.1

Ray version and other system information (Python version, TensorFlow version, OS):

  • ray==1.0.0
  • tensorflow=2.1.0=gpu_py36h2e5cdaa_0
  • python=3.6.10=h0371630_0
    -os = CentOS Linux 7 (Core)

Reproduction (REQUIRED)

Please provide a script that can be run to reproduce the issue. The script should have no external library dependencies (i.e., use fake or mock data / environments):

If we cannot run your script, we cannot fix your issue.

  • I have verified my script runs in a clean environment and reproduces the issue.
  • I have verified the issue also occurs with the latest wheels.
@gregoruar gregoruar added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Oct 16, 2020
@gregoruar gregoruar changed the title tune.choice, tune.qrandint not compatible with keras [tune] [tune] tune.choice, tune.qrandint not compatible with keras Oct 16, 2020
@richardliaw richardliaw added P1 Issue that should be fixed within a few weeks tune Tune-related issues and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Oct 19, 2020
@richardliaw richardliaw assigned amogkam and krfricke and unassigned amogkam Oct 19, 2020
@krfricke
Copy link
Contributor

Hi @gregoruar - first thank you for raising this issue and for the exhaustive reproduction script!

The error is due to passing a tune search space directly to HyperOptSearch - until now we only allowed automatic conversion of tune search space when passed as the tune.run(config) parameter.

So with

search_alg = HyperOptSearch(
                metric="val_loss",
                mode="min",
                )

search_alg = ConcurrencyLimiter(search_alg, max_concurrent=4)

analysis = tune.run(trainer, 
          config=config,
          verbose=1,
          local_dir="ray_results",
          name=experiment_name, 
          num_samples=num_samples,
          search_alg=search_alg,
          scheduler=scheduler,
          raise_on_failed_trial=False,
          resources_per_trial={"cpu": 2, "gpu": 1},
          log_to_file=("stdout.log", "stderr.log"),
          fail_fast=True,
          )

it should work (or run into the next error, which is that HyperOpt does not support lower bounds other than 0 for random integers...).

We addressed this in #11503 however, and once that PR is merged, it will be possible to pass tune search spaces directly to search algorithms.

Let me know if you need any more help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks tune Tune-related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants