Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] DreamerV3 fails on environments with continuous action spaces #39751

Closed
n-mat opened this issue Sep 19, 2023 · 2 comments · Fixed by #39786
Closed

[RLlib] DreamerV3 fails on environments with continuous action spaces #39751

n-mat opened this issue Sep 19, 2023 · 2 comments · Fixed by #39786
Assignees
Labels
bug Something that is supposed to be working; but isn't P0 Issues that should be fixed in short order rllib RLlib related issues

Comments

@n-mat
Copy link

n-mat commented Sep 19, 2023

What happened + What you expected to happen

Both the overview of algorithms and the README.md of dreamerv3 indicate the compatibility of the DreamerV3 algorithm with continuous action spaces. However, when applying the algorithm to environments with action spaces of the form spaces.Box(-2.0, 2.0, (1,), np.float32), it fails because 'Box' object has no attribute 'n' (see below).

(base) ray@rllib:/tmp/ray/rllib/tests$ python run_regression_tests.py --dir /tmp/ray/rllib/tuned_examples/dreamerv3/pendulum.py 
/home/ray/anaconda3/lib/python3.8/site-packages/tensorflow_probability/python/__init__.py:57: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  if (distutils.version.LooseVersion(tf.__version__) <
/home/ray/anaconda3/lib/python3.8/site-packages/tensorflow_probability/python/__init__.py:58: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  distutils.version.LooseVersion(required_tensorflow_version)):
2023-09-19 12:50:08,529	WARNING deprecation.py:50 -- DeprecationWarning: `DirectStepOptimizer` has been deprecated. This will raise an error in the future!
rllib dir=.
Will run the following regression tests:
-> /tmp/ray/rllib/tuned_examples/dreamerv3/pendulum.py
/home/ray/anaconda3/lib/python3.8/site-packages/gymnasium/spaces/box.py:130: UserWarning: WARN: Box bound precision lowered by casting to float32
  gym.logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
/home/ray/anaconda3/lib/python3.8/site-packages/gymnasium/utils/passive_env_checker.py:164: UserWarning: WARN: The obs returned by the `reset()` method was expecting numpy array dtype to be float32, actual type: float64
  logger.warn(
/home/ray/anaconda3/lib/python3.8/site-packages/gymnasium/utils/passive_env_checker.py:188: UserWarning: WARN: The obs returned by the `reset()` method is not within the observation space.
  logger.warn(f"{pre} is not within the observation space.")
2023-09-19 12:50:10,420	WARNING services.py:1889 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 2147483648 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=9.78gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.
2023-09-19 12:50:10,467	INFO worker.py:1633 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 
2023-09-19 12:50:11,339	WARNING deprecation.py:50 -- DeprecationWarning: `build_tf_policy` has been deprecated. This will raise an error in the future!
2023-09-19 12:50:11,340	WARNING deprecation.py:50 -- DeprecationWarning: `build_policy_class` has been deprecated. This will raise an error in the future!
2023-09-19 12:50:11,363	WARNING algorithm_config.py:2578 -- Setting `exploration_config={}` because you set `_enable_rl_module_api=True`. When RLModule API are enabled, exploration_config can not be set. If you want to implement custom exploration behaviour, please modify the `forward_exploration` method of the RLModule at hand. On configs that have a default exploration config, this must be done with `config.exploration_config={}`.
2023-09-19 12:50:11,381	WARNING algorithm_config.py:2578 -- Setting `exploration_config={}` because you set `_enable_rl_module_api=True`. When RLModule API are enabled, exploration_config can not be set. If you want to implement custom exploration behaviour, please modify the `forward_exploration` method of the RLModule at hand. On configs that have a default exploration config, this must be done with `config.exploration_config={}`.
2023-09-19 12:50:11,392	INFO tune.py:654 -- [output] This will use the new output engine with verbosity 2. To disable the new output and use the legacy output engine, set the environment variable RAY_AIR_NEW_OUTPUT=0. For more information, please see https://github.com/ray-project/ray/issues/36949
/home/ray/anaconda3/lib/python3.8/site-packages/jupyter_client/connect.py:27: DeprecationWarning: Jupyter is migrating its paths to use standard platformdirs
given by the platformdirs library.  To remove this warning and
see the appropriate new directories, set the environment variable
`JUPYTER_PLATFORM_DIRS=1` and then run `jupyter --paths`.
The use of platformdirs will be the default in `jupyter_core` v6
  from jupyter_core.paths import jupyter_data_dir
/home/ray/anaconda3/lib/python3.8/site-packages/comet_ml/monkey_patching.py:19: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
╭─────────────────────────────────────────────────────────────────────────────╮
│ Configuration for experiment     default_a483ec138ed44940a5a313c85cf72477   │
├─────────────────────────────────────────────────────────────────────────────┤
│ Search algorithm                 BasicVariantGenerator                      │
│ Scheduler                        FIFOScheduler                              │
│ Number of trials                 1                                          │
╰─────────────────────────────────────────────────────────────────────────────╯

View detailed results here: /home/ray/ray_results/default_a483ec138ed44940a5a313c85cf72477
To visualize your results with TensorBoard, run: `tensorboard --logdir /home/ray/ray_results/default_a483ec138ed44940a5a313c85cf72477`

Trial status: 1 PENDING
Current time: 2023-09-19 12:50:11. Total running time: 0s
Logical resource usage: 1.0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
╭──────────────────────────────────────────────╮
│ Trial name                          status   │
├──────────────────────────────────────────────┤
│ DreamerV3_Pendulum-v1_be5be_00000   PENDING  │
╰──────────────────────────────────────────────╯
(pid=6059) /home/ray/anaconda3/lib/python3.8/site-packages/tensorflow_probability/python/__init__.py:57: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
(pid=6059)   if (distutils.version.LooseVersion(tf.__version__) <
(pid=6059) /home/ray/anaconda3/lib/python3.8/site-packages/tensorflow_probability/python/__init__.py:58: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
(pid=6059)   distutils.version.LooseVersion(required_tensorflow_version)):
(pid=6059) DeprecationWarning: `DirectStepOptimizer` has been deprecated. This will raise an error in the future!
(pid=6059) /home/ray/anaconda3/lib/python3.8/site-packages/gymnasium/spaces/box.py:130: UserWarning: WARN: Box bound precision lowered by casting to float32
(pid=6059)   gym.logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
(pid=6059) /home/ray/anaconda3/lib/python3.8/site-packages/gymnasium/utils/passive_env_checker.py:164: UserWarning: WARN: The obs returned by the `reset()` method was expecting numpy array dtype to be float32, actual type: float64
(pid=6059)   logger.warn(
(pid=6059) /home/ray/anaconda3/lib/python3.8/site-packages/gymnasium/utils/passive_env_checker.py:188: UserWarning: WARN: The obs returned by the `reset()` method is not within the observation space.
(pid=6059)   logger.warn(f"{pre} is not within the observation space.")
(DreamerV3 pid=6059) 2023-09-19 12:50:15,907	WARNING algorithm_config.py:672 -- Cannot create DreamerV3Config from given `config_dict`! Property __stdout_file__ not supported.
2023-09-19 12:50:16,104	ERROR tune_controller.py:1502 -- Trial task failed for trial DreamerV3_Pendulum-v1_be5be_00000
Traceback (most recent call last):
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/air/execution/_internal/event_manager.py", line 110, in resolve_future
    result = ray.get(future)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/_private/auto_init_hook.py", line 24, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/_private/worker.py", line 2549, in get
    raise value
  File "python/ray/_raylet.pyx", line 1999, in ray._raylet.task_execution_handler
  File "python/ray/_raylet.pyx", line 1894, in ray._raylet.execute_task_with_cancellation_handler
  File "python/ray/_raylet.pyx", line 1558, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 1559, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 1791, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 910, in ray._raylet.store_task_errors
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::DreamerV3.__init__() (pid=6059, ip=192.168.178.38, actor_id=91d535a66288bcf6e02e675301000000, repr=DreamerV3)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/rllib/algorithms/algorithm.py", line 517, in __init__
    super().__init__(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/tune/trainable/trainable.py", line 185, in __init__
    self.setup(copy.deepcopy(self.config))
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/rllib/algorithms/dreamerv3/dreamerv3.py", line 504, in setup
    super().setup(config)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/rllib/algorithms/algorithm.py", line 767, in setup
    self.learner_group = learner_group_config.build()
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/rllib/core/learner/learner_group_config.py", line 102, in build
    return self.learner_group_class(learner_spec)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/rllib/core/learner/learner_group.py", line 97, in __init__
    self._learner.build()
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/rllib/core/learner/tf/tf_learner.py", line 406, in build
    super().build()
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/rllib/core/learner/learner.py", line 982, in build
    self._module = self._make_module()
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/rllib/core/learner/learner.py", line 1564, in _make_module
    module = self._module_spec.build()
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/rllib/core/rl_module/marl_module.py", line 462, in build
    module = self.marl_module_class(module_config)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/rllib/core/rl_module/rl_module.py", line 377, in new_init
    previous_init(self, *args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/rllib/core/rl_module/marl_module.py", line 58, in __init__
    super().__init__(config or MultiAgentRLModuleConfig())
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/rllib/core/rl_module/rl_module.py", line 369, in __init__
    self.setup()
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/rllib/core/rl_module/marl_module.py", line 65, in setup
    self._rl_modules[module_id] = module_spec.build()
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/rllib/core/rl_module/rl_module.py", line 104, in build
    module = self.module_class(module_config)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/rllib/core/rl_module/rl_module.py", line 377, in new_init
    previous_init(self, *args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/rllib/core/rl_module/rl_module.py", line 377, in new_init
    previous_init(self, *args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/rllib/core/rl_module/tf/tf_rl_module.py", line 18, in __init__
    RLModule.__init__(self, *args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/rllib/core/rl_module/rl_module.py", line 369, in __init__
    self.setup()
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/rllib/algorithms/dreamerv3/dreamerv3_rl_module.py", line 54, in setup
    self.world_model = WorldModel(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/rllib/algorithms/dreamerv3/tf/models/world_model.py", line 151, in __init__
    self.sequence_model = SequenceModel(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/rllib/algorithms/dreamerv3/tf/models/components/sequence_model.py", line 90, in __init__
    tf.TensorSpec(shape=[None, action_space.n], dtype=dl_type),
AttributeError: 'Box' object has no attribute 'n'

Trial DreamerV3_Pendulum-v1_be5be_00000 errored after 0 iterations at 2023-09-19 12:50:16. Total running time: 4s
Error file: /home/ray/ray_results/default_a483ec138ed44940a5a313c85cf72477/DreamerV3_Pendulum-v1_be5be_00000_0_2023-09-19_12-50-11/error.txt

Trial status: 1 ERROR
Current time: 2023-09-19 12:50:16. Total running time: 4s
Logical resource usage: 0/16 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
╭──────────────────────────────────────────────╮
│ Trial name                          status   │
├──────────────────────────────────────────────┤
│ DreamerV3_Pendulum-v1_be5be_00000   ERROR    │
╰──────────────────────────────────────────────╯

Number of errored trials: 1
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Trial name                            # failures   error file                                                                                                                       │
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ DreamerV3_Pendulum-v1_be5be_00000              1   /home/ray/ray_results/default_a483ec138ed44940a5a313c85cf72477/DreamerV3_Pendulum-v1_be5be_00000_0_2023-09-19_12-50-11/error.txt │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

2023-09-19 12:50:17,545	WARNING algorithm_config.py:2578 -- Setting `exploration_config={}` because you set `_enable_rl_module_api=True`. When RLModule API are enabled, exploration_config can not be set. If you want to implement custom exploration behaviour, please modify the `forward_exploration` method of the RLModule at hand. On configs that have a default exploration config, this must be done with `config.exploration_config={}`.
2023-09-19 12:50:17,547	WARNING algorithm_config.py:2578 -- Setting `exploration_config={}` because you set `_enable_rl_module_api=True`. When RLModule API are enabled, exploration_config can not be set. If you want to implement custom exploration behaviour, please modify the `forward_exploration` method of the RLModule at hand. On configs that have a default exploration config, this must be done with `config.exploration_config={}`.
Traceback (most recent call last):
  File "run_regression_tests.py", line 259, in <module>
    trials = run_experiments(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/tune/tune.py", line 1255, in run_experiments
    return run(
  File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/tune/tune.py", line 1137, in run
    raise TuneError("Trials did not complete", incomplete_trials)
ray.tune.error.TuneError: ('Trials did not complete', [DreamerV3_Pendulum-v1_be5be_00000])

Versions / Dependencies

I use the docker image rayproject/ray-ml:2.7.0-gpu and the files of the repository in the version of the ray-2.7.0 tag.

Reproduction script

In order to reproduce, one can execute the following as it is suggested in l. 13 of the tuned example pendulum.py:
python run_regression_tests.py --dir /tmp/ray/rllib/tuned_examples/dreamerv3/pendulum.py

Issue Severity

High: It blocks me from completing my task.

@n-mat n-mat added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Sep 19, 2023
@n-mat n-mat changed the title [RLlib] DreamerV3 fails on envs with continuous action spaces [RLlib] DreamerV3 fails on environments with continuous action spaces Sep 19, 2023
@sven1977 sven1977 self-assigned this Sep 20, 2023
@sven1977 sven1977 added P0 Issues that should be fixed in short order rllib RLlib related issues and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Sep 20, 2023
@sven1977
Copy link
Contributor

PR in flight:
#39772

@lyzyn
Copy link

lyzyn commented Oct 19, 2023

I have also encountered this problem. Have you resolved it?
2023-09-19 12:50:17,545 WARNING algorithm_config.py:2578 -- Setting exploration_config={} because you set _enable_rl_module_api=True. When RLModule API are enabled, exploration_config can not be set. If you want to implement custom exploration behaviour, please modify the forward_exploration method of the RLModule at hand. On configs that have a default exploration config, this must be done with config.exploration_config={}.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't P0 Issues that should be fixed in short order rllib RLlib related issues
Projects
None yet
3 participants