-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rllib] Custom environment observations always have the dtype float32 #7946
Comments
… for both observations and actions. This saves memory. Fix bug where actions of other environments are always included. NB: This version requires a ray bug to be patched: ray-project/ray#7946
It seems that incorrect sample dtypes are used in other places as well: |
Hi, I'm a bot from the Ray team :) To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months. If there is no further activity in the 14 days, the issue will be closed!
You can always ask for help on our discussion forum or Ray's public slack channel. |
Comment to remove stale label. |
Hi, I'm a bot from the Ray team :) To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months. If there is no further activity in the 14 days, the issue will be closed!
You can always ask for help on our discussion forum or Ray's public slack channel. |
This comment is to remove the stale label. |
Hi, I'm a bot from the Ray team :) To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months. If there is no further activity in the 14 days, the issue will be closed!
You can always ask for help on our discussion forum or Ray's public slack channel. |
This comment is to remove the stale label. |
closing because after running the repro script, it looks like we chased this bug away. |
ray[rllib] 0.8.3
Python 3.6
tensorflow-gpu 2.1.0
What is the problem?
When using a custom environment with an observation with a dtype other than float32, e.g. uint8, the observation type is changed to float32. This causes models that expect other data types to fail.
This happens due to a hardcoded float32 in dynamic_tf_policy.py.
When using a Dict observation space (a set of simpler gym spaces), it also happens in preprocessors.py.
Reproduction
Note: You will have to set the breakpoints in the affected lines yourself. I wanted to provide a simple custom model that takes in only uint8, but couldn't get it to work.
The text was updated successfully, but these errors were encountered: