[rllib] Custom environment observations always have the dtype float32 #7946

internetcoffeephone · 2020-04-09T00:32:41Z

ray[rllib] 0.8.3
Python 3.6
tensorflow-gpu 2.1.0

What is the problem?

When using a custom environment with an observation with a dtype other than float32, e.g. uint8, the observation type is changed to float32. This causes models that expect other data types to fail.

This happens due to a hardcoded float32 in dynamic_tf_policy.py.

When using a Dict observation space (a set of simpler gym spaces), it also happens in preprocessors.py.

Reproduction

import numpy as np
import ray
from gym.spaces import Box, Dict, Discrete
from ray import tune
from ray.rllib.env import BaseEnv
from ray.rllib.utils import try_import_tf
from ray.tune.registry import register_env

tf = try_import_tf()


class TestEnv(BaseEnv):
    def poll(self):
        return {}, [], [], [], []

    def send_actions(self, action_dict):
        pass

    observation_space = Dict({"curr_obs": Box(low=0, high=255, shape=(1, 1), dtype=np.uint8)})
    action_space = Discrete(1)


def env_creator(_):
    return TestEnv()


env_name = "test_env"
register_env(env_name, env_creator)

ray.init(local_mode=True)
tune.run("PPO", config={"env": env_name})

Note: You will have to set the breakpoints in the affected lines yourself. I wanted to provide a simple custom model that takes in only uint8, but couldn't get it to work.

The text was updated successfully, but these errors were encountered:

… for both observations and actions. This saves memory. Fix bug where actions of other environments are always included. NB: This version requires a ray bug to be patched: ray-project/ray#7946

…y fix for ray-project/ray#7946

internetcoffeephone · 2020-05-13T20:27:28Z

It seems that incorrect sample dtypes are used in other places as well:

stale · 2020-11-12T12:19:26Z

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.

internetcoffeephone · 2020-11-13T10:12:19Z

Comment to remove stale label.

stale · 2021-03-13T16:59:20Z

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.

internetcoffeephone · 2021-03-13T18:14:13Z

This comment is to remove the stale label.

stale · 2021-10-30T20:23:26Z

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.

internetcoffeephone · 2021-10-31T11:09:34Z

This comment is to remove the stale label.

avnishn · 2022-02-07T19:50:20Z

closing because after running the repro script, it looks like we chased this bug away.

internetcoffeephone added the bug Something that is supposed to be working; but isn't label Apr 9, 2020

internetcoffeephone added a commit to internetcoffeephone/sequential_social_dilemma_games that referenced this issue Apr 14, 2020

Add ray patch script for incorrect hardcoded float32 values. Temporar…

3f6b8a5

…y fix for ray-project/ray#7946

internetcoffeephone mentioned this issue Jul 2, 2020

Social Curiosity Module implementation and MOA fixes eugenevinitsky/sequential_social_dilemma_games#179

Merged

stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Nov 12, 2020

stale bot removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Nov 13, 2020

stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Mar 13, 2021

stale bot removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Mar 13, 2021

rkooo567 added the rllib RLlib related issues label Jul 2, 2021

stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Oct 30, 2021

stale bot removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Oct 31, 2021

Rohan138 mentioned this issue Dec 2, 2021

Fix tf.random.uniform dtype #20843

Closed

4 tasks

avnishn closed this as completed Feb 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rllib] Custom environment observations always have the dtype float32 #7946

[rllib] Custom environment observations always have the dtype float32 #7946

internetcoffeephone commented Apr 9, 2020

internetcoffeephone commented May 13, 2020

stale bot commented Nov 12, 2020

internetcoffeephone commented Nov 13, 2020

stale bot commented Mar 13, 2021

internetcoffeephone commented Mar 13, 2021

stale bot commented Oct 30, 2021

internetcoffeephone commented Oct 31, 2021

avnishn commented Feb 7, 2022

[rllib] Custom environment observations always have the dtype float32 #7946

[rllib] Custom environment observations always have the dtype float32 #7946

Comments

internetcoffeephone commented Apr 9, 2020

What is the problem?

Reproduction

internetcoffeephone commented May 13, 2020

stale bot commented Nov 12, 2020

internetcoffeephone commented Nov 13, 2020

stale bot commented Mar 13, 2021

internetcoffeephone commented Mar 13, 2021

stale bot commented Oct 30, 2021

internetcoffeephone commented Oct 31, 2021

avnishn commented Feb 7, 2022