[rllib] Cannot run the custom environment sample code in Python #8754

SarahBelieve · 2020-06-02T22:45:35Z

What is the problem?

I tried the custom environment example with PyTorch. But it cannot run with this error "RuntimeError: Expected object of scalar type Float but got scalar type Long for argument #2 'mat1' in call to _th_addmm".

When I switched the framework to Tensorflow, it runs properly. I can't find any documentation about how I am supposed to make changes if I want to use the custom environment with PyTorch.

System environment:

Ray: latest pip version
Python: 3.7.7
PyTorch: 1.5.0
OS: Ubuntu 18

Reproduction

import gym
from gym.spaces import Discrete, Box
from ray import tune

class SimpleCorridor(gym.Env):
    def __init__(self, config):
        self.end_pos = config["corridor_length"]
        self.cur_pos = 0
        self.action_space = Discrete(2)
        self.observation_space = Box(0.0, self.end_pos, shape=(1, ))

    def reset(self):
        self.cur_pos = 0
        return [self.cur_pos]

    def step(self, action):
        if action == 0 and self.cur_pos > 0:
            self.cur_pos -= 1
        elif action == 1:
            self.cur_pos += 1
        done = self.cur_pos >= self.end_pos
        return [self.cur_pos], 1 if done else 0, done, {}

tune.run(
    "PPO",
    config={
        "env": SimpleCorridor,
        "num_workers": 1,
        "env_config": {"corridor_length": 5},
        "use_pytorch":True})

The text was updated successfully, but these errors were encountered:

sven1977 · 2020-06-03T06:29:00Z

Could you also upgrade to the latest ray wheel? Then use the --torch flag on the command line with this script. It's running fine for me. Could you also post the entire stacktrace?

SarahBelieve · 2020-06-03T21:10:13Z

@sven1977 Thanks for your quick reply!
I upgrade the ray through pip install ray[rllib] --upgrade.
Then run the script with python sample_corridor.py --torch
Here is what I got:

2020-06-03 14:05:41,553	INFO resource_spec.py:212 -- Starting Ray with 3.32 GiB memory available for workers and up to 1.67 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-06-03 14:05:41,907	INFO services.py:1170 -- View the Ray dashboard at localhost:8265
== Status ==
Memory usage on this node: 11.8/15.5 GiB
Using FIFO scheduling algorithm.
Resources requested: 2/4 CPUs, 0/0 GPUs, 0.0/3.32 GiB heap, 0.0/1.12 GiB objects
Result logdir: /home/hcr/ray_results/PPO
Number of trials: 1 (1 RUNNING)
+--------------------------+----------+-------+
| Trial name               | status   | loc   |
|--------------------------+----------+-------|
| PPO_SimpleCorridor_00000 | RUNNING  |       |
+--------------------------+----------+-------+


(pid=16576) 2020-06-03 14:05:44,568	INFO trainer.py:580 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
(pid=16576) 2020-06-03 14:05:44,598	INFO trainable.py:217 -- Getting current IP.
(pid=16576) 2020-06-03 14:05:44,599	WARNING util.py:37 -- Install gputil for GPU system monitoring.
2020-06-03 14:05:45,504	ERROR trial_runner.py:519 -- Trial PPO_SimpleCorridor_00000: Error processing event.
Traceback (most recent call last):
  File "/home/hcr/python-envs/jh-warehouse/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 467, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/home/hcr/python-envs/jh-warehouse/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 431, in fetch_result
    result = ray.get(trial_future[0], DEFAULT_GET_TIMEOUT)
  File "/home/hcr/python-envs/jh-warehouse/lib/python3.7/site-packages/ray/worker.py", line 1515, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RuntimeError): ray::PPO.train() (pid=16576, ip=192.168.1.27)
  File "python/ray/_raylet.pyx", line 463, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 417, in ray._raylet.execute_task.function_executor
  File "/home/hcr/python-envs/jh-warehouse/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 495, in train
    raise e
  File "/home/hcr/python-envs/jh-warehouse/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 484, in train
    result = Trainable.train(self)
  File "/home/hcr/python-envs/jh-warehouse/lib/python3.7/site-packages/ray/tune/trainable.py", line 261, in train
    result = self._train()
  File "/home/hcr/python-envs/jh-warehouse/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 151, in _train
    fetches = self.optimizer.step()
  File "/home/hcr/python-envs/jh-warehouse/lib/python3.7/site-packages/ray/rllib/optimizers/sync_samples_optimizer.py", line 59, in step
    for e in self.workers.remote_workers()
  File "/home/hcr/python-envs/jh-warehouse/lib/python3.7/site-packages/ray/rllib/utils/memory.py", line 32, in ray_get_and_free
    return ray.get(object_ids)
ray.exceptions.RayTaskError(RuntimeError): ray::RolloutWorker.sample() (pid=16575, ip=192.168.1.27)
  File "python/ray/_raylet.pyx", line 463, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 417, in ray._raylet.execute_task.function_executor
  File "/home/hcr/python-envs/jh-warehouse/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 510, in sample
    batches = [self.input_reader.next()]
  File "/home/hcr/python-envs/jh-warehouse/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 54, in next
    batches = [self.get_data()]
  File "/home/hcr/python-envs/jh-warehouse/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 98, in get_data
    item = next(self.rollout_provider)
  File "/home/hcr/python-envs/jh-warehouse/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 358, in _env_runner
    active_episodes)
  File "/home/hcr/python-envs/jh-warehouse/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 610, in _do_policy_eval
    timestep=policy.global_timestep)
  File "/home/hcr/python-envs/jh-warehouse/lib/python3.7/site-packages/ray/rllib/policy/torch_policy.py", line 151, in compute_actions
    input_dict, state_batches, seq_lens)
  File "/home/hcr/python-envs/jh-warehouse/lib/python3.7/site-packages/ray/rllib/models/modelv2.py", line 164, in __call__
    res = self.forward(restored, state or [], seq_lens)
  File "/home/hcr/python-envs/jh-warehouse/lib/python3.7/site-packages/ray/rllib/models/torch/fcnet.py", line 92, in forward
    features = self._hidden_layers(obs.reshape(obs.shape[0], -1))
  File "/home/hcr/python-envs/jh-warehouse/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/hcr/python-envs/jh-warehouse/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward
    input = module(input)
  File "/home/hcr/python-envs/jh-warehouse/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/hcr/python-envs/jh-warehouse/lib/python3.7/site-packages/ray/rllib/models/torch/misc.py", line 108, in forward
    return self._model(x)
  File "/home/hcr/python-envs/jh-warehouse/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/hcr/python-envs/jh-warehouse/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward
    input = module(input)
  File "/home/hcr/python-envs/jh-warehouse/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/hcr/python-envs/jh-warehouse/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 87, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/hcr/python-envs/jh-warehouse/lib/python3.7/site-packages/torch/nn/functional.py", line 1610, in linear
    ret = torch.addmm(bias, input, weight.t())
RuntimeError: Expected object of scalar type Float but got scalar type Long for argument #2 'mat1' in call to _th_addmm
== Status ==
Memory usage on this node: 11.9/15.5 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/4 CPUs, 0/0 GPUs, 0.0/3.32 GiB heap, 0.0/1.12 GiB objects
Result logdir: /home/hcr/ray_results/PPO
Number of trials: 1 (1 ERROR)
+--------------------------+----------+-------+
| Trial name               | status   | loc   |
|--------------------------+----------+-------|
| PPO_SimpleCorridor_00000 | ERROR    |       |
+--------------------------+----------+-------+
Number of errored trials: 1
+--------------------------+--------------+--------------------------------------------------------------------------------------+
| Trial name               |   # failures | error file                                                                           |
|--------------------------+--------------+--------------------------------------------------------------------------------------|
| PPO_SimpleCorridor_00000 |            1 | /home/hcr/ray_results/PPO/PPO_SimpleCorridor_0_2020-06-03_14-05-436wiesl7d/error.txt |
+--------------------------+--------------+--------------------------------------------------------------------------------------+

Traceback (most recent call last):
  File "sample_corridor.py", line 30, in <module>
    "use_pytorch":True})
  File "/home/hcr/python-envs/jh-warehouse/lib/python3.7/site-packages/ray/tune/tune.py", line 347, in run
    raise TuneError("Trials did not complete", incomplete_trials)
ray.tune.error.TuneError: ('Trials did not complete', [PPO_SimpleCorridor_00000])
(pid=16575) /pytorch/torch/csrc/utils/tensor_numpy.cpp:141: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program.

stale · 2020-11-11T11:14:57Z

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.

stale · 2020-11-25T12:06:44Z

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!

SarahBelieve added the question Just a question :) label Jun 2, 2020

richardliaw added the rllib label Jul 11, 2020

stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Nov 11, 2020

stale bot closed this as completed Nov 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rllib] Cannot run the custom environment sample code in Python #8754

[rllib] Cannot run the custom environment sample code in Python #8754

SarahBelieve commented Jun 2, 2020

sven1977 commented Jun 3, 2020

SarahBelieve commented Jun 3, 2020

stale bot commented Nov 11, 2020

stale bot commented Nov 25, 2020

[rllib] Cannot run the custom environment sample code in Python #8754

[rllib] Cannot run the custom environment sample code in Python #8754

Comments

SarahBelieve commented Jun 2, 2020

What is the problem?

System environment:

Reproduction

sven1977 commented Jun 3, 2020

SarahBelieve commented Jun 3, 2020

stale bot commented Nov 11, 2020

stale bot commented Nov 25, 2020