New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLLib] RuntimeError: Expected scalars to be on CPU, got cuda:0 instead #34159
Comments
Can you share your script? |
@xwjiang2010 I think I'm facing same issue, after PBT/PB2 perturbs one of my trials and loads them from checkpoint (loads successfully) next OS: Linux 6.1 code that reproduces the issue every time I run it on my setup import ray
from ray.rllib.algorithms.ppo import PPOConfig
from ray import tune
from ray.tune.schedulers.pb2 import PB2
from ray import air
ray.init(address="auto")
config = (
PPOConfig()
.framework("torch")
.environment("BipedalWalker-v3")
.training(
lr=1e-5,
model={"fcnet_hiddens": [128, 128]},
train_batch_size=1024,
)
.rollouts(num_rollout_workers=5, num_envs_per_worker=4)
.resources(num_gpus=1.0 / 4)
)
perturbation_interval = 20
pb2 = PB2(
time_attr="training_iteration",
perturbation_interval=perturbation_interval,
hyperparam_bounds={"lr": [1e-3, 1e-7], "train_batch_size": [128, 1024 * 8]},
)
param_space = {**config.to_dict(), **{"checkpoint_interval": perturbation_interval}}
tuner = tune.Tuner(
"PPO",
param_space=param_space,
run_config=air.RunConfig(
stop={"training_iteration": 1e9},
verbose=1,
),
tune_config=tune.TuneConfig(
scheduler=pb2, metric="episode_reward_mean", mode="max", num_samples=4
),
)
results = tuner.fit() piece of logs:
|
I think I have a lead on this, I'll do some more testing but I think it has to do with the num_gpu and roll_out_workers and how they are set. When I initially ran the tuner.run I hadn't set the gpu resources correct so it trained on the cpu instead. Then later I fixed it and went back to load an older checkpoint (that had the improper gpu set) and it tried loading it onto the gpu instead of cpu causing that issue. I'll see if I can recreate it more consistently. |
Actually are people all running into this issue with RLlib Algorithms? It may have something to do with how Algorithm save and load checkpoints (basically they should be consistent - whether both on gpu or both on cpu). related: https://discuss.ray.io/t/runtimeerror-expected-scalars-to-be-on-cpu-got-cuda-0-instead/9998 |
I've reproduced this with both PPO and SAC, didn't check the rest. |
I also met the same issue when restoring and training a PPO agent with ray 2.3.1. Is there any temporary solution? I used run_experiment() to train. |
Hey @DenysAshikhin, I don't have much visibility into this issue, but if you can create a minimal repro script that we can use to debug it would be a great starting point. For examples some script that trains a PPO agent on cuda for one iteration, and then tries to restore it (for inference or continuation of training) but it fails with the error message you showed. Thanks. |
also tagging @perduta @WeihaoTan to provide a repro script. Thanks! |
Hi @kouroshHakha @xwjiang2010 Here is the repro script. If you run it using a machine with 1 GPU. It works perfectly. If you run it using a machine with multiple GPUs. The bug will appear.
config.yaml
After training is finished, uncomment "restore" and put the trained checkpoint path in it. Increase timesteps_total and re-run train.py, the bug will appear. Something like:
|
@xwjiang2010 is this enough? if not, what else should I attach? |
For my side (doesn't even require multiple gpus):
You can of course subsitute the env with a random env. In my case, after letting it make some checkpoints, and then ctr+c in the cmd window. After it gracefully saves, running it again gives the same error as outlined above |
My issue is occurring on a single GPU machine as well.
…On Thu, Apr 13, 2023, 14:35 Denys Ashikhin ***@***.***> wrote:
For my side (doesn't even require multiple gpus):
import ray
from ray.rllib.env import PolicyServerInput
from ray.rllib.algorithms.ppo import PPOConfig
import numpy as np
import argparse
from gymnasium.spaces import MultiDiscrete, Box
ray.init(num_cpus=9, num_gpus=1, log_to_driver=False, configure_logging=False)
ppo_config = PPOConfig()
parser = argparse.ArgumentParser(description='Optional app description')
parser.add_argument('-ip', type=str, help='IP of this device')
parser.add_argument('-checkpoint', type=str, help='location of checkpoint to restore from')
args = parser.parse_args()
def _input(ioctx):
# We are remote worker, or we are local worker with num_workers=0:
# Create a PolicyServerInput.
if ioctx.worker_index > 0 or ioctx.worker.num_workers == 0:
return PolicyServerInput(
ioctx,
args.ip,
55556 + ioctx.worker_index - (1 if ioctx.worker_index > 0 else 0),
)
# No InputReader (PolicyServerInput) needed.
else:
return None
x = 320
y = 240
# kl_coeff, ->default 0.2
# ppo_config.gamma = 0.01 # vf_loss_coeff used to be 0.01??
# "entropy_coeff": 0.00005,
# "clip_param": 0.1,
ppo_config.gamma = 0.998 # default 0.99
ppo_config.lambda_ = 0.99 # default 1.0???
ppo_config.kl_target = 0.01 # default 0.01
ppo_config.rollout_fragment_length = 128
# ppo_config.train_batch_size = 8500
# ppo_config.train_batch_size = 10000
ppo_config.train_batch_size = 12000
ppo_config.sgd_minibatch_size = 512
# ppo_config.num_sgd_iter = 2 # default 30???
ppo_config.num_sgd_iter = 7 # default 30???
# ppo_config.lr = 3.5e-5 # 5e-5ppo_config.lr = 9e-5 # 5e-5
ppo_config.model = {
# Share layers for value function. If you set this to True, it's
# important to tune vf_loss_coeff.
"vf_share_layers": True,
"use_lstm": True,
"max_seq_len": 32,
"lstm_cell_size": 128,
"lstm_use_prev_action": True,
"conv_filters": [
# 240 X 320
[16, [5, 5], 3],
[32, [5, 5], 3],
[64, [5, 5], 3],
[128, [3, 3], 2],
[256, [3, 3], 2],
[512, [3, 3], 2],
],
"conv_activation": "relu",
"post_fcnet_hiddens": [512],
"post_fcnet_activation": "relu"
}
ppo_config.batch_mode = "complete_episodes"
ppo_config.simple_optimizer = True
# ppo_config["remote_worker_envs"] = True
ppo_config.env = None
ppo_config.observation_space = Box(low=0, high=1, shape=(y, x, 1), dtype=np.float32)
ppo_config.action_space = MultiDiscrete(
[
2, # W
2, # A
2, # S
2, # D
2, # Space
2, # H
2, # J
2, # K
2 # L
]
)
ppo_config.env_config = {
"sleep": True,
'replayOn': False
}
ppo_config.rollouts(num_rollout_workers=2, enable_connectors=False)
ppo_config.offline_data(input_=_input)
ppo_config.framework_str = 'torch'
ppo_config.log_sys_usage = False
ppo_config.compress_observations = True
ppo_config.shuffle_sequences = False
ppo_config.num_gpus = 0.5
ppo_config.num_gpus_per_worker = 0.25
ppo_config.num_cpus_per_worker = 2
ppo_config.num_cpus_per_learner_worker = 2
ppo_config.num_gpus_per_learner_worker = 0.5
tempyy = ppo_config.to_dict()
print(tempyy)
from ray import tune
name = "" + args.checkpoint
print(f"Starting: {name}")
tune.run("PPO",
resume='AUTO',
# param_space=config,
config=tempyy,
name=name, keep_checkpoints_num=None, checkpoint_score_attr="episode_reward_mean",
max_failures=1,
checkpoint_freq=5, checkpoint_at_end=True)
```
You can of course subsitute the env with a random env. In my case, after letting it make some checkpoints, and then ctr+c in the cmd window. After it gracefully saves, running it again gives the same error as outlined above
—
Reply to this email directly, view it on GitHub
<#34159 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEMC3OYRFMYMNWOVS32N5FTXA7XKBANCNFSM6AAAAAAWWUYRZE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Got the same problem when trying to resume an experiment on Colab with exactly the same configuration as before.
at the end of iteration. |
@cheadrian what is the |
I get the same error with or without specifying it. |
안녕하세요. sac을 사용하고 있는데 |
Please ignore that point, no matter what settings I use the issue happens same as for the others. I had confused some parameters 😅 |
same error and no solution yet ???? |
@kouroshHakha Could you triage for RL team? |
@gunewar I get the feeling that it's some new configuration that got missed in automated builds |
config에서 num gpus를 제거하니 일단 학습은 됩니다! :) I remove numgpus from config, so it's learning for now! :) |
So you trained initially with num_gpus (thus training on the gpu) and for subsequent runs you turned off the num_gpu? Is it training on your cpu or gpu then? |
I was able to successfully restore the model that was trained with the num_gpus(=1) config after removing the num_gpus(=0) config. |
Interesting, didn't work for me. Can you confirm if it's training on your GPU still or got switched to your CPU after restoring? |
I just checked the training, and now I'm working on another project. I used all the configs related to resources as the base and didn't use Ray Tuner |
Tune Status Modifying the Tune Status |
I'm happy to know there's kinda a work-around for some. However, it still didn't work for me and it doesn't fix that I need to train on my GPU (as I'm sure others do as well) |
Apparently the torch optimizer param_group values should not be moved to cuda devices when restoring optimizer states. There will be a PR addressing this issue in the next releases. |
@kouroshHakha |
Hey @DenysAshikhin, so here is the core change that need to happen in RLlib's torch policy. If it's urgent, you can make these changes on your local installation of ray. If it can wait a few days, you can either install nightly or use master once this PR is merged. If you need reliability, you have to wait for the released version. |
@kouroshHakha As such, I will need to manually add this as I don't want to redo the other for now. I'll report back to confirm if the linked pr fixes it for me soon. |
@DenysAshikhin I have pinged the owner of the other PR for possibility of merge . Thanks for your valuable inputs on our library. |
What happened + What you expected to happen
How severe does this issue affect your experience of using Ray?
High: It blocks me to complete my task.
Hi all,
I am trying to load in a previously trained model to continue training it, except I get the following error:How severe does this issue affect your experience of using Ray?
High: It blocks me to complete my task.
Hi all,
I am trying to load in a previously trained model to continue training it, except I get the following error:
Relevant code:
Versions / Dependencies
OS: Win11
Python: 3.10
Ray: latest nightly windows wheel
Reproduction script
n/a
Issue Severity
High: It blocks me from completing my task.
The text was updated successfully, but these errors were encountered: