[RLLib] RuntimeError: Expected scalars to be on CPU, got cuda:0 instead #34159

DenysAshikhin · 2023-04-07T15:03:55Z

What happened + What you expected to happen

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.
Hi all,

I am trying to load in a previously trained model to continue training it, except I get the following error:How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.
Hi all,

I am trying to load in a previously trained model to continue training it, except I get the following error:

Failure # 1 (occurred at 2023-03-31_14-54-08)
e[36mray::PPO.train()e[39m (pid=5616, ip=127.0.0.1, repr=PPO)
  File "python\ray\_raylet.pyx", line 875, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 879, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 819, in ray._raylet.execute_task.function_executor
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\_private\function_manager.py", line 674, in actor_method_executor
    return method(__ray_actor, *args, **kwargs)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\tune\trainable\trainable.py", line 384, in train
    raise skipped from exception_cause(skipped)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\tune\trainable\trainable.py", line 381, in train
    result = self.step()
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 794, in step
    results, train_iter_ctx = self._run_one_training_iteration()
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 2810, in _run_one_training_iteration
    results = self.training_step()
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\algorithms\ppo\ppo.py", line 420, in training_step
    train_results = train_one_step(self, train_batch)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\execution\train_ops.py", line 52, in train_one_step
    info = do_minibatch_sgd(
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\utils\sgd.py", line 129, in do_minibatch_sgd
    local_worker.learn_on_batch(
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 1029, in learn_on_batch
    info_out[pid] = policy.learn_on_batch(batch)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\utils\threading.py", line 24, in wrapper
    return func(self, *a, **k)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 663, in learn_on_batch
    self.apply_gradients(_directStepOptimizerSingleton)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 880, in apply_gradients
    opt.step()
  File "C:\personal\ai\ray_venv\lib\site-packages\torch\optim\optimizer.py", line 280, in wrapper
    out = func(*args, **kwargs)
  File "C:\personal\ai\ray_venv\lib\site-packages\torch\optim\optimizer.py", line 33, in _use_grad
    ret = func(self, *args, **kwargs)
  File "C:\personal\ai\ray_venv\lib\site-packages\torch\optim\adam.py", line 141, in step
    adam(
  File "C:\personal\ai\ray_venv\lib\site-packages\torch\optim\adam.py", line 281, in adam
    func(params,
  File "C:\personal\ai\ray_venv\lib\site-packages\torch\optim\adam.py", line 449, in _multi_tensor_adam
    torch._foreach_addcmul_(device_exp_avg_sqs, device_grads, device_grads, 1 - beta2)
RuntimeError: Expected scalars to be on CPU, got cuda:0 instead.

Relevant code:

tune.run("PPO",
         resume='AUTO',
         # param_space=config,
         config=ppo_config.to_dict(),
         name=name, keep_checkpoints_num=None, checkpoint_score_attr="episode_reward_mean",
         max_failures=1,
         # restore="C:\\Users\\denys\\ray_results\\mediumbrawl-attention-256Att-128MLP-L2\\PPOTrainer_RandomEnv_1e882_00000_0_2022-06-02_15-13-44\\checkpoint_000028\\checkpoint-28",
         checkpoint_freq=5, checkpoint_at_end=True)

Versions / Dependencies

OS: Win11
Python: 3.10
Ray: latest nightly windows wheel

Reproduction script

n/a

Issue Severity

High: It blocks me from completing my task.

The text was updated successfully, but these errors were encountered:

xwjiang2010 · 2023-04-07T15:33:43Z

Can you share your script?

perduta · 2023-04-08T18:33:22Z

@xwjiang2010 I think I'm facing same issue, after PBT/PB2 perturbs one of my trials and loads them from checkpoint (loads successfully) next opt.step() seems to be failing. I am trying to debug this - if you have any clues I'm all ears.

OS: Linux 6.1
Python: 3.10
Ray: both on 2.3.1 and latest nightly (1b5b2f8)

code that reproduces the issue every time I run it on my setup

import ray
from ray.rllib.algorithms.ppo import PPOConfig
from ray import tune
from ray.tune.schedulers.pb2 import PB2
from ray import air

ray.init(address="auto")

config = (
    PPOConfig()
    .framework("torch")
    .environment("BipedalWalker-v3")
    .training(
        lr=1e-5,
        model={"fcnet_hiddens": [128, 128]},
        train_batch_size=1024,
    )
    .rollouts(num_rollout_workers=5, num_envs_per_worker=4)
    .resources(num_gpus=1.0 / 4)
)

perturbation_interval = 20
pb2 = PB2(
    time_attr="training_iteration",
    perturbation_interval=perturbation_interval,
    hyperparam_bounds={"lr": [1e-3, 1e-7], "train_batch_size": [128, 1024 * 8]},
)

param_space = {**config.to_dict(), **{"checkpoint_interval": perturbation_interval}}

tuner = tune.Tuner(
    "PPO",
    param_space=param_space,
    run_config=air.RunConfig(
        stop={"training_iteration": 1e9},
        verbose=1,
    ),
    tune_config=tune.TuneConfig(
        scheduler=pb2, metric="episode_reward_mean", mode="max", num_samples=4
    ),
)

results = tuner.fit()

piece of logs:

(RolloutWorker pid=189049) 2023-04-08 20:06:19,348	WARNING env.py:155 -- Your env doesn't have a .spec.max_episode_steps attribute. Your horizon will default to infinity, and your environment will not be reset.
(PPO pid=188910) 2023-04-08 20:06:19,348	WARNING env.py:155 -- Your env doesn't have a .spec.max_episode_steps attribute. Your horizon will default to infinity, and your environment will not be reset.
(PPO pid=188910) 2023-04-08 20:06:19,348	WARNING env.py:155 -- Your env doesn't have a .spec.max_episode_steps attribute. Your horizon will default to infinity, and your environment will not be reset.
(RolloutWorker pid=189049) pybullet build time: Apr  4 2023 02:40:04 [repeated 2x across cluster]
== Status ==
Current time: 2023-04-08 20:06:20 (running for 00:03:11.58)
PopulationBasedTraining: 2 checkpoints, 1 perturbs
Logical resource usage: 24.0/24 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:G)
Current best trial: 9e53a_00003 with episode_reward_mean=-809.2682569878904 and parameters={'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0.25, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, '_fake_gpus': False, 'num_learner_workers': 0, 'num_gpus_per_learner_worker': 0, 'num_cpus_per_learner_worker': 1, 'local_gpu_idx': 0, 'custom_resources_per_worker': {}, 'placement_strategy': 'PACK', 'eager_tracing': False, 'eager_max_retraces': 20, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'env': 'quadcopter-env-v0', 'env_config': {}, 'observation_space': None, 'action_space': None, 'env_task_fn': None, 'render_env': False, 'clip_rewards': None, 'normalize_actions': True, 'clip_actions': False, 'disable_env_checking': False, 'is_atari': False, 'auto_wrap_old_gym_envs': True, 'num_envs_per_worker': 4, 'sample_collector': <class 'ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector'>, 'sample_async': False, 'enable_connectors': True, 'rollout_fragment_length': 'auto', 'batch_mode': 'truncate_episodes', 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'validate_workers_after_construction': True, 'preprocessor_pref': 'deepmind', 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'compress_observations': False, 'enable_tf1_exec_eagerly': False, 'sampler_perf_stats_ema_coef': None, 'gamma': 0.99, 'lr': 1e-05, 'train_batch_size': 1024, 'model': {'_disable_preprocessor_api': False, '_disable_action_flattening': False, 'fcnet_hiddens': [128, 128], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': False, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'framestack': True, 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'encoder_latent_dim': None, 'lstm_use_prev_action_reward': -1, '_use_default_native_models': -1}, 'optimizer': {}, 'max_requests_in_flight_per_sampler_worker': 2, 'learner_class': None, '_enable_learner_api': False, '_learner_hps': PPOLearnerHPs(kl_coeff=0.2, kl_target=0.01, use_critic=True, clip_param=0.3, vf_clip_param=10.0, entropy_coeff=0.0, vf_loss_coeff=1.0, lr_schedule=None, entropy_coeff_schedule=None), 'explore': True, 'exploration_config': {'type': 'StochasticSampling'}, 'policy_states_are_swappable': False, 'input_config': {}, 'actions_in_input_normalized': False, 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_config': {}, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'offline_sampling': False, 'evaluation_interval': None, 'evaluation_duration': 10, 'evaluation_duration_unit': 'episodes', 'evaluation_sample_timeout_s': 180.0, 'evaluation_parallel_to_training': False, 'evaluation_config': None, 'off_policy_estimation_methods': {}, 'ope_split_batch_by_episode': True, 'evaluation_num_workers': 0, 'always_attach_evaluation_results': False, 'enable_async_evaluation': False, 'in_evaluation': False, 'sync_filters_on_rollout_workers_timeout_s': 60.0, 'keep_per_episode_custom_metrics': False, 'metrics_episode_collection_timeout_s': 60.0, 'metrics_num_episodes_for_smoothing': 100, 'min_time_s_per_iteration': None, 'min_train_timesteps_per_iteration': 0, 'min_sample_timesteps_per_iteration': 0, 'export_native_model_files': False, 'checkpoint_trainable_policies_only': False, 'logger_creator': None, 'logger_config': None, 'log_level': 'WARN', 'log_sys_usage': True, 'fake_sampler': False, 'seed': None, 'worker_cls': None, 'ignore_worker_failures': False, 'recreate_failed_workers': False, 'max_num_worker_restarts': 1000, 'delay_between_worker_restarts_s': 60.0, 'restart_failed_sub_environments': False, 'num_consecutive_worker_failures_tolerance': 100, 'worker_health_probe_timeout_s': 60, 'worker_restore_timeout_s': 1800, 'rl_module_spec': None, '_enable_rl_module_api': False, '_validate_exploration_conf_and_rl_modules': True, '_tf_policy_handles_more_than_one_loss': False, '_disable_preprocessor_api': False, '_disable_action_flattening': False, '_disable_execution_plan_api': True, 'simple_optimizer': False, 'replay_sequence_length': None, 'horizon': -1, 'soft_horizon': -1, 'no_done_at_end': -1, 'lr_schedule': None, 'use_critic': True, 'use_gae': True, 'kl_coeff': 0.2, 'sgd_minibatch_size': 128, 'num_sgd_iter': 30, 'shuffle_sequences': True, 'vf_loss_coeff': 1.0, 'entropy_coeff': 0.0, 'entropy_coeff_schedule': None, 'clip_param': 0.3, 'vf_clip_param': 10.0, 'grad_clip': None, 'kl_target': 0.01, 'vf_share_layers': -1, 'checkpoint_interval': 20, '__stdout_file__': None, '__stderr_file__': None, 'lambda': 1.0, 'input': 'sampler', 'multiagent': {'policies': {'default_policy': (None, None, None, None)}, 'policy_mapping_fn': <function AlgorithmConfig.DEFAULT_POLICY_MAPPING_FN at 0x7f4db05ea830>, 'policies_to_train': None, 'policy_map_capacity': 100, 'policy_map_cache': -1, 'count_steps_by': 'env_steps', 'observation_fn': None}, 'callbacks': <class 'ray.rllib.algorithms.callbacks.DefaultCallbacks'>, 'create_env_on_driver': False, 'custom_eval_function': None, 'framework': 'torch', 'num_cpus_for_driver': 1, 'num_workers': 5}
Result logdir: /home/pp/ray_results/PPO
Number of trials: 4/4 (4 RUNNING)


(PPO pid=188910) 2023-04-08 20:06:20,350	WARNING checkpoints.py:109 -- No `rllib_checkpoint.json` file found in checkpoint directory /tmp/checkpoint_tmp_a68e78c04f114e399acd38a05eff6631/.! Trying to extract checkpoint info from other files found in that dir.
(PPO pid=188910) 2023-04-08 20:06:20,391	INFO trainable.py:915 -- Restored on 192.168.178.20 from checkpoint: /tmp/checkpoint_tmp_a68e78c04f114e399acd38a05eff6631
(PPO pid=188910) 2023-04-08 20:06:20,392	INFO trainable.py:924 -- Current state after restoring: {'_iteration': 80, '_timesteps_total': None, '_time_total': 146.15634059906006, '_episodes_total': 155}
2023-04-08 20:06:20,958	ERROR trial_runner.py:1485 -- Trial PPO_quadcopter-env-v0_9e53a_00003: Error happened when processing _ExecutorEventType.TRAINING_RESULT.
ray.exceptions.RayTaskError(RuntimeError): ray::PPO.train() (pid=188912, ip=192.168.178.20, repr=PPO)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 386, in train
    raise skipped from exception_cause(skipped)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 383, in train
    result = self.step()
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 792, in step
    results, train_iter_ctx = self._run_one_training_iteration()
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2811, in _run_one_training_iteration
    results = self.training_step()
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/algorithms/ppo/ppo.py", line 432, in training_step
    train_results = multi_gpu_train_one_step(self, train_batch)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/execution/train_ops.py", line 163, in multi_gpu_train_one_step
    results = policy.learn_on_loaded_batch(
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 825, in learn_on_loaded_batch
    self.apply_gradients(_directStepOptimizerSingleton)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 885, in apply_gradients
    opt.step()
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 280, in wrapper
    out = func(*args, **kwargs)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 33, in _use_grad
    ret = func(self, *args, **kwargs)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/adam.py", line 141, in step
    adam(
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/adam.py", line 281, in adam
    func(params,
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/adam.py", line 449, in _multi_tensor_adam
    torch._foreach_addcmul_(device_exp_avg_sqs, device_grads, device_grads, 1 - beta2)
RuntimeError: Expected scalars to be on CPU, got cuda:0 instead.

DenysAshikhin · 2023-04-10T02:21:21Z

@xwjiang2010 @perduta

I think I have a lead on this, I'll do some more testing but I think it has to do with the num_gpu and roll_out_workers and how they are set. When I initially ran the tuner.run I hadn't set the gpu resources correct so it trained on the cpu instead. Then later I fixed it and went back to load an older checkpoint (that had the improper gpu set) and it tried loading it onto the gpu instead of cpu causing that issue. I'll see if I can recreate it more consistently.

xwjiang2010 · 2023-04-10T17:26:10Z

Actually are people all running into this issue with RLlib Algorithms? It may have something to do with how Algorithm save and load checkpoints (basically they should be consistent - whether both on gpu or both on cpu).

cc @kouroshHakha

related: https://discuss.ray.io/t/runtimeerror-expected-scalars-to-be-on-cpu-got-cuda-0-instead/9998

perduta · 2023-04-10T17:44:52Z

Actually are people all running into this issue with RLlib Algorithms? It may have something to do with how Algorithm save and load checkpoints (basically they should be consistent - whether both on gpu or both on cpu).

cc @kouroshHakha

I've reproduced this with both PPO and SAC, didn't check the rest.

WeihaoTan · 2023-04-11T17:47:43Z

I also met the same issue when restoring and training a PPO agent with ray 2.3.1. Is there any temporary solution? I used run_experiment() to train.

kouroshHakha · 2023-04-11T18:56:49Z

Hey @DenysAshikhin, I don't have much visibility into this issue, but if you can create a minimal repro script that we can use to debug it would be a great starting point.

For examples some script that trains a PPO agent on cuda for one iteration, and then tries to restore it (for inference or continuation of training) but it fails with the error message you showed.

Thanks.

xwjiang2010 · 2023-04-11T18:59:14Z

also tagging @perduta @WeihaoTan to provide a repro script. Thanks!

WeihaoTan · 2023-04-12T13:05:07Z

Hi @kouroshHakha @xwjiang2010 Here is the repro script. If you run it using a machine with 1 GPU. It works perfectly. If you run it using a machine with multiple GPUs. The bug will appear.
train.py

import argparse
import yaml

import ray
from ray.tune.experiment.config_parser import _make_parser
from ray.tune.progress_reporter import CLIReporter
from ray.tune.tune import run_experiments
from ray.tune.registry import register_trainable, register_env
from ray.tune.schedulers import create_scheduler
from ray.rllib.models import ModelCatalog
from ray.rllib.utils.framework import try_import_torch

from algorithms.registry import ALGORITHMS, get_algorithm_class
from envs.registry import ENVIRONMENTS, get_env_class, POLICY_MAPPINGS, CALLBACKS
from models.registry import MODELS, get_model_class, ACTION_DISTS, get_action_dist_class


EXAMPLE_USAGE = """
python train.py -f config.yaml
"""

# Try to import both backends for flag checking/warnings.
torch, _ = try_import_torch()


def create_parser(parser_creator=None):
    parser = _make_parser(
        parser_creator=parser_creator,
        formatter_class=argparse.RawDescriptionHelpFormatter,
        description="Train a reinforcement learning agent.",
        epilog=EXAMPLE_USAGE,
    )

    # See also the base parser definition in ray/tune/experiment/__config_parser.py
    parser.add_argument(
        "--ray-address",
        default=None,
        type=str,
        help="Connect to an existing Ray cluster at this address instead "
        "of starting a new one.",
    )
    parser.add_argument(
        "--ray-ui", action="store_true", help="Whether to enable the Ray web UI."
    )
    parser.add_argument(
        "--local-mode",
        action="store_true",
        help="Run ray in local mode for easier debugging.",
    )
    parser.add_argument(
        "--ray-num-cpus",
        default=None,
        type=int,
        help="--num-cpus to use if starting a new cluster.",
    )
    parser.add_argument(
        "--ray-num-gpus",
        default=None,
        type=int,
        help="--num-gpus to use if starting a new cluster.",
    )
    parser.add_argument(
        "--ray-num-nodes",
        default=None,
        type=int,
        help="Emulate multiple cluster nodes for debugging.",
    )
    parser.add_argument(
        "--ray-object-store-memory",
        default=None,
        type=int,
        help="--object-store-memory to use if starting a new cluster.",
    )
    parser.add_argument(
        "--resume",
        action="store_true",
        help="Whether to attempt to resume previous Tune experiments.",
    )
    parser.add_argument(
        "-f",
        "--config-file",
        default="config.yaml",
        type=str,
        help="If specified, use config options from this file. Note that this "
             "overrides any trial-specific options set via flags above.",
    )

    return parser


def run(args, parser):
    assert args.config_file is not None, "Must specify a config file"
    with open(args.config_file) as f:
        experiments = yaml.safe_load(f)
    verbose = 1
    for exp in experiments.values():
        metric_columns = exp.pop("metric_columns", None)
        if not exp.get("run"):
            parser.error("the following arguments are required: --run")
        if not exp.get("env") and not exp.get("config", {}).get("env"):
            parser.error("the following arguments are required: --env")
        if exp["config"].get("multiagent"):
            policy_mapping_name = exp["config"]["multiagent"].get("policy_mapping_fn")
            if isinstance(policy_mapping_name, str):
                exp["config"]["multiagent"]["policy_mapping_fn"] = POLICY_MAPPINGS[policy_mapping_name]
        if exp["config"].get("callbacks"):
            calback_name = exp["config"].get("callbacks")
            if isinstance(calback_name, str):
                exp["config"]["callbacks"] = CALLBACKS[calback_name]

    if args.ray_num_nodes:
        from ray.cluster_utils import Cluster

        cluster = Cluster()
        for _ in range(args.ray_num_nodes):
            cluster.add_node(
                num_cpus=args.ray_num_cpus or 1,
                num_gpus=args.ray_num_gpus or 0,
                object_store_memory=args.ray_object_store_memory,
            )
        ray.init(address=cluster.address)
    else:
        ray.init(
            include_dashboard=args.ray_ui,
            address=args.ray_address,
            object_store_memory=args.ray_object_store_memory,
            num_cpus=args.ray_num_cpus,
            num_gpus=args.ray_num_gpus,
            local_mode=args.local_mode,
        )

    progress_reporter = CLIReporter(
        print_intermediate_tables=verbose >= 1,
        metric_columns=metric_columns,
    )

    trials = run_experiments(
        experiments,
        scheduler=create_scheduler(args.scheduler, **args.scheduler_config),
        resume=args.resume,
        verbose=verbose,
        progress_reporter=progress_reporter,
        concurrent=True,
    )
    ray.shutdown()

    checkpoints = []
    for trial in trials:
        if trial.checkpoint.dir_or_data:
            checkpoints.append(trial.checkpoint.dir_or_data)

    if checkpoints:
        from rich import print
        from rich.panel import Panel

        print("\nYour training finished.")

        print("Best available checkpoint for each trial:")
        for cp in checkpoints:
            print(f"  {cp}")

        print(
            "\nYou can now evaluate your trained algorithm from any "
            "checkpoint, e.g. by running:"
        )
        print(Panel(f"[green]  rllib evaluate {checkpoints[0]} "))



def main():
    parser = create_parser()
    args = parser.parse_args()
    run(args, parser)


if __name__ == "__main__":
    main()

config.yaml

test:
  run: PPO
  checkpoint_config:
    checkpoint_frequency: 50
    checkpoint_at_end: true
    num_to_keep: 5
  local_dir: ray_results
  stop:
    timesteps_total: 1000
  #restore: checkpoint

  config:
    framework: torch 
    env: CartPole-v1

    num_workers: 4
    num_cpus_for_driver: 1
    num_envs_per_worker: 1
    num_cpus_per_worker: 2
    num_gpus: 1

    disable_env_checking: true

After training is finished, uncomment "restore" and put the trained checkpoint path in it. Increase timesteps_total and re-run train.py, the bug will appear. Something like:

    torch._foreach_addcmul_(device_exp_avg_sqs, device_grads, device_grads, 1 - beta2)
RuntimeError: Expected scalars to be on CPU, got cuda:0 instead.

perduta · 2023-04-13T06:14:47Z

@xwjiang2010 I think I'm facing same issue, after PBT/PB2 perturbs one of my trials and loads them from checkpoint (loads successfully) next opt.step() seems to be failing. I am trying to debug this - if you have any clues I'm all ears.

OS: Linux 6.1 Python: 3.10 Ray: both on 2.3.1 and latest nightly (1b5b2f8)

code that reproduces the issue every time I run it on my setup

import ray
from ray.rllib.algorithms.ppo import PPOConfig
from ray import tune
from ray.tune.schedulers.pb2 import PB2
from ray import air

ray.init(address="auto")

config = (
    PPOConfig()
    .framework("torch")
    .environment("BipedalWalker-v3")
    .training(
        lr=1e-5,
        model={"fcnet_hiddens": [128, 128]},
        train_batch_size=1024,
    )
    .rollouts(num_rollout_workers=5, num_envs_per_worker=4)
    .resources(num_gpus=1.0 / 4)
)

perturbation_interval = 20
pb2 = PB2(
    time_attr="training_iteration",
    perturbation_interval=perturbation_interval,
    hyperparam_bounds={"lr": [1e-3, 1e-7], "train_batch_size": [128, 1024 * 8]},
)

param_space = {**config.to_dict(), **{"checkpoint_interval": perturbation_interval}}

tuner = tune.Tuner(
    "PPO",
    param_space=param_space,
    run_config=air.RunConfig(
        stop={"training_iteration": 1e9},
        verbose=1,
    ),
    tune_config=tune.TuneConfig(
        scheduler=pb2, metric="episode_reward_mean", mode="max", num_samples=4
    ),
)

results = tuner.fit()

piece of logs:

(RolloutWorker pid=189049) 2023-04-08 20:06:19,348	WARNING env.py:155 -- Your env doesn't have a .spec.max_episode_steps attribute. Your horizon will default to infinity, and your environment will not be reset.
(PPO pid=188910) 2023-04-08 20:06:19,348	WARNING env.py:155 -- Your env doesn't have a .spec.max_episode_steps attribute. Your horizon will default to infinity, and your environment will not be reset.
(PPO pid=188910) 2023-04-08 20:06:19,348	WARNING env.py:155 -- Your env doesn't have a .spec.max_episode_steps attribute. Your horizon will default to infinity, and your environment will not be reset.
(RolloutWorker pid=189049) pybullet build time: Apr  4 2023 02:40:04 [repeated 2x across cluster]
== Status ==
Current time: 2023-04-08 20:06:20 (running for 00:03:11.58)
PopulationBasedTraining: 2 checkpoints, 1 perturbs
Logical resource usage: 24.0/24 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:G)
Current best trial: 9e53a_00003 with episode_reward_mean=-809.2682569878904 and parameters={'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0.25, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, '_fake_gpus': False, 'num_learner_workers': 0, 'num_gpus_per_learner_worker': 0, 'num_cpus_per_learner_worker': 1, 'local_gpu_idx': 0, 'custom_resources_per_worker': {}, 'placement_strategy': 'PACK', 'eager_tracing': False, 'eager_max_retraces': 20, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'env': 'quadcopter-env-v0', 'env_config': {}, 'observation_space': None, 'action_space': None, 'env_task_fn': None, 'render_env': False, 'clip_rewards': None, 'normalize_actions': True, 'clip_actions': False, 'disable_env_checking': False, 'is_atari': False, 'auto_wrap_old_gym_envs': True, 'num_envs_per_worker': 4, 'sample_collector': <class 'ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector'>, 'sample_async': False, 'enable_connectors': True, 'rollout_fragment_length': 'auto', 'batch_mode': 'truncate_episodes', 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'validate_workers_after_construction': True, 'preprocessor_pref': 'deepmind', 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'compress_observations': False, 'enable_tf1_exec_eagerly': False, 'sampler_perf_stats_ema_coef': None, 'gamma': 0.99, 'lr': 1e-05, 'train_batch_size': 1024, 'model': {'_disable_preprocessor_api': False, '_disable_action_flattening': False, 'fcnet_hiddens': [128, 128], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': False, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'framestack': True, 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'encoder_latent_dim': None, 'lstm_use_prev_action_reward': -1, '_use_default_native_models': -1}, 'optimizer': {}, 'max_requests_in_flight_per_sampler_worker': 2, 'learner_class': None, '_enable_learner_api': False, '_learner_hps': PPOLearnerHPs(kl_coeff=0.2, kl_target=0.01, use_critic=True, clip_param=0.3, vf_clip_param=10.0, entropy_coeff=0.0, vf_loss_coeff=1.0, lr_schedule=None, entropy_coeff_schedule=None), 'explore': True, 'exploration_config': {'type': 'StochasticSampling'}, 'policy_states_are_swappable': False, 'input_config': {}, 'actions_in_input_normalized': False, 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_config': {}, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'offline_sampling': False, 'evaluation_interval': None, 'evaluation_duration': 10, 'evaluation_duration_unit': 'episodes', 'evaluation_sample_timeout_s': 180.0, 'evaluation_parallel_to_training': False, 'evaluation_config': None, 'off_policy_estimation_methods': {}, 'ope_split_batch_by_episode': True, 'evaluation_num_workers': 0, 'always_attach_evaluation_results': False, 'enable_async_evaluation': False, 'in_evaluation': False, 'sync_filters_on_rollout_workers_timeout_s': 60.0, 'keep_per_episode_custom_metrics': False, 'metrics_episode_collection_timeout_s': 60.0, 'metrics_num_episodes_for_smoothing': 100, 'min_time_s_per_iteration': None, 'min_train_timesteps_per_iteration': 0, 'min_sample_timesteps_per_iteration': 0, 'export_native_model_files': False, 'checkpoint_trainable_policies_only': False, 'logger_creator': None, 'logger_config': None, 'log_level': 'WARN', 'log_sys_usage': True, 'fake_sampler': False, 'seed': None, 'worker_cls': None, 'ignore_worker_failures': False, 'recreate_failed_workers': False, 'max_num_worker_restarts': 1000, 'delay_between_worker_restarts_s': 60.0, 'restart_failed_sub_environments': False, 'num_consecutive_worker_failures_tolerance': 100, 'worker_health_probe_timeout_s': 60, 'worker_restore_timeout_s': 1800, 'rl_module_spec': None, '_enable_rl_module_api': False, '_validate_exploration_conf_and_rl_modules': True, '_tf_policy_handles_more_than_one_loss': False, '_disable_preprocessor_api': False, '_disable_action_flattening': False, '_disable_execution_plan_api': True, 'simple_optimizer': False, 'replay_sequence_length': None, 'horizon': -1, 'soft_horizon': -1, 'no_done_at_end': -1, 'lr_schedule': None, 'use_critic': True, 'use_gae': True, 'kl_coeff': 0.2, 'sgd_minibatch_size': 128, 'num_sgd_iter': 30, 'shuffle_sequences': True, 'vf_loss_coeff': 1.0, 'entropy_coeff': 0.0, 'entropy_coeff_schedule': None, 'clip_param': 0.3, 'vf_clip_param': 10.0, 'grad_clip': None, 'kl_target': 0.01, 'vf_share_layers': -1, 'checkpoint_interval': 20, '__stdout_file__': None, '__stderr_file__': None, 'lambda': 1.0, 'input': 'sampler', 'multiagent': {'policies': {'default_policy': (None, None, None, None)}, 'policy_mapping_fn': <function AlgorithmConfig.DEFAULT_POLICY_MAPPING_FN at 0x7f4db05ea830>, 'policies_to_train': None, 'policy_map_capacity': 100, 'policy_map_cache': -1, 'count_steps_by': 'env_steps', 'observation_fn': None}, 'callbacks': <class 'ray.rllib.algorithms.callbacks.DefaultCallbacks'>, 'create_env_on_driver': False, 'custom_eval_function': None, 'framework': 'torch', 'num_cpus_for_driver': 1, 'num_workers': 5}
Result logdir: /home/pp/ray_results/PPO
Number of trials: 4/4 (4 RUNNING)


(PPO pid=188910) 2023-04-08 20:06:20,350	WARNING checkpoints.py:109 -- No `rllib_checkpoint.json` file found in checkpoint directory /tmp/checkpoint_tmp_a68e78c04f114e399acd38a05eff6631/.! Trying to extract checkpoint info from other files found in that dir.
(PPO pid=188910) 2023-04-08 20:06:20,391	INFO trainable.py:915 -- Restored on 192.168.178.20 from checkpoint: /tmp/checkpoint_tmp_a68e78c04f114e399acd38a05eff6631
(PPO pid=188910) 2023-04-08 20:06:20,392	INFO trainable.py:924 -- Current state after restoring: {'_iteration': 80, '_timesteps_total': None, '_time_total': 146.15634059906006, '_episodes_total': 155}
2023-04-08 20:06:20,958	ERROR trial_runner.py:1485 -- Trial PPO_quadcopter-env-v0_9e53a_00003: Error happened when processing _ExecutorEventType.TRAINING_RESULT.
ray.exceptions.RayTaskError(RuntimeError): ray::PPO.train() (pid=188912, ip=192.168.178.20, repr=PPO)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 386, in train
    raise skipped from exception_cause(skipped)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 383, in train
    result = self.step()
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 792, in step
    results, train_iter_ctx = self._run_one_training_iteration()
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2811, in _run_one_training_iteration
    results = self.training_step()
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/algorithms/ppo/ppo.py", line 432, in training_step
    train_results = multi_gpu_train_one_step(self, train_batch)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/execution/train_ops.py", line 163, in multi_gpu_train_one_step
    results = policy.learn_on_loaded_batch(
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 825, in learn_on_loaded_batch
    self.apply_gradients(_directStepOptimizerSingleton)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 885, in apply_gradients
    opt.step()
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 280, in wrapper
    out = func(*args, **kwargs)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 33, in _use_grad
    ret = func(self, *args, **kwargs)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/adam.py", line 141, in step
    adam(
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/adam.py", line 281, in adam
    func(params,
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/adam.py", line 449, in _multi_tensor_adam
    torch._foreach_addcmul_(device_exp_avg_sqs, device_grads, device_grads, 1 - beta2)
RuntimeError: Expected scalars to be on CPU, got cuda:0 instead.

@xwjiang2010 is this enough? if not, what else should I attach?
Should we attach bug label back? I don't know any way to workaround this and I am unable to train any RL model at this time using ray.

DenysAshikhin · 2023-04-13T12:35:31Z

For my side (doesn't even require multiple gpus):

import ray
from ray.rllib.env import PolicyServerInput
from ray.rllib.algorithms.ppo import PPOConfig

import numpy as np
import argparse
from gymnasium.spaces import MultiDiscrete, Box

ray.init(num_cpus=9, num_gpus=1, log_to_driver=False, configure_logging=False)

ppo_config = PPOConfig()

parser = argparse.ArgumentParser(description='Optional app description')
parser.add_argument('-ip', type=str, help='IP of this device')

parser.add_argument('-checkpoint', type=str, help='location of checkpoint to restore from')

args = parser.parse_args()


def _input(ioctx):
    # We are remote worker, or we are local worker with num_workers=0:
    # Create a PolicyServerInput.
    if ioctx.worker_index > 0 or ioctx.worker.num_workers == 0:
        return PolicyServerInput(
            ioctx,
            args.ip,
            55556 + ioctx.worker_index - (1 if ioctx.worker_index > 0 else 0),
        )
    # No InputReader (PolicyServerInput) needed.
    else:
        return None


x = 320
y = 240

# kl_coeff, ->default 0.2
# ppo_config.gamma = 0.01  # vf_loss_coeff used to be 0.01??
# "entropy_coeff": 0.00005,
# "clip_param": 0.1,
ppo_config.gamma = 0.998  # default 0.99
ppo_config.lambda_ = 0.99  # default 1.0???
ppo_config.kl_target = 0.01  # default 0.01
ppo_config.rollout_fragment_length = 128
# ppo_config.train_batch_size = 8500
# ppo_config.train_batch_size = 10000
ppo_config.train_batch_size = 12000
ppo_config.sgd_minibatch_size = 512
# ppo_config.num_sgd_iter = 2  # default 30???
ppo_config.num_sgd_iter = 7  # default 30???
# ppo_config.lr = 3.5e-5  # 5e-5
ppo_config.lr = 9e-5  # 5e-5

ppo_config.model = {
    # Share layers for value function. If you set this to True, it's
    # important to tune vf_loss_coeff.
    "vf_share_layers": True,

    "use_lstm": True,
    "max_seq_len": 32,
    "lstm_cell_size": 128,
    "lstm_use_prev_action": True,

    "conv_filters": [
     
        # 240 X 320
        [16, [5, 5], 3],
        [32, [5, 5], 3],
        [64, [5, 5], 3],
        [128, [3, 3], 2],
        [256, [3, 3], 2],
        [512, [3, 3], 2],
    ],
    "conv_activation": "relu",
    "post_fcnet_hiddens": [512],
    "post_fcnet_activation": "relu"
}
ppo_config.batch_mode = "complete_episodes"
ppo_config.simple_optimizer = True

# ppo_config["remote_worker_envs"] = True



ppo_config.env = None
ppo_config.observation_space = Box(low=0, high=1, shape=(y, x, 1), dtype=np.float32)
ppo_config.action_space = MultiDiscrete(
    [
        2,  # W
        2,  # A
        2,  # S
        2,  # D
        2,  # Space
        2,  # H
        2,  # J
        2,  # K
        2  # L
    ]
)
ppo_config.env_config = {
    "sleep": True,
    'replayOn': False
}

ppo_config.rollouts(num_rollout_workers=2, enable_connectors=False)
ppo_config.offline_data(input_=_input)

ppo_config.framework_str = 'torch'
ppo_config.log_sys_usage = False
ppo_config.compress_observations = True
ppo_config.shuffle_sequences = False
ppo_config.num_gpus = 0.35
ppo_config.num_gpus_per_worker = 0.1
ppo_config.num_cpus_per_worker = 2
ppo_config.num_cpus_per_learner_worker = 2
ppo_config.num_gpus_per_learner_worker = 0.35

tempyy = ppo_config.to_dict()

print(tempyy)

from ray import tune

name = "" + args.checkpoint
print(f"Starting: {name}")

tune.run("PPO",
         resume='AUTO',
         # param_space=config,
         config=tempyy,
         name=name, keep_checkpoints_num=None, checkpoint_score_attr="episode_reward_mean",
         max_failures=1,
         checkpoint_freq=5, checkpoint_at_end=True)

You can of course subsitute the env with a random env. In my case, after letting it make some checkpoints, and then ctr+c in the cmd window. After it gracefully saves, running it again gives the same error as outlined above

perduta · 2023-04-13T12:59:51Z

My issue is occurring on a single GPU machine as well.

…

On Thu, Apr 13, 2023, 14:35 Denys Ashikhin ***@***.***> wrote: For my side (doesn't even require multiple gpus): import ray from ray.rllib.env import PolicyServerInput from ray.rllib.algorithms.ppo import PPOConfig import numpy as np import argparse from gymnasium.spaces import MultiDiscrete, Box ray.init(num_cpus=9, num_gpus=1, log_to_driver=False, configure_logging=False) ppo_config = PPOConfig() parser = argparse.ArgumentParser(description='Optional app description') parser.add_argument('-ip', type=str, help='IP of this device') parser.add_argument('-checkpoint', type=str, help='location of checkpoint to restore from') args = parser.parse_args() def _input(ioctx): # We are remote worker, or we are local worker with num_workers=0: # Create a PolicyServerInput. if ioctx.worker_index > 0 or ioctx.worker.num_workers == 0: return PolicyServerInput( ioctx, args.ip, 55556 + ioctx.worker_index - (1 if ioctx.worker_index > 0 else 0), ) # No InputReader (PolicyServerInput) needed. else: return None x = 320 y = 240 # kl_coeff, ->default 0.2 # ppo_config.gamma = 0.01 # vf_loss_coeff used to be 0.01?? # "entropy_coeff": 0.00005, # "clip_param": 0.1, ppo_config.gamma = 0.998 # default 0.99 ppo_config.lambda_ = 0.99 # default 1.0??? ppo_config.kl_target = 0.01 # default 0.01 ppo_config.rollout_fragment_length = 128 # ppo_config.train_batch_size = 8500 # ppo_config.train_batch_size = 10000 ppo_config.train_batch_size = 12000 ppo_config.sgd_minibatch_size = 512 # ppo_config.num_sgd_iter = 2 # default 30??? ppo_config.num_sgd_iter = 7 # default 30??? # ppo_config.lr = 3.5e-5 # 5e-5ppo_config.lr = 9e-5 # 5e-5 ppo_config.model = { # Share layers for value function. If you set this to True, it's # important to tune vf_loss_coeff. "vf_share_layers": True, "use_lstm": True, "max_seq_len": 32, "lstm_cell_size": 128, "lstm_use_prev_action": True, "conv_filters": [ # 240 X 320 [16, [5, 5], 3], [32, [5, 5], 3], [64, [5, 5], 3], [128, [3, 3], 2], [256, [3, 3], 2], [512, [3, 3], 2], ], "conv_activation": "relu", "post_fcnet_hiddens": [512], "post_fcnet_activation": "relu" } ppo_config.batch_mode = "complete_episodes" ppo_config.simple_optimizer = True # ppo_config["remote_worker_envs"] = True ppo_config.env = None ppo_config.observation_space = Box(low=0, high=1, shape=(y, x, 1), dtype=np.float32) ppo_config.action_space = MultiDiscrete( [ 2, # W 2, # A 2, # S 2, # D 2, # Space 2, # H 2, # J 2, # K 2 # L ] ) ppo_config.env_config = { "sleep": True, 'replayOn': False } ppo_config.rollouts(num_rollout_workers=2, enable_connectors=False) ppo_config.offline_data(input_=_input) ppo_config.framework_str = 'torch' ppo_config.log_sys_usage = False ppo_config.compress_observations = True ppo_config.shuffle_sequences = False ppo_config.num_gpus = 0.5 ppo_config.num_gpus_per_worker = 0.25 ppo_config.num_cpus_per_worker = 2 ppo_config.num_cpus_per_learner_worker = 2 ppo_config.num_gpus_per_learner_worker = 0.5 tempyy = ppo_config.to_dict() print(tempyy) from ray import tune name = "" + args.checkpoint print(f"Starting: {name}") tune.run("PPO", resume='AUTO', # param_space=config, config=tempyy, name=name, keep_checkpoints_num=None, checkpoint_score_attr="episode_reward_mean", max_failures=1, checkpoint_freq=5, checkpoint_at_end=True) ``` You can of course subsitute the env with a random env. In my case, after letting it make some checkpoints, and then ctr+c in the cmd window. After it gracefully saves, running it again gives the same error as outlined above — Reply to this email directly, view it on GitHub <#34159 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEMC3OYRFMYMNWOVS32N5FTXA7XKBANCNFSM6AAAAAAWWUYRZE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

cheadrian · 2023-04-17T12:42:42Z

Got the same problem when trying to resume an experiment on Colab with exactly the same configuration as before.

tuner = Tuner.restore("saved_models", trainer, resume_unfinished = True, resume_errored=True)
tuner.fit()

at the end of iteration.
Single GPU setup.
Python 3.9.16, torch 2.0.0+cu118, ray 2.3.1

xwjiang2010 · 2023-04-17T15:26:30Z

@cheadrian what is the trainer in your case?

cheadrian · 2023-04-17T17:03:52Z

trainer = RLTrainer(
    run_config=run_config,
    scaling_config=ScalingConfig(
        num_workers=2, use_gpu=True,
        trainer_resources={"CPU": 0.0}, 
        resources_per_worker={"CPU": 1.0}),
    algorithm="PPO",
    config=config_cf,
    resume_from_checkpoint=rl_checkpoint,
)

I get the same error with or without specifying it.

solnox99 · 2023-04-20T11:10:56Z

안녕하세요.

sac을 사용하고 있는데
tune으로 학습한 모델을 Algorithm.from_checkpoint()로 불러와서 쓰려니 저도 똑같은 에러가 나네요 ㅠㅠ

DenysAshikhin · 2023-04-20T17:54:10Z

kouroshHakha

Please ignore that point, no matter what settings I use the issue happens same as for the others. I had confused some parameters 😅

gunewar · 2023-04-21T14:54:37Z

same error and no solution yet ????
python3.9/site-packages/torch/optim/adam.py", line 449, in _multi_tensor_adam
torch.foreach_addcmul(device_exp_avg_sqs, device_grads, device_grads, 1 - beta2)
RuntimeError: Expected scalars to be on CPU, got cuda:0 instead.

xwjiang2010 · 2023-04-21T17:21:36Z

@kouroshHakha Could you triage for RL team?

DenysAshikhin · 2023-04-21T17:26:12Z

@gunewar
I'm sure the team is aware of the issue at this point - let's just give them some time to find a way to reproduce this on their end at which point it should be a simple fix (hopefully).

I get the feeling that it's some new configuration that got missed in automated builds

woosangbum · 2023-04-24T10:36:22Z

안녕하세요.

sac을 사용하고 있는데 tune으로 학습한 모델을 Algorithm.from_checkpoint()로 불러와서 쓰려니 저도 똑같은 에러가 나네요 ㅠㅠ

config에서 num gpus를 제거하니 일단 학습은 됩니다! :)

I remove numgpus from config, so it's learning for now! :)

DenysAshikhin · 2023-04-24T13:06:44Z

안녕하세요.
sac을 사용하고 있는데 tune으로 학습한 모델을 Algorithm.from_checkpoint()로 불러와서 쓰려니 저도 똑같은 에러가 나네요 ㅠㅠ

config에서 num gpus를 제거하니 일단 학습은 됩니다! :)

I remove numgpus from config, so it's learning for now! :)

So you trained initially with num_gpus (thus training on the gpu) and for subsequent runs you turned off the num_gpu? Is it training on your cpu or gpu then?

woosangbum · 2023-04-24T14:14:02Z

안녕하세요.
sac을 사용하고 있는데 tune으로 학습한 모델을 Algorithm.from_checkpoint()로 불러와서 쓰려니 저도 똑같은 에러가 나네요 ㅠㅠ

config에서 num gpus를 제거하니 일단 학습은 됩니다! :)
I remove numgpus from config, so it's learning for now! :)

So you trained initially with num_gpus (thus training on the gpu) and for subsequent runs you turned off the num_gpu? Is it training on your cpu or gpu then?

I was able to successfully restore the model that was trained with the num_gpus(=1) config after removing the num_gpus(=0) config.
After doing this, there were no errors and it was possible to continue training the model.

DenysAshikhin · 2023-04-24T14:57:53Z

Interesting, didn't work for me. Can you confirm if it's training on your GPU still or got switched to your CPU after restoring?

woosangbum · 2023-04-24T16:17:59Z

Interesting, didn't work for me. Can you confirm if it's training on your GPU still or got switched to your CPU after restoring?

I just checked the training, and now I'm working on another project.

I used all the configs related to resources as the base and didn't use Ray Tuner

cheadrian · 2023-04-24T19:15:48Z

Got the same problem when trying to resume an experiment on Colab with exactly the same configuration as before.
tuner = Tuner.restore("saved_models", trainer, resume_unfinished = True, resume_errored=True)
tuner.fit()
at the end of iteration. Single GPU setup. Python 3.9.16, torch 2.0.0+cu118, ray 2.3.1
trainer = RLTrainer(
    run_config=run_config,
    scaling_config=ScalingConfig(
        num_workers=2, use_gpu=True,
        trainer_resources={"CPU": 0.0}, 
        resources_per_worker={"CPU": 1.0}),
    algorithm="PPO",
    config=config_cf,
    resume_from_checkpoint=rl_checkpoint,
)

Tune Status
Resources requested: 2.0/2 CPUs, 1.0/1 GPUs, 0.0/10.85 GiB heap, 0.0/0.19 GiB objects

Modifying the use_gpu=True to use_gpu=False makes the training to continue, but only on the CPU.

Tune Status
Resources requested: 2.0/2 CPUs, 0/1 GPUs, 0.0/10.85 GiB heap, 0.0/0.19 GiB objects

DenysAshikhin · 2023-04-25T23:41:40Z

Got the same problem when trying to resume an experiment on Colab with exactly the same configuration as before.
tuner = Tuner.restore("saved_models", trainer, resume_unfinished = True, resume_errored=True)
tuner.fit()
at the end of iteration. Single GPU setup. Python 3.9.16, torch 2.0.0+cu118, ray 2.3.1
trainer = RLTrainer(
    run_config=run_config,
    scaling_config=ScalingConfig(
        num_workers=2, use_gpu=True,
        trainer_resources={"CPU": 0.0}, 
        resources_per_worker={"CPU": 1.0}),
    algorithm="PPO",
    config=config_cf,
    resume_from_checkpoint=rl_checkpoint,
)
Tune Status Resources requested: 2.0/2 CPUs, 1.0/1 GPUs, 0.0/10.85 GiB heap, 0.0/0.19 GiB objects

Modifying the use_gpu=True to use_gpu=False makes the training to continue, but only on the CPU.

Tune Status Resources requested: 2.0/2 CPUs, 0/1 GPUs, 0.0/10.85 GiB heap, 0.0/0.19 GiB objects

I'm happy to know there's kinda a work-around for some. However, it still didn't work for me and it doesn't fix that I need to train on my GPU (as I'm sure others do as well)

kouroshHakha · 2023-05-04T00:05:24Z

Apparently the torch optimizer param_group values should not be moved to cuda devices when restoring optimizer states. There will be a PR addressing this issue in the next releases.

DenysAshikhin · 2023-05-04T00:47:46Z

@kouroshHakha
Thanks for the update, is there a link to view the pr so I can incorporate it on my end until it's through officially?

kouroshHakha · 2023-05-04T00:56:50Z

Hey @DenysAshikhin, so here is the core change that need to happen in RLlib's torch policy. If it's urgent, you can make these changes on your local installation of ray. If it can wait a few days, you can either install nightly or use master once this PR is merged. If you need reliability, you have to wait for the released version.

DenysAshikhin · 2023-05-04T01:03:59Z

@kouroshHakha
Again thank you for the prompt response. Unfortunately, I have manually added a different pr fixing a memory leak in policyserverinput which will not be included anytime soon I'd wager (source: #31400).

As such, I will need to manually add this as I don't want to redo the other for now. I'll report back to confirm if the linked pr fixes it for me soon.

kouroshHakha · 2023-05-04T01:12:43Z

@DenysAshikhin I have pinged the owner of the other PR for possibility of merge . Thanks for your valuable inputs on our library.

DenysAshikhin added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Apr 7, 2023

xwjiang2010 added the needs-repro-script Issue needs a runnable script to be reproduced label Apr 7, 2023

xwjiang2010 added air question Just a question :) and removed bug Something that is supposed to be working; but isn't labels Apr 7, 2023

hora-anyscale added @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. P2 Important issue, but not time-critical and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Apr 7, 2023

xwjiang2010 added rllib RLlib related issues and removed air labels Apr 10, 2023

xwjiang2010 removed the @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. label Apr 21, 2023

xwjiang2010 added triage Needs triage (eg: priority, bug/not-bug, and owning component) and removed needs-repro-script Issue needs a runnable script to be reproduced P2 Important issue, but not time-critical labels Apr 21, 2023

hora-anyscale changed the title ~~RuntimeError: Expected scalars to be on CPU, got cuda:0 instead~~ [RLLib] RuntimeError: Expected scalars to be on CPU, got cuda:0 instead Apr 28, 2023

hora-anyscale added P2 Important issue, but not time-critical and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Apr 28, 2023

kouroshHakha added P0 Issue that must be fixed in short order bug Something that is supposed to be working; but isn't and removed P2 Important issue, but not time-critical question Just a question :) labels May 3, 2023

kouroshHakha assigned ArturNiederfahrenhorst and kouroshHakha and unassigned ArturNiederfahrenhorst May 3, 2023

kouroshHakha mentioned this issue May 4, 2023

[RLlib] Fixed bug in restoring a gpu trained algorithm #35024

Merged

8 tasks

kouroshHakha closed this as completed in #35024 May 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLLib] RuntimeError: Expected scalars to be on CPU, got cuda:0 instead #34159

[RLLib] RuntimeError: Expected scalars to be on CPU, got cuda:0 instead #34159

DenysAshikhin commented Apr 7, 2023

xwjiang2010 commented Apr 7, 2023

perduta commented Apr 8, 2023 •

edited

DenysAshikhin commented Apr 10, 2023

xwjiang2010 commented Apr 10, 2023 •

edited

perduta commented Apr 10, 2023

WeihaoTan commented Apr 11, 2023

kouroshHakha commented Apr 11, 2023

xwjiang2010 commented Apr 11, 2023

WeihaoTan commented Apr 12, 2023 •

edited

perduta commented Apr 13, 2023 •

edited

DenysAshikhin commented Apr 13, 2023 •

edited

perduta commented Apr 13, 2023 via email

cheadrian commented Apr 17, 2023

xwjiang2010 commented Apr 17, 2023

cheadrian commented Apr 17, 2023 •

edited

solnox99 commented Apr 20, 2023

DenysAshikhin commented Apr 20, 2023

gunewar commented Apr 21, 2023

xwjiang2010 commented Apr 21, 2023

DenysAshikhin commented Apr 21, 2023

woosangbum commented Apr 24, 2023 •

edited

DenysAshikhin commented Apr 24, 2023

woosangbum commented Apr 24, 2023

DenysAshikhin commented Apr 24, 2023 •

edited

woosangbum commented Apr 24, 2023 •

edited

cheadrian commented Apr 24, 2023 •

edited

DenysAshikhin commented Apr 25, 2023

kouroshHakha commented May 4, 2023

DenysAshikhin commented May 4, 2023

kouroshHakha commented May 4, 2023

DenysAshikhin commented May 4, 2023

kouroshHakha commented May 4, 2023

[RLLib] RuntimeError: Expected scalars to be on CPU, got cuda:0 instead #34159

[RLLib] RuntimeError: Expected scalars to be on CPU, got cuda:0 instead #34159

Comments

DenysAshikhin commented Apr 7, 2023

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

xwjiang2010 commented Apr 7, 2023

perduta commented Apr 8, 2023 • edited

DenysAshikhin commented Apr 10, 2023

xwjiang2010 commented Apr 10, 2023 • edited

perduta commented Apr 10, 2023

WeihaoTan commented Apr 11, 2023

kouroshHakha commented Apr 11, 2023

xwjiang2010 commented Apr 11, 2023

WeihaoTan commented Apr 12, 2023 • edited

perduta commented Apr 13, 2023 • edited

DenysAshikhin commented Apr 13, 2023 • edited

perduta commented Apr 13, 2023 via email

cheadrian commented Apr 17, 2023

xwjiang2010 commented Apr 17, 2023

cheadrian commented Apr 17, 2023 • edited

solnox99 commented Apr 20, 2023

DenysAshikhin commented Apr 20, 2023

gunewar commented Apr 21, 2023

xwjiang2010 commented Apr 21, 2023

DenysAshikhin commented Apr 21, 2023

woosangbum commented Apr 24, 2023 • edited

DenysAshikhin commented Apr 24, 2023

woosangbum commented Apr 24, 2023

DenysAshikhin commented Apr 24, 2023 • edited

woosangbum commented Apr 24, 2023 • edited

cheadrian commented Apr 24, 2023 • edited

DenysAshikhin commented Apr 25, 2023

kouroshHakha commented May 4, 2023

DenysAshikhin commented May 4, 2023

kouroshHakha commented May 4, 2023

DenysAshikhin commented May 4, 2023

kouroshHakha commented May 4, 2023

perduta commented Apr 8, 2023 •

edited

xwjiang2010 commented Apr 10, 2023 •

edited

WeihaoTan commented Apr 12, 2023 •

edited

perduta commented Apr 13, 2023 •

edited

DenysAshikhin commented Apr 13, 2023 •

edited

cheadrian commented Apr 17, 2023 •

edited

woosangbum commented Apr 24, 2023 •

edited

DenysAshikhin commented Apr 24, 2023 •

edited

woosangbum commented Apr 24, 2023 •

edited

cheadrian commented Apr 24, 2023 •

edited