Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLLib] RuntimeError: Expected scalars to be on CPU, got cuda:0 instead #34159

Closed
DenysAshikhin opened this issue Apr 7, 2023 · 32 comments · Fixed by #35024
Closed

[RLLib] RuntimeError: Expected scalars to be on CPU, got cuda:0 instead #34159

DenysAshikhin opened this issue Apr 7, 2023 · 32 comments · Fixed by #35024
Assignees
Labels
bug Something that is supposed to be working; but isn't P0 Issue that must be fixed in short order rllib RLlib related issues

Comments

@DenysAshikhin
Copy link

What happened + What you expected to happen

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.
Hi all,

I am trying to load in a previously trained model to continue training it, except I get the following error:How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task.
Hi all,

I am trying to load in a previously trained model to continue training it, except I get the following error:

Failure # 1 (occurred at 2023-03-31_14-54-08)
e[36mray::PPO.train()e[39m (pid=5616, ip=127.0.0.1, repr=PPO)
  File "python\ray\_raylet.pyx", line 875, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 879, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 819, in ray._raylet.execute_task.function_executor
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\_private\function_manager.py", line 674, in actor_method_executor
    return method(__ray_actor, *args, **kwargs)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\tune\trainable\trainable.py", line 384, in train
    raise skipped from exception_cause(skipped)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\tune\trainable\trainable.py", line 381, in train
    result = self.step()
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 794, in step
    results, train_iter_ctx = self._run_one_training_iteration()
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 2810, in _run_one_training_iteration
    results = self.training_step()
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\algorithms\ppo\ppo.py", line 420, in training_step
    train_results = train_one_step(self, train_batch)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\execution\train_ops.py", line 52, in train_one_step
    info = do_minibatch_sgd(
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\utils\sgd.py", line 129, in do_minibatch_sgd
    local_worker.learn_on_batch(
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 1029, in learn_on_batch
    info_out[pid] = policy.learn_on_batch(batch)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\utils\threading.py", line 24, in wrapper
    return func(self, *a, **k)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 663, in learn_on_batch
    self.apply_gradients(_directStepOptimizerSingleton)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 880, in apply_gradients
    opt.step()
  File "C:\personal\ai\ray_venv\lib\site-packages\torch\optim\optimizer.py", line 280, in wrapper
    out = func(*args, **kwargs)
  File "C:\personal\ai\ray_venv\lib\site-packages\torch\optim\optimizer.py", line 33, in _use_grad
    ret = func(self, *args, **kwargs)
  File "C:\personal\ai\ray_venv\lib\site-packages\torch\optim\adam.py", line 141, in step
    adam(
  File "C:\personal\ai\ray_venv\lib\site-packages\torch\optim\adam.py", line 281, in adam
    func(params,
  File "C:\personal\ai\ray_venv\lib\site-packages\torch\optim\adam.py", line 449, in _multi_tensor_adam
    torch._foreach_addcmul_(device_exp_avg_sqs, device_grads, device_grads, 1 - beta2)
RuntimeError: Expected scalars to be on CPU, got cuda:0 instead.

Relevant code:

tune.run("PPO",
         resume='AUTO',
         # param_space=config,
         config=ppo_config.to_dict(),
         name=name, keep_checkpoints_num=None, checkpoint_score_attr="episode_reward_mean",
         max_failures=1,
         # restore="C:\\Users\\denys\\ray_results\\mediumbrawl-attention-256Att-128MLP-L2\\PPOTrainer_RandomEnv_1e882_00000_0_2022-06-02_15-13-44\\checkpoint_000028\\checkpoint-28",
         checkpoint_freq=5, checkpoint_at_end=True)

Versions / Dependencies

OS: Win11
Python: 3.10
Ray: latest nightly windows wheel

Reproduction script

n/a

Issue Severity

High: It blocks me from completing my task.

@DenysAshikhin DenysAshikhin added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Apr 7, 2023
@xwjiang2010 xwjiang2010 added the needs-repro-script Issue needs a runnable script to be reproduced label Apr 7, 2023
@xwjiang2010
Copy link
Contributor

Can you share your script?

@xwjiang2010 xwjiang2010 added air question Just a question :) and removed bug Something that is supposed to be working; but isn't labels Apr 7, 2023
@hora-anyscale hora-anyscale added @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. P2 Important issue, but not time-critical and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Apr 7, 2023
@perduta
Copy link

perduta commented Apr 8, 2023

@xwjiang2010 I think I'm facing same issue, after PBT/PB2 perturbs one of my trials and loads them from checkpoint (loads successfully) next opt.step() seems to be failing. I am trying to debug this - if you have any clues I'm all ears.

OS: Linux 6.1
Python: 3.10
Ray: both on 2.3.1 and latest nightly (1b5b2f8)

code that reproduces the issue every time I run it on my setup

import ray
from ray.rllib.algorithms.ppo import PPOConfig
from ray import tune
from ray.tune.schedulers.pb2 import PB2
from ray import air

ray.init(address="auto")

config = (
    PPOConfig()
    .framework("torch")
    .environment("BipedalWalker-v3")
    .training(
        lr=1e-5,
        model={"fcnet_hiddens": [128, 128]},
        train_batch_size=1024,
    )
    .rollouts(num_rollout_workers=5, num_envs_per_worker=4)
    .resources(num_gpus=1.0 / 4)
)

perturbation_interval = 20
pb2 = PB2(
    time_attr="training_iteration",
    perturbation_interval=perturbation_interval,
    hyperparam_bounds={"lr": [1e-3, 1e-7], "train_batch_size": [128, 1024 * 8]},
)

param_space = {**config.to_dict(), **{"checkpoint_interval": perturbation_interval}}

tuner = tune.Tuner(
    "PPO",
    param_space=param_space,
    run_config=air.RunConfig(
        stop={"training_iteration": 1e9},
        verbose=1,
    ),
    tune_config=tune.TuneConfig(
        scheduler=pb2, metric="episode_reward_mean", mode="max", num_samples=4
    ),
)

results = tuner.fit()

piece of logs:

(RolloutWorker pid=189049) 2023-04-08 20:06:19,348	WARNING env.py:155 -- Your env doesn't have a .spec.max_episode_steps attribute. Your horizon will default to infinity, and your environment will not be reset.
(PPO pid=188910) 2023-04-08 20:06:19,348	WARNING env.py:155 -- Your env doesn't have a .spec.max_episode_steps attribute. Your horizon will default to infinity, and your environment will not be reset.
(PPO pid=188910) 2023-04-08 20:06:19,348	WARNING env.py:155 -- Your env doesn't have a .spec.max_episode_steps attribute. Your horizon will default to infinity, and your environment will not be reset.
(RolloutWorker pid=189049) pybullet build time: Apr  4 2023 02:40:04 [repeated 2x across cluster]
== Status ==
Current time: 2023-04-08 20:06:20 (running for 00:03:11.58)
PopulationBasedTraining: 2 checkpoints, 1 perturbs
Logical resource usage: 24.0/24 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:G)
Current best trial: 9e53a_00003 with episode_reward_mean=-809.2682569878904 and parameters={'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0.25, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, '_fake_gpus': False, 'num_learner_workers': 0, 'num_gpus_per_learner_worker': 0, 'num_cpus_per_learner_worker': 1, 'local_gpu_idx': 0, 'custom_resources_per_worker': {}, 'placement_strategy': 'PACK', 'eager_tracing': False, 'eager_max_retraces': 20, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'env': 'quadcopter-env-v0', 'env_config': {}, 'observation_space': None, 'action_space': None, 'env_task_fn': None, 'render_env': False, 'clip_rewards': None, 'normalize_actions': True, 'clip_actions': False, 'disable_env_checking': False, 'is_atari': False, 'auto_wrap_old_gym_envs': True, 'num_envs_per_worker': 4, 'sample_collector': <class 'ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector'>, 'sample_async': False, 'enable_connectors': True, 'rollout_fragment_length': 'auto', 'batch_mode': 'truncate_episodes', 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'validate_workers_after_construction': True, 'preprocessor_pref': 'deepmind', 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'compress_observations': False, 'enable_tf1_exec_eagerly': False, 'sampler_perf_stats_ema_coef': None, 'gamma': 0.99, 'lr': 1e-05, 'train_batch_size': 1024, 'model': {'_disable_preprocessor_api': False, '_disable_action_flattening': False, 'fcnet_hiddens': [128, 128], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': False, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'framestack': True, 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'encoder_latent_dim': None, 'lstm_use_prev_action_reward': -1, '_use_default_native_models': -1}, 'optimizer': {}, 'max_requests_in_flight_per_sampler_worker': 2, 'learner_class': None, '_enable_learner_api': False, '_learner_hps': PPOLearnerHPs(kl_coeff=0.2, kl_target=0.01, use_critic=True, clip_param=0.3, vf_clip_param=10.0, entropy_coeff=0.0, vf_loss_coeff=1.0, lr_schedule=None, entropy_coeff_schedule=None), 'explore': True, 'exploration_config': {'type': 'StochasticSampling'}, 'policy_states_are_swappable': False, 'input_config': {}, 'actions_in_input_normalized': False, 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_config': {}, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'offline_sampling': False, 'evaluation_interval': None, 'evaluation_duration': 10, 'evaluation_duration_unit': 'episodes', 'evaluation_sample_timeout_s': 180.0, 'evaluation_parallel_to_training': False, 'evaluation_config': None, 'off_policy_estimation_methods': {}, 'ope_split_batch_by_episode': True, 'evaluation_num_workers': 0, 'always_attach_evaluation_results': False, 'enable_async_evaluation': False, 'in_evaluation': False, 'sync_filters_on_rollout_workers_timeout_s': 60.0, 'keep_per_episode_custom_metrics': False, 'metrics_episode_collection_timeout_s': 60.0, 'metrics_num_episodes_for_smoothing': 100, 'min_time_s_per_iteration': None, 'min_train_timesteps_per_iteration': 0, 'min_sample_timesteps_per_iteration': 0, 'export_native_model_files': False, 'checkpoint_trainable_policies_only': False, 'logger_creator': None, 'logger_config': None, 'log_level': 'WARN', 'log_sys_usage': True, 'fake_sampler': False, 'seed': None, 'worker_cls': None, 'ignore_worker_failures': False, 'recreate_failed_workers': False, 'max_num_worker_restarts': 1000, 'delay_between_worker_restarts_s': 60.0, 'restart_failed_sub_environments': False, 'num_consecutive_worker_failures_tolerance': 100, 'worker_health_probe_timeout_s': 60, 'worker_restore_timeout_s': 1800, 'rl_module_spec': None, '_enable_rl_module_api': False, '_validate_exploration_conf_and_rl_modules': True, '_tf_policy_handles_more_than_one_loss': False, '_disable_preprocessor_api': False, '_disable_action_flattening': False, '_disable_execution_plan_api': True, 'simple_optimizer': False, 'replay_sequence_length': None, 'horizon': -1, 'soft_horizon': -1, 'no_done_at_end': -1, 'lr_schedule': None, 'use_critic': True, 'use_gae': True, 'kl_coeff': 0.2, 'sgd_minibatch_size': 128, 'num_sgd_iter': 30, 'shuffle_sequences': True, 'vf_loss_coeff': 1.0, 'entropy_coeff': 0.0, 'entropy_coeff_schedule': None, 'clip_param': 0.3, 'vf_clip_param': 10.0, 'grad_clip': None, 'kl_target': 0.01, 'vf_share_layers': -1, 'checkpoint_interval': 20, '__stdout_file__': None, '__stderr_file__': None, 'lambda': 1.0, 'input': 'sampler', 'multiagent': {'policies': {'default_policy': (None, None, None, None)}, 'policy_mapping_fn': <function AlgorithmConfig.DEFAULT_POLICY_MAPPING_FN at 0x7f4db05ea830>, 'policies_to_train': None, 'policy_map_capacity': 100, 'policy_map_cache': -1, 'count_steps_by': 'env_steps', 'observation_fn': None}, 'callbacks': <class 'ray.rllib.algorithms.callbacks.DefaultCallbacks'>, 'create_env_on_driver': False, 'custom_eval_function': None, 'framework': 'torch', 'num_cpus_for_driver': 1, 'num_workers': 5}
Result logdir: /home/pp/ray_results/PPO
Number of trials: 4/4 (4 RUNNING)


(PPO pid=188910) 2023-04-08 20:06:20,350	WARNING checkpoints.py:109 -- No `rllib_checkpoint.json` file found in checkpoint directory /tmp/checkpoint_tmp_a68e78c04f114e399acd38a05eff6631/.! Trying to extract checkpoint info from other files found in that dir.
(PPO pid=188910) 2023-04-08 20:06:20,391	INFO trainable.py:915 -- Restored on 192.168.178.20 from checkpoint: /tmp/checkpoint_tmp_a68e78c04f114e399acd38a05eff6631
(PPO pid=188910) 2023-04-08 20:06:20,392	INFO trainable.py:924 -- Current state after restoring: {'_iteration': 80, '_timesteps_total': None, '_time_total': 146.15634059906006, '_episodes_total': 155}
2023-04-08 20:06:20,958	ERROR trial_runner.py:1485 -- Trial PPO_quadcopter-env-v0_9e53a_00003: Error happened when processing _ExecutorEventType.TRAINING_RESULT.
ray.exceptions.RayTaskError(RuntimeError): ray::PPO.train() (pid=188912, ip=192.168.178.20, repr=PPO)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 386, in train
    raise skipped from exception_cause(skipped)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 383, in train
    result = self.step()
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 792, in step
    results, train_iter_ctx = self._run_one_training_iteration()
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2811, in _run_one_training_iteration
    results = self.training_step()
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/algorithms/ppo/ppo.py", line 432, in training_step
    train_results = multi_gpu_train_one_step(self, train_batch)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/execution/train_ops.py", line 163, in multi_gpu_train_one_step
    results = policy.learn_on_loaded_batch(
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 825, in learn_on_loaded_batch
    self.apply_gradients(_directStepOptimizerSingleton)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 885, in apply_gradients
    opt.step()
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 280, in wrapper
    out = func(*args, **kwargs)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 33, in _use_grad
    ret = func(self, *args, **kwargs)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/adam.py", line 141, in step
    adam(
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/adam.py", line 281, in adam
    func(params,
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/adam.py", line 449, in _multi_tensor_adam
    torch._foreach_addcmul_(device_exp_avg_sqs, device_grads, device_grads, 1 - beta2)
RuntimeError: Expected scalars to be on CPU, got cuda:0 instead.

@DenysAshikhin
Copy link
Author

@xwjiang2010 @perduta

I think I have a lead on this, I'll do some more testing but I think it has to do with the num_gpu and roll_out_workers and how they are set. When I initially ran the tuner.run I hadn't set the gpu resources correct so it trained on the cpu instead. Then later I fixed it and went back to load an older checkpoint (that had the improper gpu set) and it tried loading it onto the gpu instead of cpu causing that issue. I'll see if I can recreate it more consistently.

@xwjiang2010
Copy link
Contributor

xwjiang2010 commented Apr 10, 2023

Actually are people all running into this issue with RLlib Algorithms? It may have something to do with how Algorithm save and load checkpoints (basically they should be consistent - whether both on gpu or both on cpu).

cc @kouroshHakha

related: https://discuss.ray.io/t/runtimeerror-expected-scalars-to-be-on-cpu-got-cuda-0-instead/9998

@xwjiang2010 xwjiang2010 added rllib RLlib related issues and removed air labels Apr 10, 2023
@perduta
Copy link

perduta commented Apr 10, 2023

Actually are people all running into this issue with RLlib Algorithms? It may have something to do with how Algorithm save and load checkpoints (basically they should be consistent - whether both on gpu or both on cpu).

cc @kouroshHakha

I've reproduced this with both PPO and SAC, didn't check the rest.

@WeihaoTan
Copy link

I also met the same issue when restoring and training a PPO agent with ray 2.3.1. Is there any temporary solution? I used run_experiment() to train.

@kouroshHakha
Copy link
Contributor

Hey @DenysAshikhin, I don't have much visibility into this issue, but if you can create a minimal repro script that we can use to debug it would be a great starting point.

For examples some script that trains a PPO agent on cuda for one iteration, and then tries to restore it (for inference or continuation of training) but it fails with the error message you showed.

Thanks.

@xwjiang2010
Copy link
Contributor

also tagging @perduta @WeihaoTan to provide a repro script. Thanks!

@WeihaoTan
Copy link

WeihaoTan commented Apr 12, 2023

Hi @kouroshHakha @xwjiang2010 Here is the repro script. If you run it using a machine with 1 GPU. It works perfectly. If you run it using a machine with multiple GPUs. The bug will appear.
train.py

import argparse
import yaml

import ray
from ray.tune.experiment.config_parser import _make_parser
from ray.tune.progress_reporter import CLIReporter
from ray.tune.tune import run_experiments
from ray.tune.registry import register_trainable, register_env
from ray.tune.schedulers import create_scheduler
from ray.rllib.models import ModelCatalog
from ray.rllib.utils.framework import try_import_torch

from algorithms.registry import ALGORITHMS, get_algorithm_class
from envs.registry import ENVIRONMENTS, get_env_class, POLICY_MAPPINGS, CALLBACKS
from models.registry import MODELS, get_model_class, ACTION_DISTS, get_action_dist_class


EXAMPLE_USAGE = """
python train.py -f config.yaml
"""

# Try to import both backends for flag checking/warnings.
torch, _ = try_import_torch()


def create_parser(parser_creator=None):
    parser = _make_parser(
        parser_creator=parser_creator,
        formatter_class=argparse.RawDescriptionHelpFormatter,
        description="Train a reinforcement learning agent.",
        epilog=EXAMPLE_USAGE,
    )

    # See also the base parser definition in ray/tune/experiment/__config_parser.py
    parser.add_argument(
        "--ray-address",
        default=None,
        type=str,
        help="Connect to an existing Ray cluster at this address instead "
        "of starting a new one.",
    )
    parser.add_argument(
        "--ray-ui", action="store_true", help="Whether to enable the Ray web UI."
    )
    parser.add_argument(
        "--local-mode",
        action="store_true",
        help="Run ray in local mode for easier debugging.",
    )
    parser.add_argument(
        "--ray-num-cpus",
        default=None,
        type=int,
        help="--num-cpus to use if starting a new cluster.",
    )
    parser.add_argument(
        "--ray-num-gpus",
        default=None,
        type=int,
        help="--num-gpus to use if starting a new cluster.",
    )
    parser.add_argument(
        "--ray-num-nodes",
        default=None,
        type=int,
        help="Emulate multiple cluster nodes for debugging.",
    )
    parser.add_argument(
        "--ray-object-store-memory",
        default=None,
        type=int,
        help="--object-store-memory to use if starting a new cluster.",
    )
    parser.add_argument(
        "--resume",
        action="store_true",
        help="Whether to attempt to resume previous Tune experiments.",
    )
    parser.add_argument(
        "-f",
        "--config-file",
        default="config.yaml",
        type=str,
        help="If specified, use config options from this file. Note that this "
             "overrides any trial-specific options set via flags above.",
    )

    return parser


def run(args, parser):
    assert args.config_file is not None, "Must specify a config file"
    with open(args.config_file) as f:
        experiments = yaml.safe_load(f)
    verbose = 1
    for exp in experiments.values():
        metric_columns = exp.pop("metric_columns", None)
        if not exp.get("run"):
            parser.error("the following arguments are required: --run")
        if not exp.get("env") and not exp.get("config", {}).get("env"):
            parser.error("the following arguments are required: --env")
        if exp["config"].get("multiagent"):
            policy_mapping_name = exp["config"]["multiagent"].get("policy_mapping_fn")
            if isinstance(policy_mapping_name, str):
                exp["config"]["multiagent"]["policy_mapping_fn"] = POLICY_MAPPINGS[policy_mapping_name]
        if exp["config"].get("callbacks"):
            calback_name = exp["config"].get("callbacks")
            if isinstance(calback_name, str):
                exp["config"]["callbacks"] = CALLBACKS[calback_name]

    if args.ray_num_nodes:
        from ray.cluster_utils import Cluster

        cluster = Cluster()
        for _ in range(args.ray_num_nodes):
            cluster.add_node(
                num_cpus=args.ray_num_cpus or 1,
                num_gpus=args.ray_num_gpus or 0,
                object_store_memory=args.ray_object_store_memory,
            )
        ray.init(address=cluster.address)
    else:
        ray.init(
            include_dashboard=args.ray_ui,
            address=args.ray_address,
            object_store_memory=args.ray_object_store_memory,
            num_cpus=args.ray_num_cpus,
            num_gpus=args.ray_num_gpus,
            local_mode=args.local_mode,
        )

    progress_reporter = CLIReporter(
        print_intermediate_tables=verbose >= 1,
        metric_columns=metric_columns,
    )

    trials = run_experiments(
        experiments,
        scheduler=create_scheduler(args.scheduler, **args.scheduler_config),
        resume=args.resume,
        verbose=verbose,
        progress_reporter=progress_reporter,
        concurrent=True,
    )
    ray.shutdown()

    checkpoints = []
    for trial in trials:
        if trial.checkpoint.dir_or_data:
            checkpoints.append(trial.checkpoint.dir_or_data)

    if checkpoints:
        from rich import print
        from rich.panel import Panel

        print("\nYour training finished.")

        print("Best available checkpoint for each trial:")
        for cp in checkpoints:
            print(f"  {cp}")

        print(
            "\nYou can now evaluate your trained algorithm from any "
            "checkpoint, e.g. by running:"
        )
        print(Panel(f"[green]  rllib evaluate {checkpoints[0]} "))



def main():
    parser = create_parser()
    args = parser.parse_args()
    run(args, parser)


if __name__ == "__main__":
    main()

config.yaml

test:
  run: PPO
  checkpoint_config:
    checkpoint_frequency: 50
    checkpoint_at_end: true
    num_to_keep: 5
  local_dir: ray_results
  stop:
    timesteps_total: 1000
  #restore: checkpoint

  config:
    framework: torch 
    env: CartPole-v1

    num_workers: 4
    num_cpus_for_driver: 1
    num_envs_per_worker: 1
    num_cpus_per_worker: 2
    num_gpus: 1

    disable_env_checking: true

After training is finished, uncomment "restore" and put the trained checkpoint path in it. Increase timesteps_total and re-run train.py, the bug will appear. Something like:

    torch._foreach_addcmul_(device_exp_avg_sqs, device_grads, device_grads, 1 - beta2)
RuntimeError: Expected scalars to be on CPU, got cuda:0 instead.

@perduta
Copy link

perduta commented Apr 13, 2023

@xwjiang2010 I think I'm facing same issue, after PBT/PB2 perturbs one of my trials and loads them from checkpoint (loads successfully) next opt.step() seems to be failing. I am trying to debug this - if you have any clues I'm all ears.

OS: Linux 6.1 Python: 3.10 Ray: both on 2.3.1 and latest nightly (1b5b2f8)

code that reproduces the issue every time I run it on my setup

import ray
from ray.rllib.algorithms.ppo import PPOConfig
from ray import tune
from ray.tune.schedulers.pb2 import PB2
from ray import air

ray.init(address="auto")

config = (
    PPOConfig()
    .framework("torch")
    .environment("BipedalWalker-v3")
    .training(
        lr=1e-5,
        model={"fcnet_hiddens": [128, 128]},
        train_batch_size=1024,
    )
    .rollouts(num_rollout_workers=5, num_envs_per_worker=4)
    .resources(num_gpus=1.0 / 4)
)

perturbation_interval = 20
pb2 = PB2(
    time_attr="training_iteration",
    perturbation_interval=perturbation_interval,
    hyperparam_bounds={"lr": [1e-3, 1e-7], "train_batch_size": [128, 1024 * 8]},
)

param_space = {**config.to_dict(), **{"checkpoint_interval": perturbation_interval}}

tuner = tune.Tuner(
    "PPO",
    param_space=param_space,
    run_config=air.RunConfig(
        stop={"training_iteration": 1e9},
        verbose=1,
    ),
    tune_config=tune.TuneConfig(
        scheduler=pb2, metric="episode_reward_mean", mode="max", num_samples=4
    ),
)

results = tuner.fit()

piece of logs:

(RolloutWorker pid=189049) 2023-04-08 20:06:19,348	WARNING env.py:155 -- Your env doesn't have a .spec.max_episode_steps attribute. Your horizon will default to infinity, and your environment will not be reset.
(PPO pid=188910) 2023-04-08 20:06:19,348	WARNING env.py:155 -- Your env doesn't have a .spec.max_episode_steps attribute. Your horizon will default to infinity, and your environment will not be reset.
(PPO pid=188910) 2023-04-08 20:06:19,348	WARNING env.py:155 -- Your env doesn't have a .spec.max_episode_steps attribute. Your horizon will default to infinity, and your environment will not be reset.
(RolloutWorker pid=189049) pybullet build time: Apr  4 2023 02:40:04 [repeated 2x across cluster]
== Status ==
Current time: 2023-04-08 20:06:20 (running for 00:03:11.58)
PopulationBasedTraining: 2 checkpoints, 1 perturbs
Logical resource usage: 24.0/24 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:G)
Current best trial: 9e53a_00003 with episode_reward_mean=-809.2682569878904 and parameters={'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0.25, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, '_fake_gpus': False, 'num_learner_workers': 0, 'num_gpus_per_learner_worker': 0, 'num_cpus_per_learner_worker': 1, 'local_gpu_idx': 0, 'custom_resources_per_worker': {}, 'placement_strategy': 'PACK', 'eager_tracing': False, 'eager_max_retraces': 20, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'env': 'quadcopter-env-v0', 'env_config': {}, 'observation_space': None, 'action_space': None, 'env_task_fn': None, 'render_env': False, 'clip_rewards': None, 'normalize_actions': True, 'clip_actions': False, 'disable_env_checking': False, 'is_atari': False, 'auto_wrap_old_gym_envs': True, 'num_envs_per_worker': 4, 'sample_collector': <class 'ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector'>, 'sample_async': False, 'enable_connectors': True, 'rollout_fragment_length': 'auto', 'batch_mode': 'truncate_episodes', 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'validate_workers_after_construction': True, 'preprocessor_pref': 'deepmind', 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'compress_observations': False, 'enable_tf1_exec_eagerly': False, 'sampler_perf_stats_ema_coef': None, 'gamma': 0.99, 'lr': 1e-05, 'train_batch_size': 1024, 'model': {'_disable_preprocessor_api': False, '_disable_action_flattening': False, 'fcnet_hiddens': [128, 128], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': False, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'framestack': True, 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'encoder_latent_dim': None, 'lstm_use_prev_action_reward': -1, '_use_default_native_models': -1}, 'optimizer': {}, 'max_requests_in_flight_per_sampler_worker': 2, 'learner_class': None, '_enable_learner_api': False, '_learner_hps': PPOLearnerHPs(kl_coeff=0.2, kl_target=0.01, use_critic=True, clip_param=0.3, vf_clip_param=10.0, entropy_coeff=0.0, vf_loss_coeff=1.0, lr_schedule=None, entropy_coeff_schedule=None), 'explore': True, 'exploration_config': {'type': 'StochasticSampling'}, 'policy_states_are_swappable': False, 'input_config': {}, 'actions_in_input_normalized': False, 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_config': {}, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'offline_sampling': False, 'evaluation_interval': None, 'evaluation_duration': 10, 'evaluation_duration_unit': 'episodes', 'evaluation_sample_timeout_s': 180.0, 'evaluation_parallel_to_training': False, 'evaluation_config': None, 'off_policy_estimation_methods': {}, 'ope_split_batch_by_episode': True, 'evaluation_num_workers': 0, 'always_attach_evaluation_results': False, 'enable_async_evaluation': False, 'in_evaluation': False, 'sync_filters_on_rollout_workers_timeout_s': 60.0, 'keep_per_episode_custom_metrics': False, 'metrics_episode_collection_timeout_s': 60.0, 'metrics_num_episodes_for_smoothing': 100, 'min_time_s_per_iteration': None, 'min_train_timesteps_per_iteration': 0, 'min_sample_timesteps_per_iteration': 0, 'export_native_model_files': False, 'checkpoint_trainable_policies_only': False, 'logger_creator': None, 'logger_config': None, 'log_level': 'WARN', 'log_sys_usage': True, 'fake_sampler': False, 'seed': None, 'worker_cls': None, 'ignore_worker_failures': False, 'recreate_failed_workers': False, 'max_num_worker_restarts': 1000, 'delay_between_worker_restarts_s': 60.0, 'restart_failed_sub_environments': False, 'num_consecutive_worker_failures_tolerance': 100, 'worker_health_probe_timeout_s': 60, 'worker_restore_timeout_s': 1800, 'rl_module_spec': None, '_enable_rl_module_api': False, '_validate_exploration_conf_and_rl_modules': True, '_tf_policy_handles_more_than_one_loss': False, '_disable_preprocessor_api': False, '_disable_action_flattening': False, '_disable_execution_plan_api': True, 'simple_optimizer': False, 'replay_sequence_length': None, 'horizon': -1, 'soft_horizon': -1, 'no_done_at_end': -1, 'lr_schedule': None, 'use_critic': True, 'use_gae': True, 'kl_coeff': 0.2, 'sgd_minibatch_size': 128, 'num_sgd_iter': 30, 'shuffle_sequences': True, 'vf_loss_coeff': 1.0, 'entropy_coeff': 0.0, 'entropy_coeff_schedule': None, 'clip_param': 0.3, 'vf_clip_param': 10.0, 'grad_clip': None, 'kl_target': 0.01, 'vf_share_layers': -1, 'checkpoint_interval': 20, '__stdout_file__': None, '__stderr_file__': None, 'lambda': 1.0, 'input': 'sampler', 'multiagent': {'policies': {'default_policy': (None, None, None, None)}, 'policy_mapping_fn': <function AlgorithmConfig.DEFAULT_POLICY_MAPPING_FN at 0x7f4db05ea830>, 'policies_to_train': None, 'policy_map_capacity': 100, 'policy_map_cache': -1, 'count_steps_by': 'env_steps', 'observation_fn': None}, 'callbacks': <class 'ray.rllib.algorithms.callbacks.DefaultCallbacks'>, 'create_env_on_driver': False, 'custom_eval_function': None, 'framework': 'torch', 'num_cpus_for_driver': 1, 'num_workers': 5}
Result logdir: /home/pp/ray_results/PPO
Number of trials: 4/4 (4 RUNNING)


(PPO pid=188910) 2023-04-08 20:06:20,350	WARNING checkpoints.py:109 -- No `rllib_checkpoint.json` file found in checkpoint directory /tmp/checkpoint_tmp_a68e78c04f114e399acd38a05eff6631/.! Trying to extract checkpoint info from other files found in that dir.
(PPO pid=188910) 2023-04-08 20:06:20,391	INFO trainable.py:915 -- Restored on 192.168.178.20 from checkpoint: /tmp/checkpoint_tmp_a68e78c04f114e399acd38a05eff6631
(PPO pid=188910) 2023-04-08 20:06:20,392	INFO trainable.py:924 -- Current state after restoring: {'_iteration': 80, '_timesteps_total': None, '_time_total': 146.15634059906006, '_episodes_total': 155}
2023-04-08 20:06:20,958	ERROR trial_runner.py:1485 -- Trial PPO_quadcopter-env-v0_9e53a_00003: Error happened when processing _ExecutorEventType.TRAINING_RESULT.
ray.exceptions.RayTaskError(RuntimeError): ray::PPO.train() (pid=188912, ip=192.168.178.20, repr=PPO)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 386, in train
    raise skipped from exception_cause(skipped)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 383, in train
    result = self.step()
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 792, in step
    results, train_iter_ctx = self._run_one_training_iteration()
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2811, in _run_one_training_iteration
    results = self.training_step()
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/algorithms/ppo/ppo.py", line 432, in training_step
    train_results = multi_gpu_train_one_step(self, train_batch)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/execution/train_ops.py", line 163, in multi_gpu_train_one_step
    results = policy.learn_on_loaded_batch(
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 825, in learn_on_loaded_batch
    self.apply_gradients(_directStepOptimizerSingleton)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 885, in apply_gradients
    opt.step()
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 280, in wrapper
    out = func(*args, **kwargs)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 33, in _use_grad
    ret = func(self, *args, **kwargs)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/adam.py", line 141, in step
    adam(
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/adam.py", line 281, in adam
    func(params,
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/adam.py", line 449, in _multi_tensor_adam
    torch._foreach_addcmul_(device_exp_avg_sqs, device_grads, device_grads, 1 - beta2)
RuntimeError: Expected scalars to be on CPU, got cuda:0 instead.

@xwjiang2010 is this enough? if not, what else should I attach?
Should we attach bug label back? I don't know any way to workaround this and I am unable to train any RL model at this time using ray.

@DenysAshikhin
Copy link
Author

DenysAshikhin commented Apr 13, 2023

For my side (doesn't even require multiple gpus):

import ray
from ray.rllib.env import PolicyServerInput
from ray.rllib.algorithms.ppo import PPOConfig

import numpy as np
import argparse
from gymnasium.spaces import MultiDiscrete, Box

ray.init(num_cpus=9, num_gpus=1, log_to_driver=False, configure_logging=False)

ppo_config = PPOConfig()

parser = argparse.ArgumentParser(description='Optional app description')
parser.add_argument('-ip', type=str, help='IP of this device')

parser.add_argument('-checkpoint', type=str, help='location of checkpoint to restore from')

args = parser.parse_args()


def _input(ioctx):
    # We are remote worker, or we are local worker with num_workers=0:
    # Create a PolicyServerInput.
    if ioctx.worker_index > 0 or ioctx.worker.num_workers == 0:
        return PolicyServerInput(
            ioctx,
            args.ip,
            55556 + ioctx.worker_index - (1 if ioctx.worker_index > 0 else 0),
        )
    # No InputReader (PolicyServerInput) needed.
    else:
        return None


x = 320
y = 240

# kl_coeff, ->default 0.2
# ppo_config.gamma = 0.01  # vf_loss_coeff used to be 0.01??
# "entropy_coeff": 0.00005,
# "clip_param": 0.1,
ppo_config.gamma = 0.998  # default 0.99
ppo_config.lambda_ = 0.99  # default 1.0???
ppo_config.kl_target = 0.01  # default 0.01
ppo_config.rollout_fragment_length = 128
# ppo_config.train_batch_size = 8500
# ppo_config.train_batch_size = 10000
ppo_config.train_batch_size = 12000
ppo_config.sgd_minibatch_size = 512
# ppo_config.num_sgd_iter = 2  # default 30???
ppo_config.num_sgd_iter = 7  # default 30???
# ppo_config.lr = 3.5e-5  # 5e-5
ppo_config.lr = 9e-5  # 5e-5

ppo_config.model = {
    # Share layers for value function. If you set this to True, it's
    # important to tune vf_loss_coeff.
    "vf_share_layers": True,

    "use_lstm": True,
    "max_seq_len": 32,
    "lstm_cell_size": 128,
    "lstm_use_prev_action": True,

    "conv_filters": [
     
        # 240 X 320
        [16, [5, 5], 3],
        [32, [5, 5], 3],
        [64, [5, 5], 3],
        [128, [3, 3], 2],
        [256, [3, 3], 2],
        [512, [3, 3], 2],
    ],
    "conv_activation": "relu",
    "post_fcnet_hiddens": [512],
    "post_fcnet_activation": "relu"
}
ppo_config.batch_mode = "complete_episodes"
ppo_config.simple_optimizer = True

# ppo_config["remote_worker_envs"] = True



ppo_config.env = None
ppo_config.observation_space = Box(low=0, high=1, shape=(y, x, 1), dtype=np.float32)
ppo_config.action_space = MultiDiscrete(
    [
        2,  # W
        2,  # A
        2,  # S
        2,  # D
        2,  # Space
        2,  # H
        2,  # J
        2,  # K
        2  # L
    ]
)
ppo_config.env_config = {
    "sleep": True,
    'replayOn': False
}

ppo_config.rollouts(num_rollout_workers=2, enable_connectors=False)
ppo_config.offline_data(input_=_input)

ppo_config.framework_str = 'torch'
ppo_config.log_sys_usage = False
ppo_config.compress_observations = True
ppo_config.shuffle_sequences = False
ppo_config.num_gpus = 0.35
ppo_config.num_gpus_per_worker = 0.1
ppo_config.num_cpus_per_worker = 2
ppo_config.num_cpus_per_learner_worker = 2
ppo_config.num_gpus_per_learner_worker = 0.35

tempyy = ppo_config.to_dict()

print(tempyy)

from ray import tune

name = "" + args.checkpoint
print(f"Starting: {name}")

tune.run("PPO",
         resume='AUTO',
         # param_space=config,
         config=tempyy,
         name=name, keep_checkpoints_num=None, checkpoint_score_attr="episode_reward_mean",
         max_failures=1,
         checkpoint_freq=5, checkpoint_at_end=True)

You can of course subsitute the env with a random env. In my case, after letting it make some checkpoints, and then ctr+c in the cmd window. After it gracefully saves, running it again gives the same error as outlined above

@perduta
Copy link

perduta commented Apr 13, 2023 via email

@cheadrian
Copy link

Got the same problem when trying to resume an experiment on Colab with exactly the same configuration as before.

tuner = Tuner.restore("saved_models", trainer, resume_unfinished = True, resume_errored=True)
tuner.fit()

at the end of iteration.
Single GPU setup.
Python 3.9.16, torch 2.0.0+cu118, ray 2.3.1

@xwjiang2010
Copy link
Contributor

@cheadrian what is the trainer in your case?

@cheadrian
Copy link

cheadrian commented Apr 17, 2023

trainer = RLTrainer(
    run_config=run_config,
    scaling_config=ScalingConfig(
        num_workers=2, use_gpu=True,
        trainer_resources={"CPU": 0.0}, 
        resources_per_worker={"CPU": 1.0}),
    algorithm="PPO",
    config=config_cf,
    resume_from_checkpoint=rl_checkpoint,
)

I get the same error with or without specifying it.

@solnox99
Copy link

안녕하세요.

sac을 사용하고 있는데
tune으로 학습한 모델을 Algorithm.from_checkpoint()로 불러와서 쓰려니 저도 똑같은 에러가 나네요 ㅠㅠ

@DenysAshikhin
Copy link
Author

kouroshHakha

Please ignore that point, no matter what settings I use the issue happens same as for the others. I had confused some parameters 😅

@gunewar
Copy link

gunewar commented Apr 21, 2023

same error and no solution yet ????
python3.9/site-packages/torch/optim/adam.py", line 449, in _multi_tensor_adam
torch.foreach_addcmul(device_exp_avg_sqs, device_grads, device_grads, 1 - beta2)
RuntimeError: Expected scalars to be on CPU, got cuda:0 instead.

@xwjiang2010 xwjiang2010 removed the @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. label Apr 21, 2023
@xwjiang2010 xwjiang2010 added triage Needs triage (eg: priority, bug/not-bug, and owning component) and removed needs-repro-script Issue needs a runnable script to be reproduced P2 Important issue, but not time-critical labels Apr 21, 2023
@xwjiang2010
Copy link
Contributor

@kouroshHakha Could you triage for RL team?

@DenysAshikhin
Copy link
Author

@gunewar
I'm sure the team is aware of the issue at this point - let's just give them some time to find a way to reproduce this on their end at which point it should be a simple fix (hopefully).

I get the feeling that it's some new configuration that got missed in automated builds

@woosangbum
Copy link

woosangbum commented Apr 24, 2023

안녕하세요.

sac을 사용하고 있는데 tune으로 학습한 모델을 Algorithm.from_checkpoint()로 불러와서 쓰려니 저도 똑같은 에러가 나네요 ㅠㅠ

config에서 num gpus를 제거하니 일단 학습은 됩니다! :)

I remove numgpus from config, so it's learning for now! :)

@DenysAshikhin
Copy link
Author

안녕하세요.
sac을 사용하고 있는데 tune으로 학습한 모델을 Algorithm.from_checkpoint()로 불러와서 쓰려니 저도 똑같은 에러가 나네요 ㅠㅠ

config에서 num gpus를 제거하니 일단 학습은 됩니다! :)

I remove numgpus from config, so it's learning for now! :)

So you trained initially with num_gpus (thus training on the gpu) and for subsequent runs you turned off the num_gpu? Is it training on your cpu or gpu then?

@woosangbum
Copy link

안녕하세요.
sac을 사용하고 있는데 tune으로 학습한 모델을 Algorithm.from_checkpoint()로 불러와서 쓰려니 저도 똑같은 에러가 나네요 ㅠㅠ

config에서 num gpus를 제거하니 일단 학습은 됩니다! :)
I remove numgpus from config, so it's learning for now! :)

So you trained initially with num_gpus (thus training on the gpu) and for subsequent runs you turned off the num_gpu? Is it training on your cpu or gpu then?

I was able to successfully restore the model that was trained with the num_gpus(=1) config after removing the num_gpus(=0) config.
After doing this, there were no errors and it was possible to continue training the model.

@DenysAshikhin
Copy link
Author

DenysAshikhin commented Apr 24, 2023

Interesting, didn't work for me. Can you confirm if it's training on your GPU still or got switched to your CPU after restoring?

@woosangbum
Copy link

woosangbum commented Apr 24, 2023

Interesting, didn't work for me. Can you confirm if it's training on your GPU still or got switched to your CPU after restoring?

I just checked the training, and now I'm working on another project.

I used all the configs related to resources as the base and didn't use Ray Tuner

@cheadrian
Copy link

cheadrian commented Apr 24, 2023

Got the same problem when trying to resume an experiment on Colab with exactly the same configuration as before.

tuner = Tuner.restore("saved_models", trainer, resume_unfinished = True, resume_errored=True)
tuner.fit()

at the end of iteration. Single GPU setup. Python 3.9.16, torch 2.0.0+cu118, ray 2.3.1

trainer = RLTrainer(
    run_config=run_config,
    scaling_config=ScalingConfig(
        num_workers=2, use_gpu=True,
        trainer_resources={"CPU": 0.0}, 
        resources_per_worker={"CPU": 1.0}),
    algorithm="PPO",
    config=config_cf,
    resume_from_checkpoint=rl_checkpoint,
)

Tune Status
Resources requested: 2.0/2 CPUs, 1.0/1 GPUs, 0.0/10.85 GiB heap, 0.0/0.19 GiB objects

Modifying the use_gpu=True to use_gpu=False makes the training to continue, but only on the CPU.

Tune Status
Resources requested: 2.0/2 CPUs, 0/1 GPUs, 0.0/10.85 GiB heap, 0.0/0.19 GiB objects

@DenysAshikhin
Copy link
Author

Got the same problem when trying to resume an experiment on Colab with exactly the same configuration as before.

tuner = Tuner.restore("saved_models", trainer, resume_unfinished = True, resume_errored=True)
tuner.fit()

at the end of iteration. Single GPU setup. Python 3.9.16, torch 2.0.0+cu118, ray 2.3.1

trainer = RLTrainer(
    run_config=run_config,
    scaling_config=ScalingConfig(
        num_workers=2, use_gpu=True,
        trainer_resources={"CPU": 0.0}, 
        resources_per_worker={"CPU": 1.0}),
    algorithm="PPO",
    config=config_cf,
    resume_from_checkpoint=rl_checkpoint,
)

Tune Status Resources requested: 2.0/2 CPUs, 1.0/1 GPUs, 0.0/10.85 GiB heap, 0.0/0.19 GiB objects

Modifying the use_gpu=True to use_gpu=False makes the training to continue, but only on the CPU.

Tune Status Resources requested: 2.0/2 CPUs, 0/1 GPUs, 0.0/10.85 GiB heap, 0.0/0.19 GiB objects

I'm happy to know there's kinda a work-around for some. However, it still didn't work for me and it doesn't fix that I need to train on my GPU (as I'm sure others do as well)

@hora-anyscale hora-anyscale changed the title RuntimeError: Expected scalars to be on CPU, got cuda:0 instead [RLLib] RuntimeError: Expected scalars to be on CPU, got cuda:0 instead Apr 28, 2023
@hora-anyscale hora-anyscale added P2 Important issue, but not time-critical and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Apr 28, 2023
@kouroshHakha kouroshHakha added P0 Issue that must be fixed in short order bug Something that is supposed to be working; but isn't and removed P2 Important issue, but not time-critical question Just a question :) labels May 3, 2023
@kouroshHakha
Copy link
Contributor

Apparently the torch optimizer param_group values should not be moved to cuda devices when restoring optimizer states. There will be a PR addressing this issue in the next releases.

@DenysAshikhin
Copy link
Author

@kouroshHakha
Thanks for the update, is there a link to view the pr so I can incorporate it on my end until it's through officially?

@kouroshHakha
Copy link
Contributor

Hey @DenysAshikhin, so here is the core change that need to happen in RLlib's torch policy. If it's urgent, you can make these changes on your local installation of ray. If it can wait a few days, you can either install nightly or use master once this PR is merged. If you need reliability, you have to wait for the released version.

@DenysAshikhin
Copy link
Author

@kouroshHakha
Again thank you for the prompt response. Unfortunately, I have manually added a different pr fixing a memory leak in policyserverinput which will not be included anytime soon I'd wager (source: #31400).

As such, I will need to manually add this as I don't want to redo the other for now. I'll report back to confirm if the linked pr fixes it for me soon.

@kouroshHakha
Copy link
Contributor

@DenysAshikhin I have pinged the owner of the other PR for possibility of merge . Thanks for your valuable inputs on our library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't P0 Issue that must be fixed in short order rllib RLlib related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.