Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors on new install #67

Closed
jarlva opened this issue Aug 12, 2023 · 1 comment
Closed

Errors on new install #67

jarlva opened this issue Aug 12, 2023 · 1 comment

Comments

@jarlva
Copy link

jarlva commented Aug 12, 2023

After following the steps here by installing the DI dependency and running, on a new install, python3 -u zoo/classic_control/cartpole/config/cartpole_muzero_config.py I now get:

/home/user/miniconda3/envs/light/lib/python3.9/site-packages/gym/wrappers/step_api_compatibility.py:39: DeprecationWarning: WARN: Initializing environment in old step API which returns one bool instead of two. It is recommended to set `new_step_api=True` to use new step API. This will be the default behaviour in future.
  deprecation(
/home/user/miniconda3/envs/light/lib/python3.9/site-packages/gym/wrappers/step_api_compatibility.py:39: DeprecationWarning: WARN: Initializing environment in old step API which returns one bool instead of two. It is recommended to set `new_step_api=True` to use new step API. This will be the default behaviour in future.
  deprecation(
/home/user/miniconda3/envs/light/lib/python3.9/site-packages/gym/core.py:268: DeprecationWarning: WARN: Function `env.seed(seed)` is marked as deprecated and will be removed in the future. Please use `env.reset(seed=seed)` instead.
  deprecation(
/home/user/miniconda3/envs/light/lib/python3.9/site-packages/gym/core.py:268: DeprecationWarning: WARN: Function `env.seed(seed)` is marked as deprecated and will be removed in the future. Please use `env.reset(seed=seed)` instead.
  deprecation(
Traceback (most recent call last):
  File "/home/user/py/LightZero/zoo/classic_control/cartpole/config/cartpole_muzero_config.py", line 93, in <module>
    train_muzero([main_config, create_config], seed=0, max_env_step=max_env_step)
  File "/home/user/py/LightZero/lzero/entry/train_muzero.py", line 158, in train_muzero
    new_data = collector.collect(train_iter=learner.train_iter, policy_kwargs=collect_kwargs)
  File "/home/user/py/LightZero/lzero/worker/muzero_collector.py", line 383, in collect
    policy_output = self._policy.forward(stack_obs, action_mask, temperature, to_play, epsilon)
  File "/home/user/py/LightZero/lzero/policy/muzero.py", line 520, in _forward_collect
    network_output = self._collect_model.initial_inference(data)
  File "/home/user/py/LightZero/lzero/model/muzero_model_mlp.py", line 170, in initial_inference
    latent_state = self._representation(obs)
  File "/home/user/py/LightZero/lzero/model/muzero_model_mlp.py", line 218, in _representation
    latent_state = self.representation_network(observation)
  File "/home/user/miniconda3/envs/light/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/py/LightZero/lzero/model/common.py", line 280, in forward
    return self.fc_representation(x)
  File "/home/user/miniconda3/envs/light/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/miniconda3/envs/light/lib/python3.9/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/home/user/miniconda3/envs/light/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/user/miniconda3/envs/light/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception ignored in: <function MuZeroCollector.__del__ at 0x7f59b7722310>
Traceback (most recent call last):
  File "/home/user/py/LightZero/lzero/worker/muzero_collector.py", line 181, in __del__
    self.close()
  File "/home/user/py/LightZero/lzero/worker/muzero_collector.py", line 171, in close
    self._env.close()
  File "/home/user/py/DI-engine/ding/envs/env_manager/subprocess_env_manager.py", line 635, in close
    p.send(['close', None, None])
  File "/home/user/miniconda3/envs/light/lib/python3.9/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/home/user/miniconda3/envs/light/lib/python3.9/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/user/miniconda3/envs/light/lib/python3.9/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Exception ignored in: <function MuZeroEvaluator.__del__ at 0x7f59b7722af0>
Traceback (most recent call last):
  File "/home/user/py/LightZero/lzero/worker/muzero_evaluator.py", line 170, in __del__
    self.close()
  File "/home/user/py/LightZero/lzero/worker/muzero_evaluator.py", line 160, in close
    self._env.close()
  File "/home/user/py/DI-engine/ding/envs/env_manager/subprocess_env_manager.py", line 635, in close
    p.send(['close', None, None])
  File "/home/user/miniconda3/envs/light/lib/python3.9/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/home/user/miniconda3/envs/light/lib/python3.9/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/user/miniconda3/envs/light/lib/python3.9/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
@puyuan1996
Copy link
Collaborator

Hello,

This error might occur due to a mismatch between your installed torch and CUDA versions and your GPU hardware settings. You can try the following command to install torch and its corresponding CUDA:

pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

If similar errors persist, please provide details on your hardware device and torch version so we can better pinpoint the issue.

Best wishes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants