Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA out of memory #16

Closed
khcf123 opened this issue Jan 1, 2022 · 4 comments
Closed

CUDA out of memory #16

khcf123 opened this issue Jan 1, 2022 · 4 comments

Comments

@khcf123
Copy link

khcf123 commented Jan 1, 2022

Hi, im running on windows 10 and using latest starcraft 2. The error shown as below when I run "python run.py"

(torch_1_5) PS C:\Users\alexa\Downloads\mini-AlphaStar-main> python run.py
pygame 2.1.2 (SDL 2.0.18, Python 3.7.11)
Hello from the pygame community. https://www.pygame.org/contribute.html
run init
cudnn available
cudnn version 7604
initialed player
initialed teacher
start_time before training: 2022-01-01 18:11:32
map name: Simple64
player.name: MainPlayer
player.race: Race.protoss
start_time before reset: 2022-01-01 18:13:12
total_episodes: 1
start_episode_time before is_final: 2022-01-01 18:13:13
ActorLoop.run() Exception cause return, Detials of the Exception: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 2.00 GiB total capacity; 1.27 GiB already allocated; 0 bytes free; 1.33 GiB reserved in total by PyTorch)
Traceback (most recent call last):
File "C:\Users\alexa\Downloads\mini-AlphaStar-main\alphastarmini\core\rl\rl_vs_computer_wo_replay.py", line 213, in run
player_step = self.player.agent.step_from_state(state, player_memory)
File "C:\Users\alexa\Downloads\mini-AlphaStar-main\alphastarmini\core\rl\alphastar_agent.py", line 235, in step_from_state
hidden_state=hidden_state)
File "C:\Users\alexa\Downloads\mini-AlphaStar-main\alphastarmini\core\arch\agent.py", line 299, in action_logits_by_state
return_logits = True)
File "C:\Users\alexa\Downloads\mini-AlphaStar-main\alphastarmini\core\arch\arch_model.py", line 134, in forward
entity_embeddings, embedded_entity, entity_nums = self.entity_encoder(state.entity_state)
File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\torch\nn\modules\module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "C:\Users\alexa\Downloads\mini-AlphaStar-main\alphastarmini\core\arch\entity_encoder.py", line 390, in forward
unit_types_one = torch.nonzero(batch, as_tuple=True)[-1]
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 2.00 GiB total capacity; 1.27 GiB already allocated; 0 bytes free; 1.33 GiB reserved in total by PyTorch)

run over

Can i know what is the problem here and what is the solution? Thanks

@liuruoze
Copy link
Owner

liuruoze commented Jan 1, 2022

Yes, this is due to your GPU card memory is not enough.

To fix the problem, try to decrese the number in the line#199 in the

alphastarmini/lib/hyper_parameters.py

MiniStar_Arch_Hyper_Parameters = ArchHyperParameters(batch_size=int(32 * 1.5 / P.Batch_Scale), sequence_length=int(32 * 8 / P.Seq_Scale),

the batch_size and sequence_length can be set to a small number to fit in your GPU card memory (you should also check the value of Batch_Scale and Seq_Scale defined in param.py).

Or you can just use CPU to run the program in the laptop, and switch to GPU when transferring to a server.

To change from GPU to CPU , change the value in the line#2 in the

run.py

USED_DEVICES = "0"

to

USED_DEVICES = "-1"

Hope this will solve your porblem.

@khcf123
Copy link
Author

khcf123 commented Jan 1, 2022

Thanks for your reply

My laptop GPU card memory is 2gb.
What batch_size and sequence_length can be set to a small number to fit in my GPU card memory (also the value of Batch_Scale and Seq_Scale defined in param.py)?

Can you please advice me, thanks!!

@khcf123
Copy link
Author

khcf123 commented Jan 1, 2022

Yes, I can get it to work after restart laptop, and change USED_DEVICES = "0" to USED_DEVICES = "-1".

But, it comes out another error:

ActorLoop.run() Exception cause return, Detials of the Exception: The game didn't advance to the expected game loop. Expected: 2712, got: 2709
Traceback (most recent call last):
File "C:\Users\alexa\Downloads\mini-AlphaStar-main\alphastarmini\core\rl\rl_vs_computer_wo_replay.py", line 253, in run
timesteps = env.step(env_actions, step_mul=STEP_MUL) # STEP_MUL step_mul
File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\pysc2\lib\stopwatch.py", line 212, in _stopwatch
return func(*args, **kwargs)
File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\pysc2\env\sc2_env.py", line 548, in step
return self._step(step_mul)
File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\pysc2\env\sc2_env.py", line 565, in _step
return self._observe(target_game_loop=target_game_loop)
File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\pysc2\env\sc2_env.py", line 670, in _observe
self._get_observations(target_game_loop)
File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\pysc2\env\sc2_env.py", line 645, in _get_observations
"Expected: %s, got: %s") % (target_game_loop, game_loop))
ValueError: The game didn't advance to the expected game loop. Expected: 2712, got: 2709

run over

@liuruoze
Copy link
Owner

liuruoze commented Jan 2, 2022

Yes, I can get it to work after restart laptop, and change USED_DEVICES = "0" to USED_DEVICES = "-1".

But, it comes out another error:

ActorLoop.run() Exception cause return, Detials of the Exception: The game didn't advance to the expected game loop. Expected: 2712, got: 2709 Traceback (most recent call last): File "C:\Users\alexa\Downloads\mini-AlphaStar-main\alphastarmini\core\rl\rl_vs_computer_wo_replay.py", line 253, in run timesteps = env.step(env_actions, step_mul=STEP_MUL) # STEP_MUL step_mul File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\pysc2\lib\stopwatch.py", line 212, in _stopwatch return func(*args, **kwargs) File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\pysc2\env\sc2_env.py", line 548, in step return self._step(step_mul) File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\pysc2\env\sc2_env.py", line 565, in _step return self._observe(target_game_loop=target_game_loop) File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\pysc2\env\sc2_env.py", line 670, in _observe self._get_observations(target_game_loop) File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\pysc2\env\sc2_env.py", line 645, in _get_observations "Expected: %s, got: %s") % (target_game_loop, game_loop)) ValueError: The game didn't advance to the expected game loop. Expected: 2712, got: 2709

run over

Yes, this is a problem that occasionally happens in windows SC2 (the rate is rare, actually I don't know the reason). However, this problem is not the content of the current issue, which should be discussed in a new issue. Please open a new issue. I will close the current issue for you.

@liuruoze liuruoze closed this as completed Jan 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants