CUDA out of memory #16

khcf123 · 2022-01-01T10:35:21Z

Hi, im running on windows 10 and using latest starcraft 2. The error shown as below when I run "python run.py"

(torch_1_5) PS C:\Users\alexa\Downloads\mini-AlphaStar-main> python run.py
pygame 2.1.2 (SDL 2.0.18, Python 3.7.11)
Hello from the pygame community. https://www.pygame.org/contribute.html
run init
cudnn available
cudnn version 7604
initialed player
initialed teacher
start_time before training: 2022-01-01 18:11:32
map name: Simple64
player.name: MainPlayer
player.race: Race.protoss
start_time before reset: 2022-01-01 18:13:12
total_episodes: 1
start_episode_time before is_final: 2022-01-01 18:13:13
ActorLoop.run() Exception cause return, Detials of the Exception: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 2.00 GiB total capacity; 1.27 GiB already allocated; 0 bytes free; 1.33 GiB reserved in total by PyTorch)
Traceback (most recent call last):
File "C:\Users\alexa\Downloads\mini-AlphaStar-main\alphastarmini\core\rl\rl_vs_computer_wo_replay.py", line 213, in run
player_step = self.player.agent.step_from_state(state, player_memory)
File "C:\Users\alexa\Downloads\mini-AlphaStar-main\alphastarmini\core\rl\alphastar_agent.py", line 235, in step_from_state
hidden_state=hidden_state)
File "C:\Users\alexa\Downloads\mini-AlphaStar-main\alphastarmini\core\arch\agent.py", line 299, in action_logits_by_state
return_logits = True)
File "C:\Users\alexa\Downloads\mini-AlphaStar-main\alphastarmini\core\arch\arch_model.py", line 134, in forward
entity_embeddings, embedded_entity, entity_nums = self.entity_encoder(state.entity_state)
File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\torch\nn\modules\module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "C:\Users\alexa\Downloads\mini-AlphaStar-main\alphastarmini\core\arch\entity_encoder.py", line 390, in forward
unit_types_one = torch.nonzero(batch, as_tuple=True)[-1]
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 2.00 GiB total capacity; 1.27 GiB already allocated; 0 bytes free; 1.33 GiB reserved in total by PyTorch)

run over

Can i know what is the problem here and what is the solution? Thanks

liuruoze · 2022-01-01T12:12:38Z

Yes, this is due to your GPU card memory is not enough.

To fix the problem, try to decrese the number in the line#199 in the

alphastarmini/lib/hyper_parameters.py

MiniStar_Arch_Hyper_Parameters = ArchHyperParameters(batch_size=int(32 * 1.5 / P.Batch_Scale), sequence_length=int(32 * 8 / P.Seq_Scale),

the batch_size and sequence_length can be set to a small number to fit in your GPU card memory (you should also check the value of Batch_Scale and Seq_Scale defined in param.py).

Or you can just use CPU to run the program in the laptop, and switch to GPU when transferring to a server.

To change from GPU to CPU , change the value in the line#2 in the

run.py

USED_DEVICES = "0"

to

USED_DEVICES = "-1"

Hope this will solve your porblem.

khcf123 · 2022-01-01T13:22:57Z

Thanks for your reply

My laptop GPU card memory is 2gb.
What batch_size and sequence_length can be set to a small number to fit in my GPU card memory (also the value of Batch_Scale and Seq_Scale defined in param.py)?

Can you please advice me, thanks!!

khcf123 · 2022-01-01T13:52:10Z

Yes, I can get it to work after restart laptop, and change USED_DEVICES = "0" to USED_DEVICES = "-1".

But, it comes out another error:

ActorLoop.run() Exception cause return, Detials of the Exception: The game didn't advance to the expected game loop. Expected: 2712, got: 2709
Traceback (most recent call last):
File "C:\Users\alexa\Downloads\mini-AlphaStar-main\alphastarmini\core\rl\rl_vs_computer_wo_replay.py", line 253, in run
timesteps = env.step(env_actions, step_mul=STEP_MUL) # STEP_MUL step_mul
File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\pysc2\lib\stopwatch.py", line 212, in _stopwatch
return func(*args, **kwargs)
File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\pysc2\env\sc2_env.py", line 548, in step
return self._step(step_mul)
File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\pysc2\env\sc2_env.py", line 565, in _step
return self._observe(target_game_loop=target_game_loop)
File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\pysc2\env\sc2_env.py", line 670, in _observe
self._get_observations(target_game_loop)
File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\pysc2\env\sc2_env.py", line 645, in _get_observations
"Expected: %s, got: %s") % (target_game_loop, game_loop))
ValueError: The game didn't advance to the expected game loop. Expected: 2712, got: 2709

run over

liuruoze · 2022-01-02T00:09:22Z

Yes, I can get it to work after restart laptop, and change USED_DEVICES = "0" to USED_DEVICES = "-1".

But, it comes out another error:

ActorLoop.run() Exception cause return, Detials of the Exception: The game didn't advance to the expected game loop. Expected: 2712, got: 2709 Traceback (most recent call last): File "C:\Users\alexa\Downloads\mini-AlphaStar-main\alphastarmini\core\rl\rl_vs_computer_wo_replay.py", line 253, in run timesteps = env.step(env_actions, step_mul=STEP_MUL) # STEP_MUL step_mul File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\pysc2\lib\stopwatch.py", line 212, in _stopwatch return func(*args, **kwargs) File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\pysc2\env\sc2_env.py", line 548, in step return self._step(step_mul) File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\pysc2\env\sc2_env.py", line 565, in _step return self._observe(target_game_loop=target_game_loop) File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\pysc2\env\sc2_env.py", line 670, in _observe self._get_observations(target_game_loop) File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\pysc2\env\sc2_env.py", line 645, in _get_observations "Expected: %s, got: %s") % (target_game_loop, game_loop)) ValueError: The game didn't advance to the expected game loop. Expected: 2712, got: 2709

run over

Yes, this is a problem that occasionally happens in windows SC2 (the rate is rare, actually I don't know the reason). However, this problem is not the content of the current issue, which should be discussed in a new issue. Please open a new issue. I will close the current issue for you.

liuruoze closed this as completed Jan 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA out of memory #16

CUDA out of memory #16

khcf123 commented Jan 1, 2022

liuruoze commented Jan 1, 2022 •

edited

khcf123 commented Jan 1, 2022

khcf123 commented Jan 1, 2022

liuruoze commented Jan 2, 2022

CUDA out of memory #16

CUDA out of memory #16

Comments

khcf123 commented Jan 1, 2022

liuruoze commented Jan 1, 2022 • edited

khcf123 commented Jan 1, 2022

khcf123 commented Jan 1, 2022

liuruoze commented Jan 2, 2022

liuruoze commented Jan 1, 2022 •

edited