New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: Memory consumption by fp16 is not normal when using Engine. #1095
Comments
Thanks! Give me some time to verify this. :) |
Hi @powermano , appreciate your help to spot this tricky bug, it has been fixed in #1096 . |
You're welcome, I'm looking forward to your new ZeRO. |
I will close this issue for now. Do keep us informed if you have further questions. :) |
@FrankLeeeee Sorry to bother you, have you even tested my training code? After modifying a few lines of code, the memory on my side dropped from 8.5G to 7.5G, but still can't reach 5.8G without using Engine. |
Hi, I don't know which model/optimzier/loss/dataloader you use, so in issue #1096 , I used my own experiment configuration but the major code is the same as yours. |
If you wish, you can share your full script with me, I can go test it. |
the config is
train_debug.py
|
Thanks, give me some time. I will keep you updated :) |
the command is
|
Hi @powermano , I have run your code with/without engine. However, I do not observe any difference. I logged the memory usage like below. if global_step % LOGGING_FREQUNCE == 0:
logger.info(
f"global_step {global_step} -train loss: {train_loss:.5}, lr: {lr_scheduler.get_last_lr()[0]:.5g}",
ranks=[0])
logger.info(get_mem_info(), ranks=[0]) I have added The results are:
|
I downloaded the latest version, the test is normal, it should be my problem with my version。
The results are: with Engine
without engine:
|
I now close this issue. |
Great, thanks! |
🐛 Describe the bug
when using colossalai.amp.convert_to_torch_amp to wrap the model, optimizer and criterion,
and then train normally, which also only consumes 4700M of memory.
But if you use colossalai.initialize to initialize, it will consume 7700M of memory. But we did see that by reading the fp16 parameter in config, in the initialization code of
colossalai.initialize
, the conversion of processcolossalai.amp.convert_to_torch_amp
is performed, and then we use the Engine for training, but it needs to consume 7700M of memory at this time. This is where I get confused.Environment
No response
The text was updated successfully, but these errors were encountered: