This repository was archived by the owner on Aug 1, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 129
This repository was archived by the owner on Aug 1, 2025. It is now read-only.
HuggingFace Bert Phase2 Training is not able to use the same batch size between Eager Mode and TorchDynamo Usage #468
Copy link
Copy link
Closed
Description
Since this might be commit dependent as things improve, I checkout out commit 5fb502660e52a2e1f93ab0f148fd8776e1b56297
As of this commit I am still seeing TorchDynamo exceed eager mode memory requirements. Eager mode is consuming around 35,584 GB
on an A100 40GB card for HugggingFace Bert-Large Phase2 Pretraining. This model uses a batch size 16 and a sequence length of 512. You can view instantaneous memory usage via nvidia-smi dmon -s m
.
You can reproduce the with the following instructions:
git clone https://github.com/kevinstephano/simple_dl_models.git
cd simple_dl_models
python huggingface_bert_phase2.py --torchdynamo --amp
This is the error I see, for reference.
Traceback (most recent call last):
File "huggingface_bert_phase2.py", line 42, in <module>
final_results += runner.run(sys.argv, 'BertForPreTraining_P2_bert-large-uncased_[seqs=16,seql=512]', BertForPreTraining(config), optim_func, bert_p2_input_func, None)
File "/workspace/simple_dl_models/execution/runner.py", line 79, in run
result_records.append(execution_loop.execute(args, name, model_name, model, optim_func, input_func, grad_func, eager_record))
File "/workspace/simple_dl_models/execution/execution_loop.py", line 100, in execute
loss = model(*batch)
File "/opt/pytorch/pytorch/torch/nn/modules/module.py", line 1147, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line 1069, in forward
@add_start_docstrings_to_model_forward(BERT_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
File "/opt/pytorch/pytorch/torch/nn/modules/module.py", line 1147, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line 1018, in forward
encoder_outputs = self.encoder(
File "/opt/pytorch/pytorch/torch/nn/modules/module.py", line 1147, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torchdynamo/eval_frame.py", line 142, in catch_errors
return callback(frame, cache_size)
File "/opt/conda/lib/python3.8/site-packages/torchdynamo/convert_frame.py", line 340, in _convert_frame
result = inner_convert(frame, cache_size)
File "/opt/conda/lib/python3.8/site-packages/torchdynamo/convert_frame.py", line 119, in _fn
return fn(*args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torchdynamo/convert_frame.py", line 295, in _convert_frame_assert
code = transform_code_object(frame.f_code, transform)
File "/opt/conda/lib/python3.8/site-packages/torchdynamo/bytecode_transformation.py", line 338, in transform_code_object
transformations(instructions, code_options)
File "/opt/conda/lib/python3.8/site-packages/torchdynamo/convert_frame.py", line 271, in transform
tracer.run()
File "/opt/conda/lib/python3.8/site-packages/torchdynamo/symbolic_convert.py", line 310, in run
and self.step()
File "/opt/conda/lib/python3.8/site-packages/torchdynamo/symbolic_convert.py", line 288, in step
getattr(self, inst.opname)(inst)
File "/opt/conda/lib/python3.8/site-packages/torchdynamo/symbolic_convert.py", line 1324, in RETURN_VALUE
self.output.compile_subgraph(self)
File "/opt/conda/lib/python3.8/site-packages/torchdynamo/output_graph.py", line 286, in compile_subgraph
self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)
File "/opt/conda/lib/python3.8/site-packages/torchdynamo/output_graph.py", line 327, in compile_and_call_fx_graph
compiled_fn = self.call_user_compiler(gm)
File "/opt/conda/lib/python3.8/site-packages/torchdynamo/output_graph.py", line 350, in call_user_compiler
raise BackendCompilerFailed(self.compiler_fn, e) from e
torchdynamo.exc.BackendCompilerFailed: ? raised RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 39.59 GiB total capacity; 38.23 GiB already allocated; 8.19 MiB free; 38.39 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Metadata
Metadata
Assignees
Labels
No labels