Skip to content
This repository was archived by the owner on Aug 1, 2025. It is now read-only.
This repository was archived by the owner on Aug 1, 2025. It is now read-only.

HuggingFace Bert Phase2 Training is not able to use the same batch size between Eager Mode and TorchDynamo Usage #468

@kevinstephano

Description

@kevinstephano

Since this might be commit dependent as things improve, I checkout out commit 5fb502660e52a2e1f93ab0f148fd8776e1b56297

As of this commit I am still seeing TorchDynamo exceed eager mode memory requirements. Eager mode is consuming around 35,584 GB on an A100 40GB card for HugggingFace Bert-Large Phase2 Pretraining. This model uses a batch size 16 and a sequence length of 512. You can view instantaneous memory usage via nvidia-smi dmon -s m.

You can reproduce the with the following instructions:

git clone https://github.com/kevinstephano/simple_dl_models.git
cd simple_dl_models
python huggingface_bert_phase2.py --torchdynamo --amp

This is the error I see, for reference.

Traceback (most recent call last):
  File "huggingface_bert_phase2.py", line 42, in <module>
    final_results += runner.run(sys.argv, 'BertForPreTraining_P2_bert-large-uncased_[seqs=16,seql=512]', BertForPreTraining(config), optim_func, bert_p2_input_func, None)
  File "/workspace/simple_dl_models/execution/runner.py", line 79, in run
    result_records.append(execution_loop.execute(args, name, model_name, model, optim_func, input_func, grad_func, eager_record))
  File "/workspace/simple_dl_models/execution/execution_loop.py", line 100, in execute
    loss = model(*batch)
  File "/opt/pytorch/pytorch/torch/nn/modules/module.py", line 1147, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line 1069, in forward
    @add_start_docstrings_to_model_forward(BERT_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
  File "/opt/pytorch/pytorch/torch/nn/modules/module.py", line 1147, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line 1018, in forward
    encoder_outputs = self.encoder(
  File "/opt/pytorch/pytorch/torch/nn/modules/module.py", line 1147, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torchdynamo/eval_frame.py", line 142, in catch_errors
    return callback(frame, cache_size)
  File "/opt/conda/lib/python3.8/site-packages/torchdynamo/convert_frame.py", line 340, in _convert_frame
    result = inner_convert(frame, cache_size)
  File "/opt/conda/lib/python3.8/site-packages/torchdynamo/convert_frame.py", line 119, in _fn
    return fn(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torchdynamo/convert_frame.py", line 295, in _convert_frame_assert
    code = transform_code_object(frame.f_code, transform)
  File "/opt/conda/lib/python3.8/site-packages/torchdynamo/bytecode_transformation.py", line 338, in transform_code_object
    transformations(instructions, code_options)
  File "/opt/conda/lib/python3.8/site-packages/torchdynamo/convert_frame.py", line 271, in transform
    tracer.run()
  File "/opt/conda/lib/python3.8/site-packages/torchdynamo/symbolic_convert.py", line 310, in run
    and self.step()
  File "/opt/conda/lib/python3.8/site-packages/torchdynamo/symbolic_convert.py", line 288, in step
    getattr(self, inst.opname)(inst)
  File "/opt/conda/lib/python3.8/site-packages/torchdynamo/symbolic_convert.py", line 1324, in RETURN_VALUE
    self.output.compile_subgraph(self)
  File "/opt/conda/lib/python3.8/site-packages/torchdynamo/output_graph.py", line 286, in compile_subgraph
    self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)
  File "/opt/conda/lib/python3.8/site-packages/torchdynamo/output_graph.py", line 327, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm)
  File "/opt/conda/lib/python3.8/site-packages/torchdynamo/output_graph.py", line 350, in call_user_compiler
    raise BackendCompilerFailed(self.compiler_fn, e) from e
torchdynamo.exc.BackendCompilerFailed: ? raised RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 39.59 GiB total capacity; 38.23 GiB already allocated; 8.19 MiB free; 38.39 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions