Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 [Bug] Compilation failure for HuggingFace T5-base Model #1583

Closed
gs-olive opened this issue Jan 10, 2023 · 2 comments · Fixed by #1584
Closed

🐛 [Bug] Compilation failure for HuggingFace T5-base Model #1583

gs-olive opened this issue Jan 10, 2023 · 2 comments · Fixed by #1584
Assignees
Labels
bug Something isn't working

Comments

@gs-olive
Copy link
Collaborator

Bug Description

When compiling the T5-base network (https://huggingface.co/t5-base), the following error is encountered:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)

To Reproduce

Steps to reproduce the behavior:

  1. Run torch_tensorrt.compile with t5-base model as input, using fp32 precision.
  2. Choose two fixed-size inputs of shape [1, 128] and [1, 128] and enable truncate_long_and_double with 12 GB workspace.
  3. Pass in model keyword args to disable attention and hidden state outputs
  4. Run inference using the compiled model on two sample inputs.

Expected behavior

Model should successfully compile with Torch-TRT. Specifically, internal device mismatch issues should either be addressed with a warning at compile time, or should otherwise not cause errors.

Environment

  • Torch-TensorRT Version: 1.4.0.dev0+f43be5b6
  • PyTorch Version: 1.14.0.dev20221114+cu116
  • CPU Architecture: Intel Xeon CPU
  • OS: Ubuntu 20.04
  • How you installed PyTorch: pip
  • Build command you used: python setup.py develop
  • Are you using local sources or building from archives: local
  • Python version: 3.8.13
  • CUDA version: 11.6

Additional context

The problem seems related to #1416 which was intended to address device mismatch issues of this sort. Since this case is not caught by that PR, it likely arises in a different area, for example as a result of an internal computation in a Torch block.

@gs-olive gs-olive added the bug Something isn't working label Jan 10, 2023
@gs-olive gs-olive self-assigned this Jan 10, 2023
@gs-olive
Copy link
Collaborator Author

Root cause is related to various model-internal auxiliary tensors being initialized on CPU. Running model.cuda() and putting both input tensors on GPU resolves the compilation issue.

This model is one operator away from full TensorRT support (only requiring aten::full_like), however full compilation is not currently functional since the model outputs are in Tuple form which is not currently supported by Torch-TensorRT, and could warrant a new feature, as in #629.

@Christina-Young-NVIDIA
Copy link
Collaborator

Dheeraj assigned changes to George.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants