Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[T5] Fix speed degradation bug t5 #10496

Merged
merged 3 commits into from Mar 3, 2021

Conversation

patrickvonplaten
Copy link
Contributor

@patrickvonplaten patrickvonplaten commented Mar 3, 2021

What does this PR do?

Checking every value of a tensor for inf is expensive. This was added to T5 to allow for fp16 training, but should then also be used when the model is in fp16 to not slow down normal fp32 mode.

Using @dsgissin script:

device = torch.device('cuda:0') if torch.cuda.is_available() else torch.device('cpu')
print(f"Using device: {device}")

t5_tokenizer = T5TokenizerFast.from_pretrained('t5-base')
t5_model = T5ForConditionalGeneration.from_pretrained('t5-base')
t5_model = t5_model.to(device)

t5_input_ids = t5_tokenizer("summarize: studies have shown that owning a dog is good for you ", return_tensors="pt").input_ids  # Batch size 1
t5_input_ids = t5_input_ids.to(device)

import time
import numpy as np
N = 100
times = []
for _ in range(N):
  start = time.time()
  t5_outputs = t5_model.generate(t5_input_ids)
  end = time.time()
  times.append(end-start)
print(f"transformers version: {transformers_version}")
print(f"torch version: {torch_version}")
print(f"{1000*np.mean(times):.0f} ms \u00B1 {1000*np.std(times):.2f} ms per loop (mean \u00B1 std of {N} runs)")

with:

  • Python 3.8.5
  • PyTorch 1.7.1
  • CUDA 11.1 on a NVIDIA V100 GPU

The time was improved from:
441 ms ± 41.67 ms per loop (mean ± std of 100 runs)
to
388 ms ± 44.75 ms per loop (mean ± std of 100 runs)

@patrickvonplaten patrickvonplaten linked an issue Mar 3, 2021 that may be closed by this pull request
3 tasks
@patrickvonplaten patrickvonplaten mentioned this pull request Mar 3, 2021
3 tasks
Copy link
Contributor

@patil-suraj patil-suraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Some of the other library models also use this trick (BART-like models), we should also investigate those.

@patrickvonplaten
Copy link
Contributor Author

Looks good to me!

Some of the other library models also use this trick (BART-like models), we should also investigate those.

Good point - yeah, let me fix this in this PR actually

@patrickvonplaten patrickvonplaten merged commit 2d2ed2c into huggingface:master Mar 3, 2021
@patrickvonplaten patrickvonplaten deleted the pspeed_tup_t5 branch March 3, 2021 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

T5 GPU Runtime Degradation
2 participants