[T5] Fix speed degradation bug t5 #10496

patrickvonplaten · 2021-03-03T08:48:54Z

What does this PR do?

Checking every value of a tensor for inf is expensive. This was added to T5 to allow for fp16 training, but should then also be used when the model is in fp16 to not slow down normal fp32 mode.

Using @dsgissin script:

device = torch.device('cuda:0') if torch.cuda.is_available() else torch.device('cpu')
print(f"Using device: {device}")

t5_tokenizer = T5TokenizerFast.from_pretrained('t5-base')
t5_model = T5ForConditionalGeneration.from_pretrained('t5-base')
t5_model = t5_model.to(device)

t5_input_ids = t5_tokenizer("summarize: studies have shown that owning a dog is good for you ", return_tensors="pt").input_ids  # Batch size 1
t5_input_ids = t5_input_ids.to(device)

import time
import numpy as np
N = 100
times = []
for _ in range(N):
  start = time.time()
  t5_outputs = t5_model.generate(t5_input_ids)
  end = time.time()
  times.append(end-start)
print(f"transformers version: {transformers_version}")
print(f"torch version: {torch_version}")
print(f"{1000*np.mean(times):.0f} ms \u00B1 {1000*np.std(times):.2f} ms per loop (mean \u00B1 std of {N} runs)")

with:

Python 3.8.5
PyTorch 1.7.1
CUDA 11.1 on a NVIDIA V100 GPU

The time was improved from:
441 ms ± 41.67 ms per loop (mean ± std of 100 runs)
to
388 ms ± 44.75 ms per loop (mean ± std of 100 runs)

patil-suraj

Looks good to me!

Some of the other library models also use this trick (BART-like models), we should also investigate those.

patrickvonplaten · 2021-03-03T09:03:42Z

Looks good to me!

Some of the other library models also use this trick (BART-like models), we should also investigate those.

Good point - yeah, let me fix this in this PR actually

fix speed degradation bug t5

d72c7cf

patrickvonplaten linked an issue Mar 3, 2021 that may be closed by this pull request

T5 GPU Runtime Degradation #10142

Closed

3 tasks

patrickvonplaten requested a review from patil-suraj March 3, 2021 08:54

patrickvonplaten mentioned this pull request Mar 3, 2021

T5 GPU Runtime Degradation #10142

Closed

3 tasks

patil-suraj approved these changes Mar 3, 2021

View reviewed changes

patrickvonplaten added 2 commits March 3, 2021 09:08

fix for all models

56907e7

fix code quality

9701070

patrickvonplaten merged commit 2d2ed2c into huggingface:master Mar 3, 2021

patrickvonplaten deleted the pspeed_tup_t5 branch March 3, 2021 09:42

Liangtaiwan mentioned this pull request Oct 28, 2021

T5-v1.1 loss go to nan when fp16 training was enabled #14189

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[T5] Fix speed degradation bug t5 #10496

[T5] Fix speed degradation bug t5 #10496

patrickvonplaten commented Mar 3, 2021 •

edited

patil-suraj left a comment

patrickvonplaten commented Mar 3, 2021

[T5] Fix speed degradation bug t5 #10496

[T5] Fix speed degradation bug t5 #10496

Conversation

patrickvonplaten commented Mar 3, 2021 • edited

What does this PR do?

patil-suraj left a comment

Choose a reason for hiding this comment

patrickvonplaten commented Mar 3, 2021

patrickvonplaten commented Mar 3, 2021 •

edited