Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

T5 speed #23

Closed
JiushengChen opened this issue Sep 1, 2020 · 6 comments · Fixed by #30
Closed

T5 speed #23

JiushengChen opened this issue Sep 1, 2020 · 6 comments · Fixed by #30
Assignees

Comments

@JiushengChen
Copy link
Contributor

T5 speed on latest code is lower than expected (docker used). Caused benchmark test failed.
$CUDA_VISIBLE_DEVICES=3 bash models/hf_t5.sh

Util Model Task Split BatchSize Bleu Throughput(samples/s) Expected
transformers_v3.0.2 t5-base wmt_en_ro/raw val 64 27.44 5 5~5.5
transformers_v3.0.2+fastseq_v0.0.3 t5-base wmt_en_ro/raw val 64 27.43 6.3 7~7.5
transformers_v3.0.2+fastseq_v0.0.3 t5-base wmt_en_ro/raw val 128 27.42 7.2 7.9~8.4

Not sure if it is due to docker.

@feihugis
Copy link
Contributor

feihugis commented Sep 1, 2020

The expected score is set based on the result without docker. Let me re-run it using docker.

@feihugis
Copy link
Contributor

feihugis commented Sep 1, 2020

@JiushengChen I got the following result by using docker on gpu4. It is still different from yours. Which version of Python are you using?

  • use Python==3.8.3
Util Model Task Split BatchSize Samples Tokens Bleu Rouge Loss Perplexity Runtime(seconds) Throughput(samples/s) Throughput(tokens/s)
transformers_v3.0.2 t5-base wmt_en_ro/raw val 64 1999 NA 27.44 NA|NA|NA NA NA 378 5.3 NA
transformers_v3.0.2 t5-base wmt_en_ro/raw val 64 1999 NA 27.44 NA|NA|NA NA NA 367 5.4 NA
transformers_v3.0.2 t5-base wmt_en_ro/raw val 64 1999 NA 27.44 NA|NA|NA NA NA 369 5.4 NA
Util Model Task Split BatchSize Samples Tokens Bleu Rouge Loss Perplexity Runtime(seconds) Throughput(samples/s) Throughput(tokens/s)
transformers_v3.0.2+fastseq_v0.0.3 t5-base wmt_en_ro/raw val 64 1999 NA 27.43 NA|NA|NA NA NA 275 7.3 NA
transformers_v3.0.2+fastseq_v0.0.3 t5-base wmt_en_ro/raw val 128 1999 NA 27.42 NA|NA|NA NA NA 246 8.1 NA
transformers_v3.0.2+fastseq_v0.0.3 t5-base wmt_en_ro/raw val 64 1999 NA 27.43 NA|NA|NA NA NA 279 7.2 NA
transformers_v3.0.2+fastseq_v0.0.3 t5-base wmt_en_ro/raw val 128 1999 NA 27.42 NA|NA|NA NA NA 247 8.1 NA
transformers_v3.0.2+fastseq_v0.0.3 t5-base wmt_en_ro/raw val 64 1999 NA 27.43 NA|NA|NA NA NA 285 7.0 NA
transformers_v3.0.2+fastseq_v0.0.3 t5-base wmt_en_ro/raw val 128 1999 NA 27.42 NA|NA|NA NA NA 259 7.7 NA
  • use the default python (3.6.9) in the docker
Util Model Task Split BatchSize Samples Tokens Bleu Rouge Loss Perplexity Runtime(seconds) Throughput(samples/s) Throughput(tokens/s)
transformers_v3.0.2 t5-base wmt_en_ro/raw val 64 1999 NA 27.44 NA|NA|NA NA NA 407 4.9 NA
transformers_v3.0.2 t5-base wmt_en_ro/raw val 64 1999 NA 27.44 NA|NA|NA NA NA 407 4.9 NA
transformers_v3.0.2 t5-base wmt_en_ro/raw val 64 1999 NA 27.44 NA|NA|NA NA NA 406 4.9 NA
Util Model Task Split BatchSize Samples Tokens Bleu Rouge Loss Perplexity Runtime(seconds) Throughput(samples/s) Throughput(tokens/s)
transformers_v3.0.2+fastseq_v0.0.3 t5-base wmt_en_ro/raw val 64 1999 NA 27.43 NA|NA|NA NA NA 318 6.3 NA
transformers_v3.0.2+fastseq_v0.0.3 t5-base wmt_en_ro/raw val 128 1999 NA 27.42 NA|NA|NA NA NA 296 6.8 NA
transformers_v3.0.2+fastseq_v0.0.3 t5-base wmt_en_ro/raw val 64 1999 NA 27.43 NA|NA|NA NA NA 317 6.3 NA
transformers_v3.0.2+fastseq_v0.0.3 t5-base wmt_en_ro/raw val 128 1999 NA 27.42 NA|NA|NA NA NA 298 6.7 NA
transformers_v3.0.2+fastseq_v0.0.3 t5-base wmt_en_ro/raw val 64 1999 NA 27.43 NA|NA|NA NA NA 330 6.1 NA
transformers_v3.0.2+fastseq_v0.0.3 t5-base wmt_en_ro/raw val 128 1999 NA 27.42 NA|NA|NA NA NA 294 6.8 NA

I guess the cause is that @replace(...) is not fully compatible with Python-3.6.9 yet and some optimizations did not take into effect. This issue is on my list. Will fix it once I finish the logging issue reported by the user.

@JiushengChen
Copy link
Contributor Author

I am using docker from docker/Dockerfile. It uses Python 3.6.9 :: Anaconda, Inc. Good to know root cause identified.

@feihugis
Copy link
Contributor

feihugis commented Sep 2, 2020

Benchmark results with Docker

  • Python-3.6.9
Util Model Task Split BatchSize Samples Tokens Bleu Rouge Loss Perplexity Runtime(seconds) Throughput(samples/s) Throughput(tokens/s)
transformers_v3.0.2 t5-base wmt_en_ro/raw val 64 1999 NA 27.44 NA|NA|NA NA NA 402 5.0 NA
transformers_v3.0.2 t5-base wmt_en_ro/raw val 64 1999 NA 27.44 NA|NA|NA NA NA 403 5.0 NA
transformers_v3.0.2 t5-base wmt_en_ro/raw val 64 1999 NA 27.44 NA|NA|NA NA NA 399 5.0 NA
Util Model Task Split BatchSize Samples Tokens Bleu Rouge Loss Perplexity Runtime(seconds) Throughput(samples/s) Throughput(tokens/s)
transformers_v3.0.2+fastseq_v0.0.3 t5-base wmt_en_ro/raw val 64 1999 NA 27.43 NA|NA|NA NA NA 326 6.1 NA
transformers_v3.0.2+fastseq_v0.0.3 t5-base wmt_en_ro/raw val 128 1999 NA 27.42 NA|NA|NA NA NA 278 7.2 NA
transformers_v3.0.2+fastseq_v0.0.3 t5-base wmt_en_ro/raw val 64 1999 NA 27.43 NA|NA|NA NA NA 314 6.4 NA
transformers_v3.0.2+fastseq_v0.0.3 t5-base wmt_en_ro/raw val 128 1999 NA 27.42 NA|NA|NA NA NA 280 7.1 NA
transformers_v3.0.2+fastseq_v0.0.3 t5-base wmt_en_ro/raw val 64 1999 NA 27.43 NA|NA|NA NA NA 312 6.4 NA
transformers_v3.0.2+fastseq_v0.0.3 t5-base wmt_en_ro/raw val 128 1999 NA 27.42 NA|NA|NA NA NA 278 7.2 NA
  • Python-3.8.3
Util Model Task Split BatchSize Samples Tokens Bleu Rouge Loss Perplexity Runtime(seconds) Throughput(samples/s) Throughput(tokens/s)
transformers_v3.0.2 t5-base wmt_en_ro/raw val 64 1999 NA 27.44 NA|NA|NA NA NA 388 5.2 NA
transformers_v3.0.2 t5-base wmt_en_ro/raw val 64 1999 NA 27.44 NA|NA|NA NA NA 385 5.2 NA
transformers_v3.0.2 t5-base wmt_en_ro/raw val 64 1999 NA 27.44 NA|NA|NA NA NA 379 5.3 NA
Util Model Task Split BatchSize Samples Tokens Bleu Rouge Loss Perplexity Runtime(seconds) Throughput(samples/s) Throughput(tokens/s)
transformers_v3.0.2+fastseq_v0.0.3 t5-base wmt_en_ro/raw val 64 1999 NA 27.43 NA|NA|NA NA NA 297 6.7 NA
transformers_v3.0.2+fastseq_v0.0.3 t5-base wmt_en_ro/raw val 128 1999 NA 27.42 NA|NA|NA NA NA 268 7.5 NA
transformers_v3.0.2+fastseq_v0.0.3 t5-base wmt_en_ro/raw val 64 1999 NA 27.43 NA|NA|NA NA NA 286 7.0 NA
transformers_v3.0.2+fastseq_v0.0.3 t5-base wmt_en_ro/raw val 128 1999 NA 27.42 NA|NA|NA NA NA 252 7.9 NA
transformers_v3.0.2+fastseq_v0.0.3 t5-base wmt_en_ro/raw val 64 1999 NA 27.43 NA|NA|NA NA NA 282 7.1 NA
transformers_v3.0.2+fastseq_v0.0.3 t5-base wmt_en_ro/raw val 128 1999 NA 27.42 NA|NA|NA NA NA 260 7.7 NA

@JiushengChen
Copy link
Contributor Author

Thanks for the study! I also checked with a few users, they don't care too much about Python version. But they uses some popular docker images provided by Torch and Nvidia etc. I think most are using Python 3.6. And consider the diff above is not so much. I suggest we just use our existing docker image, which is 3.6.9.

@feihugis
Copy link
Contributor

feihugis commented Sep 2, 2020

@JiushengChen Sounds good! Let's use Python-3.6.9 for our benchmarks and tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants