-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Pytorch] AMD GPUs benchmarks #328
Comments
This is tensorflow benchmarks http://blog.gpueater.com/en/2018/04/23/00011_tech_cifar10_bench_on_tf13/ |
Thanks for the report! We are working on performance optimizations at the moment. If you are interested to help, could you |
How long is "long" on a RX 580? Some hours? A day? A week? |
Typically a few hours for a single network. It depends on how many unique convolution configs are missing from our internal database. |
Ran this command with this bench : Whole lotta tests were running. Waited the night for it to finish. Came back this morning and the task was frozen, radeontop showing every metric at 100% utilization (doubt it though). Looked for any result but none is showing. I'll try another, maybe lighter benchmark. |
Not sure if relevant for you, but I can't detect my gpu now, and this is the log of when I killed it when it froze during the first benchmark. Working on it.
|
This is problematic as well
|
Couple of questions: Thanks! |
(I'm working with @hyperfraise )
|
@skylt Thanks, this is helpful. There is no The instability looks like a kernel driver issue on |
I noticed that sometimes hcc is launched prior to the training beginning. The compilation step seems to disappear after the first script run. You can find the results of the test I ran prior to tuning MIOpen below. The numbers are based on 90 observations (batch) where the first 10 were discarded.
|
Concerning your questions: we do compile some kernels the first time we run them and subsequently get them from cache. So you are correct that the second time you'll invoke these kernels will be what you want to time. I think the performance you observe makes sense. I'd expect resnet18 to get better as you tune MIOpen and in general I've observed that larger batch sizes help - so judging from your memory consumption numbers you could increase the batch size for resnet18 on the 580 2x (maybe even 4x). Please do attach your performance database here once you are done so that I can make sure we'll get these configs tuned in a future MIOpen release. |
I wonder about how you get such a nice result. |
Single GPU. I don't know where the differences could come from; I did not do anything in particular. |
I used the benchmark script in |
@Delaunay this is very interesting data, thanks for posting it here! Which PyTorch commit did you use for your test? Your fp16 data looks correct for gfx803. @geekboood could you post a link to the benchmark you were running? I'd be interested in having a look at it. Thanks! |
The commit I used was |
Thanks, very good! It may be interesting to rerun this benchmark after |
@iotamudelta The ResNet one is already in the link above. |
Hi. Thanks for this work guys.
I was curious as to whether you had been able to bench the framework on amd gpus ? I've successfully build pytorch with rocm support following your instructions, and the benchs I got don't seem right. I'm testing with a Radeon 580, which should be like half the performance as 1080 Ti, and I'm seeing more like 9-10 times drop in performances on convolution. The tensorflow benchs already show that the gap shouldn't be that wide.
Is this supposed to be normal for the moment ?
The text was updated successfully, but these errors were encountered: