Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Winograd F(4x4,3x3) #1643 has decreased performance on my machine #1646

Closed
kuba97531 opened this issue Jul 24, 2018 · 6 comments
Closed

Winograd F(4x4,3x3) #1643 has decreased performance on my machine #1646

kuba97531 opened this issue Jul 24, 2018 · 6 comments

Comments

@kuba97531
Copy link
Contributor

After changes done in #1643 I have observed slowdown rather than speedup on my machine.
I have compared builds from commits 6333b66 and 7cfbb72 it seems that on my machine the new version is slower.
System Information:
OS: Windows 2010
GPU: 1080Ti
Build: Release build from visual studio 2017

The table presents the results of benchmark after running the full tuner for both versions.
I run benchmark with two commands:
leelaz.exe -w 153 --benchmark
leelaz.exe -w 153 --benchmark -t 4

build n/s n/s (-t 4)
7cfbb72 (new) 446, 451, 451, 452, 451 664, 656, 659, 659, 665
6333b66 (old) 454, 454, 454, 459, 458 676, 695, 700, 691, 699

The results don't differ by much, but all tests I do show a very clear tiny slowdown.

I case it matters I have attached a tuning.
tuning.txt

Is it possible I am doing something wrong?

@kuba97531
Copy link
Contributor Author

I've also uploaded the two compiled .exe files
https://drive.google.com/open?id=1BtiXRye_UpaU4-hpjmhSD_EX5CY03yQt

@gcp
Copy link
Member

gcp commented Jul 24, 2018

These results are, quite frankly, terrible.

This is a GTX 1060, quite a bit less than half the peak TFLOPS of your 1080 Ti:
374 n/s default, 501 n/s t=4

Your card is benchmarking only 20-30% faster.

@tterava
Copy link
Contributor

tterava commented Jul 24, 2018

LZ tends to work best when you use the same -t parameter in tuning as you do in normal use. I get about 950 n/s on gtx 1080 using -t 4. Seems there's some other issue with your system.

@kuba97531
Copy link
Contributor Author

It seems that the problem might be with my build. The benchmark on the official leelaz.exe 0.15 is 600 and 900 (t -4) so a bit more in line with the expectations.

Do you have any idea what I might be doing wrong regarding the build procedure?
Are official windows binaries build from visual studio or with some other tool?

@gcp
Copy link
Member

gcp commented Jul 24, 2018

I build with MSVC2017, latest release. I do use a newer OpenBLAS that is not available in Nuget by building that from source with mingw64 (I just noticed it's missing Ryzen 2 support, so I'll have to update it again). The DLL is in the release package. Maybe not set to optimize or missing a define like NDEBUG or something?

I benchmarked on a GTX 1070 on Windows 10 and got ~530 n/s with t=4. That's not great, but it's still clearly better than the older versions, which did ~488 n/s.

So there's probably something wrong with your build, but comparing to Linux benchmarks is probably going to show some difference too because of Windows' terribly slow GPU computing support.

@kuba97531
Copy link
Contributor Author

I close the issue until I resolve my local problems and reproduce the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants