-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NVIDIA Kepler results (GTX 760M, driver version 419.67) #14
Comments
Some thoughts and comparison with newer NVIDIA architectures
RGB32 loadsI've also tried RGB32 (float3) format for typed buffer loads and textures. Results was different from my previous experiments. Current test configuration shows 2/3 rate (somewhat strange) for buffers and linear texture access and 1/3 for random texture reads. My previous experiments provided 1/3 rate for linear buffer load (similar to raw buffer Load3) and somewhat slower for random load. 1/3 rate seemed reasonable - it is consistent with assumption than NVIDIA TMU hardware falls back to 32-bit fetches on unaligned access (otherwise it would be expected 1/2 rate as with RGBA32). I played with test configuration a little - tune thread group size and loop iteration count, and got different results - near 1/3 or much slower in some cases. Maybe cache bank conflicts start to appear. |
Thanks! Added Kepler results. This confirms that Nvidia's uniform load driver optimization affects kepler too. |
PerfTest
To select adapter, use: PerfTest.exe [ADAPTER_INDEX]
Adapters found:
0: NVIDIA GeForce GTX 760M
1: Intel(R) HD Graphics 4600
2: Microsoft Basic Render Driver
Using adapter 0
Running 30 warm-up frames and 30 benchmark frames:
.............................XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Performance compared to Buffer.Load random
Buffer.Load uniform: 3.073ms 62.440x
Buffer.Load linear: 195.662ms 0.981x
Buffer.Load random: 197.022ms 0.974x
Buffer.Load uniform: 3.227ms 59.465x
Buffer.Load linear: 195.179ms 0.983x
Buffer.Load random: 196.785ms 0.975x
Buffer.Load uniform: 3.598ms 53.329x
Buffer.Load linear: 193.676ms 0.991x
Buffer.Load random: 191.866ms 1.000x
Buffer.Load uniform: 3.031ms 63.308x
Buffer.Load linear: 195.622ms 0.981x
Buffer.Load random: 197.009ms 0.974x
Buffer.Load uniform: 3.025ms 63.434x
Buffer.Load linear: 195.135ms 0.983x
Buffer.Load random: 196.860ms 0.975x
Buffer.Load uniform: 3.443ms 55.728x
Buffer.Load linear: 193.744ms 0.990x
Buffer.Load random: 191.929ms 1.000x
Buffer.Load uniform: 2.970ms 64.605x
Buffer.Load linear: 195.751ms 0.980x
Buffer.Load random: 197.141ms 0.973x
Buffer.Load uniform: 3.175ms 60.425x
Buffer.Load linear: 195.351ms 0.982x
Buffer.Load random: 196.911ms 0.974x
Buffer.Load uniform: 3.621ms 52.985x
Buffer.Load linear: 350.658ms 0.547x
Buffer.Load random: 350.633ms 0.547x
ByteAddressBuffer.Load uniform: 3.758ms 51.055x
ByteAddressBuffer.Load linear: 191.898ms 1.000x
ByteAddressBuffer.Load random: 216.928ms 0.884x
ByteAddressBuffer.Load2 uniform: 4.682ms 40.977x
ByteAddressBuffer.Load2 linear: 390.852ms 0.491x
ByteAddressBuffer.Load2 random: 442.053ms 0.434x
ByteAddressBuffer.Load3 uniform: 572.822ms 0.335x
ByteAddressBuffer.Load3 linear: 568.316ms 0.338x
ByteAddressBuffer.Load3 random: 570.361ms 0.336x
ByteAddressBuffer.Load4 uniform: 752.691ms 0.255x
ByteAddressBuffer.Load4 linear: 758.795ms 0.253x
ByteAddressBuffer.Load4 random: 763.638ms 0.251x
ByteAddressBuffer.Load2 unaligned uniform: 4.199ms 45.692x
ByteAddressBuffer.Load2 unaligned linear: 391.542ms 0.490x
ByteAddressBuffer.Load2 unaligned random: 442.574ms 0.434x
ByteAddressBuffer.Load4 unaligned uniform: 752.793ms 0.255x
ByteAddressBuffer.Load4 unaligned linear: 758.698ms 0.253x
ByteAddressBuffer.Load4 unaligned random: 763.679ms 0.251x
StructuredBuffer.Load uniform: 3.103ms 61.827x
StructuredBuffer.Load linear: 195.674ms 0.981x
StructuredBuffer.Load random: 196.991ms 0.974x
StructuredBuffer.Load uniform: 3.301ms 58.120x
StructuredBuffer.Load linear: 195.167ms 0.983x
StructuredBuffer.Load random: 196.749ms 0.975x
StructuredBuffer.Load uniform: 3.846ms 49.882x
StructuredBuffer.Load linear: 350.461ms 0.547x
StructuredBuffer.Load random: 350.494ms 0.547x
cbuffer{float4} load uniform: 4.478ms 42.844x
cbuffer{float4} load linear: 9217.404ms 0.021x
cbuffer{float4} load random: 3333.476ms 0.058x
Texture2D.Load uniform: 3.384ms 56.695x
Texture2D.Load linear: 202.197ms 0.949x
Texture2D.Load random: 204.327ms 0.939x
Texture2D.Load uniform: 3.731ms 51.424x
Texture2D.Load linear: 198.542ms 0.966x
Texture2D.Load random: 211.881ms 0.906x
Texture2D.Load uniform: 4.306ms 44.558x
Texture2D.Load linear: 196.088ms 0.978x
Texture2D.Load random: 195.847ms 0.980x
Texture2D.Load uniform: 3.419ms 56.118x
Texture2D.Load linear: 202.264ms 0.949x
Texture2D.Load random: 204.311ms 0.939x
Texture2D.Load uniform: 3.673ms 52.243x
Texture2D.Load linear: 198.553ms 0.966x
Texture2D.Load random: 211.917ms 0.905x
Texture2D.Load uniform: 4.115ms 46.626x
Texture2D.Load linear: 196.084ms 0.978x
Texture2D.Load random: 350.561ms 0.547x
Texture2D.Load uniform: 3.517ms 54.547x
Texture2D.Load linear: 202.339ms 0.948x
Texture2D.Load random: 204.392ms 0.939x
Texture2D.Load uniform: 3.705ms 51.783x
Texture2D.Load linear: 198.537ms 0.966x
Texture2D.Load random: 350.591ms 0.547x
Texture2D.Load uniform: 4.028ms 47.637x
Texture2D.Load linear: 350.589ms 0.547x
Texture2D.Load random: 350.519ms 0.547x
The text was updated successfully, but these errors were encountered: