Update residual_forward to use packed input #299
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Update residual_forward to use 128 bit packed input, with floatX
Previous Kernel:
block_size 32 | time 0.1498 ms | bandwidth 503.99 GB/s
block_size 64 | time 0.0760 ms | bandwidth 993.32 GB/s
block_size 128 | time 0.0490 ms | bandwidth 1540.78 GB/s
block_size 256 | time 0.0487 ms | bandwidth 1548.88 GB/s
block_size 512 | time 0.0487 ms | bandwidth 1548.88 GB/s
block_size 1024 | time 0.0497 ms | bandwidth 1518.38 GB/s
total average iteration time: 39.030942 ms
New Kernel
block_size 32 | time 0.0219 ms | bandwidth 3440.86 GB/s
block_size 64 | time 0.0214 ms | bandwidth 3522.09 GB/s
block_size 128 | time 0.0223 ms | bandwidth 3392.29 GB/s
block_size 256 | time 0.0225 ms | bandwidth 3357.22 GB/s
block_size 512 | time 0.0226 ms | bandwidth 3333.70 GB/s
block_size 1024 | time 0.0225 ms | bandwidth 3352.64 GB/s
total average iteration time: 38.639469 ms