New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is gr-lfast now faster than gr-clenabled? #1
Comments
|
No, the Costas loop on GPU's actually performs pretty poorly due to the
algorithm's sequential calculations. The only way I've found to code it
for OpenCL is as a single task-based kernel call so it really only executes
like a standard CPU routine on 1 GPU core, not in parallel like one would
like, so the performance is pretty low for that block (It drops to less
than 2 Msps even on an NVIDIA 1070 card versus 34+Msps for the gr-lfast
version on an i7-6700) and the OpenCL performance didn't change much
varying the data size. So far the best performance I've gotten out of the
Costas Loops is in gr-lfast the the optimized code.
For gr-clenabled, there's a tool that installs called test-clenabled and
you can pass it a parameter for the data size and it'll take the timing
measurements for both the OpenCL version and CPU version so you can run
tests on your hardware with any sizes you'd like to test.
Also, when you get gr-clenabled running it'll create 2 separate gnuradio
groups. The OpenCL-Accelerated group are the blocks that actually run
faster on the GPU's since the calculations could be done in parallel.
Those in the OpenCL-Enabled group function in OpenCL but their performance
is generally worse than the native CPU blocks.
I'm also pushing some updates tonight to it to clean up some of the
processing, but no major performance updates in this pass.
…On Thu, Apr 27, 2017 at 1:52 PM, kurtulmehtap ***@***.***> wrote:
With the new improvements, Is gr-lfast now faster than gr-clenabled for
the costas loop for block sizes faster than 8192?
Can you add performance measurements for extremely large blocks (like 2^20)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMY1FegcvolJ1ROWlCfHu8-OYvYdPRA2ks5r0NXrgaJpZM4NKjnF>
.
|
|
I see. Page 40 of your article "Study on Implementing OpenCL in Common GNURadio Blocks" showed 378 MSPS performance for a block size of 24576. I assume that did not include the Costas Loop but only the Quadrature Demod block for PSK. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
With the new improvements, Is gr-lfast now faster than gr-clenabled for the costas loop for block sizes faster than 8192?
Can you add performance measurements for extremely large blocks (like 2^20)
The text was updated successfully, but these errors were encountered: