Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cufft: use CUFFT_COMPATIBILITY_FFTW_PADDING instead of CUFFT_COMPATIBILITY_NATIVE #52

Closed
godsic opened this issue Nov 6, 2015 · 6 comments
Assignees

Comments

@godsic
Copy link
Contributor

godsic commented Nov 6, 2015

CUFFT_COMPATIBILITY_NATIVE mode is broken since cuda7.0+. Since CUFFT_COMPATIBILITY_NATIVE has been marked as DEPRECATED since cuda6.0, Nvidia developers are not willing to fix it. Therefore mumax3 should switch to the default CUFFT_COMPATIBILITY_FFTW_PADDING mode. This is rather big change.

Since Nvidia now suggests to use out-of-place r2c and c2r transforms, i.e. see page 20 of the "CUFFT LIBRARY USER'S GUIDE DU-06707-001_v7.5", then I would prefer to take this route. This will marginally increase mumax3 memory consumption, but is way less harmful, then say touching mighty demag kernel code.

@barnex any thoughts?

@godsic godsic self-assigned this Nov 6, 2015
@barnex barnex assigned barnex and godsic and unassigned godsic and barnex Nov 7, 2015
@barnex
Copy link
Member

barnex commented Nov 7, 2015

Yes, I agree. I don't want to touch the kernel layout at any cost.
I'll give it a shot this weekend.

@barnex barnex removed the enhancement label Nov 7, 2015
@godsic
Copy link
Contributor Author

godsic commented Nov 7, 2015

If you are busy, I am happy to work on this next week.

On 7 November 2015 at 21:09, Arne Vansteenkiste notifications@github.com
wrote:

Yes, I agree. I don't want to touch the kernel layout at any cost.
I'll give it a shot this weekend.


Reply to this email directly or view it on GitHub
#52 (comment).

Mykola

@barnex
Copy link
Member

barnex commented Nov 7, 2015

Sounds good. Let me know if you need any help.
e0c3a28 re-introduces the convolution self-test (against brute-force , for a sparse random magnetization). It is enabled by the flag -paranoid, and is on in all tests. This should help in testing the fix.

@godsic
Copy link
Contributor Author

godsic commented Nov 7, 2015

Thanks, this should help indeed.

@godsic
Copy link
Contributor Author

godsic commented Nov 11, 2015

de985bf
7313693

@godsic godsic closed this as completed Nov 11, 2015
@barnex
Copy link
Member

barnex commented Nov 12, 2015

Thanks @godsic! All tests pass on my 680.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants