-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not a number in L2 error #1
Comments
Managing 15 students doesn't give me much time to code or debug anymore. |
Ok, I'll try so see if I can find and report soon. |
Data are well initialized, no problem here. I tried a smaller configuration (1024 bodies), running on K80 (sm_37). Trying to analyze the problem turns to be quite hard.
Running larger configuration (still with debug flags) provides "mostly" good results, i.e. fmm and direct agree, except at some location, where fmm potential generate crazy large numbers. I also don't know if I can "trust" cuda-memcheck : when running the executable (built with -g -G -O0), cuda-memcheck somtimes report an out of bound error in buildOctant kernel, sometime the execution never stops ! I'd like to identify if there is really a problem with buildOctant (and possibly related to recursive call inside kernel, ie cuda dynamic parallelism). Would you recommend a better configuration (number of particules, NCRIT, ....) which could be better to analyze this behavior ? |
The code was working at some point. Perhaps you could checkout older revisions to see if they work? |
Also, this is the original code. |
I just did some changes to code.. --- FMM Profiling ---------------- Although for large sizes, I end up into CUDA launch timeouts as I have an active display. |
Hi,
I have tested several GPU arch (sm_35, sm_50) with different cuda version (8.0 and 10.0), and keep having on output log:
--- FMM vs. direct ---------------
Rel. L2 Error (pot) : nan
Rel. L2 Error (acc) : nan
The bodyAcc arrays contains lot's of NaN.
Cuda-memcheck does not complain.
I don't know where this problem could originate.
Can anyone help to identify the problem, confirm / invalidate this behavior ?
The text was updated successfully, but these errors were encountered: