New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] _Segmentation_Fault_from_npair_half_bin_newton.cpp_ #3835
Comments
This is not really a bug in LAMMPS, but an issue with your fix code: when requesting a custom neighbor list cutoff, you also have to make certain that you can actually reach atoms that are that far away and thus you have to check whether your custom cutoff plus neighbor list skin distance is smaller or equal to the communication cutoff. This is by default set to the neighbor list skin plus the largest pair style cutoff. In your input, you have a neighbor list skin of 13 angstrom plus are requesting a custom cutoff for your fix neighbor list of 21 angstrom, thus the communication cutoff must be at least 34 angstrom. This is not the case. You have a pair style cutoff of 4.5 angstrom, thus your communication cutoff is set to 17.5. Adding the line That said, a neighbor list skin of 13 angstrom is extremely wasteful. It adds many unwanted pairs to the neighbor list and thus can result in a significant slowdown. For standard molecular systems, it should be best to stick with the default neighbor list settings (2 angstrom skin, delay 0, every 1, check yes). Depending on the dynamics of the system and the fastest moving atoms as well as the choice of timestep, you may improve performance by using an even smaller skin (1.5 or 1.0) and use delay 1 or delay 2 (after delay 2 the performance gains from increasing the delay are quickly diminishing versus the risk of missing neighbors). You can look at fix srp for some prototype code to check whether the communication cutoff is sufficiently large. Look at the Please also not that LAMMPS prints an important warning with the large neighbor list skin for your system that must not be ignored. |
Dear Alex,
Thank you very much for looking into this issue. First of all, I totally
agree that skin distance of 13 is ridiculous and during actual simulations
I use either 1 or 2. I also do not extend the pair cutoff by 21 on top of
that (usually more like by ~10). I did get a similar error when using more
reasonable numbers. This is pretty much a synthetic example to highlight
the issue that I am facing. In my module I dynamically find the most
appropriate cutoff distance for the pair list used within this fix
depending on the input parameters, which sometimes can be larger than the
communication cutoff set within the input file as "the neighbor list skin
plus the largest pair style cutoff". However, in my understanding, when
requesting a pairlist with larger cutoff within the code as I did, that
should result in extension of communication cutoff either automatically or
by means of additional request within the code. I looked for the latter
within existing code, but could not find a corresponding example. I would
really appreciate it if you could direct me to an appropriate solution.
Hope this makes sense.
Sincerely,
…--
Aram Davtyan
On Thu, Jun 29, 2023 at 4:06 PM Axel Kohlmeyer ***@***.***> wrote:
This is not really a bug in LAMMPS, but an issue with your fix code: when
requesting a custom neighbor list cutoff, you also have to make certain
that you can actually *reach* atoms that are that far away and thus you
have to check whether your custom cutoff plus neighbor list skin distance
is smaller or equal to the communication cutoff. This is by default set to
the neighbor list skin plus the largest pair style cutoff.
In your input, you have a neighbor list skin of 13 angstrom plus are
requesting a custom cutoff for your fix neighbor list of 21 angstrom, thus
the communication cutoff must be at least 34 angstrom. This is not the
case. You have a pair style cutoff of 4.5 angstrom, thus your communication
cutoff is set to 17.5. Adding the line comm_modify cutoff 34 eliminates
the memory access issue and thus the segfaults.
That said, a neighbor list skin of 13 angstrom is *extremely* wasteful.
It adds many unwanted pairs to the neighbor list and thus can result in a
significant slowdown. For standard molecular systems, it should be best to
stick with the default neighbor list settings (2 angstrom skin, delay 0,
every 1, check yes). Depending on the dynamics of the system and the
fastest moving atoms as well as the choice of timestep, you may improve
performance by using an even smaller skin (1.5 or 1.0) and use delay 1 or
delay 2 (after delay 2 the performance gains from increasing the delay are
quickly diminishing versus the risk of missing neighbors).
So for this case adding comm_modify cutoff 23 would be sufficient.
You can look at fix srp for some prototype code to check whether the
communication cutoff is sufficiently large. Look at the setup_pre_force()
function around line 248.
Please also not that LAMMPS prints an important warning with the large
neighbor list skin for your system that must not be ignored.
—
Reply to this email directly, view it on GitHub
<#3835 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A5F3DY56PPDRHVANZ37E6L3XNVVU5ANCNFSM6AAAAAAZYK5P7Q>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Plus, wouldn't this raise an error or an exception rather than resulting in seg fault (or promote me to do something else), given that all I do is to change the largest pair list cutoff using provided standard functions intended for that? |
No. There is no precedence for this and you are mistaken. The only case where the communication cutoff is automatically extended is when a pair style cutoff requires it. In all other cases, it is the job of the individual code to ensure the communication cutoff requirements are fulfilled. An example for a case that supports requesting a neighbor list with a custom cutoff is compute rdf. That checks for whether a custom communication cutoff is set or uses the default communication cutoff and then requires that the custom cutoff for the g(r) computation is sufficiently small or stops with an error suggesting to change the communication cutoff as needed. You can extend the communication cutoff only outside of a run. Similarly, you cannot change a neighbor list request during a run. If you want to rather extend the communication cutoff on demand, it also needs to be done by your code during the init() processing. I have not seen this done anywhere so I cannot guarantee that this will work. |
no no, just to clarify I have no intension to change the cutoff during the run, just at the very beginning when simulation input parameters are given. Also, just to clarify that I understood correctly, there is no way to change the communication cutoff from within the code, only from the input (*.in) file? |
No. The general philosophy in LAMMPS is that programmers are not supposed to make unreasonable choices when writing their code. Requesting a neighbor list with a cutoff larger than the communication cutoff is an unreasonable choice. It is near impossible and would be a huge effort to protect all function arguments from bad choices. There are plenty of cases where what is a reasonable or unreasonable choice depends on the context and sometimes the base code has no knowledge of that. |
You are misunderstanding again! When you look through the |
Thanks for all the clarifications. I will try to find an example where |
Sorry, can I double check something to make sure that I understand, in compute_rdf.cpp and compute_adf.cpp, the pair list cutoffs are set to Or in other words, when setting pair list cutoff in my fix, should I set it to the required cutoff (user_cutoff) or user_cutoff+skin? |
No. But the |
The difference is between perpetual and occasional neighbor lists. |
Thank you very much! That explains a lot! |
Summary
When using a custom (fix) module that requests half pair list and sets a cutoff for the list using
set_cutoff()
function a segmentation fault occurs either every time or occasionally depending on the system. The segmentation fault is thrown on line 128 (delx = xtmp - x[j][0];
), due to ill defined value of j that originates from assignmentj = binhead[ibin+stencil[k]]
in for commandfor (j = binhead[ibin+stencil[k]]; j >= 0; j = bins[j])
on line 124. This is most likely due to memory corruption as far as I can tell.I have reversed any changes and stripped my module of any code except for pair list request and call of
set_cutoff()
to make sure that the memory corruption does not occure from within the module.It is also important to note that the latest stable version does not seem to have the issue.
LAMMPS Version and Platform
LAMMPS version: LAMMPS (28 Mar 2023 - Update 1) (the release version that was cloned from github at the beginning of June)
The error always occurs on linux system (#40~20.04.1-Ubuntu SMP Mon Mar 7 09:18:32 UTC 2022) with 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz
It also occasionally manifests on MacOS Ventura with M2 processor (approximately once in 20 times)
Steps to Reproduce
To provide more information and reproduce the error I prepared a directory with most basic simulation setup (subfolder
simulation_setup
), additional code that requests the pair list and sets cutoff (using one of the values the gives error on my side, though it may occurs at smaller values as well) placed in subdirectorysrc
, and log files from LAMMPS itself and Valgrind debug run (inerror_logs
subdirectory, also please note that line numbers after 128 may be shifted due to print that I added on my side to investigate the issue). The latter clearly reports the issues innpair_half_bin_newton.cpp
that originate on line 124.Further Information, Files, and Links
The archive of the directory described above is attached.
bug_report_npair_half_bin_newton.zip
The text was updated successfully, but these errors were encountered: