[BUG] _Segmentation_Fault_from_npair_half_bin_newton.cpp_ #3835

aramdbs · 2023-06-29T10:54:55Z

Summary

When using a custom (fix) module that requests half pair list and sets a cutoff for the list using set_cutoff() function a segmentation fault occurs either every time or occasionally depending on the system. The segmentation fault is thrown on line 128 (delx = xtmp - x[j][0];), due to ill defined value of j that originates from assignment j = binhead[ibin+stencil[k]] in for command for (j = binhead[ibin+stencil[k]]; j >= 0; j = bins[j]) on line 124. This is most likely due to memory corruption as far as I can tell.

I have reversed any changes and stripped my module of any code except for pair list request and call of set_cutoff() to make sure that the memory corruption does not occure from within the module.

It is also important to note that the latest stable version does not seem to have the issue.

LAMMPS Version and Platform

LAMMPS version: LAMMPS (28 Mar 2023 - Update 1) (the release version that was cloned from github at the beginning of June)
The error always occurs on linux system (#40~20.04.1-Ubuntu SMP Mon Mar 7 09:18:32 UTC 2022) with 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz
It also occasionally manifests on MacOS Ventura with M2 processor (approximately once in 20 times)

Steps to Reproduce

To provide more information and reproduce the error I prepared a directory with most basic simulation setup (subfolder simulation_setup), additional code that requests the pair list and sets cutoff (using one of the values the gives error on my side, though it may occurs at smaller values as well) placed in subdirectory src, and log files from LAMMPS itself and Valgrind debug run (in error_logs subdirectory, also please note that line numbers after 128 may be shifted due to print that I added on my side to investigate the issue). The latter clearly reports the issues in npair_half_bin_newton.cpp that originate on line 124.

Further Information, Files, and Links

The archive of the directory described above is attached.
bug_report_npair_half_bin_newton.zip

The text was updated successfully, but these errors were encountered:

akohlmey · 2023-06-29T12:06:26Z

This is not really a bug in LAMMPS, but an issue with your fix code: when requesting a custom neighbor list cutoff, you also have to make certain that you can actually reach atoms that are that far away and thus you have to check whether your custom cutoff plus neighbor list skin distance is smaller or equal to the communication cutoff. This is by default set to the neighbor list skin plus the largest pair style cutoff.

In your input, you have a neighbor list skin of 13 angstrom plus are requesting a custom cutoff for your fix neighbor list of 21 angstrom, thus the communication cutoff must be at least 34 angstrom. This is not the case. You have a pair style cutoff of 4.5 angstrom, thus your communication cutoff is set to 17.5. Adding the line comm_modify cutoff 34 eliminates the memory access issue and thus the segfaults.

That said, a neighbor list skin of 13 angstrom is extremely wasteful. It adds many unwanted pairs to the neighbor list and thus can result in a significant slowdown. For standard molecular systems, it should be best to stick with the default neighbor list settings (2 angstrom skin, delay 0, every 1, check yes). Depending on the dynamics of the system and the fastest moving atoms as well as the choice of timestep, you may improve performance by using an even smaller skin (1.5 or 1.0) and use delay 1 or delay 2 (after delay 2 the performance gains from increasing the delay are quickly diminishing versus the risk of missing neighbors).
So for this case adding comm_modify cutoff 23 would be sufficient.

You can look at fix srp for some prototype code to check whether the communication cutoff is sufficiently large. Look at the setup_pre_force() function around line 248.

Please also not that LAMMPS prints an important warning with the large neighbor list skin for your system that must not be ignored.

aramdbs · 2023-06-29T12:25:06Z

Dear Alex, Thank you very much for looking into this issue. First of all, I totally agree that skin distance of 13 is ridiculous and during actual simulations I use either 1 or 2. I also do not extend the pair cutoff by 21 on top of that (usually more like by ~10). I did get a similar error when using more reasonable numbers. This is pretty much a synthetic example to highlight the issue that I am facing. In my module I dynamically find the most appropriate cutoff distance for the pair list used within this fix depending on the input parameters, which sometimes can be larger than the communication cutoff set within the input file as "the neighbor list skin plus the largest pair style cutoff". However, in my understanding, when requesting a pairlist with larger cutoff within the code as I did, that should result in extension of communication cutoff either automatically or by means of additional request within the code. I looked for the latter within existing code, but could not find a corresponding example. I would really appreciate it if you could direct me to an appropriate solution. Hope this makes sense. Sincerely,

…

-- Aram Davtyan

On Thu, Jun 29, 2023 at 4:06 PM Axel Kohlmeyer ***@***.***> wrote: This is not really a bug in LAMMPS, but an issue with your fix code: when requesting a custom neighbor list cutoff, you also have to make certain that you can actually *reach* atoms that are that far away and thus you have to check whether your custom cutoff plus neighbor list skin distance is smaller or equal to the communication cutoff. This is by default set to the neighbor list skin plus the largest pair style cutoff. In your input, you have a neighbor list skin of 13 angstrom plus are requesting a custom cutoff for your fix neighbor list of 21 angstrom, thus the communication cutoff must be at least 34 angstrom. This is not the case. You have a pair style cutoff of 4.5 angstrom, thus your communication cutoff is set to 17.5. Adding the line comm_modify cutoff 34 eliminates the memory access issue and thus the segfaults. That said, a neighbor list skin of 13 angstrom is *extremely* wasteful. It adds many unwanted pairs to the neighbor list and thus can result in a significant slowdown. For standard molecular systems, it should be best to stick with the default neighbor list settings (2 angstrom skin, delay 0, every 1, check yes). Depending on the dynamics of the system and the fastest moving atoms as well as the choice of timestep, you may improve performance by using an even smaller skin (1.5 or 1.0) and use delay 1 or delay 2 (after delay 2 the performance gains from increasing the delay are quickly diminishing versus the risk of missing neighbors). So for this case adding comm_modify cutoff 23 would be sufficient. You can look at fix srp for some prototype code to check whether the communication cutoff is sufficiently large. Look at the setup_pre_force() function around line 248. Please also not that LAMMPS prints an important warning with the large neighbor list skin for your system that must not be ignored. — Reply to this email directly, view it on GitHub <#3835 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A5F3DY56PPDRHVANZ37E6L3XNVVU5ANCNFSM6AAAAAAZYK5P7Q> . You are receiving this because you authored the thread.Message ID: ***@***.***>

aramdbs · 2023-06-29T12:35:56Z

Plus, wouldn't this raise an error or an exception rather than resulting in seg fault (or promote me to do something else), given that all I do is to change the largest pair list cutoff using provided standard functions intended for that?

akohlmey · 2023-06-29T12:37:17Z

However, in my understanding, when requesting a pairlist with larger cutoff within the code as I did, that should result in extension of communication cutoff either automatically or by means of additional request within the code.

No. There is no precedence for this and you are mistaken. The only case where the communication cutoff is automatically extended is when a pair style cutoff requires it. In all other cases, it is the job of the individual code to ensure the communication cutoff requirements are fulfilled. An example for a case that supports requesting a neighbor list with a custom cutoff is compute rdf. That checks for whether a custom communication cutoff is set or uses the default communication cutoff and then requires that the custom cutoff for the g(r) computation is sufficiently small or stops with an error suggesting to change the communication cutoff as needed.

You can extend the communication cutoff only outside of a run. Similarly, you cannot change a neighbor list request during a run. If you want to rather extend the communication cutoff on demand, it also needs to be done by your code during the init() processing. I have not seen this done anywhere so I cannot guarantee that this will work.

aramdbs · 2023-06-29T12:41:09Z

no no, just to clarify I have no intension to change the cutoff during the run, just at the very beginning when simulation input parameters are given.

Also, just to clarify that I understood correctly, there is no way to change the communication cutoff from within the code, only from the input (*.in) file?

akohlmey · 2023-06-29T12:41:22Z

Plus, wouldn't this raise an error or an exception rather than resulting in seg fault (or promote me to do something else), given that all I do is to change the largest pair list cutoff using provided standard functions intended for that?

No. The general philosophy in LAMMPS is that programmers are not supposed to make unreasonable choices when writing their code. Requesting a neighbor list with a cutoff larger than the communication cutoff is an unreasonable choice. It is near impossible and would be a huge effort to protect all function arguments from bad choices. There are plenty of cases where what is a reasonable or unreasonable choice depends on the context and sometimes the base code has no knowledge of that.

akohlmey · 2023-06-29T12:46:25Z

Also, just to clarify that I understood correctly, there is no way to change the communication cutoff from within the code, only from the input (*.in) file?

You are misunderstanding again!

When you look through the Input class, you can see the the command comm_modify calls the function Comm::modify_params(). In that function, the class member Comm::cutghostuser is set to the requested value.
There are several cases, where this value is changed, e.g. some of the GPU package pair styles for TIP4P water.

aramdbs · 2023-06-29T12:52:31Z

Thanks for all the clarifications. I will try to find an example where Comm::cutghostuser is changed to see if that may work in my case.

aramdbs · 2023-06-30T00:11:26Z

Sorry, can I double check something to make sure that I understand, in compute_rdf.cpp and compute_adf.cpp, the pair list cutoffs are set to mycutneigh = cutoff_user + skin and then, the following check is performed:
mycutneigh<MAX(force->pair->cutforce+skin, comm->cutghostuser). However, according to how you came up with communication cutoff of 34 above, shouldn't it really be mycutneigh+skin<MAX(force->pair->cutforce+skin,comm->cutghostuser)? Or is it somehow different for compute and fix styles?

Or in other words, when setting pair list cutoff in my fix, should I set it to the required cutoff (user_cutoff) or user_cutoff+skin?

akohlmey · 2023-06-30T00:17:44Z

shouldn't it really be mycutneigh+skin<MAX(force->pair->cutforce+skin,comm->cutghostuser)?

No. But the request->set_cutoff() call may be using the wrong argument. I need to check that.

akohlmey · 2023-06-30T00:36:54Z

Or is it somehow different for compute and fix styles?

The difference is between perpetual and occasional neighbor lists.
For perpetual neighbor lists, they are automatically rebuilt regularly during the run due to the settings of the neighbor and neigh_modify command. For those the neighbor list skin is added automatically. In compute rdf you have an occasional neighbor list request where the neighbor list is built explicitly with a call to Neighbor::build_one(). At that point, no neighbor list skin distance would be needed unless the programmer want to use an additional buffer in the neighbor list to make certain there are no rounding or other similar issues to would exclude a single pair. For a force computation, these rarely make a difference, since the forces at the cutoff are small and mostly cancel for homogeneous systems. However, compute rdf is more sensitive, specifically the coordination number to missing single pairs of atoms. Thus adding a buffer for safety is a good idea and the neighbor list skin is a good value for that, since it is usually adjusted to the active unit settings. Using a fixed number could lead to a very inefficient choice.

aramdbs · 2023-06-30T01:04:55Z

Thank you very much! That explains a lot!

aramdbs added the bug label Jun 29, 2023

akohlmey added this to the Stable Release Summer 2023 milestone Jun 29, 2023

akohlmey added the invalid label Jun 29, 2023

akohlmey self-assigned this Jun 29, 2023

akohlmey mentioned this issue Jun 30, 2023

Collected small changes and fixes #3830

Merged

10 tasks

akohlmey closed this as completed in #3830 Jul 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] _Segmentation_Fault_from_npair_half_bin_newton.cpp_ #3835

[BUG] _Segmentation_Fault_from_npair_half_bin_newton.cpp_ #3835

aramdbs commented Jun 29, 2023 •

edited

akohlmey commented Jun 29, 2023

aramdbs commented Jun 29, 2023 via email

aramdbs commented Jun 29, 2023

akohlmey commented Jun 29, 2023

aramdbs commented Jun 29, 2023

akohlmey commented Jun 29, 2023

akohlmey commented Jun 29, 2023

aramdbs commented Jun 29, 2023

aramdbs commented Jun 30, 2023 •

edited

akohlmey commented Jun 30, 2023

akohlmey commented Jun 30, 2023

aramdbs commented Jun 30, 2023

[BUG] _Segmentation_Fault_from_npair_half_bin_newton.cpp_ #3835

[BUG] _Segmentation_Fault_from_npair_half_bin_newton.cpp_ #3835

Comments

aramdbs commented Jun 29, 2023 • edited

akohlmey commented Jun 29, 2023

aramdbs commented Jun 29, 2023 via email

aramdbs commented Jun 29, 2023

akohlmey commented Jun 29, 2023

aramdbs commented Jun 29, 2023

akohlmey commented Jun 29, 2023

akohlmey commented Jun 29, 2023

aramdbs commented Jun 29, 2023

aramdbs commented Jun 30, 2023 • edited

akohlmey commented Jun 30, 2023

akohlmey commented Jun 30, 2023

aramdbs commented Jun 30, 2023

aramdbs commented Jun 29, 2023 •

edited

aramdbs commented Jun 30, 2023 •

edited