Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order bonds, angles, and dihedrals consistently for lammpsdata files #1070

Closed
jaclark5 opened this issue Dec 10, 2022 · 9 comments · Fixed by #1071
Closed

Order bonds, angles, and dihedrals consistently for lammpsdata files #1070

jaclark5 opened this issue Dec 10, 2022 · 9 comments · Fixed by #1071

Comments

@jaclark5
Copy link
Contributor

Describe the behavior you would like added to mBuild
The way that mBuild currently functions appears to be that bond, angle, and dihedral types are ordered according to the order that the molecules are packed (although atom types are consistent). This becomes an annoyance when comparing the properties of independent boxes and the identifiers are inconsistent. This is also an issue when using TIP4P water and the bond and angle numbers change.

Describe the solution you'd like
I'd like the bond and angles to be written out in the same order given the same atom type ordering.

@chrisiacovella
Copy link
Contributor

Can you give some more specific examples. E.g., actual output from the different lammps files being? We were all just talking about this and it wasn't completely clear to us what "order" you were referring to (because there are so many ways in which "order" comes up in putting things into the lammps data format that casts atom types as integers).

@jaclark5
Copy link
Contributor Author

So in lammps files you have a list of bond and angle types, and then all the bonds and angles are assigned to those types. I made this change because I generated three independent boxes with the same chemical system using tip4p water and some other stuff. After generating the boxes in the same exact way (I wrote a function), the bond type ids and angle type ids were different for each box, thus I went to each file to find the TIP4P bond and angle ids to make a slightly different submission script for each box with the following lines.


Box 1:
variable Wbond equal 11
variable Wangle equal 11
pair_style      lj/cut/tip4p/long ${OW} ${HW} ${Wbond} ${Wangle} 0.125 12.0

Box 2:
variable Wbond equal 16
variable Wangle equal 8
pair_style      lj/cut/tip4p/long ${OW} ${HW} ${Wbond} ${Wangle} 0.125 12.0

Box 3:
variable Wbond equal 4
variable Wangle equal 21
pair_style      lj/cut/tip4p/long ${OW} ${HW} ${Wbond} ${Wangle} 0.125 12.0

As you can imagine this was a point of detail that I continuously had to check when recycling lammps submission scripts in subsequent steps and admittedly neglected to do on occasion which was a source of frustration.

I understand the the type ids will change if the system changes, but it would be really nice if when running the same chemical system, the ids were consistently representing the same bonds and angles. I would imagine that this would also be useful in post-processing with Ovito if one were to compare bond length distributions for the same chemical system in different simulations. Otherwise I go back to the *.lmp file that mbuild outputs and look for the bond type with the atoms I'm interested in for each box.

@chrisiacovella
Copy link
Contributor

Thanks for clarifying...it will definitely make it easier to review the PR. We've had a lot of recent discussions about how to handle conversions to formats that are non-unique (e.g., integers like in lammps, or 4 character representations for other formats), and the pain of the non-uniqueness when it comes to analysis and visualization, so this is timely.

@bc118
Copy link
Contributor

bc118 commented Dec 15, 2022

OK. I think what needs to be done to resolve this issue is essentially the below, outlined in a simple example:

We need to do this for bonded types/classes as an example, with the numbers being the atom_types/classes:

orig_all bonds = [[3,2], [2,3], [5,4] [1,2], [2,1], [3,2], [2,3], [5,4]] (no guarantee this is ever the same)

sort inner list order and eliminate backward duplicates

new = [[2,3], [4,5], [1,2]]

sort overall order:

Final_order = [[1,2], [2,3], [4,5]]

This type of sorting can be done for all the bonded parameters (bonds, angles, dihedrals, and impropers) and non-bonded parameters. This will ensure the same system (chemical components and FFs) will have the same atom_type numbers, bond types, angle_type, dihedral_type, and improper_type numbers; thereby, allowing only 1 LAMMPS control file to be written. This should work regardless of the concentration or system size as long as chemical components and FFs are the same.

@jaclark5
Copy link
Contributor Author

jaclark5 commented Dec 15, 2022

That will work for bond types, but you can't change the atom order in the angle or dihedral types in your first step like that. The referenced PR offers a method of sorting those.

@bc118
Copy link
Contributor

bc118 commented Dec 15, 2022

@jaclark5 yeah, you are correct. it is custom to each bonded/non-bonded parameter, in its individual format. I just wanted to show the simplest case easily.

@jaclark5
Copy link
Contributor Author

Ok yes, I think we are on the same page then.

@CalCraven
Copy link
Contributor

Thank you all for your discussion. This is really helpful when writing a fix. I'm in the process of writing some tests cases for this PR that will reflect the sorting and ordering scheme described by @bc118. Since it does seem like a nice and consistent way for this output if used in other softwares, I think this should be the default method of writing. There are two concerns with this which are mentioned below.

1. This will change a lot of the expected tests in test_lammpsdata.py. This shouldn't be a huge problem to change manually, but still worth mentioning.
2. This change may reduce the speed of writing for large (10,000 atom +) systems. I'm going to test this a bit, but it may necessitate some sort of ordering function, where the user may need to specify if they want to a faster, or more ordered method of writing.

Lastly, I want to point all users to to this PR #701 which looks to use GMSO as the basis for writing out a lammps data file. These same issues will be addressed for this writer, which is currently in a testing phase, and may provide both better speed and better flexibility for lammps parameterization styles and unit styles. Since it may be helpful, I'm linking a brief blurb of code for any users interested in writing out that way.

from gmso.external import from_mbuild
gmso_topology = from_mbuild(MY_MBUILD_CPD) #conversion to gmso

from gmso.core import ForceField
forcefield = ForceField(MY_FOYER_OR_GMSO_FORCEFIELD_FILE_PATH.xml) # load forcefield into memory

from gmso.parameterization import apply
gmso_topology_param = apply(gmso_topology, forcefield, identify_connections=True) #apply the forcefield to the topology

gmso_topology_param.save("my_topology.lammps", unit_style='real') #write output into real units

@jaclark5
Copy link
Contributor Author

I already corrected the tests and this won't have a scaling issue since I'm reordering the types, the code that assigns types to specific bonds/angles/dihedrals hasn't been touched.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants