Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to keep parity on the residue & chain name assignment behaviour with v0.11 breaking change? #1764

Open
hjuinj opened this issue Nov 8, 2023 · 4 comments

Comments

@hjuinj
Copy link
Collaborator

hjuinj commented Nov 8, 2023

With release 0.11.0 the Topology class was changed.
https://docs.openforcefield.org/projects/toolkit/en/stable/releasehistory.html#atom-metadata-and-hierarchy-schemes-for-iterating-over-residues-chains-etc

In particular if I called off_mol.to_topology().to_openmm() prior to this version,
the toolkit will automatically try to find a residue name in the mol_to_residues dictionary and if not found, assign off_mol.name to be the residue name. Similar automation is done for chain assignment.

Beyond 0.11.0, the residue name is set to 'UNK' and chain id to 'X'

for atom in molecule.atoms:
# If the residue name is undefined, assume a default of "UNK"
if "residue_name" in atom.metadata:
atom_residue_name = atom.metadata["residue_name"]
else:
atom_residue_name = "UNK"
# If the residue number is undefined, assume a default of "0"
if "residue_number" in atom.metadata:
atom_residue_number = atom.metadata["residue_number"]
else:
atom_residue_number = "0"
# If the insertion code is undefined, assume a default of " "
if "insertion_code" in atom.metadata:
atom_insertion_code = atom.metadata["insertion_code"]
else:
atom_insertion_code = " "
# If the chain ID is undefined, assume a default of "X"
if "chain_id" in atom.metadata:
atom_chain_id = atom.metadata["chain_id"]
else:
atom_chain_id = "X"

What is the best way to keep parity with this behaviour beyond version 0.11? Do I have to iterate through each atom in the off_mol and update the atom metadata field? Or is there a more elegant way? Thank you

@mattwthompson
Copy link
Member

If I understand correctly, you're trying to ensure the openmm.app.Topology has residues named according to the various Molecule.names? Mapping molecule names (which don't have standardized meanings) to residue names (which certainly do) is something of a transformation that arguably shouldn't have been done previously. I'm not sure if logic like below could be considered

                if "residue_name" in atom.metadata:
                    atom_residue_name = atom.metadata["residue_name"]
                else:
-                    atom_residue_name = "UNK"
+                    atom_residue_name = "UNK" if atom.molecule.name is not None else atom.moleucle.name

Do I have to iterate through each atom ...

This is the only way I know how to do it (which isn't to say there isn't another way!) since residue information is stored on the atoms themselves. I previously found this to be less-than-ideal, but I forget if we discussed a different way of updating residue information.

The main author of this functionality (@j-wags) is out until December; I'd like his feedback before modifying any of this behavior. For now I think you're stuck iterating through the atoms, but in the future it might be smoother.

@hjuinj
Copy link
Collaborator Author

hjuinj commented Nov 8, 2023

Thank you for the prompt reply Matt, this sounds good, it will be great if this can be improved in the future.

@mattwthompson
Copy link
Member

Thanks for your patience!

I think the documentation could be improved, as tracked in the linked issue. Two features I'd like to get public 👍/👎 on from Jeff when he returns are

  1. Setting residues (or other hierarchy information) from the Topology/Molecule APIs
  2. Falling back to Molecule.name, if present and if atoms do not have residue names, in the above OpenMM conversion(s).

If either of these are good ideas, let's open separate tickets for them.

@j-wags
Copy link
Member

j-wags commented Dec 7, 2023

@mattwthompson is right - The only way to do it currently is to iterate over all atoms and update their metadata.

Setting residues (or other hierarchy information) from the Topology/Molecule APIs

👍 Good idea, I'd accept a PR with these changes

Falling back to Molecule.name, if present and if atoms do not have residue names, in the above OpenMM conversion(s).

👎 I agree that this was an unnecessary behavior change in 0.11, which I wish I'd caught. But now this is a behavior that users who DID promptly update to 0.11 (or who started using OFFTK since then) have built around for 16 months, and to revert it after all this time would effectively be a second behavior change. Considered as a feature request against the current behavior, the benefit of this change is ambiguous. Most molecule names are more than 3 characters, which leads to confusion when writing PDBs or other formats that expect 3-character resnames. And the automagic "sometimes we use the data from here, but other times we use the data from there" pattern is a common cause of user confusion/bugs in workflows that involve OFFTK. So overall I don't think I'd welcome this change, though maybe I'm overlooking something.

@j-wags j-wags removed the help wanted label Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants