Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paramaterising with openff issues #389

Closed
annamherz opened this issue Oct 3, 2022 · 13 comments
Closed

Paramaterising with openff issues #389

annamherz opened this issue Oct 3, 2022 · 13 comments

Comments

@annamherz
Copy link
Contributor

Hello! I have had issues with parameterising molecules using openff. I am currently trying to paramaterise the attached molecules with sage ('openff_unconstrained-2.0.0'). I'm not sure if it is partially related to this issue? Currently, for the import openff toolkit part of _openforcefield.py I get the following message:

WARNING:openff.toolkit.utils.toolkit_registry:Warning: Unable to load toolkit 'OpenEye Toolkit'. The Open Force Field Toolkit does not require the OpenEye Toolkits, and can use RDKit/AmberTools instead. However, if you have a valid license for the OpenEye Toolkits, consider installing them for faster performance and additional file format support: https://docs.eyesopen.com/toolkits/python/quickstart-python/linuxosx.html OpenEye offers free Toolkit licenses for academics: https://www.eyesopen.com/academic-licensing

However I don't think this is causing the issues.

When carrying out the parameterisation with sdf files here in the code, I get the following error message when setting verbose to True for ejm31 (files: ejm31.zip):

Parameterisation failed! Last error: 'Unable to create OpenFF Molecule!: UndefinedStereochemistryError('Unable to make OFFMol from RDMol: Unable to make OFFMol from RDMol: RDMol has unspecified stereochemistry. RDMol name: Undefined chiral centers are:\n - Atom C (index 1)\n - Atom C (index 2)\n - Atom C (index 3)\n - Atom C (index 5)\n - Atom C (index 6)\n - Atom C (index 7)\n - Atom C (index 9)\n - Atom C (index 14)\n - Atom C (index 15)\n - Atom C (index 17)\n')'

Before this, the sdf file is read okay, is saved as a pdb without issue, and then read and saved again as an sdf molecule. For ejm31 this sdf molecule in the tmp file is:

     RDKit          3D

 32 33  0  0  0  0  0  0  0  0999 V2000
   -6.7910   -6.0370  -15.4770 C   0  0  0  0  0  0  0  0  0  0  0  0
   -7.3870   -4.7670  -15.5750 C   0  0  0  0  0  0  0  0  0  0  0  0
   -6.5970   -3.6380  -15.8240 C   0  0  0  0  0  0  0  0  0  0  0  0
   -8.7710   -9.1870  -16.0530 C   0  0  0  0  0  0  0  0  0  0  0  0
   -7.6340   -7.1860  -15.1430 C   0  0  0  0  0  0  0  0  0  0  0  0
   -9.5610   -9.5170  -14.9660 C   0  0  0  0  0  0  0  0  0  0  0  0
  -10.4280  -10.6090  -15.0860 C   0  0  0  0  0  0  0  0  0  0  0  0
   -5.4010   -6.1660  -15.6940 C   0  0  0  0  0  0  0  0  0  0  0  0
   -5.2150   -3.7560  -15.9990 C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.6230   -5.0220  -15.9380 C   0  0  0  0  0  0  0  0  0  0  0  0
   -9.1230   -4.5800  -15.4220 Cl  0  0  0  0  0  0  0  0  0  0  0  0
   -8.1580   -7.2420  -14.0360 O   0  0  0  0  0  0  0  0  0  0  0  0
   -7.8970   -8.1220  -16.1170 N   0  0  0  0  0  0  0  0  0  0  0  0
  -10.5100  -11.3670  -16.1970 N   0  0  0  0  0  0  0  0  0  0  0  0
   -9.7160  -11.0520  -17.2180 C   0  0  0  0  0  0  0  0  0  0  0  0
   -8.8550   -9.9670  -17.1820 C   0  0  0  0  0  0  0  0  0  0  0  0
   -9.8320  -11.8470  -18.3300 N   0  0  0  0  0  0  0  0  0  0  0  0
   -8.8840  -11.9480  -19.3160 C   0  0  0  0  0  0  0  0  0  0  0  0
   -7.8360  -11.3290  -19.2340 O   0  0  0  0  0  0  0  0  0  0  0  0
   -4.6010   -7.7220  -15.6600 Cl  0  0  0  0  0  0  0  0  0  0  0  0
   -7.0680   -2.6600  -15.8790 H   0  0  0  0  0  0  0  0  0  0  0  0
   -9.5210   -8.9330  -14.0510 H   0  0  0  0  0  0  0  0  0  0  0  0
  -11.0660  -10.8420  -14.2370 H   0  0  0  0  0  0  0  0  0  0  0  0
   -4.6110   -2.8690  -16.1900 H   0  0  0  0  0  0  0  0  0  0  0  0
   -3.5470   -5.1130  -16.0790 H   0  0  0  0  0  0  0  0  0  0  0  0
   -7.4570   -7.9660  -17.0170 H   0  0  0  0  0  0  0  0  0  0  0  0
   -8.2640   -9.7200  -18.0540 H   0  0  0  0  0  0  0  0  0  0  0  0
  -10.6740  -12.4030  -18.4100 H   0  0  0  0  0  0  0  0  0  0  0  0
   -9.1110  -12.8760  -20.5060 C   0  0  0  0  0  0  0  0  0  0  0  0
   -9.9910  -12.5620  -21.0680 H   0  0  0  0  0  0  0  0  0  0  0  0
   -8.2470  -12.8670  -21.1750 H   0  0  0  0  0  0  0  0  0  0  0  0
   -9.2680  -13.9040  -20.1880 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0
  1  5  1  0
  1  8  1  0
  2  3  1  0
  2 11  1  0
  3  9  1  0
  3 21  1  0
  4  6  1  0
  4 13  1  0
  4 16  1  0
  5 12  1  0
  5 13  1  0
  6  7  1  0
  6 22  1  0
  7 14  1  0
  7 23  1  0
  8 10  1  0
  8 20  1  0
  9 10  1  0
  9 24  1  0
 10 25  1  0
 13 26  1  0
 14 15  1  0
 15 16  1  0
 15 17  1  0
 16 27  1  0
 17 18  1  0
 17 28  1  0
 18 19  1  0
 18 29  1  0
 29 30  1  0
 29 31  1  0
 29 32  1  0
M  END
$$$$

This is different from the sdf file read in, and I think causing the error when it is then read by openff, as the error shows up after this conversion.
When running this without the rdkit step, so just directly using the original input sdf with _OpenFFMolecule.from_file this is able to be read. I think during the stage of converting the BSS molecule into an sdf, important structural information is lost which prevents it from being opened by openff. Paramaterisation with openff proceeds okay then following this with the original sdf file.

Afterwards, the next issue occurs with loading the molecule into a parmed structure. When I load the original sdf file into BSS and then save it as a pdb (similarly to how is done already), the following error occurs:
cannot reshape array of size 15 into shape (32,3)
And then saving of the actual parametrised molecule is not able to proceed.
When loading the attached pdb for ejm31 (which is the pdb obtained when I initially created the ligand and also saved the sdf), these coordinates were input okay into parmed and the structure could be saved and loaded into BSS again.

I was wondering if there was some better way to convert the file formats past just loading them into BSS as a molecule and then outputting them as sdf / pdbs? As this is the point where the errors seem to originate.

@lohedges
Copy link
Member

lohedges commented Oct 4, 2022

Hi @annamherz. I think the issue is that I have forgotten to avoid going via the intermediate PDB file now that we have a native SDF parser. (I had updated this in the network generation code following another issue that you raised, but apparently not in the OpenFF parameterisation wrapper.) The current approach is a hack (recommended by the OpenFF folks) to attempt to recover stereochemistry. I just need to check the file format associated with the molecule, then write directly to SDF if it was loaded from that format in the first place. I'll check with your example to see if it works.

For reference, you can safely ignore the OpenEye warning. OpenFF can use functionality from the toolkit if it is installed, but can't specify it as a hard dependency since it is proprietary software that needs a license. If it's not installed, it will just use an alternative backend, e.g. RDKit. (OpenFF is interoperable in many places.)

@lohedges
Copy link
Member

lohedges commented Oct 4, 2022

Okay, I've fixed the first bit, but ParmEd still seems unhappy. I'll try to figure out if the issue is with the OpenMM System, Topoolgy, or the positions from the PDB file.

@annamherz
Copy link
Contributor Author

For reference, you can safely ignore the OpenEye warning. OpenFF can use functionality from the toolkit if it is installed, but can't specify it as a hard dependency since it is proprietary software that needs a license. If it's not installed, it will just use an alternative backend, e.g. RDKit. (OpenFF is interoperable in many places.)

ah okay that makes sense!

@lohedges
Copy link
Member

lohedges commented Oct 4, 2022

Right, I've now fixed it. I was using positions from the PDB file when creating the ParmEd structure. (This was following an old OpenFF tutorial.) I actually want to use the positions from the first conformer of the OpenFF molecule, which might differ to those in the PDB. I'll tidy up a few things then push a fix.

@annamherz
Copy link
Contributor Author

thank you so much!

@annamherz
Copy link
Contributor Author

The paramaterising is working well now, however I'm still getting one error. After running
lig_p = BSS.Parameters.parameterise(ligand, "openff_unconstrained-2.0.0")
when I try to get the molecule from this:
lig_p.getMolecule()
I get the following error:

 File "/home/anna/BioSimSpace/python/BioSimSpace/Parameters/_process.py", line 227, in getMolecule
   raise _ParameterisationError("Parameterisation failed! Last error: '%s'" % str(self._last_error))
BioSimSpace._Exceptions._exceptions.ParameterisationError: Parameterisation failed! Last error: 'Unable to convert OpenMM System to ParmEd structure!: ImportError('You must have at least OpenMM 6.3 installed')'

This is a bit strange, as I've checked the openmm version of my conda environment, and also when printing it in the python script for this, the version is 7.7 ?

@annamherz annamherz reopened this Oct 4, 2022
@lohedges
Copy link
Member

lohedges commented Oct 4, 2022

Can you try again in a completely clean conda environment? I'm not seeing this locally. I wonder if you have some environment variables set that mean that OpenFF is picking up a different version of OpenMM. Alternatively, it might be failing to extract the version number for some reason.

Could you confirm which OS you are using and how you installed BioSImSpace, i.e. source, or via conda?

@lohedges
Copy link
Member

lohedges commented Oct 4, 2022

For what it's worth, I'm very surprised that it could install an OpenMM version less than 6.3. Assuming it pulls in the dependency via conda-forge (which it should, given our instructions and channel priority), then it only goes back to 7.4. Looking at the Omnia version here, 6.2 was released over 7 years ago! This would be completely incompatible with Sire and there is absolutely no way this could be pulled in via any of our current build or installation methods 🤷‍♂️

@lohedges
Copy link
Member

lohedges commented Oct 4, 2022

The ParmEd function that is failing is here. This has no check of the OpenMM version number, so it's not clear to me where the check is being performed.

@lohedges
Copy link
Member

lohedges commented Oct 4, 2022

I've also grepped the entire BioSimSpace conda environment for "OpenMM 6.3" and don't see anything, so it doesn't look like the exception is anything that would be raised by the standard packages that should be installed. Perhaps you have some outdated versions of certain packages installed elsewhere.

Could you try something like the following in your environment:

import openmm
print(openmm.__file__)
/home/lester/.conda/envs/sire-dev/lib/python3.9/site-packages/openmm/__init__.py

import openff.toolkit
print(openff.toolkit.__file__)
/home/lester/.conda/envs/sire-dev/lib/python3.9/site-packages/openff/toolkit/__init__.py

import parmed
print(parmed.__file__)
/home/lester/.conda/envs/sire-dev/lib/python3.9/site-packages/parmed/__init__.py

That will check the location of the modules that are required for the functionality that is failing.

@annamherz
Copy link
Contributor Author

For what it's worth, I'm very surprised that it could install an OpenMM version less than 6.3. Assuming it pulls in the dependency via conda-forge (which it should, given our instructions and channel priority), then it only goes back to 7.4. Looking at the Omnia version here, 6.2 was released over 7 years ago! This would be completely incompatible with Sire and there is absolutely no way this could be pulled in via any of our current build or installation methods man_shrugging

Yes, I'm fairly certain that the openmm version isn't less than 6.3 at all, as I don't think that has ever been installed anywhere on the machine I'm using? I'm running Ubuntu 22. BioSimSpace is installed via source and then used via the PYTHONPATH variable, as I'm changing between the branches, but I was getting this error for both the devel branch and the feature-amber-fep. I'm using Sire from the conda environment. I installed a clean conda environment using this and then updated sire to 2023 using conda install -c conda-forge -c michellab/label/dev sire=2023.0.0 and I still got the same error.

Trying the above:

/home/anna/anaconda3/envs/biosimspace-dev-clean/lib/python3.9/site-packages/openmm/__init__.py
/home/anna/anaconda3/envs/biosimspace-dev-clean/lib/python3.9/site-packages/openff/toolkit/__init__.py
/home/anna/amber22/lib/python3.9/site-packages/parmed/__init__.py

I think the problem is this, so it is taking parmed from the amber22 install. This does check the version.

@annamherz
Copy link
Contributor Author

annamherz commented Oct 4, 2022

I think the problem is this, so it is taking parmed from the amber22 install. This does check the version.

In the conda install, parmed is version 3.4.3, whilst in the amber install it is only 3.4.1.

This doesn't happen when I start python from the terminal whilst the conda environment is active. In that case, I get:
/home/anna/anaconda3/envs/biosimspace-dev-clean/lib/python3.9/site-packages/parmed/__init__.py
I think it is because in the overall pipeline script I'm running, Amber is sourced for later MD runs for preparing the parameterised ligands so that the correct version of amber is used. If I run the ligand parameterisation script in a fresh terminal without sourcing amber before, the version is 3.4.3 and from the conda install, and it is able to proceed okay. So I guess in this case I just can't source the amber in the overall bash script before. This does make running the MD runs a bit awkward in the sense that I then can't use the default $AMBERHOME variable for the exe path, but I can circumvent that by setting the amber path to a different variable, so I think it should be fine now!

@lohedges
Copy link
Member

lohedges commented Oct 4, 2022

Ah, great sleuthing. That's annoying that non-conda ambertools bundles its own version of parmed. It's good that a workaround is easy enough, but it would be good to figure out something else so that it doesn't trip other users up in future. Perhaps we can check the module path for parmed and raise a warning if it lies outside of the active conda environment. Perhaps there's a way to directly import the correct one (assuming we know the path).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants