Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OBMol::CopySubstructure #1811

Merged
merged 10 commits into from Apr 26, 2018

Conversation

@baoilleach
Copy link
Member

commented Apr 5, 2018

Here is an implementation of OBMol::CopySubstructure. It copies a substructure of a molecule to another one, and has a bunch of optional parameters that hopefully cover every possible use case (or at least, make it possible for the user to do so). OBMol::NextFragment() already had some of this code; I've taken it out and expanded it, and now NextFragment() just calls this new function.

This is needed to implement reactions as OBMols - e.g. if someone wants reagent 1. Also, we really should have it in any case - I've missed it before, and it's not the sort of thing that's simple for the user to throw together. One use case is that it simplifies work with matched pairs, where you are chopping up molecules.

@baoilleach

This comment has been minimized.

Copy link
Member Author

commented Apr 5, 2018

Here's the doxygen docs (only a bit mangled):

Copy part of a molecule to another molecule.

This function copies a substructure of a molecule to another molecule. The key information needed is an OBBitVec indicating which atoms to include and (optionally) an OBBitVec indicating which bonds to exclude. By default, only bonds joining included atoms are copied.

When an atom is copied, but not all of its bonds are, by default hydrogen counts are adjusted to account for the missing bonds. That is, given the SMILES "CF", if we copy the two atoms but exclude the bond, we will end up with "C.F". This behavior can be changed by specifiying a value other than 1 for the option parameter. A value of 0 will yield "[C].[F]" while 2 will yield "C*.F*" (see option below for more information).

Aromaticity is preserved as present in the original OBMol. If this is not desired, the user should call OBMol::UnsetAromaticPerceived() on the new OBMol.

Stereochemistry is only preserved if the corresponding elements are wholly present in the substructure. For example, all four atoms and bonds of a tetrahedral stereocenter must be copied.

Here is an example of using this method to copy ring systems to a new molecule. Given the molecule represented by the SMILES string, "FC1CC1c2ccccc2I", we will end up with a new molecule represented by the SMILES string, "C1CC1.c2ccccc2".
OBBitVec atoms(mol.NumAtoms() + 1); // the maximum size needed
FOR_ATOMS_OF_MOL(atom, mol) {
if(atom->IsInRing())
atoms.SetBitOn(atom->Idx());
}
OBBitVec excludebonds(mol.NumBonds()); // the maximum size needed
FOR_BONDS_OF_MOL(bond, mol) {
if(!bond->IsInRing())
excludebonds.SetBitOn(bond->Idx());
}
OBMol newmol;
mol.CopySubstructure(&newmol, &atoms, &bonds);

When used from Python, note that "None" may be used to specify an empty value for the excludebonds parameter.

Remarks
Some alternatives to using this function, which may be preferred in some instances due to efficiency or convenience are:

    Copying the entire OBMol, and then deleting the unwanted parts
    Modifiying the original OBMol, and then restoring it
    Using the SMILES writer option -xf to specify fragment atom idxs

Returns
A boolean indicating success or failure. Currently failure is only reported if one of the specified atoms is not present.

Parameters
newmol The molecule to which to add the substructure. Note that atoms are appended to this molecule.
atoms An OBBitVec, indexed by atom Idx, specifying which atoms to copy
excludebonds An OBBitVec, indexed by bond Idx, specifying a list of bonds to exclude. By default, all bonds between the specified atoms are included - this parameter overrides that.
correctvalence A value of 0, 1 (default) or 2 that indicates how atoms with missing bonds are handled: 0 - Leave the implicit hydrogen count unchanged; 1 - Adjust the implicit hydrogen count to correct for the missing bonds; 2 - Replace the missing bonds with bonds to dummy atoms
atomorder Record the Idxs of the original atoms. That is, the first element in this vector will be the Idx of the atom in the original OBMol that corresponds to the first atom in the new OBMol. Note that the information is appended to this vector.
bondorder Record the Idxs of the original bonds. See atomorder above.

baoilleach added 2 commits Apr 5, 2018
@mwojcikowski

This comment has been minimized.

Copy link
Contributor

commented Apr 5, 2018

I really like the functionality, but I think that you didn't copy the residue information. Do you think it might be feasible? I think such optional feature would be useful for extracting pockets of proteins etc.

@baoilleach

This comment has been minimized.

Copy link
Member Author

commented Apr 5, 2018

I'll look into it.

@baoilleach

This comment has been minimized.

Copy link
Member Author

commented Apr 6, 2018

Done. Expect an updated PR asap.

@baoilleach

This comment has been minimized.

Copy link
Member Author

commented Apr 6, 2018

Here's the new docs regarding residue information. Hopefully it's not too confusing:

Residue information is preserved if the original OBMol is marked as having
its residues perceived. If this is not desired, either call
OBMol::UnsetChainsPerceived() in advance on the original OBMol to avoid copying
the residues (and then reset it afterwards), or else call it on the new OBMol so
that residue information will be reperceived (when requested).

Oops - I've just noticed that there is no UnsetChainsPerceived(). I'm not very keen on these convenience functions - three (Set/Unset/Has) for each bit of a flag. Anyway, let's assume that an UnsetChainsPerceived() will magically appear in the near future via a PR.

@baoilleach

This comment has been minimized.

Copy link
Member Author

commented Apr 12, 2018

This is ready to go assuming you're happy with #1813.

@ghutchis

This comment has been minimized.

Copy link
Member

commented Apr 12, 2018

I'll wait for reaction from @mwojcikowski but it looks fine.

@mwojcikowski

This comment has been minimized.

Copy link
Contributor

commented Apr 13, 2018

My only suggestion would be to test for list of residue names/ids instead of just number of them. Other than that LGTM.

@baoilleach

This comment has been minimized.

Copy link
Member Author

commented Apr 13, 2018

Will do.

baoilleach added 2 commits Apr 16, 2018
Minor tweak. The function should not return false if a bond is specif…
…ied for exclusion, even if it joins atoms not in the substructure. Consider the case of excluding all non-ring bonds, and including ring atoms.
@baoilleach

This comment has been minimized.

Copy link
Member Author

commented Apr 26, 2018

Good to go?

@mwojcikowski

This comment has been minimized.

Copy link
Contributor

commented Apr 26, 2018

LGTM

@ghutchis ghutchis merged commit 725078f into openbabel:master Apr 26, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.