TautomerCanonicalizer gives unexpected/forbidden form of phosphoric acid #20

benbowen · 2018-02-08T19:54:17Z

I'm converting all the molecules in my database to canonical-tautomers and noticed that things like NADH looked weird. You can see it most plainly for phosphoric acid. I didn't expect the Hydrogen on the phosphorous. Is this the correct/expected behavior?

from rdkit import Chem
from rdkit.Chem import Draw
from molvs.tautomer import TautomerCanonicalizer

original_smiles = 'OP(=O)(O)O'

original_mol = Chem.MolFromSmiles(original_smiles)
tautomerized_mol = TautomerCanonicalizer().canonicalize(original_mol)

Draw.MolsToGridImage([original_mol,tautomerized_mol],
                     molsPerRow=3,subImgSize=(200,200),
                     legends=['original','tautomer'])

benbowen · 2018-02-08T19:56:54Z

NADH looks like this

original_smiles = 'NC(=O)C1=CN([C@@H]2O[C@H](COP(=O)(O)OP(=O)(O)OC[C@H]3O[C@@H](N4C=NC5=C4N=CN=C5N)[C@H](O)[C@@H]3O)[C@@H](O)[C@H]2O)C=CC1'

original_mol = Chem.MolFromSmiles(original_smiles)
tautomerized_mol = TautomerCanonicalizer().canonicalize(original_mol)

Draw.MolsToGridImage([original_mol,tautomerized_mol],
                     molsPerRow=1,subImgSize=(600,300),
                     legends=['original','tautomer'])

mcs07 · 2018-02-09T14:04:58Z

I think this is caused by the phosphonic acid rules: https://github.com/mcs07/MolVS/blob/master/molvs/tautomer.py#L130

It can probably be fixed by making the SMARTS pattern more strict to match only the intended target:
https://en.wikipedia.org/wiki/Phosphorous_acid

benbowen · 2018-02-09T17:57:56Z

You are correct, removing that rule stops that moiety from being modified. When you say, "more strict", you think specify an explicit number of bonds on the Phosphorous in the SMARTS pattern?

Why does rdkit allow 7 bonds on the phosphorous? Rdkit is a vast package, but looking at the definition of Phosphorous, it has max bonds of 5.

If I do SantizeMol, the hydrogen stays put. When I paste the structure into ChemDraw, its not valid.

@h

Updates SMARTS definitions for phosphinic acids. Requires 3 explicit (X3) and 3 total (D3) connections for tautomerizing phosphinic acids. New behavior properly handles compounds with 4 connections (e.g., phosphates, phosphonic acids). ```python from rdkit import Chem from molvs.tautomer import TautomerCanonicalizer import pandas as pd my_transforms = ( TautomerTransform('phosphonic acid f', '[OH]-[PD3X3H0]', bonds='='), TautomerTransform('phosphonic acid r', '[PD3X3H1]=[O]', bonds='-') ) cpds = ['methylphosphinic acid','methylphosphonous acid','methylphosphonic acid','NADPH'] smiles = ['CP(=O)O','CP(O)O','CP(=O)(O)O','NC(=O)C1=CN([C@@h]2O[C@H](COP(=O)(O)OP(=O)(O)OC[C@H]3O[C@@h](N4C=NC5=C4N=CN=C5N)[C@H](O)[C@@h]3O)[C@@h](O)[C@H]2O)C=CC1'] mols = [Chem.MolFromSmiles(smi) for smi in smiles] can_taut = [TautomerCanonicalizer(transforms=my_transforms).canonicalize(mol) for mol in mols] smiles_taut = [Chem.MolToSmiles(mol) for mol in can_taut] df = pd.DataFrame({'cpd':cpds,'smi':smiles,'taut_smi':smiles_taut}) cpd smi taut_smi 0 methylphosphinic acid CP(=O)O C[PH](=O)O 1 methylphosphonous acid CP(O)O C[PH](=O)O 2 methylphosphonic acid CP(=O)(O)O CP(=O)(O)O 3 NADPH NC(=O)C1=CN([C@@h]2O[C@H](COP(=O)(O)OP(=O)(O)OC[C@H]3O[C@@h](N4C=NC5=C4N=CN=C5N)[C@H](O)[C@@h]3O)[C@@h](O)[C@H]2O)C=CC1 NC(=O)C1=CN([C@@h]2O[C@H](COP(=O)(O)OP(=O)(O)OC[C@H]3O[C@@h](n4cnc5c(N)ncnc54)[C@H](O)[C@@h]3O)[C@@h](O)[C@H]2O)C=CC1 ```

mcs07 added the bug label Feb 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TautomerCanonicalizer gives unexpected/forbidden form of phosphoric acid #20

TautomerCanonicalizer gives unexpected/forbidden form of phosphoric acid #20

benbowen commented Feb 8, 2018

benbowen commented Feb 8, 2018

mcs07 commented Feb 9, 2018

benbowen commented Feb 9, 2018 •

edited

Loading

TautomerCanonicalizer gives unexpected/forbidden form of phosphoric acid #20

TautomerCanonicalizer gives unexpected/forbidden form of phosphoric acid #20

Comments

benbowen commented Feb 8, 2018

benbowen commented Feb 8, 2018

mcs07 commented Feb 9, 2018

benbowen commented Feb 9, 2018 • edited Loading

benbowen commented Feb 9, 2018 •

edited

Loading