Modifying molecules in python? #1249

Closed
kovasap opened this Issue Jan 4, 2017 · 14 comments

Projects

None yet

2 participants

@kovasap
kovasap commented Jan 4, 2017

I'm interested in using RDKit to run reactions with general ('R') groups on both sides. For example, I would like to take the following reaction:
image
and run molecules like this one:
image
through it. In this case, I would form this molecule:
image
I have tried using the chemical reaction functionality rdkit has to do this, but have been unable to figure out how to use it in conjunction with the AdjustQueryParameters class, which allows me to set some important restrictions on how molecules are matched in the reaction process. I have a good idea of how I could implement this reaction behavior myself - I just need to find a way to more directly interact with a molecule's bonds and atoms. Is this possible/smart to do in python?

I have no experience using C++, but would be willing to try implementing my desired behavior there if I had a little guidance for how to get started.

@greglandrum greglandrum added the question label Jan 5, 2017
@greglandrum
Member

Does this not do what you want?

In [3]: rxn = AllChem.ReactionFromSmarts('[*:5]-[CH2:3]-[CH1:2](-[*:1])-[OH]>>[*:5]-[*:3]=[*:2]-[*:1]')

In [4]: ps = rxn.RunReactants((Chem.MolFromSmiles('CCCC(O)C'),))

In [5]: ps
Out[5]: ((<rdkit.Chem.rdchem.Mol at 0x10448ec38>,),)

In [6]: Chem.MolToSmiles(ps[0][0],isomericSmiles=True)
Out[6]: 'CC=CCC'

Note that you probably don't need the dummies (=R groups) in this case.

@kovasap
kovasap commented Jan 5, 2017

Let me look at it again... I was having trouble making that work. What do you mean by "Note that you probably don't need the dummies (=R groups) in this case."?

@kovasap
kovasap commented Jan 5, 2017

I want to ensure that my reactants match only in such a way where the dummies in the general reaction make up for their extra components - I don't just want a substructure match!

@greglandrum
Member

I think my proposed solution will get you what you are looking for as long as you include the H queries the way I did. You can always continue to use the R groups, but they just complicate things.

@kovasap
kovasap commented Jan 6, 2017

I'm still a little confused... doesn't your solution use R (*) groups? I'm not sure we are thinking of the same thing.

@greglandrum
Member

Sorry I was unclear.
The suggestion in the comment above has the R groups explicit in the reaction, as you did originally.
However, given that the queries in the reaction definition include hydrogen counts, the R groups aren't actually needed. This should work the same way:

In [7]: rxn = AllChem.ReactionFromSmarts('[CH2:3]-[CH1:2]-[OH]>>[*:3]=[*:2]')

In [9]: ps = rxn.RunReactants((Chem.MolFromSmiles('CCCC(O)C'),))

In [10]: Chem.MolToSmiles(ps[0][0],isomericSmiles=True)
Out[10]: 'CC=CCC'
@kovasap
kovasap commented Jan 9, 2017

That does work, however maybe there is more to my issue than I originally explained. The initial reaction encoding I'm working with looks like this:

[H:9][O:4][C:2]([H:6])([*:1])[C:3]([H:7])([H:8])[*:5]>>[H:8][C:3]([*:5])=[C:2]([H:6])[*:1].[H:9][O:4][H:7]

Ideally I would like to use this to run reactants so that the atom numbering scheme is preserved at least for the specified atoms. The dummy atoms (*) I would like to represent general groups. I could also remove them as you suggested - that doesn't matter much to me. Running the reaction string I specified does not return any products using the reactant you specified or my own.

@kovasap
kovasap commented Jan 9, 2017

I have tried both removing the water and the * groups. I read it in as AllChem.ReactionFromSmarts, even though I think this is smiles?

@greglandrum
Member

I think I need to see some sample code, including the inputs you are providing, in order to be able to help more here.

@kovasap
kovasap commented Jan 10, 2017
import rdkit.Chem as Chem
import rdkit.Chem.AllChem as AllChem
import rdkit.Chem.Draw as Draw
from rdkit.Chem.Draw import IPythonConsole
from IPython.display import display
print(Chem.rdBase.rdkitVersion)

def draw_rxn(rxn):
    reactant_mols = [r for r in rxn.GetReactants()]
    product_mols = [r for r in rxn.GetProducts()]
    all_mols = reactant_mols + [None] + product_mols
    return Draw.MolsToGridImage(all_mols, molsPerRow=len(all_mols))

ro1_smiles = "[H:9][O:4][C:2]([H:6])([*:1])[C:3]([H:7])([H:8])[*:5]>>[H:8][C:3]([*:5])=[C:2]([H:6])[*:1].[H:9][O:4][H:7]"
ro1_smiles_no_water = "[H:9][O:4][C:2]([H:6])[C:3]([H:7])([H:8])>>[H:8][C:3]=[C:2]([H:6])"
for s in [ro1_smiles, ro1_smiles_no_water]:
    rxn = AllChem.ReactionFromSmarts(s)
    ps = rxn.RunReactants((Chem.MolFromSmiles('CCCC(O)C'),))
    display(draw_rxn(rxn))
    print(ps)

gives:

screen shot 2017-01-10 at 11 36 11 am

@greglandrum
Member

What's happening here is that you have Hs present in the graph of the SMARTS queries that define your reactants, but no Hs in the graph of the molecule used as a reactant. RDKit molecules don't have Hs present in the graph by default.
If you really want to include Hs in your SMARTS queries (and there's normally no reason to do so), then you will need to make sure that you call Chem.AddHs() on the molecules you are using as reactants.
Alternatively (and with the same effect) you can include the Hs as part of the query atom they are attached to; this is what I did in the reaction I proposed. This will be much faster and results in simpler reaction definitions.

@kovasap
kovasap commented Jan 11, 2017

That worked! Thanks for being patient with me.

Last issue: Is there a way to carry through any atom mapping numbers found in the reaction string to the resulting product molecules? Ideally, I'd like to generate a new complete reaction smiles string with atom mappings from the initial (more abstract) query string I am using to build the reaction string, using the reactant(s) I ran to fill in the missing parts.

@greglandrum
Member

Glad that helped.
And, no, there's currently no way to carry the mappings through. This is a reasonably simple change to make on the C++ side, but it's not there yet.

@kovasap
kovasap commented Jan 11, 2017

Ok good to know. I think i may just try moving the atom numbers over via some substructure matching. Thanks for your help!

@kovasap kovasap closed this Jan 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment