Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorizing GaussianInt speed up #1

Closed
ljmartin opened this issue Nov 9, 2021 · 4 comments
Closed

Vectorizing GaussianInt speed up #1

ljmartin opened this issue Nov 9, 2021 · 4 comments

Comments

@ljmartin
Copy link
Contributor

ljmartin commented Nov 9, 2021

Hi, thanks for this great package!

I wanted to speed it up for use in large-scale screening, so vectorized the gaussian integration step:

def GetIntegralsViaGaussians(prbCoor,
It gets about a 100x speedup, which is more obviously useful on larger molecules.

Here's a minimal-ish example, taken from the scripts dir:
setup:

## Setup:

from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole

from espsim import EmbedAlignConstrainedScore,ConstrainedEmbedMultipleConfs,GetEspSim, helpers

import numpy as np
from scipy.spatial.distance import cdist


## set up molecules:

refSmiles=['C1=CC=C(C=C1)C(C(=O)O)O','CCC(C(=O)O)O','OC(C(O)=O)c1ccc(Cl)cc1','C1=CC(=CC=C1C(C(=O)O)O)O','COc1ccc(cc1)C(O)C(O)=O','OC(C(O)=O)c1ccc(cc1)[N+]([O-])=O','CCCC(C(=O)O)O','CCC(C)C(C(=O)O)O','CC(C(=O)O)O']
prbSmile='C(C(C(=O)O)O)O'
refMols=[Chem.AddHs(Chem.MolFromSmiles(x)) for x in refSmiles]
prbMol=Chem.AddHs(Chem.MolFromSmiles(prbSmile))

patt=Chem.MolFromSmiles("[H]OC([H])(C)C(=O)O[H]",sanitize=False)
helper=Chem.AddHs(Chem.MolFromSmiles("[H]OC([H])(C)C(=O)O[H]"))

AllChem.EmbedMolecule(helper,AllChem.ETKDG()) #Embed first reference molecule, create one conformer
AllChem.UFFOptimizeMolecule(helper) #Optimize the coordinates of the conformer
core = AllChem.DeleteSubstructs(AllChem.ReplaceSidechains(helper,patt),Chem.MolFromSmiles('*')) #Create core molecule with 3D coordinates
core.UpdatePropertyCache()

# align the molecules:
simShape,simEsp=EmbedAlignConstrainedScore(prbMol,refMols,core)

vectorized integration function (should just be a drop-in for GaussInt):

## Define vectorized gaussian integration functions:
def VecGI(dist, charge1,charge2,):
    
    #These are precomputed coefficients:
    a=np.array([[ 15.90600036,   3.9534831 ,  17.61453176],
                [  3.9534831 ,   5.21580206,   1.91045387],
                [ 17.61453176,   1.91045387, 238.75820253]])
    b=np.array([[-0.02495   , -0.04539319, -0.00247124],
                [-0.04539319, -0.2513    , -0.00258662],
                [-0.00247124, -0.00258662, -0.0013    ]])
    
    a_flat = a.flatten()
    b_flat = b.flatten()
    dist = (dist**2).flatten()
    charges = (charge1[:,None]*charge2).flatten()
    
    return ((a_flat[:,None] * np.exp(dist * b_flat[:,None])).sum(0) * charges).sum()


def vecSim(refCoor, prbCoor, refCharge, prbCharge, metric):
    distPrbPrb = cdist(prbCoor,prbCoor)
    distPrbRef = cdist(prbCoor,refCoor)
    distRefRef = cdist(refCoor,refCoor)
    
    intPrbPrb= VecGI(distPrbPrb,prbCharge,prbCharge)
    intPrbRef= VecGI(distPrbRef,prbCharge,refCharge)
    intRefRef= VecGI(distRefRef,refCharge,refCharge)
    return SimilarityMetric(intPrbPrb,intRefRef,intPrbRef,metric)

test equivalence:

prbCoor = prbMol.GetConformer(0).GetPositions()
prbCharge = np.array([a.GetDoubleProp('_GasteigerCharge') for a in prbMol.GetAtoms()])

simEsp_vectorized = []

for refMol in refMols:
    refCoor = refMol.GetConformer(0).GetPositions()
    refCharge = np.array([a.GetDoubleProp('_GasteigerCharge') for a in refMol.GetAtoms()])
    
    metric = 'tanimoto'
    tanimoto_similarity = GetEspSim(prbMol, refMol, metric=metric, partialCharges='gasteiger')
    vectorized_tanimoto_similarity = vecSim(refCoor, prbCoor, refCharge, prbCharge,metric)

    print(np.isclose(tanimoto_similarity, vectorized_tanimoto_similarity), tanimoto_similarity, vectorized_tanimoto_similarity)
    

output:

True 0.6460256327061207 0.6460256327061223
True 0.6882976768462027 0.6882976768461678
True 0.6464181000546018 0.6464181000546553
True 0.4060487492735462 0.4060487492735308
True 0.30699421499052204 0.3069942149905257
True 0.23382614291885506 0.23382614291884982
True 0.6689556377160998 0.668955637716057
True 0.7182699415095307 0.718269941509474
True 0.7087896849268721 0.7087896849268774

These take ~6ms vs 100µs. The VecGI should just be a drop in replacement for GaussInt. Seems to work for Carbo and Tanimoto, so if it's of interest Im happy to submit a PR.
cheers
Lewis

@hesther
Copy link
Owner

hesther commented Nov 9, 2021

Hi @ljmartin , this is a great suggestion (and vectorizing is always a good option to speed up code). Feel free to submit a PR anytime! Thanks for contributing to this project!

@hesther
Copy link
Owner

hesther commented Nov 30, 2021

Hi @ljmartin, what is the tentative time-scale for your PR? I am planning on some updates that might affect that part of the code, and wanted to wait for the PR first. So if you are currently unable to submit it, I might just go ahead and introduce the changes you supposed. Upside: You don't need to worry about it. Downside: GitHub will not give you credit - so I just wanted to check in with you and ask what you think about it.

@ljmartin
Copy link
Contributor Author

ljmartin commented Dec 1, 2021

Hi @hesther, thanks for the ping - yes Ill submit a PR tomorrow! I was thinking of adding a test first to make sure it does what I think it does, but Ill just test it manually before submitting.
Cheers

@hesther
Copy link
Owner

hesther commented Dec 1, 2021

Thanks! Yeah, I can add a test later- actually there aren't any tests yet (the current dummy test only loads the package) for any code yet, this is also on my todo list.

@hesther hesther closed this as completed Dec 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants