#### Organic molecule with SMILES input, CSEARCH performs conformational sampling with RDKit, QPREP creates Gaussian input files

###### Step 1: CSEARCH conformational sampling (creates SDF files)

In [1]:
import os, glob
from pathlib import Path
from aqme.csearch import csearch
from aqme.qprep import qprep

# set working directory, SDF creation folder, name and SMILES string
name = 'quinine'
smi = 'COC1=CC2=C(C=CN=C2C=C1)[C@H]([C@@H]3C[C@@H]4CCN3C[C@@H]4C=C)O'
w_dir_main = Path(os.getcwd())
sdf_path = w_dir_main.joinpath(name)

# run CSEARCH conformational sampling, specifying:
# 1) PATH to create the new SDF files (destination=sdf_path)
# 2) RDKit sampling (program='rdkit')
# 3) SMILES string (smi=smi)
# 4) Name for the output SDF files (name=name)
csearch(destination=sdf_path,program='rdkit',smi=smi,name=name)

[14:39:14] Enabling RDKit 2019.09.3 jupyter extensions


True
AQME v 1.4.0 2022/12/07 14:39:14 
Citation: AQME v 1.4.0, Alegre-Requena, J. V.; Sowndarya, S.; Perez-Soto, R.; Alturaifi, T. M.; Paton, R. S., 2022. https://github.com/jvalegre/aqme



Starting CSEARCH with 1 job(s) (SDF, XYZ, CSV, etc. files might contain multiple jobs/structures inside)



   ----- quinine -----


o  Applying filters to initial conformers


Time CSEARCH: 28.12 seconds




<aqme.csearch.base.csearch at 0x7f2de1ef9310>

###### Step 2: Writing Gaussian input files with the SDF obtained from CSEARCH

In [2]:
# set SDF filenames and directory where the new com files will be created
com_path = sdf_path.joinpath(f'com_files')
sdf_rdkit_files = glob.glob(f'{sdf_path}/*.sdf')

# run QPREP input files generator, with:
# 1) PATH to create the new SDF files (destination=com_path)
# 2) Files to convert (files=sdf_rdkit_files)
# 3) QM program for the input (program='gaussian')
# 4) Keyword line for the Gaussian inputs (qm_input='wb97xd/6-31+G* opt freq')
# 5) Memory to use in the calculations (mem='24GB')
# 6) Processors to use in the calcs (nprocs=8)
qprep(destination=com_path,files=sdf_rdkit_files,program='gaussian',
        qm_input='wb97xd/6-31+G* opt freq',mem='24GB',nprocs=8)
 

True
AQME v 1.4.0 2022/12/07 14:39:14 
Citation: AQME v 1.4.0, Alegre-Requena, J. V.; Sowndarya, S.; Perez-Soto, R.; Alturaifi, T. M.; Paton, R. S., 2022. https://github.com/jvalegre/aqme


o  quinine_rdkit successfully processed at /home/svss/Project-DBcg-Molecules/aqme/Example_workflows/CSEARCH_CMIN_conformer_generation/quinine/com_files


Time QPREP: 0.03 seconds




<aqme.qprep.qprep at 0x7f2b98fecc90>

###### Bonus 1: using ORCA instead of Gaussian in QPREP

In [3]:
# Only need to change the qm_input and program options.
# Multiple lines are allowed. For example, this is the input file of a TS calculation:
ORCA_input = 'BP86 def2-SVP def2/J\n'
ORCA_input += '%geom\n'
ORCA_input += 'Calc_Hess true\n'
ORCA_input += 'Recalc_Hess 5\n'
ORCA_input += 'end'

qprep(destination=com_path,files=sdf_rdkit_files,program='orca',
        qm_input=ORCA_input,mem='4GB',nprocs=8)

True
AQME v 1.4.0 2022/12/07 14:39:14 
Citation: AQME v 1.4.0, Alegre-Requena, J. V.; Sowndarya, S.; Perez-Soto, R.; Alturaifi, T. M.; Paton, R. S., 2022. https://github.com/jvalegre/aqme


o  quinine_rdkit successfully processed at /home/svss/Project-DBcg-Molecules/aqme/Example_workflows/CSEARCH_CMIN_conformer_generation/quinine/com_files


Time QPREP: 0.04 seconds




<aqme.qprep.qprep at 0x7f2dbd566350>

###### Bonus 2: using a CSV with many SMILES

In [4]:
import os, glob
from pathlib import Path
from aqme.csearch import csearch
from aqme.qprep import qprep

# Ideal for ML or big data projects, only need to replace smi and name with CSV input
csv_input = 'ML_test.csv'
sdf_folder = 'ML_test'
w_dir_main = Path(os.getcwd())
sdf_path = w_dir_main.joinpath(sdf_folder)

# create conformers for all the entries in the CSV
csearch(destination=sdf_path,program='rdkit',input=csv_input)

# set SDF filenames and directory where the new com files will be created
com_path = sdf_path.joinpath(f'com_files')
sdf_rdkit_files = glob.glob(f'{sdf_path}/*.sdf')

# create COM files
qprep(destination=com_path,files=sdf_rdkit_files,program='gaussian',
        qm_input='wb97xd/6-31+G* opt freq',mem='24GB',nprocs=8)

True
AQME v 1.4.0 2022/12/07 14:39:14 
Citation: AQME v 1.4.0, Alegre-Requena, J. V.; Sowndarya, S.; Perez-Soto, R.; Alturaifi, T. M.; Paton, R. S., 2022. https://github.com/jvalegre/aqme



Starting CSEARCH with 5 job(s) (SDF, XYZ, CSV, etc. files might contain multiple jobs/structures inside)



   ----- Me -----


o  Applying filters to initial conformers


   ----- Et -----


o  Applying filters to initial conformers


   ----- Prop -----


o  Applying filters to initial conformers


   ----- Bu -----


o  Applying filters to initial conformers


   ----- Pent -----


o  Applying filters to initial conformers


Time CSEARCH: 1.44 seconds


True
AQME v 1.4.0 2022/12/07 14:39:14 
Citation: AQME v 1.4.0, Alegre-Requena, J. V.; Sowndarya, S.; Perez-Soto, R.; Alturaifi, T. M.; Paton, R. S., 2022. https://github.com/jvalegre/aqme


o  Me_rdkit successfully processed at /home/svss/Project-DBcg-Molecules/aqme/Example_workflows/CSEARCH_CMIN_conformer_generation/ML_test/com_files

o  Et_rdki

<aqme.qprep.qprep at 0x7f2b98fef210>

###### Bonus 3: If you want to use the same functions using a YAML file that stores all the variables

In [None]:
# to load the variables from a YAML file, use the varfile option
csearch(varfile='FILENAME.yaml')

# for each option, specify it in the YAML file as follows:
# program='rdkit' --> program: 'rdkit'
# name='quinine' --> name: 'quinine'
# etc

###### Bonus 4: If you want to use the same functions through command lines

In [None]:
csearch(destination=sdf_path,smi=smi,name='quinine',program='rdkit')

# for each option, specify it in the command line as follows:
# program='rdkit' --> --program 'rdkit'
# name='quinine' --> --name quinine
# etc
# for example: python -m aqme --program rdkit --smi COC1=CC2=C(C=CN=C2C=C1)[C@H]([C@@H]3C[C@@H]4CCN3C[C@@H]4C=C)O --name quinine