Skip to content

Estimating Nucleophilicity and Electrophilicity with Automated Quantum Chemistry-Based Computations of Methyl Affinities

License

Notifications You must be signed in to change notification settings

jensengroup/ESNUEL

Repository files navigation


ESNUEL (EStimating NUcleophilicity and ELectrophilicity) is a fully automated quantum chemistry (QM)-based workflow that automatically identifies nucleophilic and electrophilic sites and computes methyl cation affinities (MCAs) and methyl anion affinities (MAAs) to estimate nucleophilicity and electrophilicity, respectively.

TRY ESNUEL: https://www.esnuel.org

GitHub repository for our atom-based ML models: https://github.com/jensengroup/ESNUEL_ML

Installation

For the installation, we recommend using conda to get all the necessary dependencies:

conda env create -f environment.yml && conda activate esnuel

Then download the binaries of xtb version 6.5.1:

mkdir dep; cd dep; wget https://github.com/grimme-lab/xtb/releases/download/v6.5.1/xtb-6.5.1-linux-x86_64.tar.xz; tar -xvf ./xtb-6.5.1-linux-x86_64.tar.xz; cd ..

Furthermore, ORCA version 5.0.1 must be installed following the instructions found here: https://sites.google.com/site/orcainputlibrary/setting-up-orca

OBS!

  1. The path to ORCA must be modified in "src/esnuel/run_orca.py".
  2. The number of available CPUs and memory must be modified to match your hardware.

Usage

An example of usage via CLI command:

# Create predictions for a test molecule (OBS! Only names without "_" are allowed):
python src/esnuel/calculator.py --smiles 'Cn1c(C(C)(C)N)nc(C(=O)NCc2ccc(F)cc2)c(O)c1=O' --name 'testmol' &

The calculations are now saved in a "./calculations" folder along with a graphical output of the results (in .html format). The graphical output presents the user with the most electrophilic and nucleophilic sites within 3 kcal/mol ≈ 12.6 kJ/mol being highlighted.

An example of using ESNUEL in batch mode:

# Create predictions for a small dataset (example/testmols.csv):
python src/esnuel/calculator.py -b example/testmols.csv -n 'testmols'

The calculations are now saved in a "./calculations" folder, and a dataframe containing the results is found in "submitit_results/testmol/*_result.pkl"

The SLURM commands can be modified via the following command line arguments:

  • '--partition': The SLURM partition that you submit to, default='kemi1'.
  • '--parallel_calcs': The number of parallel molecule calculations (the total number of CPU cores requested for each SLURM job = parallel_calcs*cpus_per_calc), default=2.
  • '--cpus_per_calc': The number of cpus per molecule calculation (the total number of CPU cores requested for each SLURM job = parallel_calcs*cpus_per_calc), default=4.
  • '--mem_gb': The total memory usage in gigabytes (gb) requested for each SLURM job, default=20.
  • '--timeout_min': The total allowed duration in minutes (min) of each SLURM job, default=6000.
  • '--slurm_array_parallelism': The maximum number of parallel SLURM jobs to run simultaneously (taking one molecule at a time in batch mode), default=25.

For the QM calculations, the molecular charge is defined by the formal charge of the molecule using RDKit, and the spin is hardcoded to S=0 (multiplicity=1), as we focus on closed-shell molecules. This can be modified in the "calculateEnergy" function in "src/esnuel/calculator.py".

Citation

Our work is open access on Digital Discovery, where more information is available.

@article{ree2024esnuel,
  title = {Automated quantum chemistry for estimating nucleophilicity and electrophilicity with applications to retrosynthesis and covalent inhibitors},
  ISSN = {2635-098X},
  url = {http://dx.doi.org/10.1039/D3DD00224A},
  DOI = {10.1039/d3dd00224a},
  journal = {Digital Discovery},
  publisher = {Royal Society of Chemistry (RSC)},
  author = {Nicolai Ree and Andreas H. G\"{o}ller and Jan H. Jensen},
  year = {2024}
}

About

Estimating Nucleophilicity and Electrophilicity with Automated Quantum Chemistry-Based Computations of Methyl Affinities

Resources

License

Stars

Watchers

Forks