Skip to content

ntBre/benchmarking

Repository files navigation

benchmarking

this repository is for benchmarking the force fields generated by my valence-fitting repo with Matt’s ibstore package

Usage

Environment

Initialize the conda environment with

mamba env create -f env.yaml

Then cd to wherever you cloned ibstore and run

pip install -e .

to add the ibstore package to your environment.

main.py

The central functionality can be accessed by running main.py directly:

python main.py

This is the same as passing the following values for each of the flags:

python main.py \
	    --forcefield force-field.offxml \
	    --dataset datasets/industry.json \
	    --sqlite-file tmp.sqlite \
	    --out-dir . \
	    --procs 16

In both cases, the forcefield to benchmark is taken from force-field.offxml in the current directory, the dataset is taken from the charge-filtered version of Lily’s version of the OpenFF Industry Benchmark Season 1 v1.0 in the datasets directory, the molecule database is stored in a file named tmp.sqlite, and the output CSV and PNG files are written to the current directory.

Makefile

The Makefile can automate this process, as well as sticking the resulting images together with ImageMagick using something like:

make output/industry/out.png

More creatively, you can run the industry benchmarks on a custom forcefield with something like:

make output/industry/sage/out.png TARGET=sage

This looks for a forcefield named sage.offxml in the root directory and runs main.py and the ImageMagick commands to generate the final output. It looks a bit repetitive, but, for now at least, the output/industry/* directory and the TARGET variable must be the same. This also works for any other dataset in the datasets directory, for example:

make output/full-opt/sage/out.png TARGET=sage

Slurm

Similarly, scripts/industry.sh simply calls the make command above, after activating the conda environment from env.yaml. So if everything is set up, you should be able to run

sbatch scripts/industry.sh

and come back around 24 hours later to a summary image like the one shown in the Results section below.

Results

OpenFF Full Optimization Benchmark 1

output/full-opt/out.png

OpenFF Industry Benchmark Season 1 v1.0

output/industry/out.png

Files

DirFilePurpose
.main.pyBenchmarking script using ibstore
refilter.pyscript to refilter the industry dataset for charge issues
env.yamlconda environment to run the script
forcefieldstm.offxmlFB-optimized, really-filtered torsion-multiplicity FF
sage-tm.offxmlFB-optimized sage 2.1.0 with torsion-multiplicity data
sage-2.1.0.offxmlSage 2.1.0 dumped from the toolkit
eps-tors-10.offxmlEspaloma torsion values with Δ > 10.0 kcal/mol
sage-sage.offxmlFB-optimized sage 2.1.0 with “original” Sage data
sageenv.yamlconda environment from sage 2.1.0
01-setup.pySetup script from openff-sage
02-b-minimize.pyMinimize all the structures, also from openff-sage
scriptsfetch_industry.shtry to download the industry dataset - not working
industry.shrun the benchmarks on the industry dataset
refilter.shrefilter the industry dataset
submit.shrun the benchmarks on the full-opt dataset
full-opt-output*Benchmark output on full-opt dataset

Changelog

  • 2024-05-01 cp /pub/amcisaac/sage-2.2.0/05_benchmark_forcefield/datasets/OpenFF-Industry-Benchmark-Season-1-v1.1-filtered-charge-coverage-cache.json datasets/cache/industry.json
    • copied Lexie’s re-filtered, cached dataset over my previous cache