this repository is for benchmarking the force fields generated by my valence-fitting repo with Matt’s ibstore package
Initialize the conda environment with
mamba env create -f env.yaml
Then cd
to wherever you cloned ibstore
and run
pip install -e .
to add the ibstore
package to your environment.
The central functionality can be accessed by running main.py
directly:
python main.py
This is the same as passing the following values for each of the flags:
python main.py \
--forcefield force-field.offxml \
--dataset datasets/industry.json \
--sqlite-file tmp.sqlite \
--out-dir . \
--procs 16
In both cases, the forcefield to benchmark is taken from force-field.offxml
in the current directory, the dataset is taken from the charge-filtered
version of Lily’s version of the OpenFF Industry Benchmark Season 1 v1.0
in
the datasets directory, the molecule database is stored in a file named
tmp.sqlite
, and the output CSV and PNG files are written to the current
directory.
The Makefile can automate this process, as well as sticking the resulting images together with ImageMagick using something like:
make output/industry/out.png
More creatively, you can run the industry benchmarks on a custom forcefield with something like:
make output/industry/sage/out.png TARGET=sage
This looks for a forcefield named sage.offxml
in the root directory and runs
main.py
and the ImageMagick commands to generate the final output. It looks a
bit repetitive, but, for now at least, the output/industry/*
directory and
the TARGET
variable must be the same. This also works for any other dataset
in the datasets
directory, for example:
make output/full-opt/sage/out.png TARGET=sage
Similarly, scripts/industry.sh
simply calls the make
command above, after
activating the conda environment from env.yaml
. So if everything is set up,
you should be able to run
sbatch scripts/industry.sh
and come back around 24 hours later to a summary image like the one shown in the Results section below.
Dir | File | Purpose |
---|---|---|
. | main.py | Benchmarking script using ibstore |
refilter.py | script to refilter the industry dataset for charge issues | |
env.yaml | conda environment to run the script | |
forcefields | tm.offxml | FB-optimized, really-filtered torsion-multiplicity FF |
sage-tm.offxml | FB-optimized sage 2.1.0 with torsion-multiplicity data | |
sage-2.1.0.offxml | Sage 2.1.0 dumped from the toolkit | |
eps-tors-10.offxml | Espaloma torsion values with Δ > 10.0 kcal/mol | |
sage-sage.offxml | FB-optimized sage 2.1.0 with “original” Sage data | |
sage | env.yaml | conda environment from sage 2.1.0 |
01-setup.py | Setup script from openff-sage | |
02-b-minimize.py | Minimize all the structures, also from openff-sage | |
scripts | fetch_industry.sh | try to download the industry dataset - not working |
industry.sh | run the benchmarks on the industry dataset | |
refilter.sh | refilter the industry dataset | |
submit.sh | run the benchmarks on the full-opt dataset | |
full-opt-output | * | Benchmark output on full-opt dataset |
- 2024-05-01 cp /pub/amcisaac/sage-2.2.0/05_benchmark_forcefield/datasets/OpenFF-Industry-Benchmark-Season-1-v1.1-filtered-charge-coverage-cache.json datasets/cache/industry.json
- copied Lexie’s re-filtered, cached dataset over my previous cache