This release contains the minimal code needed to run the paper's three benchmark families with our method:
NewtonBenchActiveSciBench-ChemActiveSciBench-GRN
The packaged tree includes:
autoscilab/: method, loops, acquisition, and oracle implementationsconfigs/: fixed manifests used by the benchmark runnersnewtonbench_vendor/: minimal vendored Newton benchmark modules required by the Newton oraclescripts/: benchmark entry points plus a convenience launcher
Use Python 3.11 or newer.
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtRequired environment variables:
OPENAI_API_KEY: required for the defaultgpt-4o-minirunsTOGETHER_API_KEY: only required if you use Together-hosted non-OpenAI models
Optional overrides:
--main-url: point the main model at an OpenAI-compatible local or remote endpoint--ensemble-url: ChemBench ensemble endpoint, defaulthttp://localhost:8001/v1
The simplest interface is the wrapper:
python scripts/run_all_benchmarks.py --benchmark all --workers 1 --limit 1That runs a small manifest slice from all three benchmark families and writes results under:
results/paper_release_runs/
Run a single family:
python scripts/run_all_benchmarks.py --benchmark newton --workers 1
python scripts/run_all_benchmarks.py --benchmark chem --workers 1
python scripts/run_all_benchmarks.py --benchmark grn --workers 1Newton:
python scripts/run_newton_llm_autoscilab_budget.py \
--model gpt-4o-mini \
--budgets 10 20 50 \
--workers 4Chem:
python scripts/run_chembench_llm_autoscilab_budget.py \
--main-model gpt-4o-mini \
--budgets 40 60 80 \
--workers 4GRN:
python scripts/run_grn_prompt_budget.py \
--main-model gpt-4o-mini \
--budgets 10 20 50 \
--workers 4- The Newton oracle expects the bundled
newtonbench_vendor/directory to remain adjacent toautoscilab/. - The packaged ChemBench default path runs without requiring a local ensemble server. The
--ensemble-urlflag only matters if you explicitly enable or adapt an ensemble-backed configuration. - Results are written as JSON summaries in benchmark-specific subdirectories under
results/.
The packaged runners were smoke-tested from this release tree by checking the CLI entry points and the wrapper launcher:
python scripts/run_newton_llm_autoscilab_budget.py --help
python scripts/run_chembench_llm_autoscilab_budget.py --help
python scripts/run_grn_prompt_budget.py --help
python scripts/run_all_benchmarks.py --helpFor a lightweight live run, use:
python scripts/run_all_benchmarks.py --benchmark all --workers 1 --limit 1