How to test HELM locally

Install the NeurIPS client

Install HELM: pip install git+https://github.com/stanford-crfm/helm.git

Setup an HTTP server

Follow instructions in toy-submission to setup a simple HTTP client that can use to local tests

Configure HELM

You can configure which datasets to run HELM on by editing a run_specs.conf, to run your model on a large set of datasets, take a look at https://github.com/stanford-crfm/helm/blob/main/src/helm/benchmark/presentation/run_specs_lite.conf for some inspiration

Here's how you can create a simple one for the purposes of making sure that your Dockerfile works

echo 'entries: [{description: "mmlu:model=neurips/local,subject=college_computer_science", priority: 4}]' > run_specs.conf
helm-run --conf-paths run_specs.conf --suite v1 --max-eval-instances 1000
helm-summarize --suite v1

Analyze your results

You can launch a web server to visually inspect the results of your run, helm-summarize can also print the results textually for you in your terminal but we've found the web server to be useful.

helm-server

This will launch a server on your local host, if you're working on a remote machine you might need to setup port forwarding. If everything worked correctly you should see a page that looks like this

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

helm.md

helm.md

How to test HELM locally

Install the NeurIPS client

Setup an HTTP server

Configure HELM

Analyze your results

Files

helm.md

Latest commit

History

helm.md

File metadata and controls

How to test HELM locally

Install the NeurIPS client

Setup an HTTP server

Configure HELM

Analyze your results