Skip to content

Commit

Permalink
Save model predictions for a variety of scenarios for later analysis. (
Browse files Browse the repository at this point in the history
…#1772)

* Save model predictions for a variety of scenarios for later analysis.

* use Jupyter to compare data dumped into computed_scenarios.json

---------

Co-authored-by: Zachary McCord <zjmccord@gmail.com>
  • Loading branch information
zack112358 and zackmccord committed Jan 31, 2023
1 parent 6eb8941 commit 3a2f4b0
Show file tree
Hide file tree
Showing 6 changed files with 627 additions and 0 deletions.
10 changes: 10 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -48,3 +48,13 @@ __pycache__/

# emacs
\#*

# vim
**.swp
**.swo

# test output
computed_scenarios*.json

# jupyter
.ipynb_checkpoints
32 changes: 32 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.3.0
hooks:
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: https://github.com/psf/black
rev: 22.3.0
hooks:
- id: black
- repo: https://github.com/pre-commit/mirrors-mypy
rev: 'v0.991'
hooks:
- id: mypy
additional_dependencies:
- types-python-dateutil
- types-requests
- repo: local
hooks:
- id: jupyter-nb-clear-output
name: jupyter-nb-clear-output
files: \.ipynb$
stages: [commit]
language: system
entry: python -m nbconvert --ClearOutputPreprocessor.enabled=True --inplace
- repo: https://github.com/nbQA-dev/nbQA
rev: 1.6.1
hooks:
- id: nbqa-black
additional_dependencies: [black==22.3.0]
- id: nbqa-isort
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,3 +77,9 @@ $ yarn fix
```

If there are errors it can't fix, please fix them manually before committing.

## Validating model changes
One of the tests generates `computed_scenarios.json`, a json of inputs and
outputs for various situations. Use
[`model_comparison.ipynb`](model_comparison.ipynb) to compare the model behavior
before and after model changes.
197 changes: 197 additions & 0 deletions model_comparison.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "03835d1a",
"metadata": {},
"source": [
"# To compare model outputs\n",
"This notebook is for comparing the computed_scenarios.json outputs from different model versions of microCovid. For example, you might want to compare the model output before and after a pull request is merged. To do that, run the tests before and after the change, save both versions of computed_scenarios.json under different names, and tell us what they are here (\"reference_file\" for before the change and \"test_file\" for after, is the intention).\n",
"\n",
"If you want to see or change the \"test\" that generates the computed_scenarios.json file, look at `src/data/__tests__/scenarios.test.ts`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d2ec30ab",
"metadata": {},
"outputs": [],
"source": [
"reference_file = \"computed_scenarios_recent.json\"\n",
"test_file = \"computed_scenarios_1421.json\""
]
},
{
"cell_type": "markdown",
"id": "0a4cb0eb",
"metadata": {},
"source": [
"and then run the rest of this notebook to play with the data changes."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bbb04652",
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"import re\n",
"\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"from IPython.core.display import HTML\n",
"from pandas import NA"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "00d271a1",
"metadata": {},
"outputs": [],
"source": [
"def load(filename):\n",
" with open(filename) as file:\n",
" blob = json.load(file)\n",
" table = pd.json_normalize(blob)\n",
" return table.convert_dtypes()[sorted(table.columns)]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "09cb12bf",
"metadata": {},
"outputs": [],
"source": [
"reference = load(reference_file)\n",
"test = load(test_file)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3b0bc1c6",
"metadata": {},
"outputs": [],
"source": [
"JOIN_KEYS = [\"scenario\", \"loc\", \"vaccination\"]\n",
"combined = test.join(\n",
" reference.set_index(JOIN_KEYS), lsuffix=\".test\", rsuffix=\".reference\", on=JOIN_KEYS, how=\"outer\"\n",
")\n",
"combined = combined[sorted(combined.columns)]\n",
"combined[\"ratio\"] = combined[\"result.expectedValue.test\"] / combined[\"result.expectedValue.reference\"]\n",
"combined.rename(columns=lambda s: re.sub(r\"([a-z])([.A-Z])\", r\"\\1​\\2\", s), inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b5a18f33",
"metadata": {},
"outputs": [],
"source": [
"def group_by_ratio(df, only_first):\n",
" by_ratio = df.sort_values(\"ratio\", kind=\"stable\")\n",
" if only_first:\n",
" by_ratio = by_ratio.groupby(\"ratio\").first()\n",
" by_ratio[\"ratio\"] = by_ratio.index\n",
" by_ratio = by_ratio[[\"ratio\", *(col for col in by_ratio.columns if col != \"ratio\")]]\n",
" return (\n",
" by_ratio.set_index([\"loc\", \"scenario\", \"vaccination\"])\n",
" .sort_index(kind=\"stable\")\n",
" .sort_values(\"ratio\", kind=\"stable\")\n",
" )\n",
"\n",
"\n",
"def show_only_changes(df):\n",
" df = df.copy()\n",
" for col in df.columns:\n",
" if col.endswith(\".test\"):\n",
" test_col = col\n",
" ref_col = col.replace(\".test\", \".reference\")\n",
" else:\n",
" continue\n",
" for row in df.index:\n",
" test_val = df[test_col][row]\n",
" ref_val = df[ref_col][row]\n",
" if test_val is not NA and ref_val is not NA and test_val == ref_val:\n",
" df[test_col][row] = NA\n",
" df[ref_col][row] = NA\n",
" if all(value is pd.NA for col in (test_col, ref_col) for value in df[col]):\n",
" df.drop(test_col, axis=1, inplace=True)\n",
" df.drop(ref_col, axis=1, inplace=True)\n",
" return df\n",
"\n",
"\n",
"def blank_na(df):\n",
" return HTML(df.style.to_html().replace(\"<NA>\", \"\"))"
]
},
{
"cell_type": "markdown",
"id": "bd443b48",
"metadata": {},
"source": [
"# The next table will show highlights of what's changed.\n",
"The ratio between the test and reference \"result.expectedValue\" outputs is in the \"ratio\" column. This table shows\n",
"select rows sorted by that ratio, giving an overview that suggests which kinds of scenarios changed by how much."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bef69c42",
"metadata": {},
"outputs": [],
"source": [
"pd.set_option(\"display.max_colwidth\", 40)\n",
"pd.set_option(\"display.max_columns\", 100)\n",
"blank_na(show_only_changes(group_by_ratio(combined, only_first=True)))"
]
},
{
"cell_type": "markdown",
"id": "6c8f8879",
"metadata": {},
"source": [
"# The next table will show all changes.\n",
"As above, but all rows are included."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "03449e37",
"metadata": {},
"outputs": [],
"source": [
"blank_na(show_only_changes(group_by_ratio(combined, only_first=False)))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.0"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
13 changes: 13 additions & 0 deletions requirements-manual.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,16 @@ pytest==7.1.2
coverage==5.5
# lxml used by mypy for coverage reporting
lxml==4.9.2
# Tools for Jupyter visualization
jupyter==1.0.0
jupyter-console==6.4.4
jupyter-events==0.6.3
jupyter_client==7.4.9
jupyter_core==5.1.3
jupyter_server==2.1.0
jupyter_server_terminals==0.4.4
jupyterlab-pygments==0.2.2
jupyterlab-widgets==3.0.5
matplotlib==3.6.3
matplotlib-inline==0.1.6
pandas==1.5.3
Loading

0 comments on commit 3a2f4b0

Please sign in to comment.