Merge pull request #78 from rmnldwg/release-1.0.0.rc2

Release 1.0.0.rc2
rmnldwg · Mar 6, 2024 · ecdfc54 · ecdfc54
2 parents 645e67b + c25c586
commit ecdfc54
Show file tree

Hide file tree

Showing 28 changed files with 753 additions and 462 deletions.
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -35,7 +35,7 @@ repos:
     - build       # changes of the build system or dependencies
     - change      # commit alters the implementation of an existing feature
     - chore       # technical or maintenance task not related to feature or user story
-    - ci          # edits to the cintinuous integration scripts/configuration
+    - ci          # edits to the continuous integration scripts/configuration
     - deprecate   # a feature or functionality will be deprecated
     - docs        # add, update of revise the documentation
     - feat        # a new feature was implemented (bump MINOR version)

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,83 @@
 
 All notable changes to this project will be documented in this file.
 
+<a name="1.0.0.rc2"></a>
+## [1.0.0.rc2] - 2024-03-06
+
+Implementing the [lymixture] brought to light a shortcoming in the way the data and diagnose matrices are computed and stored. As mentioned in issue [#77], their rows are now aligned with the patient data, which may have some advantages for different use cases.
+
+Also, since this is probably the last pre-release, I took the liberty to go over some method names once more and make them clearer.
+
+
+### Bug Fixes
+
+- Don't use fake T-stage for BN model. Related [#77].\
+  Since we now have access to the full diagnose matrix by default, there
+  is no need for the Bayesian network T-stage fix anymore.
+- (**uni**) Reload data when modalities change.\
+  Because we only store those diagnoses that are relevant to the model
+  under the "_model" header in the `patient_data` table, we need to reload
+  the patient data whenever we modify the modalities.
+
+
+### Documentation
+
+- Update to slightly changed API.
+- (**bi**) Add bilateral quickstart to docs.
+
+### Features
+
+- (**mod**) Add utils to check for modality changes.
+
+### Performance
+
+- (**uni**) Make data & diagnose matrices faster. Related [#77].\
+  The last change caused a dramatic slowdown (factor 500) of the data and
+  diagnose matrix access, because it needed to index them from a
+  `DataFrame`. Now, I implemented a basic caching scheme with a patient
+  data cache version that brought back the original speed.
+  Also, apparently `del dataframe[column]` is much slower than
+  `dataframe.drop(columns)`. I replaced the former with the latter and now
+  the tests are fast again.
+
+
+### Refactor
+
+- ⚠ **BREAKING** Rename methods for brevity & clarity.\
+  Method names have been changed, e.g `comp_dist_evolution()` has been
+  renamed to `state_dist_evo()` which is both shorter and (imho) clearer.
+- (**uni**) Move data/diag matrix generation.
+
+### Testing
+
+- Update to slightly changed API.
+- (**uni**) Check reset of data on modality change.\
+  Added a test to make sure the patient data gets reloaded when the
+  modalities change. This test is still failing.
+- Finally suppress all `PerformanceWarnings`.
+
+### Change
+
+- ⚠ **BREAKING** Store data & diagnose matrices in data. Fixes [#77].\
+  Instead of weird, dedicated `UserDict`s, I simply use the patient data
+  to store the data encoding and diagnose probabilities for each patient.
+  This has the advantage that the entire matrix (irrespective of T-stage)
+  is aligned with the patients.
+- ⚠ **BREAKING** (**bi**) Shorten kwargs.\
+  The `(uni|ipsi|contra)lateral_kwargs` in the `Bilateral` constructor
+  were shortened by removing the "lateral".
+
+
+### Merge
+
+- Branch 'main' into 'dev'.
+- Branch '77-diagnose-matrices-not-aligned-with-data' into 'dev'.
+
+### Remove
+
+- Unused helpers.
+
+
 <a name="1.0.0.rc1"></a>
 ## [1.0.0.rc1] - 2024-03-04
 
@@ -490,7 +567,8 @@ Almost the entire API has changed. I'd therefore recommend to have a look at the
 - add pre-commit hook to check commit msg
 
 
-[Unreleased]: https://github.com/rmnldwg/lymph/compare/1.0.0.rc1...HEAD
+[Unreleased]: https://github.com/rmnldwg/lymph/compare/1.0.0.rc2...HEAD
+[1.0.0.rc2]: https://github.com/rmnldwg/lymph/compare/1.0.0.rc1...1.0.0.rc2
 [1.0.0.rc1]: https://github.com/rmnldwg/lymph/compare/1.0.0.a6...1.0.0.rc1
 [1.0.0.a6]: https://github.com/rmnldwg/lymph/compare/1.0.0.a5...1.0.0.a6
 [1.0.0.a5]: https://github.com/rmnldwg/lymph/compare/1.0.0.a4...1.0.0.a5
@@ -504,6 +582,7 @@ Almost the entire API has changed. I'd therefore recommend to have a look at the
 [0.4.1]: https://github.com/rmnldwg/lymph/compare/0.4.0...0.4.1
 [0.4.0]: https://github.com/rmnldwg/lymph/compare/0.3.10...0.4.0
 
+[#77]: https://github.com/rmnldwg/lymph/issues/77
 [#74]: https://github.com/rmnldwg/lymph/issues/74
 [#72]: https://github.com/rmnldwg/lymph/issues/72
 [#69]: https://github.com/rmnldwg/lymph/issues/69

diff --git a/docs/source/api.rst b/docs/source/api.rst
@@ -1,7 +1,3 @@
-.. module: api
-
-.. _api:
-
 Detailed API
 ============
 

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -18,7 +18,7 @@ Documentation
     :caption: Content
 
     install
-    quickstart_unilateral
+    quickstart
     api
     license
 

diff --git a/docs/source/quickstart.rst b/docs/source/quickstart.rst
@@ -0,0 +1,24 @@
+Quickstart
+==========
+
+We have written some short notebooks to show you how the models in this package come together.
+
+
+Unilateral
+----------
+
+.. toctree::
+    :maxdepth: 2
+    :caption: Content
+
+    quickstart_unilateral
+
+
+Bilateral
+----------
+
+.. toctree::
+    :maxdepth: 2
+    :caption: Content
+
+    quickstart_bilateral
diff --git a/docs/source/quickstart_bilateral.ipynb b/docs/source/quickstart_bilateral.ipynb
@@ -1,5 +1,20 @@
 {
  "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Modelling Bilateral Lymphatic Spread\n",
+    "\n",
+    "In the [quickstart guide](./quickstart_unilateral.ipynb) for this package, we have shown how to model the lymphatic tumr progression in head and neck cancer. But we have done so only _unilaterally_. However, depending on the lateralization of the primary tumor, we may not only see _ipsilateral_ (i.e., to the side where the tumor is located), but also to the _contralateral_ (i.e., the other) side.\n",
+    "\n",
+    "To capture this, we have developed an extension that is implemented in the `lymph.models.Bilateral` class. It shares a lot of the same API with the `lymph.models.Unilateral` class but also has some specialties. Let's have a look:\n",
+    "\n",
+    "## Importing\n",
+    "\n",
+    "Nothing new here, we just import the package."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -9,6 +24,15 @@
     "import lymph"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Graph\n",
+    "\n",
+    "As before, we define a graph structure. Note that you only need to define this for one side. The other side's graph is automatically mirrored. If you explicitly want to make the two sides asymmetric, you may do this by providing different graphs to the `ipsi_kwargs` and `contra_kwargs` in the constructor."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -24,6 +48,13 @@
     "}"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For a more detailed explanation of how this graph should be defined, look at the [unilateral quickstart guide](./quickstart_unilateral.ipynb)."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -34,14 +65,30 @@
     "model"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Parameterization\n",
+    "\n",
+    "Since we now need to distribute the parameters to both sides, the assignment gets a little more tricky: If we want to set the spread rate for e.g. the ipsilateral edge from LNL `II` to `III`, we now need to pass it as follows:"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
-    "model.set_params(spread=0.123)\n",
-    "model.contra.graph.edges[\"ItoII\"].get_params(\"spread\")"
+    "model.set_params(ipsi_IItoIII_spread=0.123)\n",
+    "model.get_params()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "So, prefixing a parameter with `ipsi_` or `contra_` causes it to be sent to only the respective side. Of course, you can still set all parameters at once:"
    ]
   },
   {
@@ -50,8 +97,40 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "model.set_params(ipsi_TtoIII_spread=0.234)\n",
-    "model.get_params(as_dict=True)"
+    "model.set_params(spread=0.234)\n",
+    "model.get_params()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Any thinkable combination of setting groups of parameters is possible: All ipsilateral params at once, all tumor spreads at once, all contralateral lnl spreads together, and so on..."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model.set_params(ipsi_spread=0.77)\n",
+    "model.set_lnl_spread_params(spread=0.543)\n",
+    "model.get_params()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    ":::{note}\n",
+    "\n",
+    "Did you notice the LNL spread parameters are not prefixed with `ipsi_` and `contra_`? This is because we set the LNL spread to be symmetric via the `is_symmetric[\"lnl_spread\"] = True` parameter in the constructor of the class. If you change this, the model will have separate parameters for the two sides.\n",
+    ":::\n",
+    "\n",
+    "## Modalities\n",
+    "\n",
+    "Setting the modalities works exactly as in the `Unilateral` case. The `Bilateral` class provides the same API for getting and setting the modalities and delegates this to the two sides."
    ]
   },
   {
@@ -74,6 +153,15 @@
     "print(model.get_modality(\"PET\").confusion_matrix)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Data / Observations\n",
+    "\n",
+    "The data loading APi is also the same compared to the `Unilateral` class. The only difference is that one now does not need to specify which `side` to load, since it will automatically load the `ipsi` and `contra` side."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -130,6 +218,15 @@
     "model.contra.patient_data[\"_model\"]"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Distribution over Diagnose Times\n",
+    "\n",
+    "Just as with the modalities, the distributions over diagnose times are delegated to the two sides via the exact same API as in the `Unilateral` model:"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -138,11 +235,12 @@
    "source": [
     "import numpy as np\n",
     "import scipy as sp\n",
-    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "rng = np.random.default_rng(42)\n",
     "\n",
     "max_time = model.max_time\n",
     "time_steps = np.arange(max_time+1)\n",
-    "p = 0.4\n",
+    "p = 0.3\n",
     "\n",
     "early_prior = sp.stats.binom.pmf(time_steps, max_time, p)\n",
     "model.set_distribution(\"early\", early_prior)"
@@ -170,13 +268,28 @@
     "params_dict"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Notice the additional parameter `late_p` that determines the shape of the late diagnse time distribution.\n",
+    "\n",
+    ":::{note}\n",
+    "\n",
+    "You cannot set the diagnose time distributions asymmetrically! With the modalities this may make sense (although it is not really supported, you may try), but for the diagnose times, this will surely break!\n",
+    ":::\n",
+    "\n",
+    "## Likelihood\n",
+    "\n",
+    "And again we have the same API as before:"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
-    "rng = np.random.default_rng(42)\n",
     "test_probabilities = {p: rng.random() for p in params_dict}\n",
     "\n",
     "llh = model.likelihood(given_params=test_probabilities, log=True)\n",
@@ -185,6 +298,13 @@
     "\n",
     "print(f\"log-likelihood is {ipsi_llh:.2f} + {contra_llh:.2f} = {llh:.2f}\")"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Note that the two side's likelihoods do not perfectly sum up. This is expected! A patient's ipsi- and a contralateral diagnosis were diagnosed _at the same time_, not separately. They are thus not equally likely as if they were observed independently."
+   ]
   }
  ],
  "metadata": {