Skip to content

Commit

Permalink
Merge pull request #78 from rmnldwg/release-1.0.0.rc2
Browse files Browse the repository at this point in the history
Release 1.0.0.rc2
  • Loading branch information
rmnldwg committed Mar 6, 2024
2 parents 645e67b + c25c586 commit ecdfc54
Show file tree
Hide file tree
Showing 28 changed files with 753 additions and 462 deletions.
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ repos:
- build # changes of the build system or dependencies
- change # commit alters the implementation of an existing feature
- chore # technical or maintenance task not related to feature or user story
- ci # edits to the cintinuous integration scripts/configuration
- ci # edits to the continuous integration scripts/configuration
- deprecate # a feature or functionality will be deprecated
- docs # add, update of revise the documentation
- feat # a new feature was implemented (bump MINOR version)
Expand Down
81 changes: 80 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,83 @@

All notable changes to this project will be documented in this file.

<a name="1.0.0.rc2"></a>
## [1.0.0.rc2] - 2024-03-06

Implementing the [lymixture] brought to light a shortcoming in the way the data and diagnose matrices are computed and stored. As mentioned in issue [#77], their rows are now aligned with the patient data, which may have some advantages for different use cases.

Also, since this is probably the last pre-release, I took the liberty to go over some method names once more and make them clearer.


### Bug Fixes

- Don't use fake T-stage for BN model. Related [#77].\
Since we now have access to the full diagnose matrix by default, there
is no need for the Bayesian network T-stage fix anymore.
- (**uni**) Reload data when modalities change.\
Because we only store those diagnoses that are relevant to the model
under the "_model" header in the `patient_data` table, we need to reload
the patient data whenever we modify the modalities.


### Documentation

- Update to slightly changed API.
- (**bi**) Add bilateral quickstart to docs.

### Features

- (**mod**) Add utils to check for modality changes.

### Performance

- (**uni**) Make data & diagnose matrices faster. Related [#77].\
The last change caused a dramatic slowdown (factor 500) of the data and
diagnose matrix access, because it needed to index them from a
`DataFrame`. Now, I implemented a basic caching scheme with a patient
data cache version that brought back the original speed.
Also, apparently `del dataframe[column]` is much slower than
`dataframe.drop(columns)`. I replaced the former with the latter and now
the tests are fast again.


### Refactor

-**BREAKING** Rename methods for brevity & clarity.\
Method names have been changed, e.g `comp_dist_evolution()` has been
renamed to `state_dist_evo()` which is both shorter and (imho) clearer.
- (**uni**) Move data/diag matrix generation.

### Testing

- Update to slightly changed API.
- (**uni**) Check reset of data on modality change.\
Added a test to make sure the patient data gets reloaded when the
modalities change. This test is still failing.
- Finally suppress all `PerformanceWarnings`.

### Change

-**BREAKING** Store data & diagnose matrices in data. Fixes [#77].\
Instead of weird, dedicated `UserDict`s, I simply use the patient data
to store the data encoding and diagnose probabilities for each patient.
This has the advantage that the entire matrix (irrespective of T-stage)
is aligned with the patients.
-**BREAKING** (**bi**) Shorten kwargs.\
The `(uni|ipsi|contra)lateral_kwargs` in the `Bilateral` constructor
were shortened by removing the "lateral".


### Merge

- Branch 'main' into 'dev'.
- Branch '77-diagnose-matrices-not-aligned-with-data' into 'dev'.

### Remove

- Unused helpers.


<a name="1.0.0.rc1"></a>
## [1.0.0.rc1] - 2024-03-04

Expand Down Expand Up @@ -490,7 +567,8 @@ Almost the entire API has changed. I'd therefore recommend to have a look at the
- add pre-commit hook to check commit msg


[Unreleased]: https://github.com/rmnldwg/lymph/compare/1.0.0.rc1...HEAD
[Unreleased]: https://github.com/rmnldwg/lymph/compare/1.0.0.rc2...HEAD
[1.0.0.rc2]: https://github.com/rmnldwg/lymph/compare/1.0.0.rc1...1.0.0.rc2
[1.0.0.rc1]: https://github.com/rmnldwg/lymph/compare/1.0.0.a6...1.0.0.rc1
[1.0.0.a6]: https://github.com/rmnldwg/lymph/compare/1.0.0.a5...1.0.0.a6
[1.0.0.a5]: https://github.com/rmnldwg/lymph/compare/1.0.0.a4...1.0.0.a5
Expand All @@ -504,6 +582,7 @@ Almost the entire API has changed. I'd therefore recommend to have a look at the
[0.4.1]: https://github.com/rmnldwg/lymph/compare/0.4.0...0.4.1
[0.4.0]: https://github.com/rmnldwg/lymph/compare/0.3.10...0.4.0

[#77]: https://github.com/rmnldwg/lymph/issues/77
[#74]: https://github.com/rmnldwg/lymph/issues/74
[#72]: https://github.com/rmnldwg/lymph/issues/72
[#69]: https://github.com/rmnldwg/lymph/issues/69
Expand Down
4 changes: 0 additions & 4 deletions docs/source/api.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,3 @@
.. module: api
.. _api:

Detailed API
============

Expand Down
2 changes: 1 addition & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Documentation
:caption: Content

install
quickstart_unilateral
quickstart
api
license

Expand Down
24 changes: 24 additions & 0 deletions docs/source/quickstart.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
Quickstart
==========

We have written some short notebooks to show you how the models in this package come together.


Unilateral
----------

.. toctree::
:maxdepth: 2
:caption: Content

quickstart_unilateral


Bilateral
----------

.. toctree::
:maxdepth: 2
:caption: Content

quickstart_bilateral
134 changes: 127 additions & 7 deletions docs/source/quickstart_bilateral.ipynb
Original file line number Diff line number Diff line change
@@ -1,5 +1,20 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Modelling Bilateral Lymphatic Spread\n",
"\n",
"In the [quickstart guide](./quickstart_unilateral.ipynb) for this package, we have shown how to model the lymphatic tumr progression in head and neck cancer. But we have done so only _unilaterally_. However, depending on the lateralization of the primary tumor, we may not only see _ipsilateral_ (i.e., to the side where the tumor is located), but also to the _contralateral_ (i.e., the other) side.\n",
"\n",
"To capture this, we have developed an extension that is implemented in the `lymph.models.Bilateral` class. It shares a lot of the same API with the `lymph.models.Unilateral` class but also has some specialties. Let's have a look:\n",
"\n",
"## Importing\n",
"\n",
"Nothing new here, we just import the package."
]
},
{
"cell_type": "code",
"execution_count": null,
Expand All @@ -9,6 +24,15 @@
"import lymph"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Graph\n",
"\n",
"As before, we define a graph structure. Note that you only need to define this for one side. The other side's graph is automatically mirrored. If you explicitly want to make the two sides asymmetric, you may do this by providing different graphs to the `ipsi_kwargs` and `contra_kwargs` in the constructor."
]
},
{
"cell_type": "code",
"execution_count": null,
Expand All @@ -24,6 +48,13 @@
"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For a more detailed explanation of how this graph should be defined, look at the [unilateral quickstart guide](./quickstart_unilateral.ipynb)."
]
},
{
"cell_type": "code",
"execution_count": null,
Expand All @@ -34,14 +65,30 @@
"model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Parameterization\n",
"\n",
"Since we now need to distribute the parameters to both sides, the assignment gets a little more tricky: If we want to set the spread rate for e.g. the ipsilateral edge from LNL `II` to `III`, we now need to pass it as follows:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model.set_params(spread=0.123)\n",
"model.contra.graph.edges[\"ItoII\"].get_params(\"spread\")"
"model.set_params(ipsi_IItoIII_spread=0.123)\n",
"model.get_params()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So, prefixing a parameter with `ipsi_` or `contra_` causes it to be sent to only the respective side. Of course, you can still set all parameters at once:"
]
},
{
Expand All @@ -50,8 +97,40 @@
"metadata": {},
"outputs": [],
"source": [
"model.set_params(ipsi_TtoIII_spread=0.234)\n",
"model.get_params(as_dict=True)"
"model.set_params(spread=0.234)\n",
"model.get_params()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Any thinkable combination of setting groups of parameters is possible: All ipsilateral params at once, all tumor spreads at once, all contralateral lnl spreads together, and so on..."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model.set_params(ipsi_spread=0.77)\n",
"model.set_lnl_spread_params(spread=0.543)\n",
"model.get_params()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
":::{note}\n",
"\n",
"Did you notice the LNL spread parameters are not prefixed with `ipsi_` and `contra_`? This is because we set the LNL spread to be symmetric via the `is_symmetric[\"lnl_spread\"] = True` parameter in the constructor of the class. If you change this, the model will have separate parameters for the two sides.\n",
":::\n",
"\n",
"## Modalities\n",
"\n",
"Setting the modalities works exactly as in the `Unilateral` case. The `Bilateral` class provides the same API for getting and setting the modalities and delegates this to the two sides."
]
},
{
Expand All @@ -74,6 +153,15 @@
"print(model.get_modality(\"PET\").confusion_matrix)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data / Observations\n",
"\n",
"The data loading APi is also the same compared to the `Unilateral` class. The only difference is that one now does not need to specify which `side` to load, since it will automatically load the `ipsi` and `contra` side."
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -130,6 +218,15 @@
"model.contra.patient_data[\"_model\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Distribution over Diagnose Times\n",
"\n",
"Just as with the modalities, the distributions over diagnose times are delegated to the two sides via the exact same API as in the `Unilateral` model:"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand All @@ -138,11 +235,12 @@
"source": [
"import numpy as np\n",
"import scipy as sp\n",
"import matplotlib.pyplot as plt\n",
"\n",
"rng = np.random.default_rng(42)\n",
"\n",
"max_time = model.max_time\n",
"time_steps = np.arange(max_time+1)\n",
"p = 0.4\n",
"p = 0.3\n",
"\n",
"early_prior = sp.stats.binom.pmf(time_steps, max_time, p)\n",
"model.set_distribution(\"early\", early_prior)"
Expand Down Expand Up @@ -170,13 +268,28 @@
"params_dict"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice the additional parameter `late_p` that determines the shape of the late diagnse time distribution.\n",
"\n",
":::{note}\n",
"\n",
"You cannot set the diagnose time distributions asymmetrically! With the modalities this may make sense (although it is not really supported, you may try), but for the diagnose times, this will surely break!\n",
":::\n",
"\n",
"## Likelihood\n",
"\n",
"And again we have the same API as before:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"rng = np.random.default_rng(42)\n",
"test_probabilities = {p: rng.random() for p in params_dict}\n",
"\n",
"llh = model.likelihood(given_params=test_probabilities, log=True)\n",
Expand All @@ -185,6 +298,13 @@
"\n",
"print(f\"log-likelihood is {ipsi_llh:.2f} + {contra_llh:.2f} = {llh:.2f}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that the two side's likelihoods do not perfectly sum up. This is expected! A patient's ipsi- and a contralateral diagnosis were diagnosed _at the same time_, not separately. They are thus not equally likely as if they were observed independently."
]
}
],
"metadata": {
Expand Down
Loading

0 comments on commit ecdfc54

Please sign in to comment.