Skip to content

Commit

Permalink
Merge pull request #83 from rmnldwg/release-1.2.0
Browse files Browse the repository at this point in the history
Release 1.2.0
  • Loading branch information
rmnldwg committed Mar 29, 2024
2 parents 6559f11 + b64a1c8 commit b7f453a
Show file tree
Hide file tree
Showing 22 changed files with 594 additions and 331 deletions.
45 changes: 44 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,48 @@

All notable changes to this project will be documented in this file.

<a name="1.2.0"></a>
## [1.2.0] - 2024-03-29

### Bug Fixes

- (**mid**) `obs_dist` may return 3D array.


### Documentation

- Fix unknown version in title.
- Add missing blank before list.
- (**mid**) Add comment about midext marginalizing.


### Features

- (**mid**) Add `posterior_state_dist()` method.\
The `Midline` model now has a `posterior_state_dist()` method, too.
- (**types**) Base `Model` has state dist methods.\
Both `state_dist()` and `posterior_state_dist()` have been added to the
`types.Model` base class.
- Add `marginalize()` method.\
With this new method, one can marginalize a (prior or posterior) state
distribution over all states that match a provided involvement.\
It is used e.g. to refactor the code of the `risk()` methods.
- (**types**) Add `obs_dist` and `marginalize`.\
The `types.Model` base abstract base class now also has the methods
`obs_dist` and `marginalize` for better autocomplete support in editors.


### Testing

- Remove plain test risk.


### Change

- (**types**) Improve type hints for inv. pattern.
- Rename "diagnose" to "diagnosis" when noun.\
When used as a noun, "diagnosis" is correct, not "diagnose".


<a name="1.1.0"></a>
## [1.1.0] - 2024-03-20
Expand Down Expand Up @@ -626,7 +668,8 @@ Almost the entire API has changed. I'd therefore recommend to have a look at the
- add pre-commit hook to check commit msg


[Unreleased]: https://github.com/rmnldwg/lymph/compare/1.1.0...HEAD
[Unreleased]: https://github.com/rmnldwg/lymph/compare/1.2.0...HEAD
[1.2.0]: https://github.com/rmnldwg/lymph/compare/1.1.0...1.2.0
[1.1.0]: https://github.com/rmnldwg/lymph/compare/1.0.0...1.1.0
[1.0.0]: https://github.com/rmnldwg/lymph/compare/1.0.0.rc2...1.0.0
[1.0.0.rc2]: https://github.com/rmnldwg/lymph/compare/1.0.0.rc1...1.0.0.rc2
Expand Down
4 changes: 2 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,13 @@ HNSCC spreads though the lymphatic system of the neck and forms metastases in re

To account for this microscopic involvement, parts of the lymphatic system are often irradiated electively to increase tumor control. Which parts are included in this elective clinical target volume is currently decided based on guidelines [1]_ [2]_ [3]_ [4]_. These in turn are derived from reports of the prevalence of involvement per lymph node level (LNL), i.e. the portion of patients that were diagnosed with metastases in any given LNL, stratified by primary tumor location. It is recommended to include a LNL in the elective target volume if 10 - 15% of patients showed involvement in that particular level.

However, while the prevalence of involvement has been reported in the literature [5]_ [6]_, and the general lymph drainage pathways are understood well, the detailed progression patterns of HNSCC remain poorly quantified. We believe that the risk for microscopic involvement in an LNL depends highly on the specific diagnose of a particular patient and their treatment can hence be personalized if the progression patterns were better quantified.
However, while the prevalence of involvement has been reported in the literature [5]_ [6]_, and the general lymph drainage pathways are understood well, the detailed progression patterns of HNSCC remain poorly quantified. We believe that the risk for microscopic involvement in an LNL depends highly on the specific diagnosis of a particular patient and their treatment can hence be personalized if the progression patterns were better quantified.


Our Goal
========

With this Python package we want to provide a framework to accurately predict the risk for microscopic metastases in any lymph node level for the specific diagnose a particular patient presents with.
With this Python package we want to provide a framework to accurately predict the risk for microscopic metastases in any lymph node level for the specific diagnosis a particular patient presents with.

The implemented model is highly interpretable and was developed together with clinicians to accurately represent the anatomy of the lymphatic drainiage. It can be trained with data that reports the patterns of lymphatic progression in detail, like the `dataset(s) <https://github.com/rmnldwg/lydata>`_ we collected at our institution, the University Hospital Zurich (USZ).

Expand Down
4 changes: 2 additions & 2 deletions docs/source/components.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,10 @@ Diagnostic Modalities
:show-inheritance:


Marginalization over Diagnose Times
Marginalization over Diagnosis Times
-----------------------------------

.. automodule:: lymph.diagnose_times
.. automodule:: lymph.diagnosis_times
:members:
:special-members: __init__, __hash__
:show-inheritance:
Expand Down
25 changes: 4 additions & 21 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,36 +3,19 @@
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/main/usage/configuration.html

# -- Path setup --------------------------------------------------------------

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
import sys

from pkg_resources import DistributionNotFound, get_distribution

sys.path.insert(0, os.path.abspath('../..'))

try:
__version__ = get_distribution("lymph").version
except DistributionNotFound:
__version__ = "unknown version"

import lymph

# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information

project = 'lymph'
copyright = '2022, Roman Ludwig'
author = 'Roman Ludwig'
gh_username = 'rmnldwg'

version = __version__
version = lymph.__version__
# The full version, including alpha/beta/rc tags
release = __version__
release = lymph.__version__


# -- General configuration ---------------------------------------------------
Expand Down
6 changes: 3 additions & 3 deletions docs/source/quickstart_bilateral.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -222,9 +222,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Distribution over Diagnose Times\n",
"## Distribution over Diagnosis Times\n",
"\n",
"Just as with the modalities, the distributions over diagnose times are delegated to the two sides via the exact same API as in the `Unilateral` model:"
"Just as with the modalities, the distributions over diagnosis times are delegated to the two sides via the exact same API as in the `Unilateral` model:"
]
},
{
Expand Down Expand Up @@ -276,7 +276,7 @@
"\n",
":::{note}\n",
"\n",
"You cannot set the diagnose time distributions asymmetrically! With the modalities this may make sense (although it is not really supported, you may try), but for the diagnose times, this will surely break!\n",
"You cannot set the diagnosis time distributions asymmetrically! With the modalities this may make sense (although it is not really supported, you may try), but for the diagnosis times, this will surely break!\n",
":::\n",
"\n",
"## Likelihood\n",
Expand Down
20 changes: 10 additions & 10 deletions docs/source/quickstart_unilateral.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"source": [
"# Getting started\n",
"\n",
"A lot of people get diagnosed with squamous cell carcinoma in the head & neck region ([HNSCC](https://en.wikipedia.org/wiki/Head_and_neck_cancer)), which frequently metastasizes via the lymphatic system. We set out to develop a methodology to predict the risk of a new patient having metastases in so-called lymph node levels (LNLs), based on their personal diagnose (e.g. findings from a CT scan) and information of previously diagnosed and treated patients. And that's exactly what this code enables you to do as well.\n",
"A lot of people get diagnosed with squamous cell carcinoma in the head & neck region ([HNSCC](https://en.wikipedia.org/wiki/Head_and_neck_cancer)), which frequently metastasizes via the lymphatic system. We set out to develop a methodology to predict the risk of a new patient having metastases in so-called lymph node levels (LNLs), based on their personal diagnosis (e.g. findings from a CT scan) and information of previously diagnosed and treated patients. And that's exactly what this code enables you to do as well.\n",
"\n",
"As mentioned, this package is meant to be a relatively simple-to-use frontend. The math is done under the hood and one does not need to worry about it a lot. But let's have a quick look at what we're doing here.\n",
"\n",
Expand Down Expand Up @@ -152,7 +152,7 @@
"source": [
"## Diagnostic Modalities\n",
"\n",
"To ultimately compute the likelihoods of observations, we need to fix the sensitivities and specificities of the obtained diagnoses. And since we might have multiple diagnostic modalities available, we need to tell the system which of them comes with which specificity and sensitivity. We do this by adding specificity/sensitivity pairs to our model:"
"To ultimately compute the likelihoods of observations, we need to fix the sensitivities and specificities of the obtained diagnosis. And since we might have multiple diagnostic modalities available, we need to tell the system which of them comes with which specificity and sensitivity. We do this by adding specificity/sensitivity pairs to our model:"
]
},
{
Expand Down Expand Up @@ -256,7 +256,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"To feed the dataset into the system, we assign the dataset to the attribute `patient_data`. What the system then does here is creating a diagnose matrix for every T-stage in the data."
"To feed the dataset into the system, we assign the dataset to the attribute `patient_data`. What the system then does here is creating a diagnosis matrix for every T-stage in the data."
]
},
{
Expand All @@ -275,17 +275,17 @@
"source": [
":::{note}\n",
"\n",
"The data now has an additional top-level header `\"_model\"` which stores only the information the model actually needs. In this case, it only stores the ipsilateral CT diagnoses of the LNLs I, II, III, and IV, as well as the mapped T-stage of the patients. Note that from the original T-stages 1, 2, 3, and 4, only \"early\" and \"late\" are left. This is the default transformation, but it can be changed by providing a function to the `mapping` keyword argument in the `load_patient_data()` method.\n",
"The data now has an additional top-level header `\"_model\"` which stores only the information the model actually needs. In this case, it only stores the ipsilateral CT diagnosis of the LNLs I, II, III, and IV, as well as the mapped T-stage of the patients. Note that from the original T-stages 1, 2, 3, and 4, only \"early\" and \"late\" are left. This is the default transformation, but it can be changed by providing a function to the `mapping` keyword argument in the `load_patient_data()` method.\n",
":::"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Distribution over Diagnose Times\n",
"## Distribution over Diagnosis Times\n",
"\n",
"The last ingredient to set up (at least when using the hidden Markov model) would now be the distribution over diagnose times. Our dataset contains two different T-stages \"early\" and \"late\". One of the underlying assumptions with our model is that earlier T-stage patients have been - on average - diagnosed at an earlier time-point, compared to late T-stage patients. We can reflect that using distributions over the diagnosis time:"
"The last ingredient to set up (at least when using the hidden Markov model) would now be the distribution over diagnosis times. Our dataset contains two different T-stages \"early\" and \"late\". One of the underlying assumptions with our model is that earlier T-stage patients have been - on average - diagnosed at an earlier time-point, compared to late T-stage patients. We can reflect that using distributions over the diagnosis time:"
]
},
{
Expand Down Expand Up @@ -313,7 +313,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now set a fixed prior for the distribution over diagnose times of early T-stage patients (i.e., patients with T1 and T2 tumors)."
"We can now set a fixed prior for the distribution over diagnosis times of early T-stage patients (i.e., patients with T1 and T2 tumors)."
]
},
{
Expand All @@ -330,11 +330,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's define a parametrized PMF over diagnose times for patients with late T-stage tumors (T3 and T4) to show this functionality. For that, we first define a parametrized function with the signature\n",
"Let's define a parametrized PMF over diagnosis times for patients with late T-stage tumors (T3 and T4) to show this functionality. For that, we first define a parametrized function with the signature\n",
"\n",
"```python\n",
"def distribution(support: list[float] | np.ndarray, a=1, b=2, c=3, ...) -> np.ndarray:\n",
" \"\"\"PMF over diagnose times (``support``) with parameters ``a``, ``b``, and ``c``.\"\"\"\n",
" \"\"\"PMF over diagnosis times (``support``) with parameters ``a``, ``b``, and ``c``.\"\"\"\n",
" ...\n",
" return result\n",
"```\n",
Expand Down Expand Up @@ -405,7 +405,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Note how the set of adjustable parameters now also contains the `p` parameter for the late T-stage's distribution over diagnose times. For the early T-stage, it is not present, because that one was provided as a fixed array."
"Note how the set of adjustable parameters now also contains the `p` parameter for the late T-stage's distribution over diagnosis times. For the early T-stage, it is not present, because that one was provided as a fixed array."
]
},
{
Expand Down
4 changes: 2 additions & 2 deletions lymph/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,11 @@

# nopycln: file

from lymph import diagnose_times, graph, matrix, models
from lymph import diagnosis_times, graph, matrix, models
from lymph.utils import clinical, pathological

__all__ = [
"diagnose_times", "matrix",
"diagnosis_times", "matrix",
"graph", "models",
"clinical", "pathological",
]
Expand Down
14 changes: 7 additions & 7 deletions lymph/diagnose_times.py → lymph/diagnosis_times.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""
Module for marginalizing over diagnose times.
Module for marginalizing over diagnosis times.
The hidden Markov model we implement assumes that every patient started off with a
healthy neck, meaning no lymph node levels harboured any metastases. This is a valid
Expand Down Expand Up @@ -33,22 +33,22 @@ class SupportError(Exception):


class Distribution:
"""Class that provides a way of storing distributions over diagnose times."""
"""Class that provides a way of storing distributions over diagnosis times."""
def __init__(
self,
distribution: Iterable[float] | callable,
max_time: int | None = None,
**kwargs,
) -> None:
"""Initialize a distribution over diagnose times.
"""Initialize a distribution over diagnosis times.
This object can either be created by passing a parametrized function (e.g.,
``scipy.stats`` distribution) or by passing a list of probabilities for each
diagnose time.
diagnosis time.
The signature of the function must be ``func(support, **kwargs)``, where
``support`` is the support of the distribution from 0 to ``max_time``. The
function must return a list of probabilities for each diagnose time.
function must return a list of probabilities for each diagnosis time.
Note:
All arguments except ``support`` must have default values and if some
Expand Down Expand Up @@ -214,7 +214,7 @@ def get_params(
"""If updateable, return the dist's ``param`` value or all params in a dict.
See Also:
:py:meth:`lymph.diagnose_times.DistributionsUserDict.get_params`
:py:meth:`lymph.diagnosis_times.DistributionsUserDict.get_params`
:py:meth:`lymph.graph.Edge.get_params`
:py:meth:`lymph.models.Unilateral.get_params`
:py:meth:`lymph.models.Bilateral.get_params`
Expand Down Expand Up @@ -264,7 +264,7 @@ def draw_diag_times(
rng: np.random.Generator | None = None,
seed: int = 42,
) -> np.ndarray:
"""Draw ``num`` samples of diagnose times from the stored PMF.
"""Draw ``num`` samples of diagnosis times from the stored PMF.
A random number generator can be provided as ``rng``. If ``None``, a new one
is initialized with the given ``seed`` (or ``42``, by default).
Expand Down
4 changes: 2 additions & 2 deletions lymph/graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -406,8 +406,8 @@ def get_params(
"""Return the value of the parameter ``param`` or all params in a dict.
See Also:
:py:meth:`lymph.diagnose_times.Distribution.get_params`
:py:meth:`lymph.diagnose_times.DistributionsUserDict.get_params`
:py:meth:`lymph.diagnosis_times.Distribution.get_params`
:py:meth:`lymph.diagnosis_times.DistributionsUserDict.get_params`
:py:meth:`lymph.models.Unilateral.get_params`
:py:meth:`lymph.models.Bilateral.get_params`
"""
Expand Down
14 changes: 8 additions & 6 deletions lymph/matrix.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@
import numpy as np
import pandas as pd

from lymph import graph
from lymph.utils import get_state_idx_matrix, row_wise_kron, tile_and_repeat
from lymph import graph, types
from lymph.modalities import Modality
from lymph.utils import get_state_idx_matrix, row_wise_kron, tile_and_repeat


@lru_cache(maxsize=128)
Expand Down Expand Up @@ -94,17 +94,18 @@ def generate_observation(

def compute_encoding(
lnls: list[str],
pattern: pd.Series | dict[str, bool | int | str],
pattern: pd.Series | dict[str, types.InvolvementIndicator],
base: int = 2,
) -> np.ndarray:
"""Compute the encoding of a particular ``pattern`` of involvement.
A ``pattern`` holds information about the involvement of each LNL and the function
transforms this into a binary encoding which is ``True`` for all possible complete
states/diagnoses that are compatible with the given ``pattern``.
states/diagnosis that are compatible with the given ``pattern``.
In the binary case (``base=2``), the value behind ``pattern[lnl]`` can be one of
the following things:
- ``False``: The LNL is healthy.
- ``"healthy"``: The LNL is healthy.
- ``True``: The LNL is involved.
Expand All @@ -113,6 +114,7 @@ def compute_encoding(
In the trinary case (``base=3``), the value behind ``pattern[lnl]`` can be one of
these things:
- ``False``: The LNL is healthy.
- ``"healthy"``: The LNL is healthy.
- ``True``: The LNL is involved (micro- or macroscopic).
Expand Down Expand Up @@ -211,12 +213,12 @@ def generate_data_encoding(
if modality_name not in patient_row:
warnings.warn(f"Modality {modality_name} not in data. Skipping.")
continue
diagnose_encoding = compute_encoding(
diagnosis_encoding = compute_encoding(
lnls=lnls,
pattern=patient_row[modality_name],
base=2, # observations are always binary!
)
patient_encoding = np.kron(patient_encoding, diagnose_encoding)
patient_encoding = np.kron(patient_encoding, diagnosis_encoding)

result[:,i] = patient_encoding

Expand Down
Loading

0 comments on commit b7f453a

Please sign in to comment.