# Computing MS progression from longitudinal data

This tutorial illustrates how to use the `msprog` package to study the progression of disability in multiple sclerosis (MS) based on repeated assessments of an outcome measure (EDSS, NHPT, T25FW, or SDMT) through time, and on the dates of acute episodes (if present).

In [1]:
from pymsprog.__init__ import MSprog, compute_delta, load_toy_data
# from pymsprog import compute_delta, MSprog

## Input data

The data must be organised in a `pandas` `DataFrame` containing (at least) the following columns:

* Subject IDs;
* Visit dates;
* Outcome values.

The visits should be listed in chronological order (if they are not, `MSprog` will sort them).

For relapsing-remitting MS patients, an additional `DataFrame` with the dates of relapses is needed to correctly assess progression and characterise progression events as relapse-associated or relapse-independent. The dataset should contain (at least) the following columns:

* Subject IDs;
* Visit dates.

In this tutorial, we will use toy data with artificially generated EDSS and SDMT assessments and relapse dates for four patients:

In [2]:
toydata_visits, toydata_relapses = load_toy_data()

print('\nVisits:')
print(toydata_visits.head())
print('\nRelapses:')
print(toydata_relapses.head())


Visits:
   id       date  EDSS  SDMT
0   1 2021-09-23   4.5    50
1   1 2021-11-03   4.5    50
2   1 2022-01-19   4.5    51
3   1 2022-04-27   4.5    57
4   1 2022-07-12   5.5    55

Relapses:
   id       date
0   2 2021-06-12
1   2 2022-10-25
2   3 2022-12-01


## Minimal example

Given data on visits and relapses in the form specified above, the `MSprog` function analyses the disability progression for each subject. Default `outcome` is `'edss'`.

In [3]:
summary, results = MSprog(toydata_visits, # data on visits
                         subj_col='id', value_col='EDSS', date_col='date', # specify column names
                         relapse=toydata_relapses) # data on relapses


---
Outcome: EDSS
Confirmation at: [3]mm (-30dd, +30dd)
Baseline: fixed
Relapse influence: 30dd
Events detected: firstprog
---
Total subjects: 4
---
Progressed: 2 (PIRA: 1; RAW: 1)


The function prints concise info (the argument `verbose` can be used to control the amount of printed info), and generates the following two `DataFrame`s.

<br />

1. A summary of the event sequence detected for each subject:

In [4]:
print(summary)

  event_sequence  progression  RAW  PIRA  undefined_prog
1           PIRA            1    0     1               0
2            RAW            1    1     0               0
3                           0    0     0               0
4                           0    0     0               0


where: `event_sequence` specifies the order of the events; the other columns count the events of each kind (improvement; progression; relapse-associated worsening, RAW; progression independent of relapse activity, PIRA; and progression that could not be classified as either RAW of PIRA with the available information). See [[1](#lublin2014), [2](#kappos2018), [3](#silent)].

<br />

2. Extended info on each event for all subjects:

In [5]:
print(results)

   id  nevent event_type time2event conf3 sust_days sust_last
0   1       1       PIRA        292     1       125         1
1   2       1        RAW        198     1         0         0


where: `nevent` is the cumulative event count for each subject; `event_type` characterises the event; `time2event` is the number of days from baseline to event; `conf3` reports whether the event was confirmed; `sust_days` is the number of days for which the event was sustained *after confirmation*; `sust_last` reports whether the event was sustained until the last visit.

<br />

Several qualitative and quantitative options for computing the progression are given as optional arguments of `MSprog` that can be set by the user. In order to ensure reproducibility, the results should always be complemented by the specific settings used to obtain them. In the following sections we will go into more detail about usage and best practices for each of the options.


## Valid changes in the outcome measure

The `MSprog` function detects the events sequentially by scanning the outcome values in chronological order, and each value is tested for its difference from the current reference value. The `compute_delta` function returns the minimum difference $\delta$ in the chosen outcome measure that is accepted as a valid change from a given reference value $x$ (value at baseline). The criterion varies based on the test under analysis, see [[4](#lorscheider2016), [5](#bosma2010), [6](#kalinowski2022), [7](#strober2019)].

* Expanded Disability Status Scale (EDSS): $\delta(x)=\begin{cases} 1.5 \quad \text{ if } x=0\\1 \quad\;\;\; \text{ if } 0 < x \leq 5\\0.5 \quad \text{ if } 5<x\leq 10\end{cases}$;

* Nine-Hole Peg Test (NHPT), for either the dominant or the non-dominant hand: $\delta(x) = \frac{x}{5}$;

* Timed 25-Foot Walk (T25FW): $\delta(x) = \frac{x}{5}$;

* Symbol Digit Modalities Test (SDMT): $\delta(x) = \min\left(3, \frac{x}{10}\right)$.


For example:

In [6]:
print('Minimum valid change from baseline EDSS=4: ', compute_delta(4)) # default outcome measure is 'edss'
print('Minimum valid change from baseline T25FW=10: ', compute_delta(10, outcome='t25fw'))

Minimum valid change from baseline EDSS=4:  1.0
Minimum valid change from baseline T25FW=10:  2.0


The `compute_delta` function is used as default `delta_fun` in the `MSprog` function to assess progression. Alternatively, a custom `delta_fun` can be provided. To change the minimum $\delta$ for SDMT to, say, "either 4 points or 20% of the reference value", we would define:

In [7]:
def my_sdmt_delta(x):
    return min(4, x/5)
print('CUSTOM minimum valid change from baseline SDMT=50: ', my_sdmt_delta(50)) # my delta
print('DEFAULT minimum valid change from baseline SDMT=50: ', compute_delta(50, outcome='sdmt')) # default delta

CUSTOM minimum valid change from baseline SDMT=50:  4
DEFAULT minimum valid change from baseline SDMT=50:  3


To use our custom function, we can then set `delta_fun=my_sdmt_delta` in `MSprog` when computing the progression.

## Baseline scheme

The baseline scheme can be set by using the `baseline` argument in `MSprog`. Two main baseline schemes can be adopted:

* Fixed baseline (`baseline='fixed'`, default): the reference value is set to be the first outcome value found outside the influence of an acute event.
* Roving baseline (`baseline='roving'`): the reference value is initially set as the first outcome value out of relapse influence, then updated after each event to the last confirmed outcome value (out of relapse influence). **This scheme is recommended in a "multiple events" setting** [[2](#kappos2018)] (see example below). The re-baseline procedure can be made finer by setting `sub_threshold=TRUE` in `MSprog`: this moves the reference value after *any* confirmed change, even if the difference from the current reference is smaller than the minimum $\delta$.


For example, extracting multiple EDSS events for subject `4` from `toydata_visits` with a fixed baseline would result in the following.

In [8]:
print('\nData:')
print(toydata_visits.loc[toydata_visits['id']==4, ['date', 'EDSS']]) # EDSS visits

_, results = MSprog(toydata_visits, 'id', 'EDSS', 'date', relapse=toydata_relapses, subjects=[4],
                outcome='edss', conf_months=3, event='multiple', baseline='fixed', 
                include_dates=True, verbose=0)
print('\nResults with fixed baseline:')
print(results.T) # results


Data:
         date  EDSS
25 2021-09-18   4.5
26 2021-12-04   3.5
27 2022-03-12   3.5
28 2022-07-19   5.0
29 2022-10-05   5.0
30 2023-01-16   5.5
31 2023-04-27   5.0

Results with fixed baseline:
                     0
id                   4
nevent               1
event_type        impr
bldate      2021-09-18
date        2021-12-04
time2event          77
conf3                1
sust_days            0
sust_last            0


Since the reference value was kept fixed at the first visit (EDSS = 4.5), the EDSS progression at visit 4 (EDSS = 5) was not detected. On the other hand, with a roving baseline scheme, the baseline is moved to visit 3 after the confirmed improvement and the progression event is correctly detected:

In [9]:
_, results = MSprog(toydata_visits, 'id', 'EDSS', 'date', relapse=toydata_relapses, subjects=[4],
                outcome='edss', conf_months=3, event='multiple', baseline='roving', 
                include_dates=True, verbose=0)
print('\nResults with roving baseline:')
print(results.T) # results


Results with roving baseline:
                     0           1
id                   4           4
nevent               1           2
event_type        impr        PIRA
bldate      2021-09-18  2022-03-12
date        2021-12-04  2022-07-19
time2event          77         129
conf3                1           1
sust_days            0         204
sust_last            0           1


<br />
Finally, on top of the chosen baseline scheme, *post-relapse re-baseline* can be applied by setting `relapse_rebl=TRUE` in `MSprog`. If this is enabled, outcome values are scanned once more from the beginning by resetting the baseline after each relapse (first visit out of relapse influence) to classify potential events left undefined as PIRA.

## Confirmation period

An event is only validated if it is *confirmed*, that is, if the value change from baseline is maintained **up to** a subsequent visit falling within a pre-specified confirmation period [[8](#ontaneda2017)]. The chosen confirmation period depends on the type of study and on the frequency of visits, and can be set in `MSprog` by using the argument `conf_months`. If multiple values are specified (e.g., `conf_months=[3,6]`), events are retained if confirmed by at least a visit falling within one of the specified periods (here, 3 or 6 months $\pm$ `conf_tol_days`) **(\*)**. The results table will report whether an event was confirmed in each of the specified periods.

**(\*)** *An event is only confirmed if the value change from baseline is maintained **at all visits up to the confirmation visit**. So an event can only be confirmed at 6 months and not confirmed at 3 months if there are no valid confirmation visits falling within the 3-month window.*


Let's look at subject `2` from `toydata_visits`:

In [10]:
print('\nVisits:')
print(toydata_visits.loc[toydata_visits['id']==2, ['date', 'EDSS']]) # EDSS visits

print('\nRelapses:')
print(toydata_relapses[toydata_relapses['id']==2]) # relapses


Visits:
         date  EDSS
8  2020-11-26   4.0
9  2020-12-30   4.0
10 2021-03-24   4.5
11 2021-06-12   5.5
12 2021-09-04   5.0
13 2021-12-02   4.5
14 2022-02-23   4.5
15 2022-05-19   6.0
16 2022-08-28   6.0
17 2022-11-26   6.0

Relapses:
   id       date
0   2 2021-06-12
1   2 2022-10-25


The following code detects 3- or 6-month confirmed events for subject `2`.

In [11]:
_, results = MSprog(toydata_visits, 'id', 'EDSS', 'date', relapse=toydata_relapses, subjects=[2],
                outcome='edss', conf_months=[3,6], event='multiple', baseline='roving', 
                verbose=0)
print('\nResults:')
print(results.T) # results


Results:
               0     1
id             2     2
nevent         1     2
event_type   RAW  PIRA
time2event   198   257
conf3          1     1
conf6          0     1
PIRA_conf6  None     0
sust_days      0     0
sust_last      0     1


The validated events included in the results are a 3-month-confirmed RAW and a 3-month-confirmed PIRA. The RAW event was not confirmed at 6 months (`conf6` is `0`). The PIRA event was also confirmed at 6 months (`conf6` is `1`). However, since a relapse occurred before the 6-month confirmation, the event cannot be classified as a 6-month-confirmed PIRA (`PIRA_conf6` is `0`) but only as a 6-month-confirmed progression.

The RAW event found for subject `2` constitutes a *transient* accumulation of disability. Such events can be excluded from the `MSprog` output by requiring that each event be sustained for at least a certain amount of time, specified by argument `require_sust_months`. For instance:

In [12]:
_, results = MSprog(toydata_visits, 'id', 'EDSS', 'date', relapse=toydata_relapses, subjects=[2],
                outcome='edss', conf_months=[3,6], event='multiple', baseline='roving', 
                require_sust_months=6, verbose=0)
print('\nResults with require_sust_months=6:')
print(results.T) # results


Results with require_sust_months=6:
               0
id             2
nevent         1
event_type  prog
time2event   539
conf3          1
conf6          1
PIRA_conf6  None
sust_days      0
sust_last      1


In this context, as the transient EDSS accumulation was not classified as an event, it did not trigger a re-baseline according to the roving baseline scheme. As a consequence, the PIRA event is classified as an undefined progression due to the presence of a relapse between baseline and confirmation. This can be handled by enabling *post-relapse re-baseline* (`relapse_rebl=True`) to force a re-baseline after each relapse:

In [13]:
_, results = MSprog(toydata_visits, 'id', 'EDSS', 'date', relapse=toydata_relapses, subjects=[2],
                outcome='edss', conf_months=[3,6], event='multiple', baseline='roving', 
                require_sust_months=6, relapse_rebl=True, verbose=0)
print('\nResults with require_sust_months=6 and relapse_rebl=True:')
print(results.T) # results


Results with require_sust_months=6 and relapse_rebl=True:
               0
id             2
nevent         1
event_type  PIRA
time2event   257
conf3          1
conf6          1
PIRA_conf6     0
sust_days      0
sust_last      1


The event is now correctly classified as PIRA.

A more detailed report of the event detection process in each of the three cases examined can be visualized by re-running the above code snippets with `verbose=2`, see next section.  

<br />

Finally, the tolerance for the confirmation visit date can be set using the argument `conf_tol_days`. If a single number is specified (e.g., `conf_tol_days=45`), a symmetric tolerance interval is used: if the confirmation period is, say, 3 months, any visit within $[3\text{mm} - 45\text{dd}, 3\text{mm} + 45\text{dd}]$ will be a valid confirmation visit. Different tolerance on the left and on the right can be set by specifying two values (e.g., `conf_tol_days=[30, 365]` will generate a window $[3\text{mm} - 30\text{dd}, 3\text{mm} + 365\text{dd}]$). Further, the argument `conf_left` allows to consider as valid any visit *after* a certain amount of time (e.g., `conf_tol_days=45` with `conf_left=True` will result in the window $[3\text{mm} - 30\text{dd}, +\inf]$).

## Printing progress info

The `MSprog` function scans the outcome values of each subject in chronological order to detect the events. It is possible to visualize an extended log of the ongoing computations by setting `verbose=2`. See the example below.

In [14]:
summary, results = MSprog(toydata_visits, 
                     subj_col='id', value_col='EDSS', date_col='date', 
                     event='multiple', baseline='roving', 
                     relapse=toydata_relapses, verbose=2)


Subject #1: 8 visits, 0 relapses
EDSS change at visit no.5 (2022-07-12); potential confirmation visits available: no.[6]
EDSS progression[PIRA] (visit no.5, 2022-07-12) confirmed at [3] months, sustained up to visit no.8 (2023-03-11)
New settings: baseline at visit no.8, searching for events from visit no.- on
No EDSS change in any subsequent visit: end process
Event sequence: PIRA

Subject #2: 10 visits, 2 relapses
Visits not listed in chronological order: sorting them.
EDSS change at visit no.3 (2021-03-24); potential confirmation visits available: no.[]
Change not confirmed: proceed with search
EDSS change at visit no.4 (2021-06-12); potential confirmation visits available: no.[5]
EDSS progression[RAW] (visit no.4, 2021-06-12) confirmed at [3] months, sustained up to visit no.5 (2021-09-04)
New settings: baseline at visit no.5, searching for events from visit no.6 on
EDSS change at visit no.6 (2021-12-02); potential confirmation visits available: no.[7]
Change not confirmed: procee

## References

<a id="lublin2014">[1]</a> 
Lublin FD, Reingold SC, Cohen JA, Cutter GR, Sørensen PS, Thompson
AJ, et al. Defining the clinical course of multiple sclerosis. Neurology
\[Internet\]. 2014;83:278–86.


<a id="kappos2018">[2]</a> 
Kappos L, Butzkueven H, Wiendl H, Spelman T, Pellegrini F, Chen Y,
et al. Greater sensitivity to multiple sclerosis disability worsening
and progression events using a roving versus a fixed reference value in
a prospective cohort study. Multiple Sclerosis Journal \[Internet\].
2018;24:963–73.


<a id="silent">[3]</a> 
University of California SFM-ET, Cree BAC, Hollenbach JA, Bove R,
Kirkish G, Sacco S, et al. Silent progression in disease activity–free
relapsing multiple sclerosis. Annals of Neurology \[Internet\].
2019;85:653–66.

<a id="lorscheider2016">[4]</a> 
Lorscheider J, Buzzard K, Jokubaitis V, Spelman T, Havrdova E, Horakova D, et al. Defining secondary progressive multiple sclerosis. Brain. 2016;139:2395–405.

<a id="bosma2010">[5]</a>
Bosma LVAE, Kragt JJ, Brieva L, Khaleeli Z, Montalban X, Polman CH, et al. Progression on the multiple sclerosis functional composite in multiple sclerosis: What is the optimal cut-off for the three components? Mult Scler. 2010;16:862–7.

<a id="kalinowski2022">[6]</a>
Kalinowski A, Cutter G, Bozinov N, Hinman JA, Hittle M, Motl R, et al. The timed 25-foot walk in a large cohort of multiple sclerosis patients. Mult Scler. 2022;28:289–99.

<a id="strober2019">[7]</a>
Strober L, DeLuca J, Benedict RH, Jacobs A, Cohen JA, Chiaravalloti N, et al. Symbol digit modalities test: A valid clinical trial endpoint for measuring cognition in multiple sclerosis. Mult Scler. 2019;25:1781–90.

<a id="ontaneda2017">[8]</a>
Ontaneda D, Thompson AJ, Fox RJ, Cohen JA. Progressive multiple sclerosis: Prospects for disease therapy, repair, and restoration of function. Lancet. 2017;389:1357–66.