# Multi-State Model first example

## In this notebook

This notebook provides a simple setting which illustrates basic usage of the model.

## Typical settings

In a typical setting of modelling patient illness trajectories, there are multiple sources of complexity:

1. There could be many states (mild, severe, recovered, released from hospital, death etc.)
2. The probability of each transition and the duration of the stay in each state depend on patient covariates.
3. Patient covariates can change over time, possibly in a manner which depends on the states visited.

In order to introduce the multi-state-model we shall use a much simpler setting where our data arrives from a simple 3 state model and covariates do not change over time or affect the probabilities of transitions between states.

## A Simple Multi-State Setting

Patients start at state 1, state 3 shall be a terminal state and states 1,2 shall be identical in the sense that from both:

1. With probability 1/2 you transition to state 3 within 1 day.
2. With probability 1/2 you transition to state 2 or 1 (depending on the present state), within t∼exp(λ)

For this setting, one can show that the expected time until reaching a terminal state is 1+1λ (see proof at the end of this notebook.)

## The Dataset Structure

Let’s load the dataset, which was constructed based on the graph above

In [1]:
from pymsm.examples.first_example_utils import create_toy_setting_dataset
dataset = create_toy_setting_dataset(lambda_param=2)
print('dataset type: {}'.format(type(dataset)))
print('elemnets type: {}'.format(type(dataset[0])))

dataset type: <class 'list'>
elemnets type: <class 'pymsm.multi_state_competing_risks_model.PathObject'>


The dataset is a list of elements from class PathObject. Each PathObject in the list corresponds to a single sample’s (i.e “patient’s”) observed path. Let’s look one such object in detail

In [2]:
# TODO -add printing function to PathOjbect ?

We see the following attributes:

1. covariates: These are the sample’s covariates. In this case they were randomally generated and do not affect the state transitions, but for a patient this could be a numerical vector with entries such as:
    “age in years”
    “is male”
    “number of days that have passed since hospitalization”
    etc..
2. states: These are the observed states the sample visited, encoded as positive integers. Here we can see the back and forth between states 1 and 2, ending with the only terminal state (state 3).
3. time_at_each_state: These are the observed times spent at each state.
4. id: (optional) a unique identifier of the patient.

Note: if the last state is a terminal state, then the vector of times should be shorter than the vector of states by 1. Conversely, if the last state is not a terminal state, then the length of the vector of times should be the same as that of the states. In such a case, the sample is inferred to be right censored.

## Updating Covariates Over Time

In order to update the patient covariates over time, we need to define a state-transition function.
In this simple case, the covariates do not change and the function is trivial

In [3]:
def default_update_covariates_function(covariates_entering_origin_state, origin_state=None, target_state=None,
                                       time_at_origin=None, abs_time_entry_to_target_state=None):
    return covariates_entering_origin_state

You can define any function, as long as it recieves the following parameter types (in this order): 
1. pandas Series (sample covariates when entering the origin state)
2. int (origin state number)
3. int (target state number)
4. float (time spent at origin state)
5. float (absolute time of entry to target state)

If some of the parameters are not used in the function, use a default value of None, as in the example above.

## Defining terminal states

In [4]:
terminal_states = [3]

## Fitting the model

Import and init the Model

In [5]:
from pymsm.multi_state_competing_risks_model import MultiStateModel
multi_state_model = MultiStateModel(dataset, terminal_states, default_update_covariates_function, 
                                    ['covariate_1', 'covariate_2'])

Fit the Model

In [6]:
multi_state_model.fit()

Fitting Model at State: 1
>>> Fitting Transition to State: 3, n events: 643


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(ilocs[0], value, pi)


>>> Fitting Transition to State: 2, n events: 723
Fitting Model at State: 2
>>> Fitting Transition to State: 3, n events: 357
>>> Fitting Transition to State: 1, n events: 366


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(ilocs[0], value, pi)


In [8]:
import numpy as np
multi_state_model._one_monte_carlo_run(np.array([0,0]), 1, 20)

  probability_for_each_t_given_next_state = probability_for_each_t/probability_for_each_t.max()


KeyError: None