# Évaluation des probas de transition

## Objectif

Le but de ce notebook est d'évaluer la probabilité de transition d'un état à un autre pour les différentes démographies de la population.

On distingue N états:

1. asymptomatique
2. symptomatique
3. hospitalisé
4. réanimation
5. décédé
6. convalescent

Pour ça on utilise un dataframe extrait des données du paper *Estimating the burden of SARS-CoV-2 in France* (Salje et al., 2020)

In [2]:
import pandas as pd

In [3]:
df_transitions = pd.read_csv('../data/df_transitions.csv')
df_transitions

Unnamed: 0,infected_hosp,hosp_icu,hosp_death,infected_death,demography
0,0.001,0.175,0.012,0.0,h_0
1,0.0009,0.085,1e-05,0.0,f_0
2,0.006,0.122,0.013,7e-05,h_20
3,0.005,0.068,0.014,7e-05,f_20
4,0.012,0.172,0.025,0.0003,h_30
5,0.009,0.104,0.016,0.0001,f_30
6,0.016,0.243,0.039,0.0006,h_40
7,0.013,0.143,0.032,0.0004,f_40
8,0.032,0.317,0.075,0.002,h_50
9,0.025,0.19,0.064,0.002,f_50


la colonne *demography* indique le group de population, par exemple *h_0* ce sont les hommes de 0 à 20 ans, *f_20* les femmes de 20 à 30 ans etc.

Les autres colonnes représentent les probabilités de transition, par exemple *hosp_icu* représente la probabilité de passer de l'état "hospitalisé" à l'état "réanimation".


## Estimer la transition de "réanimation" à "décès"
On constate qu'on n'as pas la probabilité de transition de "réanimation" (*ICU*) à "décédé", on essaye de l'inférer.

On considère que toutes les personnes jusqu'à 80 qui meurent passent par la réanimation, donc pour elles:

$ p(death |hosp) = p(death|icu) \times p(icu|hosp) $

autrement dit:

$ p(death | icu) = \frac{p(death | hosp)}{p(icu | hosp)} $

In [4]:
df_transitions['icu_death'] = df_transitions['hosp_death'] / df_transitions['hosp_icu']

df_transitions

Unnamed: 0,infected_hosp,hosp_icu,hosp_death,infected_death,demography,icu_death
0,0.001,0.175,0.012,0.0,h_0,0.068571
1,0.0009,0.085,1e-05,0.0,f_0,0.000118
2,0.006,0.122,0.013,7e-05,h_20,0.106557
3,0.005,0.068,0.014,7e-05,f_20,0.205882
4,0.012,0.172,0.025,0.0003,h_30,0.145349
5,0.009,0.104,0.016,0.0001,f_30,0.153846
6,0.016,0.243,0.039,0.0006,h_40,0.160494
7,0.013,0.143,0.032,0.0004,f_40,0.223776
8,0.032,0.317,0.075,0.002,h_50,0.236593
9,0.025,0.19,0.064,0.002,f_50,0.336842


Pour les personnes de plus de 70 ans (à part pour les hommes de moins de 80 ans) on constate une "probabilité" > 1. Ça correspond au fait qu'un certain nombre d'entre eux passent de "hospitalisé" à "décédé" sans passer par "réanimation"

Pour y palier, on tente d'estimer plus ou moins "à vue de nez" (en se basant sur la progression par âge etc.) la probabilité de décès en réanimation pour les plus de 70 ans et on place ces valeurs dans `df_transitions`:

In [5]:
import numpy as np

infered_values = np.array([.65, .84, .78, .95])

for i in range(4):
    df_transitions.loc[12:, 'icu_death'] = infered_values

df_transitions

Unnamed: 0,infected_hosp,hosp_icu,hosp_death,infected_death,demography,icu_death
0,0.001,0.175,0.012,0.0,h_0,0.068571
1,0.0009,0.085,1e-05,0.0,f_0,0.000118
2,0.006,0.122,0.013,7e-05,h_20,0.106557
3,0.005,0.068,0.014,7e-05,f_20,0.205882
4,0.012,0.172,0.025,0.0003,h_30,0.145349
5,0.009,0.104,0.016,0.0001,f_30,0.153846
6,0.016,0.243,0.039,0.0006,h_40,0.160494
7,0.013,0.143,0.032,0.0004,f_40,0.223776
8,0.032,0.317,0.075,0.002,h_50,0.236593
9,0.025,0.19,0.064,0.002,f_50,0.336842


Ensuite on considère une colonne *hosp_death_direct*, la probabilité de passer de l'état *hospitalisé* à l'état *décédé* sans passer par la réanimation. On considère que cette probabilité est de 0 pour les patients de moins de 70 ans, et pour les plus de 70 ans:

$ p(death | hosp, ¬icu) = p(death | hosp) - p(death | icu) \times p(icu | hosp)$

In [6]:
df_transitions['hosp_death_direct'] = 0

df_transitions.loc[12:, 'hosp_death_direct'] = df_transitions.loc[12:, 'hosp_death'] - df_transitions.loc[12:, 'hosp_icu'] * df_transitions.loc[12:, 'icu_death'] 

df_transitions

Unnamed: 0,infected_hosp,hosp_icu,hosp_death,infected_death,demography,icu_death,hosp_death_direct
0,0.001,0.175,0.012,0.0,h_0,0.068571,0.0
1,0.0009,0.085,1e-05,0.0,f_0,0.000118,0.0
2,0.006,0.122,0.013,7e-05,h_20,0.106557,0.0
3,0.005,0.068,0.014,7e-05,f_20,0.205882,0.0
4,0.012,0.172,0.025,0.0003,h_30,0.145349,0.0
5,0.009,0.104,0.016,0.0001,f_30,0.153846,0.0
6,0.016,0.243,0.039,0.0006,h_40,0.160494,0.0
7,0.013,0.143,0.032,0.0004,f_40,0.223776,0.0
8,0.032,0.317,0.075,0.002,h_50,0.236593,0.0
9,0.025,0.19,0.064,0.002,f_50,0.336842,0.0


D'après ces estimations, on a un peu plus de 25% des patients entre 70 et 80 ans qui décèdent à l'hôpital sans réanimation, ce chiffre monte à 90% des patients de plus de 80 ans.

## Consistence
Normalement, on devrait avoir:
$ p(death | infected) = p(death | icu) \times p(icu | hosp) \times p(hosp | infected) $

On vérifie:

In [7]:
df_transitions['check'] = df_transitions['icu_death'] * df_transitions['hosp_icu'] * df_transitions['infected_hosp']

df_transitions[['check', 'infected_death']]

Unnamed: 0,check,infected_death
0,1.2e-05,0.0
1,9e-09,0.0
2,7.8e-05,7e-05
3,7e-05,7e-05
4,0.0003,0.0003
5,0.000144,0.0001
6,0.000624,0.0006
7,0.000416,0.0004
8,0.0024,0.002
9,0.0016,0.002


Ça colle (pas à 100% mais suffisamment)

## Touche finale
On garde les colonnes qui nous intéressent:

In [8]:
df_transitions_finale = df_transitions.loc[:,['demography', 'infected_hosp', 'hosp_icu', 'icu_death', 'hosp_death_direct']]
df_transitions_finale = df_transitions_finale.rename(columns={'hosp_death_direct': 'hosp_death'})

Il nous manque les probabilités de transition de "asymptomatique" à "infected". On peut considérer en première approximation qu'elles valent 0.5 pour toutes les tranches d'âge vu qu'elles valaient 0.5 aussi bien sur le porte-avion Charles de Gaulle (population jeune) que sur le bateau de criosiè *Diamond Princess* (population plus âgée):

In [9]:
df_transitions_finale['asymptomatic_infected'] = .1

In [10]:
df_transitions_finale['recovercont_recovered'] = 1
df_transitions_finale['asympcont_infected'] = 1
df_transitions_finale['healthy_healthy'] = 1
df_transitions_finale['asymptomatic_asympcont'] = df_transitions_finale['asymptomatic_infected']
df_transitions_finale = df_transitions_finale.drop(['asymptomatic_infected'], 1)

Et voilà le résultat, qu'on enregistre:

In [11]:
df_transitions_finale.to_csv('../data/df_transitions_finale.csv', index=False)
df_transitions_finale

Unnamed: 0,demography,infected_hosp,hosp_icu,icu_death,hosp_death,recovercont_recovered,asympcont_infected,healthy_healthy,asymptomatic_asympcont
0,h_0,0.001,0.175,0.068571,0.0,1,1,1,0.1
1,f_0,0.0009,0.085,0.000118,0.0,1,1,1,0.1
2,h_20,0.006,0.122,0.106557,0.0,1,1,1,0.1
3,f_20,0.005,0.068,0.205882,0.0,1,1,1,0.1
4,h_30,0.012,0.172,0.145349,0.0,1,1,1,0.1
5,f_30,0.009,0.104,0.153846,0.0,1,1,1,0.1
6,h_40,0.016,0.243,0.160494,0.0,1,1,1,0.1
7,f_40,0.013,0.143,0.223776,0.0,1,1,1,0.1
8,h_50,0.032,0.317,0.236593,0.0,1,1,1,0.1
9,f_50,0.025,0.19,0.336842,0.0,1,1,1,0.1


## Post processing: générer les matrices de transition

In [12]:
import numpy as np

In [13]:
states = ['healthy', 'asymptomatic', 'asympcont', 'infected', 'hosp', 'icu', 'death', 'recovercont', 'recovered']
states2id = {state: i for i, state in enumerate(states)}


states2id 

{'healthy': 0,
 'asymptomatic': 1,
 'asympcont': 2,
 'infected': 3,
 'hosp': 4,
 'icu': 5,
 'death': 6,
 'recovercont': 7,
 'recovered': 8}

In [14]:
n_states = len(states)
cols_transition = df_transitions_finale.columns.values.tolist()
cols_transition = [c for c in cols_transition if c != 'demography']

transitions = {}
for i, row in df_transitions_finale.iterrows():
    transition = np.zeros((n_states, n_states))
    for col in cols_transition:
        value = row[col]
        st_from, st_to = col.split('_')
        id_from, id_to = states2id.get(st_from), states2id.get(st_to)
        transition[id_from, id_to] = value
    transition[:,7] = 1 - np.sum(transition, axis=1)
    transitions[row['demography']] = transition
    

In [15]:
from pprint import pprint

pprint(transitions)

{'f_0': array([[1.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00],
       [0.00000000e+00, 0.00000000e+00, 1.00000000e-01, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 9.00000000e-01,
        0.00000000e+00],
       [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00],
       [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        9.00000000e-04, 0.00000000e+00, 0.00000000e+00, 9.99100000e-01,
        0.00000000e+00],
       [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 8.50000000e-02, 0.00000000e+00, 9.15000000e-01,
        0.00000000e+00],
       [0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
        0.00000000e+00, 0.00000000e+00, 1.17647059e-04, 9.99882353e-01,
   

In [16]:
import os

fdir = os.path.join('..', 'data')

for key, value in transitions.items():
    fpath = os.path.join(fdir, f'{key}.npy')
    np.save(fpath, transitions[key])

In [None]:


df = 


inds_til_70 = split_pop.index[split_pop['agegroup'] < 70].tolist()
mask = np.rand.binomial(size=len(inds_til_70), p=.11)
res[inds_til_70[mask], [4, 5]] = 1