# Data on belief and serology in long Covid

In [1]:
from pathlib import Path

import numpy as np
import pandas as pd

Here are the data from the end of eTable 4, copy-pasted into a `.csv` file:

In [2]:
orig_tab_e4 = Path('original') / 'matta_table_e4.csv'
df = pd.read_csv(orig_tab_e4).set_index('Self-rated health')
df

Unnamed: 0_level_0,"Belief-,Serology-","Belief+,Serology-","Belief-,Serology+","Belief+,Serology+"
Self-rated health,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,3901,60,102,72
2,12547,234,346,226
3,5454,114,121,94
4,1627,23,33,33
5,776,11,15,11
6,523,10,9,12
7,192,3,6,3
8,61,1,0,0


The table corresponds to the following number of participants with valid data
for self-rated health:

In [3]:
np.sum(np.array(df))

26620

Next we reconstruct a data frame with the underlying data for the individual
subjects:

In [4]:
dfs = []
char2code = {'-': 0, '+': 1}
for health in range(1, 9):
    for belief_char in '-+':
        belief_code = char2code[belief_char]
        for sero_char in '-+':
            sero_code = char2code[sero_char]
            col = f'Belief{belief_char},Serology{sero_char}'
            count = df.loc[health, col]
            dfs.append(pd.DataFrame({'health_2019': [health] * count,
                                     'belief': [belief_code] * count,
                                     'serology': [sero_code] * count}))
patients = pd.concat(dfs).reset_index(drop=True).astype(int)
patients

Unnamed: 0,health_2019,belief,serology
0,1,0,0
1,1,0,0
2,1,0,0
3,1,0,0
4,1,0,0
...,...,...,...
26615,8,0,0
26616,8,0,0
26617,8,0,0
26618,8,0,0


Here is a basic break-down of self-reported health (in 2019) by belief and
serology (in 2020 and later):

In [5]:
patients.groupby(['belief', 'serology']).mean()

Unnamed: 0_level_0,Unnamed: 1_level_0,health_2019
belief,serology,Unnamed: 2_level_1
0,0,2.420757
0,1,2.310127
1,0,2.425439
1,1,2.407982


Note that lower health scores correspond to better health.

Save the data in a processed data frame:

In [6]:
out_path = Path('processed') / 'long_covid_health.csv'
patients.to_csv(out_path, index=None)

Read-back as smoke-test:

In [7]:
pd.read_csv(out_path).head()

Unnamed: 0,health_2019,belief,serology
0,1,0,0
1,1,0,0
2,1,0,0
3,1,0,0
4,1,0,0
