# COVID19 severe (hospitalized) cases in Brazil


## Aim and data

Publicly available data for COVID19 cases hospitalized in Brazil is used to perform a retrospective cross-sectional observational study.
It means that we want to infer the paramaters driving the outcome (death/survival) from data gathered from hospitals in Brazil for different COVID19 variants.

> **Data source**: all the data has been taken from the Brazilian Ministry of Health https://opendatasus.saude.gov.br/organization/ministerio-da-saude
>
>It requires translation from Portuguese to English

## Time period and variants

We sort cases with PCR/Antigenic positive test by time period when more than 80% of the sequence submitted to the GISAID database matched a given variant.

| Variant | Time period |
|-|-|
| Delta | October 12th 2021 to December 19th 2021 |
| BA.1.X | January 1st 2022 to March 20th 2022 |
| BA.2.X | April 11th 2022 to May 29th 2022 |
| BA.4/5.X | July 18th 2022 to October 2nd 2022 |


## Python module used in this notebook

In [1]:
import pandas as pd
import scipy.stats as scst
import sys
import numpy as np
import matplotlib.pyplot as plt
import os

## Parameters

In [4]:
data_path_severe = #your_path#

#Data extracted from the source are csv file for each year since the outbreak
#With these parameters, the code is generalized as much as possible if ran after
#the notebook to extract raw data
allfiles = os.listdir(data_path_severe)
allfiles_severe = [x for x in allfiles if "severe" in x]

#exclude year 2020 from analysis
files_severe = allfiles_severe[1:len(allfiles_severe)]

## Pick only useful data and create new files

In [36]:
for variant in variants_name:
    print('Data for variant '+variant+' extracted')
    #Variant Delta was in 2021, while Omicron in 2022
    if variant == 'Delta':
        file = files_severe[0]
    else:
        file = files_severe[1]
    data = pd.read_csv(data_path_severe+file,usecols=['DT_SIN_PRI','CLASSI_FIN','PCR_SARS2','AN_SARS2','CS_SEXO','VACINA_COV',
                                                          'NU_IDADE_N','TP_IDADE','DOSE_1_COV','DOSE_2_COV','DOSE_REF',
                                                          'FATOR_RISC','CARDIOPATI','HEMATOLOGI','SIND_DOWN','HEPATICA','ASMA',
                                                          'DIABETES','NEUROLOGIC','PNEUMOPATI','IMUNODEPRE','RENAL','OBESIDADE',
                                                          'OBES_IMC','OUT_MORBI','SUPORT_VEN','UTI','DT_ENTUTI','DT_SAIDUTI',
                                                          'DT_INTERNA','EVOLUCAO','DT_EVOLUCA','CS_GESTANT',
                                                          'CS_RACA','PAC_COCBO','PUERPERA','SG_UF_NOT','ID_MUNICIP',
                                                      'CO_MUN_NOT','ID_REGIONA','CO_REGIONA','ID_UNIDADE','CO_UNI_NOT'])
    #the symptom file has been generated with another script (that will be put in this folder in the near future)
    symptoms = pd.read_parquet(data_path_severe+file.replace('severe_case','all_symptom').replace('csv','pq'))
    data = data.join(symptoms)
    data['dates_selection'] = pd.to_datetime(data.DT_SIN_PRI,dayfirst=True,errors='coerce')
    date_min = variants_period[variants_name.index(variant)][0]
    date_max = variants_period[variants_name.index(variant)][1]
    data = data[(data.dates_selection>=date_min) & (data.dates_selection<=date_max)]
    data = data[(data.CLASSI_FIN==5) | (data.CLASSI_FIN.isnull() & data.PCR_SARS2==1) | (data.CLASSI_FIN.isnull() & data.AN_SARS2==1)]
    data = data.drop(columns=['dates_selection'])
    data.to_parquet(file_name[variants_name.index(variant)]+'_severe_CRF_data.pq')

Data for variant Delta extracted
Data for variant BA.1.X extracted
Data for variant BA.2.X extracted
Data for variant BA.4/5.X extracted
