# COVID19 severe (hospitalized) cases in Brazil


## Aim and data

Publicly available data for COVID19 cases hospitalized in Brazil is used to perform a retrospective cross-sectional observational study.
It means that we want to infer the paramaters driving the outcome (death/survival) from data gathered from hospitals in Brazil for different COVID19 variants.

> **Data source**: all the data has been taken from the Brazilian Ministry of Health https://opendatasus.saude.gov.br/organization/ministerio-da-saude
>
>It requires translation from Portuguese to English

The code below shows how to extract the raw data

## Python module used in this notebook

In [1]:
import pandas as pd
from urllib.error import HTTPError
from urllib.error import URLError
import os
import datetime
import multiprocessing as mp

## Parameters

In [2]:
######################## Initializing parameters ########################
#URL where data can be found
url_severe = "https://s3.sa-east-1.amazonaws.com/ckan.saude.gov.br/SRAG/"
#prefix of the files to be downloaded
severe_case_file_prefix_name = 'INFLUD'
#List of years for which data must be searched
stop_year = pd.to_datetime('today').year
year_list = []
for year in range(2020,stop_year+1,1):
    year_list.append(year)
#########################################################################

## Function to download severe cases file for a given year

The url where the data can be found depends on the date of last update: the function starts a loop on each day since today to find the working url

In [3]:
################### Function to download severe cases ###################
def download_severe_cases(year):
    #flag to stop searching for url to download data for a given year
    flag_url_exist = 0
    #variable to increment for searching url associated to a year data
    nb_days = 0
    while(flag_url_exist == 0):
        #Break URL search if data for a given year does not exist
        if (pd.to_datetime('today')-pd.Timedelta(nb_days,"D")).year < year:
            break
        try:
            #Check if URL exists
            severe_case_data = pd.read_csv(url_severe+str(year)+'/'+severe_case_file_prefix_name+str(year)[-2:]+'-'+(pd.to_datetime('today')-pd.Timedelta(nb_days,"D")).strftime('%d-%m-%Y')+'.csv',sep=';')
            print('Successfully downloaded data for '+str(year))
            flag_url_exist = 1
            severe_case_data.to_csv('severe_case_data_'+str(year)+'.csv',index=False)
            del severe_case_data
        except (HTTPError):
            #URL does not exist: increment to search new URL
            nb_days += 1
        #About the two errors below: the connection may fail, so it continues until the data is downloaded 
        except (URLError):
            print('URLError for '+str(year)+' data, trying again')
        except (ConnectionResetError):
            print('ConnectionResetError for '+str(year)+' data, trying again')
#########################################################################

## Distribute workload on available cores

In [None]:
########################## Distribute on proc ###########################
if __name__ == '__main__':
    ncores = mp.cpu_count()
    # multiprocessing pool object
    pool = mp.Pool(ncores)  
    # input list
    inputs = year_list
    # map the function to the list and pass
    # function and input list as arguments
    pool.map(download_severe_cases,inputs)
    pool.close()
#########################################################################

ConnectionResetError for 2020 data, trying again
ConnectionResetError for 2021 data, trying again


  severe_case_data = pd.read_csv(url_severe+str(year)+'/'+severe_case_file_prefix_name+str(year)[-2:]+'-'+(pd.to_datetime('today')-pd.Timedelta(nb_days,"D")).strftime('%d-%m-%Y')+'.csv',sep=';')


Successfully downloaded data for 2023
ConnectionResetError for 2022 data, trying again
ConnectionResetError for 2021 data, trying again
URLError for 2020 data, trying again
URLError for 2020 data, trying again


  severe_case_data = pd.read_csv(url_severe+str(year)+'/'+severe_case_file_prefix_name+str(year)[-2:]+'-'+(pd.to_datetime('today')-pd.Timedelta(nb_days,"D")).strftime('%d-%m-%Y')+'.csv',sep=';')
  severe_case_data = pd.read_csv(url_severe+str(year)+'/'+severe_case_file_prefix_name+str(year)[-2:]+'-'+(pd.to_datetime('today')-pd.Timedelta(nb_days,"D")).strftime('%d-%m-%Y')+'.csv',sep=';')


Successfully downloaded data for 2020
Successfully downloaded data for 2022


  severe_case_data = pd.read_csv(url_severe+str(year)+'/'+severe_case_file_prefix_name+str(year)[-2:]+'-'+(pd.to_datetime('today')-pd.Timedelta(nb_days,"D")).strftime('%d-%m-%Y')+'.csv',sep=';')


Successfully downloaded data for 2021
