### History of the COVID-19 pandemic in Ceará, Brazil

This notebook exists to show the process of acquiring the number of cases and deaths in the pandemic from Ceará's [Integrasus API](https://indicadores.integrasus.saude.ce.gov.br/) and then creating quickstatements to provide said data in Wikidata.

In [1]:
import pandas as pd
import requests
from datetime import date,datetime, timedelta
import numpy as np

Thanks a lot to [André Campos](https://github.com/andreloc) for his workshop in Open Data Day - Fortaleza teaching how to use the Integrasus API, which I attended. The following code snippet is his creation.

In [2]:
def get_dataframe(api_url, data='', id_municipio=''):    
    api_url = api_url + '?'
    if(id_municipio != ''): 
        api_url = '{}idMunicipio={}&'.format(api_url, id_municipio)
    
    result = requests.get(api_url)
    result = result.json()    
    result = pd.DataFrame.from_dict(result)
    
    if(id_municipio != ''): 
        result.insert(0, 'idMunicipio', id_municipio)
        
    return result

In [3]:
api_endpoint = "https://indicadores.integrasus.saude.ce.gov.br/api/coronavirus/qtd-por-dia-tipo"
municipios = get_dataframe('https://indicadores.integrasus.saude.ce.gov.br/api/municipio')

qtd_por_dia_municipio = [get_dataframe(api_endpoint, id_municipio=idm) for idm in municipios.id]

In [4]:
qtd_por_dia = pd.concat(qtd_por_dia_municipio, axis=0, ignore_index=True, sort=True)
qtd_por_dia['data'] = pd.to_datetime(qtd_por_dia['data'], format='%d/%m/%Y')
qtd_por_dia

Unnamed: 0,data,idMunicipio,quantidade,tipo
0,2020-03-24,230010,1.0,Suspeito
1,2020-04-07,230010,1.0,Suspeito
2,2020-03-22,230015,1.0,Suspeito
3,2020-03-23,230015,1.0,Suspeito
4,2020-03-28,230015,1.0,Suspeito
...,...,...,...,...
1468,2020-04-07,231410,1.0,Suspeito
1469,2020-04-09,231410,1.0,Suspeito
1470,NaT,231410,1.0,Suspeito
1471,2020-03-30,231300,1.0,Suspeito


Filtering by the [notified start date](https://g1.globo.com/ce/ceara/noticia/2020/03/15/tres-primeiros-casos-de-coronavirus-no-ceara-sao-confirmados-pela-secretaria-da-saude.ghtml), since there seems to be some values in the API from before the actual beggining 

In [5]:
qtd_por_dia.dropna(inplace=True)
inicio = datetime(year=2020,month=3, day=14)
fim    = datetime.today()
qtd_por_dia = qtd_por_dia[(qtd_por_dia['data'] >= inicio) & (qtd_por_dia['data'] <= fim)]
qtd_por_dia

Unnamed: 0,data,idMunicipio,quantidade,tipo
0,2020-03-24,230010,1.0,Suspeito
1,2020-04-07,230010,1.0,Suspeito
2,2020-03-22,230015,1.0,Suspeito
3,2020-03-23,230015,1.0,Suspeito
4,2020-03-28,230015,1.0,Suspeito
...,...,...,...,...
1467,2020-04-06,231410,4.0,Suspeito
1468,2020-04-07,231410,1.0,Suspeito
1469,2020-04-09,231410,1.0,Suspeito
1471,2020-03-30,231300,1.0,Suspeito


In [6]:
qtd_por_dia['tipo'].value_counts()

Suspeito      1240
Confirmado     150
Óbito           35
Name: tipo, dtype: int64

In [7]:
qtd_nosuspect = qtd_por_dia.query(" tipo == ['Confirmado', 'Óbito'] ")
qtd_nosuspect

Unnamed: 0,data,idMunicipio,quantidade,tipo
50,2020-04-06,230075,1.0,Confirmado
76,2020-03-16,230100,1.0,Confirmado
77,2020-03-21,230100,5.0,Confirmado
78,2020-03-23,230100,2.0,Confirmado
79,2020-03-26,230100,2.0,Confirmado
...,...,...,...,...
1420,2020-03-27,231340,1.0,Óbito
1421,2020-03-27,231340,1.0,Confirmado
1422,2020-04-10,231340,1.0,Confirmado
1423,2020-04-11,231340,1.0,Confirmado


In [8]:
no_cities = qtd_nosuspect.drop('idMunicipio', 1)

In [9]:
no_cities = no_cities.pivot_table(index='data', columns='tipo', aggfunc=np.sum)
no_cities

Unnamed: 0_level_0,quantidade,quantidade
tipo,Confirmado,Óbito
data,Unnamed: 1_level_2,Unnamed: 2_level_2
2020-03-14,1.0,
2020-03-16,4.0,
2020-03-17,19.0,
2020-03-18,4.0,
2020-03-19,28.0,
2020-03-20,26.0,
2020-03-21,33.0,
2020-03-22,36.0,
2020-03-23,30.0,
2020-03-24,38.0,2.0


In [10]:
no_cities.columns = no_cities.columns.get_level_values(1)
no_cities.reset_index(level='data', col_level=1, inplace=True)

In [11]:
no_cities['ntotal_Conf'] = no_cities['Confirmado'].cumsum()
no_cities['ntotal_Ob'] = no_cities['Óbito'].cumsum()
no_cities

tipo,data,Confirmado,Óbito,ntotal_Conf,ntotal_Ob
0,2020-03-14,1.0,,1.0,
1,2020-03-16,4.0,,5.0,
2,2020-03-17,19.0,,24.0,
3,2020-03-18,4.0,,28.0,
4,2020-03-19,28.0,,56.0,
5,2020-03-20,26.0,,82.0,
6,2020-03-21,33.0,,115.0,
7,2020-03-22,36.0,,151.0,
8,2020-03-23,30.0,,181.0,
9,2020-03-24,38.0,2.0,219.0,2.0


In [12]:
#no_cities.to_csv('CE-cases_by_date.csv')

Converting to Wikidata's date format.

In [13]:
date_str = []
for dt in no_cities['data']:
    conv = datetime.date(dt)
    date_str.append(conv.strftime("+%Y-%m-%dT00:00:00Z/11"))
no_cities['wdt_dates'] = date_str
no_cities.head()

tipo,data,Confirmado,Óbito,ntotal_Conf,ntotal_Ob,wdt_dates
0,2020-03-14,1.0,,1.0,,+2020-03-14T00:00:00Z/11
1,2020-03-16,4.0,,5.0,,+2020-03-16T00:00:00Z/11
2,2020-03-17,19.0,,24.0,,+2020-03-17T00:00:00Z/11
3,2020-03-18,4.0,,28.0,,+2020-03-18T00:00:00Z/11
4,2020-03-19,28.0,,56.0,,+2020-03-19T00:00:00Z/11


In [14]:
print("CREATE\n" + 
      'LAST|Len|' + '"' + "COVID-19 pandemic in the state of Ceará" + '"\n' +
      'LAST|Den|' + '"'+ "ongoing viral pandemic in Ceará, Brazil" + '"\n' +
      'LAST|P31|' + "Q3241045"  + "|P642|"+ "Q84263196" + "|P3005|" + "Q40123" +'\n' +
      'LAST|P361|' + "Q86597695"  + '\n' +
      "LAST|P17|" + "Q155" + '\n' +
      "LAST|P276|" + "Q40123" +'\n' +
      "LAST|P580|" + "+2020-03-14T00:00:00Z/11")
for index, row in no_cities.iterrows():
        print(
      "LAST|P1603|" + str(int(row['ntotal_Conf'])) + "|P585|" + row['wdt_dates'] + "|S854|" + '"' + "https://indicadores.integrasus.saude.ce.gov.br/indicadores" + '"'
        )
        if not np.isnan(row['ntotal_Ob']):
            print(
      "LAST|P1120|" + str(int(row['ntotal_Ob'])) + "|P585|" + row['wdt_dates'] + "|S854|" + '"' + "https://indicadores.integrasus.saude.ce.gov.br/indicadores" + '"'
        )

CREATE
LAST|Len|"COVID-19 pandemic in the state of Ceará"
LAST|Den|"ongoing viral pandemic in Ceará, Brazil"
LAST|P31|Q3241045|P642|Q84263196|P3005|Q40123
LAST|P361|Q86597695
LAST|P17|Q155
LAST|P276|Q40123
LAST|P580|+2020-03-14T00:00:00Z/11
LAST|P1603|1|P585|+2020-03-14T00:00:00Z/11|S854|"https://indicadores.integrasus.saude.ce.gov.br/indicadores"
LAST|P1603|5|P585|+2020-03-16T00:00:00Z/11|S854|"https://indicadores.integrasus.saude.ce.gov.br/indicadores"
LAST|P1603|24|P585|+2020-03-17T00:00:00Z/11|S854|"https://indicadores.integrasus.saude.ce.gov.br/indicadores"
LAST|P1603|28|P585|+2020-03-18T00:00:00Z/11|S854|"https://indicadores.integrasus.saude.ce.gov.br/indicadores"
LAST|P1603|56|P585|+2020-03-19T00:00:00Z/11|S854|"https://indicadores.integrasus.saude.ce.gov.br/indicadores"
LAST|P1603|82|P585|+2020-03-20T00:00:00Z/11|S854|"https://indicadores.integrasus.saude.ce.gov.br/indicadores"
LAST|P1603|115|P585|+2020-03-21T00:00:00Z/11|S854|"https://indicadores.integrasus.saude.ce.gov.br/ind