# Modelo predictivo exponencial

El propósito de este notebook es construir un modelo predictivo de la forma:

$N_{t} = \alpha_{t-1} \cdot N_{t-1}$

O sea, un simple modelo exponencial con coeficiente variable. Proponemos modelar $\alpha_t$, la variable de _infection rate_, como una dependencia de un conjunto de factores demográficos y de comportamiento, entre los que sería deseable incluir:

* Datos demográficos:
    * Población / densidad poblacional
    * PIB per cápita
    * Índice de desarrollo humano
    * Urbanización
* Datos específicos de la epidemia
    * Días en cuarentena
    * Tipo de cuarentena en vigor (nacional, estatal, local)
    * Existencia de un sistema de salud generalizado

In [74]:
import pandas as pd
import json

demographic_data = pd.read_csv("../data/world_demographics.tsv", sep="\t").set_index("Country")
display(demographic_data.head())

quarantine_data = pd.read_csv("../data/quarantine.tsv", sep="\t").fillna("")
for col in ['Start date', 'End date']:
    quarantine_data[col] = pd.to_datetime(quarantine_data[col])
quarantine_data = quarantine_data.groupby('Country').agg(
    start=('Start date', 'min'),
    end=('End date', 'max'),
    level=('Level', 'first')
)
display(quarantine_data.head())

with open("../data/paises-info-dias.json") as fp:
    infection_data = json.load(fp)['paises']
    infection_data = pd.DataFrame([{'Country': k, 'Data': v} for k,v in infection_data.items()]).set_index('Country')
    
display(infection_data.head())

Unnamed: 0_level_0,Population,Yearly change,Net change,Density,Land area,Migrants,Fertility,Med. age,Urban,World share
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
China,1439323776,0.39,5540090,153,9388211,-348399.0,1.7,38,61,18.47
India,1380004385,0.99,13586631,464,2973190,-532687.0,2.2,28,35,17.7
United States,331002651,0.59,1937734,36,9147420,954806.0,1.8,38,83,4.25
Indonesia,273523615,1.07,2898047,151,1811570,-98955.0,2.3,30,56,3.51
Pakistan,220892340,2.0,4327022,287,770880,-233379.0,3.6,23,35,2.83


Unnamed: 0_level_0,start,end,level
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Argentina,2020-03-19,2020-03-31,National
Australia,2020-03-23,2020-09-22,National
Austria,2020-03-16,2020-04-13,National
Azad Kashmir,2020-03-24,2020-04-07,Administrative
Balochistan,2020-03-24,2020-04-07,Province


Unnamed: 0_level_0,Data
Country,Unnamed: 1_level_1
Saint Lucia,"[1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3]"
Guinea,"[1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 4, 4]"
Liechtenstein,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 4, 4, 7, 28,..."
Ghana,"[3, 6, 6, 7, 7, 11, 16, 19, 23, 27, 53]"
Qatar,"[1, 3, 3, 7, 8, 8, 8, 8, 15, 18, 24, 262, 262,..."


In [67]:
def extract_global_features(country: str, demographics_data: dict):
    return dict(
        demographics_population=demographics_data.get(country, {}).get('Population'),
        demographics_density=demographics_data.get(country, {}).get('Density'),
        demographics_fertility=demographics_data.get(country, {}).get('Fertility'),
        demographics_mid_age=demographics_data.get(country, {}).get('Med. age'),
        demographics_urbanization=demographics.get(country, {}).get('Urban'),
    )

extract_global_features('Cuba', demographic_data.to_dict('index'))

{'demographics_population': 11326616,
 'demographics_density': 106,
 'demographics_fertility': 1.6,
 'demographics_mid_age': 42,
 'demographics_urbanization': 78}

In [68]:
def extract_timeseries_features(country: str, quarantine_data: dict, infection_data: dict, last_date: ):
    pass

In [76]:
infection_data.to_dict('index')

{'Saint Lucia': {'Data': [1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3]},
 'Guinea': {'Data': [1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 4, 4]},
 'Liechtenstein': {'Data': [1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   4,
   4,
   4,
   7,
   28,
   28,
   28,
   37,
   37,
   51,
   51]},
 'Ghana': {'Data': [3, 6, 6, 7, 7, 11, 16, 19, 23, 27, 53]},
 'Qatar': {'Data': [1,
   3,
   3,
   7,
   8,
   8,
   8,
   8,
   15,
   18,
   24,
   262,
   262,
   320,
   337,
   401,
   439,
   439,
   452,
   460,
   470,
   481,
   494,
   501,
   526]},
 'Iraq': {'Data': [1,
   1,
   5,
   7,
   7,
   13,
   19,
   26,
   32,
   35,
   35,
   40,
   54,
   60,
   60,
   71,
   71,
   71,
   101,
   110,
   116,
   124,
   154,
   164,
   192,
   208,
   214,
   233,
   266,
   316]},
 'Central African Republic': {'Data': [1, 1, 1, 1, 1, 3, 3, 3, 3, 3]},
 'Guyana': {'Data': [1, 1, 1, 4, 4, 7, 7, 7, 7, 7, 19, 20, 5]},
 'Spain': {'Data': [1,
   1,
   1,
   1,
   1,
   1,
   1,
   1,
   2,
   2,
   2,
   2,
