# 1. DataExtraction

In this notebook we will extract our data and analyse it. For that purpose, we are importing our library where we define the
```bcrp_dataframe``` dataframe. This function will allows us to use the API interface of the Central Bank of Reserve of Peru (BCRP) to automatically create a pandas dataframe with the necessary codes.

## 1.1 Libraries

We import the necessary libraries, including our own library in the modules file

In [1]:
# Warnings
import warnings
warnings.filterwarnings("ignore")

# Basic Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import requests
import seaborn as sns
from scipy import stats

# Statsmodels
import statsmodels.api as sm
import pmdarima as pmd
from pmdarima.arima import auto_arima
from statsmodels.tsa.api import VAR
from statsmodels.tsa.vector_ar.var_model import VARResults
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.seasonal import seasonal_decompose

# Machine Learning models
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.model_selection import train_test_split, GridSearchCV, TimeSeriesSplit
from sklearn.linear_model import Ridge, Lasso, ElasticNet, ElasticNetCV, LinearRegression
from sklearn.linear_model import LinearRegression
from sklearn.metrics import (
    mean_absolute_error,
    mean_squared_error,
    mean_absolute_percentage_error,
    median_absolute_error,
    r2_score,
    precision_score

)

from xgboost import XGBRegressor



In [2]:
# We import our own functions
import sys
sys.path.append('../')  # Move two levels up to the project root
from modules.functions import *

# 1.2 Extraction
We define our inputs and apply them the ```bcrp_dataframe``` function in order to obtain the pandas dataframe with the corresponding series

We define the following inputs:

    series     = the code of the series we are going to extract
    start_date = the starting date, when the BCRP starts using the interest rate as a policy measure
    end_date   = December 2019
    freq       = Monthly frequency

### DF1
We can now create the first dataframe with the ```bcrp_dataframe``` function. This dataframe contains all Consumer Price Index variables in monthly % change as well as the Lacing rate and Interest rate.

In [3]:
series     = ['PN01271PM', 'PN01280PM', 'PN01282PM', 'PN01278PM', 'PN09817PM','PN09816PM', 'PN01276PM', 'PN01313PM', 'PN01314PM',  
             'PN01315PM', 'PN09818PM','PN01286PM', 'PN00493MM', 'PD04722MM']
start_date = '2003-09'
end_date   = '2019-12'
freq       = 'Mensual'

In [4]:
df_1 = bcrp_dataframe( series , start_date , end_date , freq )
df_1.head()

Unnamed: 0_level_0,Índice de precios Lima Metropolitana (var% mensual) - IPC,Índice de precios Lima Metropolitana (var% mensual) - IPC Transables,Índice de precios Lima Metropolitana (var% mensual) - IPC No Transables,Índice de precios Lima Metropolitana (var% mensual) - IPC Subyacente,Índice de precios Lima Metropolitana (var% mensual) - IPC No Subyacente,Índice de precios Lima Metropolitana (var% mensual) - IPC Alimentos y Energía,Índice de precios Lima Metropolitana (var% mensual) - IPC Sin Alimentos y Energía,Índice de precios Lima Metropolitana (var% mensual) - IPC Alimentos y Bebidas,Índice de precios Lima Metropolitana (var% mensual) - IPC sin Alimentos y Bebidas,Índice de precios Lima Metropolitana (var% mensual) - IPC Subyacente Sin Alimentos y Bebidas,Índice de precios Lima Metropolitana (var% mensual) - IPC Importado,Índice de precios Lima Metropolitana (var% mensual) - Índice de Precios al por Mayor,Tasas de interés del Banco Central de Reserva - Tasa de Encaje,Tasas de interés del Banco Central de Reserva - Tasa de Referencia de la Política Monetaria
Fecha,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
2003-09-01,0.558598,0.205933,0.822993,0.005689,1.41685,1.024345,-0.017156,0.900901,0.22377,0.048987,1.140655,0.669709,11.01492,2.75
2003-10-01,0.049032,-0.035055,0.096153,-0.040038,0.16473,0.193699,-0.070649,0.198413,-0.040003,-0.132041,0.183697,0.171749,10.349944,2.75
2003-11-01,0.167685,0.243529,0.12095,0.125742,0.237966,0.256361,0.005988,0.39604,-0.082393,0.08893,0.166302,0.190676,11.690608,2.5
2003-12-01,0.563951,0.594507,0.534926,0.127343,1.196907,0.898519,0.231768,0.986193,0.242505,0.078004,0.306233,0.649838,11.391178,2.5
2004-01-01,0.537447,0.265543,0.708509,-0.055834,1.379067,1.132403,-0.141462,1.074219,0.076551,-0.081012,3.494166,0.54559,10.63403,2.5


### DF2
We create the second dataframe with the ```bcrp_dataframe``` function. This dataframe contains monetary variables. We differentiate those variables in order to get their monthly % change

In [5]:
series     = ['PN00495MM', 'PN06481IM', 'PN02125PM']
start_date = '2003-09'
end_date   = '2019-12'
freq       = 'Mensual'

In [6]:
df_2 = bcrp_dataframe( series , start_date , end_date , freq )
df_2 = df_2.pct_change()
df_2 = df_2.dropna()
df_2.head()

Unnamed: 0_level_0,Emisión primaria y multiplicador (millones S/) - Circulante Desestacionalizado,Liquidez internacional del BCRP - RIN - Reservas Internacionales Netas (millones US$),Remuneraciones - Remuneración Mínima Vital - Índice Real (base 1994 = 100)
Fecha,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2003-10-01,0.010225,0.005542,0.056953
2003-11-01,0.011445,0.050288,-0.001674
2003-12-01,0.016607,-0.010532,-0.005608
2004-01-01,0.01949,0.036233,-0.005346
2004-02-01,0.017003,-0.005743,-0.010744


In [7]:
df = df_1.join(df_2)
df.dropna(inplace=True)
df.head()

Unnamed: 0_level_0,Índice de precios Lima Metropolitana (var% mensual) - IPC,Índice de precios Lima Metropolitana (var% mensual) - IPC Transables,Índice de precios Lima Metropolitana (var% mensual) - IPC No Transables,Índice de precios Lima Metropolitana (var% mensual) - IPC Subyacente,Índice de precios Lima Metropolitana (var% mensual) - IPC No Subyacente,Índice de precios Lima Metropolitana (var% mensual) - IPC Alimentos y Energía,Índice de precios Lima Metropolitana (var% mensual) - IPC Sin Alimentos y Energía,Índice de precios Lima Metropolitana (var% mensual) - IPC Alimentos y Bebidas,Índice de precios Lima Metropolitana (var% mensual) - IPC sin Alimentos y Bebidas,Índice de precios Lima Metropolitana (var% mensual) - IPC Subyacente Sin Alimentos y Bebidas,Índice de precios Lima Metropolitana (var% mensual) - IPC Importado,Índice de precios Lima Metropolitana (var% mensual) - Índice de Precios al por Mayor,Tasas de interés del Banco Central de Reserva - Tasa de Encaje,Tasas de interés del Banco Central de Reserva - Tasa de Referencia de la Política Monetaria,Emisión primaria y multiplicador (millones S/) - Circulante Desestacionalizado,Liquidez internacional del BCRP - RIN - Reservas Internacionales Netas (millones US$),Remuneraciones - Remuneración Mínima Vital - Índice Real (base 1994 = 100)
Fecha,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
2003-10-01,0.049032,-0.035055,0.096153,-0.040038,0.16473,0.193699,-0.070649,0.198413,-0.040003,-0.132041,0.183697,0.171749,10.349944,2.75,0.010225,0.005542,0.056953
2003-11-01,0.167685,0.243529,0.12095,0.125742,0.237966,0.256361,0.005988,0.39604,-0.082393,0.08893,0.166302,0.190676,11.690608,2.5,0.011445,0.050288,-0.001674
2003-12-01,0.563951,0.594507,0.534926,0.127343,1.196907,0.898519,0.231768,0.986193,0.242505,0.078004,0.306233,0.649838,11.391178,2.5,0.016607,-0.010532,-0.005608
2004-01-01,0.537447,0.265543,0.708509,-0.055834,1.379067,1.132403,-0.141462,1.074219,0.076551,-0.081012,3.494166,0.54559,10.63403,2.5,0.01949,0.036233,-0.005346
2004-02-01,1.086085,0.987935,1.165375,0.239166,2.322395,1.881973,0.150952,2.028986,0.250387,0.192297,2.25327,1.274425,10.718295,2.5,0.017003,-0.005743,-0.010744


In [8]:
df.tail()

Unnamed: 0_level_0,Índice de precios Lima Metropolitana (var% mensual) - IPC,Índice de precios Lima Metropolitana (var% mensual) - IPC Transables,Índice de precios Lima Metropolitana (var% mensual) - IPC No Transables,Índice de precios Lima Metropolitana (var% mensual) - IPC Subyacente,Índice de precios Lima Metropolitana (var% mensual) - IPC No Subyacente,Índice de precios Lima Metropolitana (var% mensual) - IPC Alimentos y Energía,Índice de precios Lima Metropolitana (var% mensual) - IPC Sin Alimentos y Energía,Índice de precios Lima Metropolitana (var% mensual) - IPC Alimentos y Bebidas,Índice de precios Lima Metropolitana (var% mensual) - IPC sin Alimentos y Bebidas,Índice de precios Lima Metropolitana (var% mensual) - IPC Subyacente Sin Alimentos y Bebidas,Índice de precios Lima Metropolitana (var% mensual) - IPC Importado,Índice de precios Lima Metropolitana (var% mensual) - Índice de Precios al por Mayor,Tasas de interés del Banco Central de Reserva - Tasa de Encaje,Tasas de interés del Banco Central de Reserva - Tasa de Referencia de la Política Monetaria,Emisión primaria y multiplicador (millones S/) - Circulante Desestacionalizado,Liquidez internacional del BCRP - RIN - Reservas Internacionales Netas (millones US$),Remuneraciones - Remuneración Mínima Vital - Índice Real (base 1994 = 100)
Fecha,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
2019-08-01,0.060977,0.250759,-0.044706,0.156376,-0.130717,-0.03219,0.140292,0.113872,0.025915,0.20065,0.481725,0.312812,6.446612,2.5,0.007826,0.022711,-0.000609
2019-09-01,0.006383,0.078882,-0.033024,0.102758,-0.18609,0.00645,0.006326,0.055675,-0.02632,0.057833,-0.230232,-0.014327,5.935527,2.5,0.008478,-0.005991,-6.4e-05
2019-10-01,0.110725,0.032737,0.154407,0.079994,0.172928,-0.027688,0.228356,-0.231726,0.338114,0.067785,0.049101,0.121414,6.617785,2.5,0.003873,0.001857,-0.001106
2019-11-01,0.10891,-0.034633,0.188704,0.060313,0.206176,0.191502,0.038899,0.010629,0.173799,0.048416,-0.031762,-0.143466,6.069958,2.25,0.006119,-0.017871,-0.001088
2019-12-01,0.214521,-0.067394,0.371368,0.05462,0.536559,0.071104,0.336279,0.097885,0.291403,0.044864,-0.173694,-0.043946,6.699655,2.25,0.029592,0.023139,-0.002141


## 1.3 Data Inspection
We inspect the df. We first verify that all values are non-null. The, we apply the ```describe``` function to see the main variables.

In [9]:
df.isna().sum()

Índice de precios Lima Metropolitana (var% mensual) - IPC                                       0
Índice de precios Lima Metropolitana (var% mensual) - IPC Transables                            0
Índice de precios Lima Metropolitana (var% mensual) - IPC No Transables                         0
Índice de precios Lima Metropolitana (var% mensual) - IPC Subyacente                            0
Índice de precios Lima Metropolitana (var% mensual) - IPC No Subyacente                         0
Índice de precios Lima Metropolitana (var% mensual) - IPC Alimentos y Energía                   0
Índice de precios Lima Metropolitana (var% mensual) - IPC Sin Alimentos y Energía               0
Índice de precios Lima Metropolitana (var% mensual) - IPC Alimentos y Bebidas                   0
Índice de precios Lima Metropolitana (var% mensual) - IPC sin Alimentos y Bebidas               0
Índice de precios Lima Metropolitana (var% mensual) - IPC Subyacente Sin Alimentos y Bebidas    0
Índice de precios Li

In [10]:
df.describe()

Unnamed: 0,Índice de precios Lima Metropolitana (var% mensual) - IPC,Índice de precios Lima Metropolitana (var% mensual) - IPC Transables,Índice de precios Lima Metropolitana (var% mensual) - IPC No Transables,Índice de precios Lima Metropolitana (var% mensual) - IPC Subyacente,Índice de precios Lima Metropolitana (var% mensual) - IPC No Subyacente,Índice de precios Lima Metropolitana (var% mensual) - IPC Alimentos y Energía,Índice de precios Lima Metropolitana (var% mensual) - IPC Sin Alimentos y Energía,Índice de precios Lima Metropolitana (var% mensual) - IPC Alimentos y Bebidas,Índice de precios Lima Metropolitana (var% mensual) - IPC sin Alimentos y Bebidas,Índice de precios Lima Metropolitana (var% mensual) - IPC Subyacente Sin Alimentos y Bebidas,Índice de precios Lima Metropolitana (var% mensual) - IPC Importado,Índice de precios Lima Metropolitana (var% mensual) - Índice de Precios al por Mayor,Tasas de interés del Banco Central de Reserva - Tasa de Encaje,Tasas de interés del Banco Central de Reserva - Tasa de Referencia de la Política Monetaria,Emisión primaria y multiplicador (millones S/) - Circulante Desestacionalizado,Liquidez internacional del BCRP - RIN - Reservas Internacionales Netas (millones US$),Remuneraciones - Remuneración Mínima Vital - Índice Real (base 1994 = 100)
count,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0
mean,0.234205,0.204398,0.252626,0.232706,0.231447,0.288372,0.180427,0.293818,0.187083,0.191971,0.215506,0.200536,10.745239,3.670513,0.011586,0.010424,0.001721
std,0.28602,0.263378,0.419191,0.174543,0.641638,0.49554,0.211563,0.538407,0.242511,0.231106,0.660103,0.473149,4.267221,1.107363,0.008612,0.028325,0.018369
min,-0.527952,-0.712465,-0.881608,-0.055834,-1.646689,-1.033528,-0.169771,-1.184983,-0.695181,-0.144771,-3.036376,-1.548025,5.935527,1.25,-0.006819,-0.108487,-0.012878
25%,0.041525,0.040325,-0.031931,0.12812,-0.147959,-0.026361,0.043057,-0.038394,0.039752,0.073217,-0.097806,-0.06941,7.87699,3.0,0.004665,-0.00614,-0.003851
50%,0.206464,0.217753,0.220724,0.199137,0.237957,0.245902,0.138358,0.273423,0.140809,0.145616,0.205797,0.197346,9.34555,3.75,0.010576,0.00603,-0.001863
75%,0.390471,0.348837,0.492204,0.295662,0.611524,0.581936,0.268568,0.606469,0.311335,0.224841,0.488997,0.461375,11.933199,4.25,0.017573,0.022355,-0.00014
max,1.304558,1.149207,1.821458,0.903711,2.322395,1.881973,1.020916,2.116936,0.987893,1.323068,3.494166,1.693057,26.380399,6.5,0.038067,0.11008,0.130963


We have 195 observation ranging from ```2003-10-01``` to ```2019-12-01```. The mean of monthly % change of all CPI variables is around 0.2. The mean of the lacing rate and the interest rate is 10.7% and 3.67%, respectively. The three monetary variables have a small monthly % change, around 0.01 and 0.001 for Minimum Wage index. 

# 1.4 Data E