# 1. Data Extraction

In this notebook we will extract our data and analyse it. For that purpose, we are importing from our library where we define the
```bcrp_dataframe``` dataframe. This function will allows us to use the API interface of the Central Bank of Reserve of Peru (BCRP) to automatically create a pandas dataframe with the necessary codes.

## 1.1 Libraries

We import the necessary libraries, including our own library in the modules file

In [1]:
# Warnings
import warnings
warnings.filterwarnings("ignore")

# Basic Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import requests
import seaborn as sns
from scipy import stats
from functools import reduce

# Statsmodels
import statsmodels.api as sm
import pmdarima as pmd
from pmdarima.arima import auto_arima
from statsmodels.tsa.api import VAR
from statsmodels.tsa.vector_ar.var_model import VARResults
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.seasonal import STL

# Machine Learning models
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.model_selection import train_test_split, GridSearchCV, TimeSeriesSplit
from sklearn.linear_model import Ridge, Lasso, ElasticNet, ElasticNetCV, LinearRegression
from sklearn.linear_model import LinearRegression
from sklearn.metrics import (
    mean_absolute_error,
    mean_squared_error,
    mean_absolute_percentage_error,
    median_absolute_error,
    r2_score,
    precision_score

)

from xgboost import XGBRegressor



In [2]:
# We import our own functions
import sys
sys.path.append('../../..')  # Move two levels up to the project root
from modules.functions import *

## 1.2 Extraction
We define our inputs and apply them the ```bcrp_dataframe``` function in order to obtain the pandas dataframe with the corresponding series

We define the following inputs:

    series     = the code of the series we are going to extract
    start_date = the starting date, when the BCRP starts using the interest rate as a policy measure
    end_date   = December 2019
    freq       = Monthly frequency

### df_1
We can now create the first dataframe with the ```bcrp_dataframe``` function. This dataframe contains all Consumer Price Index variables in monthly % change. This variables are then seasonally adjusted using the 

In [3]:
series     = ['PN01271PM', 'PN01280PM', 'PN01282PM', 'PN01278PM', 'PN09817PM','PN09816PM', 'PN01276PM', 'PN01313PM', 'PN01314PM',  
             'PN01315PM', 'PN09818PM','PN01286PM']
start_date = '2003-09'
end_date   = '2019-12'
freq       = 'Mensual'

In [4]:
df_1 = bcrp_dataframe( series , start_date , end_date , freq )
df_1.head()

Unnamed: 0_level_0,Índice de precios Lima Metropolitana (var% mensual) - IPC,Índice de precios Lima Metropolitana (var% mensual) - IPC Transables,Índice de precios Lima Metropolitana (var% mensual) - IPC No Transables,Índice de precios Lima Metropolitana (var% mensual) - IPC Subyacente,Índice de precios Lima Metropolitana (var% mensual) - IPC No Subyacente,Índice de precios Lima Metropolitana (var% mensual) - IPC Alimentos y Energía,Índice de precios Lima Metropolitana (var% mensual) - IPC Sin Alimentos y Energía,Índice de precios Lima Metropolitana (var% mensual) - IPC Alimentos y Bebidas,Índice de precios Lima Metropolitana (var% mensual) - IPC sin Alimentos y Bebidas,Índice de precios Lima Metropolitana (var% mensual) - IPC Subyacente Sin Alimentos y Bebidas,Índice de precios Lima Metropolitana (var% mensual) - IPC Importado,Índice de precios Lima Metropolitana (var% mensual) - Índice de Precios al por Mayor
Fecha,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2003-09-01,0.558598,0.205933,0.822993,0.005689,1.41685,1.024345,-0.017156,0.900901,0.22377,0.048987,1.140655,0.669709
2003-10-01,0.049032,-0.035055,0.096153,-0.040038,0.16473,0.193699,-0.070649,0.198413,-0.040003,-0.132041,0.183697,0.171749
2003-11-01,0.167685,0.243529,0.12095,0.125742,0.237966,0.256361,0.005988,0.39604,-0.082393,0.08893,0.166302,0.190676
2003-12-01,0.563951,0.594507,0.534926,0.127343,1.196907,0.898519,0.231768,0.986193,0.242505,0.078004,0.306233,0.649838
2004-01-01,0.537447,0.265543,0.708509,-0.055834,1.379067,1.132403,-0.141462,1.074219,0.076551,-0.081012,3.494166,0.54559


In [5]:
def get_trend(df, period=12):
    trend_df = pd.DataFrame()

    for col in df.columns:
        stl_result = STL(df[col], period=period).fit()
        trend_df[col] = stl_result.trend

    return trend_df

df_1 = get_trend(df_1)
df_1.head()

Unnamed: 0_level_0,Índice de precios Lima Metropolitana (var% mensual) - IPC,Índice de precios Lima Metropolitana (var% mensual) - IPC Transables,Índice de precios Lima Metropolitana (var% mensual) - IPC No Transables,Índice de precios Lima Metropolitana (var% mensual) - IPC Subyacente,Índice de precios Lima Metropolitana (var% mensual) - IPC No Subyacente,Índice de precios Lima Metropolitana (var% mensual) - IPC Alimentos y Energía,Índice de precios Lima Metropolitana (var% mensual) - IPC Sin Alimentos y Energía,Índice de precios Lima Metropolitana (var% mensual) - IPC Alimentos y Bebidas,Índice de precios Lima Metropolitana (var% mensual) - IPC sin Alimentos y Bebidas,Índice de precios Lima Metropolitana (var% mensual) - IPC Subyacente Sin Alimentos y Bebidas,Índice de precios Lima Metropolitana (var% mensual) - IPC Importado,Índice de precios Lima Metropolitana (var% mensual) - Índice de Precios al por Mayor
Fecha,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2003-09-01,0.518834,0.470988,0.552064,0.08957,1.146795,0.97494,-0.019098,1.065302,0.030469,0.029603,1.096327,0.676146
2003-10-01,0.492247,0.46073,0.514069,0.091267,1.078211,0.916599,-0.008282,0.987035,0.050294,0.031116,1.056286,0.646277
2003-11-01,0.465462,0.449419,0.476486,0.092842,1.009369,0.858081,0.00234,0.909263,0.069318,0.032549,1.014146,0.615219
2003-12-01,0.438413,0.437075,0.439183,0.094296,0.940109,0.79928,0.012736,0.831909,0.087472,0.033944,0.970034,0.582957
2004-01-01,0.411138,0.423839,0.402122,0.09562,0.870541,0.740242,0.022936,0.754927,0.104864,0.03536,0.924542,0.549642


### df_2
We create the second dataframe with the ```bcrp_dataframe``` function. This dataframe contains rate variables. We use this variables in levels. It is not necessary to differentiate them.

In [6]:
series     = ['PN00493MM', 'PD04722MM']
start_date = '2003-09'
end_date   = '2019-12'
freq       = 'Mensual'

In [7]:
df_2 = bcrp_dataframe( series , start_date , end_date , freq )
df_2.head()

Unnamed: 0_level_0,Tasas de interés del Banco Central de Reserva - Tasa de Encaje,Tasas de interés del Banco Central de Reserva - Tasa de Referencia de la Política Monetaria
Fecha,Unnamed: 1_level_1,Unnamed: 2_level_1
2003-09-01,11.01492,2.75
2003-10-01,10.349944,2.75
2003-11-01,11.690608,2.5
2003-12-01,11.391178,2.5
2004-01-01,10.63403,2.5


### df_3
We create the third dataframe with the ```bcrp_dataframe``` function. This dataframe contains monetary variables as well as commodities. We differentiate those variables in order to get their monthly % change

In [8]:
series     = ['PN00495MM', 'PN06481IM', 'PN02125PM', 'PN01661XM','PN01662XM','PN01664XM','PN01660XM']
start_date = '2003-09'
end_date   = '2019-12'
freq       = 'Mensual'

In [9]:
df_3 = bcrp_dataframe( series , start_date , end_date , freq )
df_3 = df_3.pct_change()
df_3 = df_3.dropna()
df_3.head()

Unnamed: 0_level_0,Emisión primaria y multiplicador (millones S/) - Circulante Desestacionalizado,Liquidez internacional del BCRP - RIN - Reservas Internacionales Netas (millones US$),Remuneraciones - Remuneración Mínima Vital - Índice Real (base 1994 = 100),Cotizaciones de productos (promedio del periodo) - Trigo - EEUU (US$ por toneladas),Cotizaciones de productos (promedio del periodo) - Maíz - EEUU (US$ por toneladas),Cotizaciones de productos (promedio del periodo) - Aceite Soya - EEUU (US$ por toneladas),Cotizaciones de productos (promedio del periodo) - Petróleo - WTI (US$ por barriles)
Fecha,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2003-10-01,0.010225,0.005542,0.056953,0.016588,-0.027808,0.181232,0.065797
2003-11-01,0.011445,0.050288,-0.001674,0.115205,0.075366,0.017022,0.022715
2003-12-01,0.016607,-0.010532,-0.005608,0.037146,0.048843,0.051254,0.037815
2004-01-01,0.01949,0.036233,-0.005346,0.003546,0.062558,0.02678,0.061517
2004-02-01,0.017003,-0.005743,-0.010744,-0.019994,0.081887,0.087963,0.015738


In [10]:
df = df_1.join(df_2).join(df_3)
df.dropna(inplace=True)
df.head()

Unnamed: 0_level_0,Índice de precios Lima Metropolitana (var% mensual) - IPC,Índice de precios Lima Metropolitana (var% mensual) - IPC Transables,Índice de precios Lima Metropolitana (var% mensual) - IPC No Transables,Índice de precios Lima Metropolitana (var% mensual) - IPC Subyacente,Índice de precios Lima Metropolitana (var% mensual) - IPC No Subyacente,Índice de precios Lima Metropolitana (var% mensual) - IPC Alimentos y Energía,Índice de precios Lima Metropolitana (var% mensual) - IPC Sin Alimentos y Energía,Índice de precios Lima Metropolitana (var% mensual) - IPC Alimentos y Bebidas,Índice de precios Lima Metropolitana (var% mensual) - IPC sin Alimentos y Bebidas,Índice de precios Lima Metropolitana (var% mensual) - IPC Subyacente Sin Alimentos y Bebidas,...,Índice de precios Lima Metropolitana (var% mensual) - Índice de Precios al por Mayor,Tasas de interés del Banco Central de Reserva - Tasa de Encaje,Tasas de interés del Banco Central de Reserva - Tasa de Referencia de la Política Monetaria,Emisión primaria y multiplicador (millones S/) - Circulante Desestacionalizado,Liquidez internacional del BCRP - RIN - Reservas Internacionales Netas (millones US$),Remuneraciones - Remuneración Mínima Vital - Índice Real (base 1994 = 100),Cotizaciones de productos (promedio del periodo) - Trigo - EEUU (US$ por toneladas),Cotizaciones de productos (promedio del periodo) - Maíz - EEUU (US$ por toneladas),Cotizaciones de productos (promedio del periodo) - Aceite Soya - EEUU (US$ por toneladas),Cotizaciones de productos (promedio del periodo) - Petróleo - WTI (US$ por barriles)
Fecha,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2003-10-01,0.492247,0.46073,0.514069,0.091267,1.078211,0.916599,-0.008282,0.987035,0.050294,0.031116,...,0.646277,10.349944,2.75,0.010225,0.005542,0.056953,0.016588,-0.027808,0.181232,0.065797
2003-11-01,0.465462,0.449419,0.476486,0.092842,1.009369,0.858081,0.00234,0.909263,0.069318,0.032549,...,0.615219,11.690608,2.5,0.011445,0.050288,-0.001674,0.115205,0.075366,0.017022,0.022715
2003-12-01,0.438413,0.437075,0.439183,0.094296,0.940109,0.79928,0.012736,0.831909,0.087472,0.033944,...,0.582957,11.391178,2.5,0.016607,-0.010532,-0.005608,0.037146,0.048843,0.051254,0.037815
2004-01-01,0.411138,0.423839,0.402122,0.09562,0.870541,0.740242,0.022936,0.754927,0.104864,0.03536,...,0.549642,10.63403,2.5,0.01949,0.036233,-0.005346,0.003546,0.062558,0.02678,0.061517
2004-02-01,0.3837,0.409852,0.365307,0.096795,0.800843,0.681065,0.032967,0.678293,0.121634,0.03684,...,0.515502,10.718295,2.5,0.017003,-0.005743,-0.010744,-0.019994,0.081887,0.087963,0.015738


In [11]:
df.tail()

Unnamed: 0_level_0,Índice de precios Lima Metropolitana (var% mensual) - IPC,Índice de precios Lima Metropolitana (var% mensual) - IPC Transables,Índice de precios Lima Metropolitana (var% mensual) - IPC No Transables,Índice de precios Lima Metropolitana (var% mensual) - IPC Subyacente,Índice de precios Lima Metropolitana (var% mensual) - IPC No Subyacente,Índice de precios Lima Metropolitana (var% mensual) - IPC Alimentos y Energía,Índice de precios Lima Metropolitana (var% mensual) - IPC Sin Alimentos y Energía,Índice de precios Lima Metropolitana (var% mensual) - IPC Alimentos y Bebidas,Índice de precios Lima Metropolitana (var% mensual) - IPC sin Alimentos y Bebidas,Índice de precios Lima Metropolitana (var% mensual) - IPC Subyacente Sin Alimentos y Bebidas,...,Índice de precios Lima Metropolitana (var% mensual) - Índice de Precios al por Mayor,Tasas de interés del Banco Central de Reserva - Tasa de Encaje,Tasas de interés del Banco Central de Reserva - Tasa de Referencia de la Política Monetaria,Emisión primaria y multiplicador (millones S/) - Circulante Desestacionalizado,Liquidez internacional del BCRP - RIN - Reservas Internacionales Netas (millones US$),Remuneraciones - Remuneración Mínima Vital - Índice Real (base 1994 = 100),Cotizaciones de productos (promedio del periodo) - Trigo - EEUU (US$ por toneladas),Cotizaciones de productos (promedio del periodo) - Maíz - EEUU (US$ por toneladas),Cotizaciones de productos (promedio del periodo) - Aceite Soya - EEUU (US$ por toneladas),Cotizaciones de productos (promedio del periodo) - Petróleo - WTI (US$ por barriles)
Fecha,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2019-08-01,0.155238,0.107169,0.182206,0.172176,0.120649,0.117414,0.187309,0.076524,0.207174,0.194464,...,-0.013811,6.446612,2.5,0.007826,0.022711,-0.000609,-0.071854,-0.127216,0.016693,-0.044064
2019-09-01,0.151966,0.103574,0.179086,0.168668,0.117677,0.111047,0.186615,0.065826,0.208745,0.190797,...,-0.029031,5.935527,2.5,0.008478,-0.005991,-6.4e-05,-0.028875,-0.044102,0.019052,0.037436
2019-10-01,0.148757,0.100074,0.176011,0.165254,0.114708,0.104757,0.185969,0.055385,0.210242,0.187204,...,-0.044027,6.617785,2.5,0.003873,0.001857,-0.001106,0.040636,0.066893,0.042443,-0.050909
2019-11-01,0.145714,0.096671,0.17314,0.161942,0.112035,0.098706,0.185426,0.045367,0.211728,0.183707,...,-0.05882,6.069958,2.25,0.006119,-0.017871,-0.001088,0.072377,-0.035344,0.0262,0.057743
2019-12-01,0.142886,0.093373,0.170545,0.158729,0.109809,0.09298,0.185001,0.035862,0.213221,0.180317,...,-0.073404,6.699655,2.25,0.029592,0.023139,-0.002141,0.032523,0.007789,0.057206,0.04904


## 1.3 Data Inspection
We inspect the df. We first verify that all values are non-null. The, we apply the ```describe``` function to see the main variables.

In [12]:
df.isna().sum()

Índice de precios Lima Metropolitana (var% mensual) - IPC                                       0
Índice de precios Lima Metropolitana (var% mensual) - IPC Transables                            0
Índice de precios Lima Metropolitana (var% mensual) - IPC No Transables                         0
Índice de precios Lima Metropolitana (var% mensual) - IPC Subyacente                            0
Índice de precios Lima Metropolitana (var% mensual) - IPC No Subyacente                         0
Índice de precios Lima Metropolitana (var% mensual) - IPC Alimentos y Energía                   0
Índice de precios Lima Metropolitana (var% mensual) - IPC Sin Alimentos y Energía               0
Índice de precios Lima Metropolitana (var% mensual) - IPC Alimentos y Bebidas                   0
Índice de precios Lima Metropolitana (var% mensual) - IPC sin Alimentos y Bebidas               0
Índice de precios Lima Metropolitana (var% mensual) - IPC Subyacente Sin Alimentos y Bebidas    0
Índice de precios Li

In [13]:
df.describe()

Unnamed: 0,Índice de precios Lima Metropolitana (var% mensual) - IPC,Índice de precios Lima Metropolitana (var% mensual) - IPC Transables,Índice de precios Lima Metropolitana (var% mensual) - IPC No Transables,Índice de precios Lima Metropolitana (var% mensual) - IPC Subyacente,Índice de precios Lima Metropolitana (var% mensual) - IPC No Subyacente,Índice de precios Lima Metropolitana (var% mensual) - IPC Alimentos y Energía,Índice de precios Lima Metropolitana (var% mensual) - IPC Sin Alimentos y Energía,Índice de precios Lima Metropolitana (var% mensual) - IPC Alimentos y Bebidas,Índice de precios Lima Metropolitana (var% mensual) - IPC sin Alimentos y Bebidas,Índice de precios Lima Metropolitana (var% mensual) - IPC Subyacente Sin Alimentos y Bebidas,...,Índice de precios Lima Metropolitana (var% mensual) - Índice de Precios al por Mayor,Tasas de interés del Banco Central de Reserva - Tasa de Encaje,Tasas de interés del Banco Central de Reserva - Tasa de Referencia de la Política Monetaria,Emisión primaria y multiplicador (millones S/) - Circulante Desestacionalizado,Liquidez internacional del BCRP - RIN - Reservas Internacionales Netas (millones US$),Remuneraciones - Remuneración Mínima Vital - Índice Real (base 1994 = 100),Cotizaciones de productos (promedio del periodo) - Trigo - EEUU (US$ por toneladas),Cotizaciones de productos (promedio del periodo) - Maíz - EEUU (US$ por toneladas),Cotizaciones de productos (promedio del periodo) - Aceite Soya - EEUU (US$ por toneladas),Cotizaciones de productos (promedio del periodo) - Petróleo - WTI (US$ por barriles)
count,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,...,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0
mean,0.236449,0.204956,0.256093,0.23363,0.23554,0.292706,0.180074,0.298515,0.186851,0.193813,...,0.198892,10.745239,3.670513,0.011586,0.010424,0.001721,0.004657,0.005585,0.003622,0.007555
std,0.105236,0.104879,0.123443,0.091512,0.204664,0.184651,0.070393,0.205851,0.058026,0.073718,...,0.217626,4.267221,1.107363,0.008612,0.028325,0.018369,0.071171,0.075505,0.060651,0.085053
min,0.056201,-0.015198,0.027347,0.091163,-0.140622,-0.013365,-0.008282,-0.086504,0.04079,0.031116,...,-0.384462,5.935527,1.25,-0.006819,-0.108487,-0.012878,-0.206207,-0.260785,-0.244475,-0.287431
25%,0.155856,0.132928,0.165486,0.163598,0.128311,0.151125,0.119568,0.144196,0.163795,0.137828,...,0.095354,7.87699,3.0,0.004665,-0.00614,-0.003851,-0.031976,-0.03136,-0.032933,-0.043734
50%,0.240839,0.18936,0.260533,0.236332,0.206382,0.288969,0.187309,0.293743,0.198244,0.201458,...,0.15849,9.34555,3.75,0.010576,0.00603,-0.001863,0.000386,0.005082,0.005193,0.015738
75%,0.292942,0.265015,0.325186,0.299551,0.304751,0.381868,0.219777,0.387526,0.219919,0.245549,...,0.302821,11.933199,4.25,0.017573,0.022355,-0.00014,0.038601,0.046492,0.037092,0.06105
max,0.508985,0.473341,0.610059,0.449803,1.078211,0.916599,0.36181,0.987035,0.300834,0.320968,...,0.77524,26.380399,6.5,0.038067,0.11008,0.130963,0.302896,0.277119,0.181232,0.24912


We have 195 observation ranging from ```2003-10-01``` to ```2019-12-01```. The mean of monthly % change of all CPI variables is around 0.2. The mean of the lacing rate and the interest rate is 10.7% and 3.67%, respectively. The three monetary variables have a small monthly % change, around 0.01 and 0.001 for Minimum Wage index. 

## 1.4 Data adjustment
We will rename the columns for easier identification of the variables. We will also create a new dataframe with the lags of the variables. 

In [14]:
# New column names
columns = {
    'Índice de precios Lima Metropolitana (var% mensual) - IPC': 'CPI',
    'Índice de precios Lima Metropolitana (var% mensual) - IPC Transables': 'CPI Tradable',
    'Índice de precios Lima Metropolitana (var% mensual) - IPC No Transables': 'CPI Non-Tradable',
    'Índice de precios Lima Metropolitana (var% mensual) - IPC Subyacente': 'CPI Core',
    'Índice de precios Lima Metropolitana (var% mensual) - IPC No Subyacente': 'CPI Non-Core',
    'Índice de precios Lima Metropolitana (var% mensual) - IPC Alimentos y Energía': 'CPI Food and Energy',
    'Índice de precios Lima Metropolitana (var% mensual) - IPC Sin Alimentos y Energía': 'CPI Excluding Food and Energy',
    'Índice de precios Lima Metropolitana (var% mensual) - IPC Alimentos y Bebidas': 'CPI Food and Beverages',
    'Índice de precios Lima Metropolitana (var% mensual) - IPC sin Alimentos y Bebidas': 'CPI Excluding Food and Beverages',
    'Índice de precios Lima Metropolitana (var% mensual) - IPC Subyacente Sin Alimentos y Bebidas': 'CPI Core Excluding Food and Beverages',
    'Índice de precios Lima Metropolitana (var% mensual) - IPC Importado': 'CPI Imported',
    'Índice de precios Lima Metropolitana (var% mensual) - Índice de Precios al por Mayor': 'Wholesale Price Index',
    'Tasas de interés del Banco Central de Reserva  - Tasa de Encaje': 'Reserve Requirement Rate',
    'Tasas de interés del Banco Central de Reserva  - Tasa de Referencia de la Política Monetaria': 'Monetary Policy Rate',
    'Emisión primaria y multiplicador (millones S/) - Circulante Desestacionalizado': 'Circulating Currency Seasonally Adjusted (mill S/)',
    'Liquidez internacional del BCRP - RIN - Reservas Internacionales Netas (millones US$)': 'Net International Reserves (mill $)',
    'Remuneraciones - Remuneración Mínima Vital - Índice Real (base 1994 = 100)': 'Real Minimum Wage (Index)',
    'Cotizaciones de productos (promedio del periodo) - Trigo - EEUU (US$ por toneladas)': 'Wheat (US$ per ton)',
    'Cotizaciones de productos (promedio del periodo) - Maíz - EEUU (US$ por toneladas)': 'Corn  (US$ per ton)',
    'Cotizaciones de productos (promedio del periodo) - Aceite Soya - EEUU (US$ por toneladas)': 'Soybean oil (US$ per ton)',
    'Cotizaciones de productos (promedio del periodo) - Petróleo - WTI (US$ por barriles)': 'Crude oil (US$ per barrel)'  
}

# We rename the columns so they are easier to analyse
df.rename(columns=columns, inplace=True)

In [15]:
df_lags = df.copy()

for variable in df_lags.columns[1:]:
    df_lags[f'{variable}_lag_1'] = df_lags[variable].shift()
    df_lags[f'{variable}_lag_2'] = df_lags[variable].shift(2)

In [16]:
# We delete contemporary variables
df_lags.drop(columns = ['CPI Tradable', 'CPI Non-Tradable', 'CPI Core', 'CPI Non-Core', 'CPI Food and Energy', 'CPI Excluding Food and Energy',
       'CPI Food and Beverages', 'CPI Excluding Food and Beverages','CPI Core Excluding Food and Beverages', 'CPI Imported',
       'Wholesale Price Index', 'Reserve Requirement Rate','Monetary Policy Rate','Circulating Currency Seasonally Adjusted (mill S/)',
       'Net International Reserves (mill $)', 'Real Minimum Wage (Index)', 'Wheat (US$ per ton)', 'Corn  (US$ per ton)', 
       'Soybean oil (US$ per ton)', 'Crude oil (US$ per barrel)'], inplace = True)

df_lags = df_lags.dropna()

## 1.5 Save Results
We save it to the ```input``` folder, where we can use it to do the forecasting in the next notebook.

In [17]:
df.to_csv('../../../input/df_raw_h19.csv')

In [18]:
df_lags.to_csv('../../../input/df_lags_h19.csv')

In [19]:
df_lags.tail()

Unnamed: 0_level_0,CPI,CPI Tradable_lag_1,CPI Tradable_lag_2,CPI Non-Tradable_lag_1,CPI Non-Tradable_lag_2,CPI Core_lag_1,CPI Core_lag_2,CPI Non-Core_lag_1,CPI Non-Core_lag_2,CPI Food and Energy_lag_1,...,Real Minimum Wage (Index)_lag_1,Real Minimum Wage (Index)_lag_2,Wheat (US$ per ton)_lag_1,Wheat (US$ per ton)_lag_2,Corn (US$ per ton)_lag_1,Corn (US$ per ton)_lag_2,Soybean oil (US$ per ton)_lag_1,Soybean oil (US$ per ton)_lag_2,Crude oil (US$ per barrel)_lag_1,Crude oil (US$ per barrel)_lag_2
Fecha,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2019-08-01,0.155238,0.110871,0.114674,0.185235,0.188145,0.175751,0.179341,0.12343,0.126055,0.123753,...,-0.002029,0.000863,-0.078909,0.071044,0.011487,0.168096,0.006573,0.042708,0.048935,-0.099863
2019-09-01,0.151966,0.107169,0.110871,0.182206,0.185235,0.172176,0.175751,0.120649,0.12343,0.117414,...,-0.000609,-0.002029,-0.071854,-0.078909,-0.127216,0.011487,0.016693,0.006573,-0.044064,0.048935
2019-10-01,0.148757,0.103574,0.107169,0.179086,0.182206,0.168668,0.172176,0.117677,0.120649,0.111047,...,-6.4e-05,-0.000609,-0.028875,-0.071854,-0.044102,-0.127216,0.019052,0.016693,0.037436,-0.044064
2019-11-01,0.145714,0.100074,0.103574,0.176011,0.179086,0.165254,0.168668,0.114708,0.117677,0.104757,...,-0.001106,-6.4e-05,0.040636,-0.028875,0.066893,-0.044102,0.042443,0.019052,-0.050909,0.037436
2019-12-01,0.142886,0.096671,0.100074,0.17314,0.176011,0.161942,0.165254,0.112035,0.114708,0.098706,...,-0.001088,-0.001106,0.072377,0.040636,-0.035344,0.066893,0.0262,0.042443,0.057743,-0.050909


In [20]:
df.tail()

Unnamed: 0_level_0,CPI,CPI Tradable,CPI Non-Tradable,CPI Core,CPI Non-Core,CPI Food and Energy,CPI Excluding Food and Energy,CPI Food and Beverages,CPI Excluding Food and Beverages,CPI Core Excluding Food and Beverages,...,Wholesale Price Index,Reserve Requirement Rate,Monetary Policy Rate,Circulating Currency Seasonally Adjusted (mill S/),Net International Reserves (mill $),Real Minimum Wage (Index),Wheat (US$ per ton),Corn (US$ per ton),Soybean oil (US$ per ton),Crude oil (US$ per barrel)
Fecha,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2019-08-01,0.155238,0.107169,0.182206,0.172176,0.120649,0.117414,0.187309,0.076524,0.207174,0.194464,...,-0.013811,6.446612,2.5,0.007826,0.022711,-0.000609,-0.071854,-0.127216,0.016693,-0.044064
2019-09-01,0.151966,0.103574,0.179086,0.168668,0.117677,0.111047,0.186615,0.065826,0.208745,0.190797,...,-0.029031,5.935527,2.5,0.008478,-0.005991,-6.4e-05,-0.028875,-0.044102,0.019052,0.037436
2019-10-01,0.148757,0.100074,0.176011,0.165254,0.114708,0.104757,0.185969,0.055385,0.210242,0.187204,...,-0.044027,6.617785,2.5,0.003873,0.001857,-0.001106,0.040636,0.066893,0.042443,-0.050909
2019-11-01,0.145714,0.096671,0.17314,0.161942,0.112035,0.098706,0.185426,0.045367,0.211728,0.183707,...,-0.05882,6.069958,2.25,0.006119,-0.017871,-0.001088,0.072377,-0.035344,0.0262,0.057743
2019-12-01,0.142886,0.093373,0.170545,0.158729,0.109809,0.09298,0.185001,0.035862,0.213221,0.180317,...,-0.073404,6.699655,2.25,0.029592,0.023139,-0.002141,0.032523,0.007789,0.057206,0.04904
