![](https://www.ibge.gov.br/templates/novo_portal_internas/imagens/logo_mobile.png)

## The Brazilian Institute of Geography and Statistics or IBGE (Portuguese: Instituto Brasileiro de Geografia e Estatística) is the agency responsible for official collection of statistical, geographic, cartographic, geodetic and environmental information in Brazil. IBGE performs a decennial national census; questionnaires account for information such as age, household income, literacy, education, occupation and hygiene levels.

### So get some statistic from [IBGE](https://www.ibge.gov.br/en/np-statistics/full-list-statistics.html)

In [None]:
import numpy as np
import pandas as pd 
import os, glob

In [None]:
!wget -q -r ftp://ftp.ibge.gov.br/Precos_Indices_de_Precos_ao_Consumidor/IPCA_15/Resultados_por_Subitem/2017/
!wget -q -r ftp://ftp.ibge.gov.br/Precos_Indices_de_Precos_ao_Consumidor/IPCA_15/Resultados_por_Subitem/2018/
!mkdir cache
!for z in ftp.ibge.gov.br/Precos_Indices_de_Precos_ao_Consumidor/IPCA_15/Resultados_por_Subitem/2017/*.zip; do unzip -qq "$z" -d cache; done
!for z in ftp.ibge.gov.br/Precos_Indices_de_Precos_ao_Consumidor/IPCA_15/Resultados_por_Subitem/2018/*.zip; do unzip -qq "$z" -d cache; done
!rm -r ftp.ibge.gov.br

In [None]:
filenames = glob.glob('./cache/*.xls')

### Extended National Consumer Price Index 15 - IPCA-15
The National System of Consumer Price Indexes - SNIPC produces - continuously and systematically - consumer price indexes and, in this production, has, as data collection units, commercial and service-rendering establishments, concessionaires of public Internet services.

In [None]:
ipca15 = pd.read_excel(filenames[0], index_col=[0], header=[4], skiprows=[5])

In [None]:
# Select some rows
cols_cat = [' ÍNDICE GERAL',       # 'GENERAL INDEX'
        ' ALIMENTAÇÃO E BEBIDAS',     # 'FOOD AND BEVERAGES'
        ' HABITAÇÃO',                 # 'HOUSING',
        ' IOGURTE E BEBIDAS LÁCTEAS', # 'YOGURT AND DAIRY BEVERAGES',
        ' ARTIGOS DE RESIDÊNCIA',     # 'RESIDENCE ARTICLES',
        ' VESTUÁRIO',                 # 'CLOTHING',
        ' TRANSPORTES',               # 'TRANSPORTES',
        ' SAÚDE E CUIDADOS PESSOAIS', # 'HEALTH AND PERSONAL CARE',
        ' DESPESAS PESSOAIS',         # 'PERSONAL EXPENSES',
        ' EDUCAÇÃO',                  # 'EDUCATION',
        ' COMUNICAÇÃO',               # 'COMMUNICATION'
        ]

In [None]:
mth = filenames[0].split('_')[1][:6]
print(f"Month: {mth[:4]+'-'+mth[-2:]}")
# Cols meaning :
# Monthly variation by groups (%)
# Rio de Janeiro, Porto Alegre, Belo Horizonte, Recife,
# São Paulo, Brasília, Belém, Fortaleza, Salvador, Curitiba, Goiânia
ipca15.drop_duplicates().loc[cols_cat, :]

In [None]:
# And now collect colomn 'NACIONAL' in DataFrame by month
IPCA15 = pd.DataFrame(index=pd.DatetimeIndex(freq='M', start='2017-01-01', end='2018-12').to_period('M')
                          , columns=cols_cat)

for file in filenames:
    mth = file.split('_')[1][:6]
    idx = pd.to_datetime(mth[:4]+'-'+mth[-2:]).to_period('M')
    tmp = pd.read_excel(file, index_col=[0], header=[4], skiprows=[5])
    IPCA15.loc[idx,:] = tmp.drop_duplicates().loc[cols_cat, 'NACIONAL'].values
IPCA15[cols_cat] = IPCA15[cols_cat].astype(np.float32)
IPCA15.index.name='eval month'

In [None]:
IPCA15.head()

In [None]:
# read the data
df = pd.read_csv('../input/train.csv', parse_dates=['first_active_month'])
trns = pd.read_csv('../input/historical_transactions.csv',
                   parse_dates=['purchase_date'], infer_datetime_format=True)

In [None]:
cols = ['card_id', 'month_lag', 'purchase_date']
df = pd.merge(df, trns[cols].groupby('card_id').first(), on='card_id', left_index=True)

In [None]:
df['eval month'] = df.purchase_date - df.month_lag.astype('timedelta64[M]')
df['eval month'] = df['eval month'].dt.to_period('M')
df.drop(['month_lag', 'purchase_date'], axis=1, inplace=True)

In [None]:
df.head()

In [None]:
df_stats = pd.merge(df, IPCA15.reset_index(), on='eval month', left_index=True)

In [None]:
# check correlation with target, it's 99% coincidence, but mb usefull
df_stats.corr().iloc[4:,3:4]

In [None]:
IPCA15.to_csv('IPCA15.csv')

In [None]:
!rm -r cache
!wget -q -r -l 1 ftp://ftp.ibge.gov.br/Precos_Indices_de_Precos_ao_Consumidor/INPC/Resultados_por_Subitem/ -A zip
!wget -q -r ftp://ftp.ibge.gov.br/Precos_Indices_de_Precos_ao_Consumidor/INPC/Resultados_por_Subitem/2017    
!mkdir cache
!for z in ftp.ibge.gov.br/Precos_Indices_de_Precos_ao_Consumidor/INPC/Resultados_por_Subitem/2017/*.zip; do unzip -qq "$z" -d cache; done
!for z in ftp.ibge.gov.br/Precos_Indices_de_Precos_ao_Consumidor/INPC/Resultados_por_Subitem/*.zip; do unzip -qq "$z" -d cache; done
!rm -r ftp.ibge.gov.br

### National Consumer Price Index - INPC
The National System of Consumer Price Indexes - SNIPC continuously and systematically produces the National Consumer Price Index - INPC, aimed at the correction of the purchasing power of salaries by means of the measurement of price changes in the basket of the lowest-income salaried population. That income range was created in order to guarantee a 50% coverage of families with a salaried reference person and living in urban areas included in the National System of Consumer Price Indexes - SNIPC . 

In [None]:
filenames = glob.glob('./cache/*.xls')
# And now collect colomn 'NACIONAL' in DataFrame by month
INPC = pd.DataFrame(index=pd.DatetimeIndex(freq='M', start='2017-01-01', end='2018-12').to_period('M')
                          , columns=cols_cat)

for file in filenames:
    mth = file.split('_')[1][:6]
    idx = pd.to_datetime(mth[:4]+'-'+mth[-2:]).to_period('M')
    tmp = pd.read_excel(file, index_col=[0], header=[4], skiprows=[5])
    INPC.loc[idx,:] = tmp.drop_duplicates().loc[cols_cat, 'NACIONAL'].values
INPC[cols_cat] = INPC[cols_cat].astype(np.float32)
INPC.index.name='eval month'

In [None]:
df_stats = pd.merge(df, INPC.reset_index(), on='eval month', left_index=True)

In [None]:
df_stats.corr().iloc[4:,3:4]

In [None]:
INPC.to_csv('INPC.csv')

In [None]:
!rm -r cache

## If you finded some worthwhile method using stats - inform please !