## Data Processing of Banking Dataset (Banco Central)

### 1. Requirements
Install python libraries dependency for the project

In [None]:
%pip install unidecode
%pip install pandas
%pip install numpy
%pip install scipy

### 2. Import Functions
Import etl function of src folder

In [2]:
import sys
sys.path.append('../src')

from extraction import *
from load import *
from transformation import *
from data_model import *

ModuleNotFoundError: No module named 'sqlalchemy'

### 3. Extracting Data
This section aims to extract Bank Information (CSV) and Bank Fees (API) dataset from Banco Central channel and provide it at data raw layer.


3.1 Original Banks Information dataset are splited in csv files by year and period. This section aims to merge the splited files into a colosolidated csv of 2020 to 2021 period.

In [None]:
bancos_df = merge_bank_info_csv()

3.2 This sections will perform url requests to an API using bank key (CNPJ) as variable parameter. 
The result of each request is merged into consolidated json which will further trasformed into pandas dataset.

In [7]:
lista_tarifas_df = get_bank_fees_api(bancos_df['CNPJ IF'])

### 4. Transforming Data
In this section it will be performed a data transformation over 'Bank Info'  and 'Bank Fees' dataset to normalize it and for a better data quality by applying outliers exclusion with Z-score. The trasformed data will be available at data trusted layer to further SQL ingestion at Postgress.

4.1 Normalizing Bank Information dataset (Informações Banco) by applynig colluns normalization, removing accents and coverting texts to lower case.

In [5]:
bancos_df_truted = normalize_bank_info()

4.2 Normalizing Bank Fees dataset (Tarifas Bancos) by aaplying colluns normalization, removing accents, coverting texts to lower case and removing outrliers.

In [None]:
lista_tarifas_df_trusted = normalize_bank_fees()

### 5. Uploading Data

In [None]:
db_manager = DatabaseManager()
db_manager.create_schema("trusted")

In [None]:
db_manager.create_table_with_pandas_df(bancos_df, 'bancos', 'trusted')
db_manager.create_table_with_pandas_df(lista_tarifas_df, 'lista_tarifas', 'trusted')

### 6. Generating Multi Dimensional Data Model (Star Schema)

In [None]:
create_star_schema(db_manager)