<a href="https://colab.research.google.com/github/fernandofsilva/LSTM_Option_Pricing/blob/main/00_notebook_download.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Setup

In [1]:
#@title Carregar as bibliotecas base
import pandas as pd
import zipfile
import os

Os dados das cotações de todos os ativos negociadas na [B3](http://www.b3.com.br), estão disponíveis no [link](http://www.b3.com.br/pt_br/market-data-e-indices/servicos-de-dados/market-data/historico/mercado-a-vista/series-historicas/) e precisam ser parseados de acordo com as seguintes instruções [link](http://www.b3.com.br/en_us/market-data-and-indices/data-services/market-data/historical-data/equities/historical-quote-data/).

Para essa tarefa é necessário utilizado o pacote [bovespa](https://github.com/fernandofsilva/bovespa), ele é a versão atualizada por mim do pacote desenvolvido a 5 anos atrás [link](https://github.com/thypad/bovespa).

In [2]:
#@title Instalar pacote Bovespa
!pip install git+https://github.com/fernandofsilva/bovespa

Collecting git+https://github.com/fernandofsilva/bovespa
  Cloning https://github.com/fernandofsilva/bovespa to /tmp/pip-req-build-acjegd5x
  Running command git clone -q https://github.com/fernandofsilva/bovespa /tmp/pip-req-build-acjegd5x
Building wheels for collected packages: bovespa
  Building wheel for bovespa (setup.py) ... [?25l[?25hdone
  Created wheel for bovespa: filename=bovespa-0.1.0-cp36-none-any.whl size=9211 sha256=8cc1c55455269e17f246398b3af55e4008a618c7da35142938e109dfee5c2a19
  Stored in directory: /tmp/pip-ephem-wheel-cache-1rn9stne/wheels/cb/3c/9b/27644c70e14f3c2a2dd611dcfcf8f4f6856e4577f3f881eff4
Successfully built bovespa
Installing collected packages: bovespa
Successfully installed bovespa-0.1.0


Os dados estão salvos dentro do meu Google Drive, porém estão disponíveis através do site da bovespa e também dentro do [Github](https://github.com/fernandofsilva/LSTM_Option_Pricing/tree/main/data) do projeto.

In [3]:
#@title Descompactor os arquivos

# Path to the files
path = '/content/drive/My Drive/Mestrado/data/'

# List files in the directory
files = os.listdir(path)

# Filter just quotation data
files = list(filter(lambda file: 'COTAHIST' in file, files))

# Loop over the files, unzip and save then local
for file in files:
    with zipfile.ZipFile(path + file, 'r') as zip_ref:
        zip_ref.extractall('')

In [4]:
#@title Amostra dos dados
!head -10 COTAHIST_A2020.TXT

00COTAHIST.2020BOVESPA 20201113                                                                                                                                                                                                                      
012020010202AALR3       010ALLIAR      ON      NM   R$  000000000182900000000019000000000001828000000000186800000000019000000000001899000000000190102443000000000000585800000000001094619600000000000000009999123100000010000000000000BRAALRACNOR6101
012020010202AAPL34      010APPLE       DRN          R$  000000001200000000000121340000000012000000000001207300000000121340000000010550000000001213400009000000000000012700000000000153338000000000000000009999123100000010000000000000BRAAPLBDR004131
012020010202ABCB4       010ABC BRASIL  PN  EJ  N2   R$  000000000200000000000020300000000001982000000000200500000000020300000000002008000000000203003979000000000000870400000000001745787800000000000000009999123100000010000000000000BRABCBACNPR4133
012020010202

# Parser dos dados

Parsear os dados e salvar em format .Csv

In [5]:
#@title Parser
import bovespa

# Correct file names
files = list(map(lambda file: file.replace('.zip', '.TXT'), files))

# Loop over the files
for file in files:

    # Parse files
    bf = bovespa.stock_history([file])

    # Save Csv
    bf.to_csv(f'{file[:-4]}.csv')

OrderedDict([('TIPREG', '00'), ('NOMARQ', 'COTAHIST.2019'), ('CODORI', 'BOVESPA'), ('DATGER', datetime.date(2019, 12, 30)), ('RESERV', '')])
OrderedDict([('TIPREG', '00'), ('NOMARQ', 'COTAHIST.2020'), ('CODORI', 'BOVESPA'), ('DATGER', datetime.date(2020, 11, 13)), ('RESERV', '')])
OrderedDict([('TIPREG', '00'), ('NOMARQ', 'COTAHIST.2018'), ('CODORI', 'BOVESPA'), ('DATGER', datetime.date(2018, 12, 28)), ('RESERV', '')])
OrderedDict([('TIPREG', '00'), ('NOMARQ', 'COTAHIST.2017'), ('CODORI', 'BOVESPA'), ('DATGER', datetime.date(2017, 12, 29)), ('RESERV', '')])
OrderedDict([('TIPREG', '00'), ('NOMARQ', 'COTAHIST.2016'), ('CODORI', 'BOVESPA'), ('DATGER', datetime.date(2016, 12, 29)), ('RESERV', '')])
OrderedDict([('TIPREG', '00'), ('NOMARQ', 'COTAHIST.2015'), ('CODORI', 'BOVESPA'), ('DATGER', datetime.date(2015, 12, 30)), ('RESERV', '')])


# Selecionar colunas e salvar Csv

In [6]:
#@title Ler os arquivos
# Correct file names
files = list(map(lambda file: file.replace('.TXT', '.csv'), files))

# Cols to filter
cols = ['DATPRG', 'CODNEG', 'PREULT', 'PREEXE', 'DATVEN', 'TPMERC']

# Instantiate list
dataframe_list = []

# Loop over the files
for file in files:
    dataframe_list.append(pd.read_csv(file, usecols=cols))

# Concatenate dataframes
data = pd.concat(dataframe_list, axis=0)

In [7]:
#@title Formatar os dados
# Convert to datetime
data.loc[data['DATVEN'] == '9999-12-31', 'DATVEN'] = data.loc[data['DATVEN'] == '9999-12-31', 'DATPRG']
data[['DATPRG', 'DATVEN']] = data[['DATPRG', 'DATVEN']].apply(pd.to_datetime)

# Rename columns
dict_ref = {
    'DATPRG': 'date',
    'CODNEG': 'symbol',
    'PREULT': 'price',
    'PREEXE': 'strike',
    'DATVEN': 'expire',
    'TPMERC': 'market'
}

# Rename the columns
data = data.rename(columns=dict_ref)

# Dict market type
market = {
    10: 'VISTA',
    12: 'EXERCÍCIO DE OPÇÕES DE COMPRA',
    13: 'EXERCÍCIO DE OPÇÕES DE VENDA',
    17: 'LEILÃO',
    20: 'FRACIONÁRIO',
    30: 'TERMO',
    50: 'FUTURO COM RETENÇÃO DE GANHO',
    60: 'FUTURO COM MOVIMENTAÇÃO CONTÍNUA',
    70: 'OPÇÕES DE COMPRA',
    80: 'OPÇÕES DE VENDA',
}

# Replace options types
data['market'] = data['market'].replace(market)

data.head()

Unnamed: 0,date,symbol,market,price,strike,expire
0,2019-01-02,AALR3,VISTA,13.25,0.0,2019-01-02
1,2019-01-02,AAPL34,VISTA,60.41,0.0,2019-01-02
2,2019-01-02,ABBV34,VISTA,342.84,0.0,2019-01-02
3,2019-01-02,ABCB4,VISTA,17.12,0.0,2019-01-02
4,2019-01-02,ABEV3,VISTA,16.15,0.0,2019-01-02


O arquivo final é salvo no meu google drive, porém uma cópia está disponível no [link](https://github.com/fernandofsilva/LSTM_Option_Pricing/tree/main/data)

In [8]:
#@title Salvar Csv
data.to_csv(path + 'quotation.csv.gz', index=False, compression='gzip')