# Process Description: Obtaining Fundamentus Share Data

## 1. Web Scraping with Selenium
- Utilize Selenium, a web automation tool, to navigate to the Fundamentus website.
- Identify and locate the necessary HTML elements containing the shares data.
- Use Selenium to interact with the webpage, extract the required information such as stock prices, financial metrics, and other relevant data.

## 2. Downloading Files
- Identify and locate any downloadable files on the Fundamentus website, such as the general balance report of each company.
- Use Selenium to simulate the process of clicking download buttons or links.
- Implement logic to wait for file downloads to complete before proceeding.

## 3. Extracting from ZIP
- If the downloaded files are in ZIP format, use Python's zipfile module to programmatically extract the contents.
- Identify the target files within the ZIP archive that contain the shares data.
- Extract the relevant files to a specified directory for further processing.

## 4. Data Processing
- Once the shares data is obtained, processed, and extracted, proceed with any necessary data cleaning, transformation, or analysis.
- Utilize Python libraries such as Pandas for efficient data manipulation.

## 5. Data Storage or Analysis
- Depending on the project requirements, store the processed shares data in a database, XLS file, or other data storage solutions.
- Conduct further analysis, visualization, or reporting as needed.

### So, let's code it!

## Import the needed libraries

In [1]:
# Selenium for web automation
from selenium import webdriver  # pip install selenium or pip install --upgrade selenium
# Webdriver Manager for managing browser drivers
from webdriver_manager.chrome import ChromeDriverManager  # pip install webdriver-manager
# Selenium service for Chrome browser
from selenium.webdriver.chrome.service import Service
# Selenium common locator strategies (By)
from selenium.webdriver.common.by import By
# Time module for handling time-related operations
import time
# NumPy for numerical operations
import numpy as np
# PyAutoGUI for automating mouse and keyboard interactions
import pyautogui
# Selenium Keys for keyboard interactions
from selenium.webdriver.common.keys import Keys
# OS module for interacting with the operating system
import os
# shutil for high-level file operations
import shutil
# sys for accessing Python interpreter variables
import sys
# zipfile for working with ZIP archives
import zipfile
# Custom module for store the paths
import config
# import pandas fro data mining
import pandas as pd
# library to get the Yahoo cotations
import yfinance as yf 
# import math do deal with nan
import math

## Steps 1, 2 and 3 (Web Scraping with Selenium, Downloading Files and Extracting from ZIP)

### Automate the process of finding and downloading the files. Moving to desired folder and extracting to XLS files

In [2]:
# Class definition for FundamentusBot
class FundamentusBot:
    # Constructor to configure the browser
    def __init__(self):
        # Configure Chrome options with download preferences (always ask where to save the file)
        chrome_options = webdriver.ChromeOptions()
        chrome_options.add_experimental_option("prefs", {
            "download.prompt_for_download": True,
            "safebrowsing.enabled": True
        })
        #create the service
        service = Service(ChromeDriverManager().install())
        #create and open the browser
        self.browser = webdriver.Chrome(service=service, options=chrome_options)

    # Method to open the Fundamentus website and navigate to the desired data section
    def open_site(self):
        #navigate to initial page of Fundamentus
        self.browser.get("https://fundamentus.com.br/")
        time.sleep(2) #wait 2 seconds 
        #click to open the Fundamentus Mobile
        self.browser.find_element(By.XPATH, '/html/body/div[1]/div[1]/div[2]/ul[1]/li[5]/a').click()
        time.sleep(3) #wait
        #call the method open_data
        self.open_data()

    # Method to navigate to the data section and process data for a list of companies
    def open_data(self):
        #list of companies of interest
        companies = config.companies
        #iterate the companies list
        for company in companies:
            #get company
            self.company = company
            #call the method that download the information
            self.dwnld_company_info()
        time.sleep(5) #wait
        # call the method that move the .zip files from 'source_path_download' to 'destination_path_zip'
        self.move_files()
        # call the method that extract the XLS files from the .zip and save in 'destination_path_xls'
        self.extract_files()
        # close the browser
        self.browser.quit()

    # Method to download company information from the Fundamentus website
    def dwnld_company_info(self):
        #try to do the sequence of actions
        try:
            # find the search bar and type the company
            self.browser.find_element(By.XPATH, '/html/body/div[1]/div[1]/div[2]/form/input[1]').send_keys(self.company)
            time.sleep(2) #wait
            # press enter to open the company's data
            self.browser.find_element(By.XPATH, '/html/body/div[1]/div[1]/div[2]/form/input[1]').send_keys(Keys.ENTER)
            time.sleep(2) #wait
            # click in 'Informações'
            self.browser.find_element(By.XPATH, '/html/body/div[2]/div[2]/div/div/div[2]/ul/li[3]/span').click()
            time.sleep(2) #wait
            # click in "Balanços em EXCEL"
            self.browser.find_element(By.XPATH, '/html/body/div[2]/div[2]/div/div/div[2]/ul/li[3]/ul/li[10]').click()
            time.sleep(3) #wait
            #get the code of the company 
            paper =self.browser.find_element(By.XPATH, '/html/body/div[2]/div[1]/div/div/div[1]/h1').text
            time.sleep(3) #wait
            # click in "baixar" to download the file
            self.browser.find_element(By.XPATH, '//*[@id="form-planilha"]/a').click()
            time.sleep(3) #wait
            # typing a new name for the downloadable file
            pyautogui.typewrite(f'bal_{paper}.zip')
            time.sleep(3) #wait
            # press ENTER to complete the download
            pyautogui.press('enter')
        # in case of any problem during the actions (try), the exception runs
        except Exception as e:
            # show the errors
            print(f'Error: {e}')
            # print a message
            print(f'There was an errror with the company: {self.company}')

    # Method to move downloaded files to a specified destination
    def move_files(self):
        #get all the files from "download" folder
        files_list = [f for f in os.listdir(config.source_path_download) if os.path.isfile(os.path.join(config.source_path_download, f))]
        # iterate all the files
        for file in files_list:
            # get the first 3 characters of the file name
            file_name = file[0:3]
            # check if the file name starts with 'bal' 
            if file_name == 'bal':
                # Check if the destination file exists
                if os.path.exists(f'{config.destination_path_zip}/{file}'):
                    # If it exists, remove it
                    os.remove(f'{config.destination_path_zip}/{file}')
                # move the file to the desired destination
                shutil.move(f'{config.source_path_download}/{file}', config.destination_path_zip)


    # Method to extract files from ZIP archives
    def extract_files(self):
        # get all the files from the directory
        files_list = [f for f in os.listdir(config.destination_path_zip) if os.path.isfile(os.path.join(config.destination_path_zip, f))]
        # iterate each file
        for file in files_list:
            # get the first 3 characters of the file name
            file_name = file[0:3]
            # check if the file name starts with 'bal' 
            if file_name == 'bal': 
                # Check if the destination file exists
                if os.path.exists(f'{config.destination_path_xls}/{file[0:-4]}.xls'):
                    # If it exists, remove it
                    os.remove(f'{config.destination_path_xls}/{file[0:-4]}.xls') 
                # Using the 'with' statement to open the ZIP archive for reading ('r' mode)
                # The archive file path is constructed using the destination path for ZIP files and the current file in the loop
                with zipfile.ZipFile(f'{config.destination_path_zip}/{file}', 'r') as zip_ref:
                    # Extract the specific file to the specified folder
                    zip_ref.extract('balanco.xls',f'{config.destination_path_xls}/')
                    # Rename the file
                    source_path = f'{config.destination_path_xls}/' #path of the file
                    file_to_rename = 'balanco.xls' #current file name
                    new_filename = f'{file[0:-4]}.xls' #desired file name
                    old_file_path = os.path.join(source_path,file_to_rename) #old file
                    new_file_path = os.path.join(source_path, new_filename) #new file
                    os.rename(old_file_path, new_file_path) #rename
                
# Create an instance of FundamentusBot and execute the process
bot = FundamentusBot()
bot.open_site()


NoSuchWindowException: Message: no such window: target window already closed
from unknown error: web view not found
  (Session info: chrome=120.0.6099.72)
Stacktrace:
	GetHandleVerifier [0x01006EE3+174339]
	(No symbol) [0x00F30A51]
	(No symbol) [0x00C46FF6]
	(No symbol) [0x00C2EFE7]
	(No symbol) [0x00C9B53B]
	(No symbol) [0x00CA9E7B]
	(No symbol) [0x00C96DA6]
	(No symbol) [0x00C71034]
	(No symbol) [0x00C71F8D]
	GetHandleVerifier [0x010A4B1C+820540]
	sqlite3_dbdata_init [0x011653EE+653550]
	sqlite3_dbdata_init [0x01164E09+652041]
	sqlite3_dbdata_init [0x011597CC+605388]
	sqlite3_dbdata_init [0x01165D9B+656027]
	(No symbol) [0x00F3FE6C]
	(No symbol) [0x00F383B8]
	(No symbol) [0x00F384DD]
	(No symbol) [0x00F25818]
	BaseThreadInitThunk [0x750DFA29+25]
	RtlGetAppContainerNamedObjectPath [0x76FE7A4E+286]
	RtlGetAppContainerNamedObjectPath [0x76FE7A1E+238]


## Step 4 (Data Processing)

### Getting the data from XLS files to process with pandas

For this first moment, let's just get 1 sample of balance and income (using "break" in the end of the code) to understand the changes that need to be done before inserting the data into a dictionary

In [3]:
#get all the files available (xls files)
files = os.listdir(config.destination_path_xls)
# iterate for ech file
for file in files:
    #create the file_name to be the key of the dictionary
    if '11' in file: #some codes has 11
        file_name = file[-10:-4] #get the code
    else:
        file_name = file[-9:-4] #get the code
    #check if the file_name is in the list  companies (this is to help in case we want to analyze only some campanies)
    if file_name in config.companies:
        #get the company balance
        balance = pd.read_excel(f'./balances/extracted/{file}', sheet_name=0)
        #get the company income statement
        income = pd.read_excel(f'./balances/extracted/{file}', sheet_name=1)
        break #to get only one result




### Let's see the head of both tables to understand what need to be done first

In [4]:
balance.head()

Unnamed: 0,"XLSWrite 1.34 Copyright(c) 1999,2000 Axolot Data",Balanço Patrimonial - AMBEV S/A,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,...,Unnamed: 35,Unnamed: 36,Unnamed: 37,Unnamed: 38,Unnamed: 39,Unnamed: 40,Unnamed: 41,Unnamed: 42,Unnamed: 43,Unnamed: 44
0,,31/12/2012,31/03/2013,30/06/2013,30/09/2013,31/12/2013,31/03/2014,30/06/2014,30/09/2014,31/12/2014,...,30/06/2021,30/09/2021,31/12/2021,31/03/2022,30/06/2022,30/09/2022,31/12/2022,31/03/2023,30/06/2023,30/09/2023
1,Ativo Total,1346301.056,1340374.016,58739269.632,59618975.744,68674015.232,63297044.48,62989045.76,65125920.768,72143200.256,...,124440133.632,135133249.536,138602479.616,127399919.616,136633409.536,142063960.064,137958080.512,135466721.28,133294415.872,137914204.16
2,Ativo Circulante,71641,77552,12057052.16,12478373.888,20470011.904,16352306.176,15773268.992,15446576.128,20728420.352,...,32705665.024,38197080.064,38627139.584,34479796.224,38238560.256,41556963.328,37816713.216,35378688,34324092.928,37552668.672
3,Caixa e Equivalentes de Caixa,48155,74204,4482174.976,4835169.792,11285832.704,7296176.128,6273862.144,5748115.968,9722066.944,...,13269346.304,17956171.776,16627697.664,12887921.664,14129258.496,17712654.336,14926435.328,12214085.632,12117013.504,17413906.432
4,Aplicações Financeiras,0,0,486132.992,612489.024,288604,410172.992,379937.984,526788.992,712958.016,...,1245607.04,2044573.952,1914606.976,1345730.048,1535714.048,1347216,454496.992,365284,313504,227164


In [5]:
income.head(5)

Unnamed: 0,"XLSWrite 1.34 Copyright(c) 1999,2000 Axolot Data",Demonstrativo de Resultado - AMBEV S/A - Trimestres Isolados,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,...,Unnamed: 35,Unnamed: 36,Unnamed: 37,Unnamed: 38,Unnamed: 39,Unnamed: 40,Unnamed: 41,Unnamed: 42,Unnamed: 43,Unnamed: 44
0,,,31/03/2013,30/06/2013,30/09/2013,31/12/2013,31/03/2014,30/06/2014,30/09/2014,31/12/2014,...,30/06/2021,30/09/2021,31/12/2021,31/03/2022,30/06/2022,30/09/2022,31/12/2022,31/03/2023,30/06/2023,30/09/2023
1,Receita Bruta de Vendas e/ou Serviços,,,,,,,,,,...,,,,,,,,,,
2,Deduções da Receita Bruta,,,,,,,,,,...,,,,,,,,,,
3,Receita Líquida de Vendas e/ou Serviços,,10617,7503133.184,8462602.752,18815037.44,9045071.872,8177433.088,8624396.288,12232882.176,...,15711140.864,18492608.512,22010836.992,18439151.616,17988995.072,20587642.88,22693033.984,20531744.768,18898114.56,20317763.584
4,Custo de Bens e/ou Serviços Vendidos,,0,-2626988.032,-2834846.976,-5935966.208,-3008314.112,-3040666.112,-2955760.128,-3809847.552,...,-7965268.992,-9253070.848,-10496073.728,-9414486.016,-9374254.08,-10648073.216,-10985254.912,-10131684.352,-9635608.576,-10223017.984


### It is possible to see that in both dataframes we need to:

- Insert the company code in the first column (header)
- Transform the first row in a the header
- Transform first column into index

In [6]:
# Insert the company code in the first column (header)
balance.iloc[0,0] = file_name
income.iloc[0,0] = file_name

In [7]:
# Transform the first row in a the header
balance.columns = balance.iloc[0] #insert first row as header
balance = balance[1:] #get first row until the end (drop the line 0 that was duplicated)
income.columns = income.iloc[0] #insert first row as header
income = income[1:] #get first row until the end (drop the line 0 that was duplicated)

In [8]:
# Transform first column into index
balance = balance.set_index(file_name)
income = income.set_index(file_name)

### Let's take a look in the columns

In [9]:
#show columns
print(balance.columns)
print(income.columns)

Index(['31/12/2012', '31/03/2013', '30/06/2013', '30/09/2013', '31/12/2013',
       '31/03/2014', '30/06/2014', '30/09/2014', '31/12/2014', '31/03/2015',
       '30/06/2015', '30/09/2015', '31/12/2015', '31/03/2016', '30/06/2016',
       '30/09/2016', '31/12/2016', '31/03/2017', '30/06/2017', '30/09/2017',
       '31/12/2017', '31/03/2018', '30/06/2018', '30/09/2018', '31/12/2018',
       '31/03/2019', '30/06/2019', '30/09/2019', '31/12/2019', '31/03/2020',
       '30/06/2020', '30/09/2020', '31/12/2020', '31/03/2021', '30/06/2021',
       '30/09/2021', '31/12/2021', '31/03/2022', '30/06/2022', '30/09/2022',
       '31/12/2022', '31/03/2023', '30/06/2023', '30/09/2023'],
      dtype='object', name=0)
Index([         nan, '31/03/2013', '30/06/2013', '30/09/2013', '31/12/2013',
       '31/03/2014', '30/06/2014', '30/09/2014', '31/12/2014', '31/03/2015',
       '30/06/2015', '30/09/2015', '31/12/2015', '31/03/2016', '30/06/2016',
       '30/09/2016', '31/12/2016', '31/03/2017', '30/06/201

It is possible to see that the balance contains data from 31/12/2012 and income dataframe starts in 31/03/2013

So let's drop the first column of balance table, this way both will have data from the same period. For the complete data we will do it later, to have the big picture of the time ranges. 

In [10]:
#drop first columns of balance
balance = balance.drop(columns=balance.columns[0], axis=1)
#drop first column of income (used a different methos due the nan name of the column)
del income[income.columns[0]]

In [11]:
#show columns
print(balance.columns)
print(income.columns)

Index(['31/03/2013', '30/06/2013', '30/09/2013', '31/12/2013', '31/03/2014',
       '30/06/2014', '30/09/2014', '31/12/2014', '31/03/2015', '30/06/2015',
       '30/09/2015', '31/12/2015', '31/03/2016', '30/06/2016', '30/09/2016',
       '31/12/2016', '31/03/2017', '30/06/2017', '30/09/2017', '31/12/2017',
       '31/03/2018', '30/06/2018', '30/09/2018', '31/12/2018', '31/03/2019',
       '30/06/2019', '30/09/2019', '31/12/2019', '31/03/2020', '30/06/2020',
       '30/09/2020', '31/12/2020', '31/03/2021', '30/06/2021', '30/09/2021',
       '31/12/2021', '31/03/2022', '30/06/2022', '30/09/2022', '31/12/2022',
       '31/03/2023', '30/06/2023', '30/09/2023'],
      dtype='object', name=0)
Index(['31/03/2013', '30/06/2013', '30/09/2013', '31/12/2013', '31/03/2014',
       '30/06/2014', '30/09/2014', '31/12/2014', '31/03/2015', '30/06/2015',
       '30/09/2015', '31/12/2015', '31/03/2016', '30/06/2016', '30/09/2016',
       '31/12/2016', '31/03/2017', '30/06/2017', '30/09/2017', '31/12/201

### Now we are ready to collect the data of all the companies and insert into the dictionary

In [12]:
#create a dictionary to store the data
fundamentus = {}
#get all the files available (xls files)
files = os.listdir(config.destination_path_xls)
# iterate for ech file
for file in files:
    #create the file_name to be the key of the dictionary
    if '11' in file: #some codes has 11
        file_name = file[-10:-4] #get the code
    else:
        file_name = file[-9:-4] #get the code
    #check if the file_name is in the list  companies (this is to help in case we want to analyze only some campanies)
    if file_name in config.companies:
        #get the company balance
        balance = pd.read_excel(f'./balances/extracted/{file}', sheet_name=0)
        #get the company income statement
        income = pd.read_excel(f'./balances/extracted/{file}', sheet_name=1)
        # Insert the company code in the first column (header)
        balance.iloc[0,0] = file_name
        income.iloc[0,0] = file_name
        # Transform the first row in a the header
        balance.columns = balance.iloc[0] #insert first row as header
        balance = balance[1:] #get first row until the end (drop the line 0 that was duplicated)
        income.columns = income.iloc[0] #insert first row as header
        income = income[1:] #get first row until the end (drop the line 0 that was duplicated)
        # Transform first column into index
        balance = balance.set_index(file_name)
        income = income.set_index(file_name)
        # Insert data into the dictionary
        #fundamentus[file_name] = balance._append(income)  
        fundamentus[file_name] = pd.concat([balance,income])
        # drop the columns name 'nan'
        for i in range(0,len(fundamentus[file_name].columns)): #repeat from 0 to len(columns)
            # need the try due and error on the check for non numeric columns headers
            try:
                if math.isnan(fundamentus[file_name].columns[i]): #check idf it is a nan
                    # drop columns in case of yes
                    fundamentus[file_name] = fundamentus[file_name].drop(columns=fundamentus[file_name].columns[i])
            # do nothing in case of error on the try
            except: 
                None # do nothing





### Getting the historical prices of shares

In [13]:
fundamentus[config.companies[1]].columns[-1]

'30/09/2023'

In [14]:
# end date 
from datetime import datetime, timedelta
end_date = fundamentus[config.companies[1]].columns[-1]
end_date_dt = datetime.strptime(end_date, "%d/%m/%Y")
start_date_dt = (end_date_dt - timedelta(days= 5 * 365))
end_date_str = end_date_dt.strftime("%Y-%m-%d")
start_date_str = start_date_dt.strftime("%Y-%m-%d")
print(f'Start date: {end_date_str}')
print(f'End date: {start_date_str}')

Start date: 2023-09-30
End date: 2018-10-01


In [15]:
cotation = {}
for company in  ["ABEV3", "AZUL4", "B3SA3"]:
    cotation_df = yf.download(f"{company}.SA", start=start_date_str, end=end_date_str, progress=False)
    cotation_df['Company'] = company
    cotation[company] = cotation_df

cotation_bkp = cotation

In [16]:
cotation = cotation_bkp

In [17]:
fundamentus['ABEV3'].columns

Index(['31/12/2012', '31/03/2013', '30/06/2013', '30/09/2013', '31/12/2013',
       '31/03/2014', '30/06/2014', '30/09/2014', '31/12/2014', '31/03/2015',
       '30/06/2015', '30/09/2015', '31/12/2015', '31/03/2016', '30/06/2016',
       '30/09/2016', '31/12/2016', '31/03/2017', '30/06/2017', '30/09/2017',
       '31/12/2017', '31/03/2018', '30/06/2018', '30/09/2018', '31/12/2018',
       '31/03/2019', '30/06/2019', '30/09/2019', '31/12/2019', '31/03/2020',
       '30/06/2020', '30/09/2020', '31/12/2020', '31/03/2021', '30/06/2021',
       '30/09/2021', '31/12/2021', '31/03/2022', '30/06/2022', '30/09/2022',
       '31/12/2022', '31/03/2023', '30/06/2023', '30/09/2023'],
      dtype='object', name=0)

In [18]:
# Converting the index as date
#fundamentus['ABEV3'].index = pd.to_datetime(fundamentus['ABEV3'].index)
type(cotation['ABEV3'].index.values[1])
date = cotation['ABEV3'].index.values[1]
# Converter para objeto datetime
data_datetime = np.datetime_as_string(date, unit='D')  # 'D' representa dias
# Formatar a data no formato desejado
data_formatada = datetime.strptime(data_datetime, "%Y-%m-%d").strftime("%d/%m/%Y")
print(data_formatada)

02/10/2018


In [19]:
cols = fundamentus['ABEV3'].columns
new_cols = []
for col in cols:
    #print('col')
    #print(col)
    # Converter a string para um objeto datetime
    col = datetime.strptime(col, "%d/%m/%Y")
    # Converter o objeto datetime para numpy.datetime64
    col = np.datetime64(col)
    new_cols.append(col)
    #print('new col')
    #print(new_cols)


In [20]:
start_date_np = datetime.strptime(start_date_dt.strftime("%d/%m/%Y"), "%d/%m/%Y")
start_date_np = np.datetime64(start_date_np)
new_cols_remove = []
for col in new_cols:
    if col < start_date_np:
        new_cols_remove.append(col)
    #    display(new_cols)

In [21]:
new_cols = [item for item in new_cols if item not in new_cols_remove]

In [22]:
display(new_cols)

[numpy.datetime64('2018-12-31T00:00:00.000000'),
 numpy.datetime64('2019-03-31T00:00:00.000000'),
 numpy.datetime64('2019-06-30T00:00:00.000000'),
 numpy.datetime64('2019-09-30T00:00:00.000000'),
 numpy.datetime64('2019-12-31T00:00:00.000000'),
 numpy.datetime64('2020-03-31T00:00:00.000000'),
 numpy.datetime64('2020-06-30T00:00:00.000000'),
 numpy.datetime64('2020-09-30T00:00:00.000000'),
 numpy.datetime64('2020-12-31T00:00:00.000000'),
 numpy.datetime64('2021-03-31T00:00:00.000000'),
 numpy.datetime64('2021-06-30T00:00:00.000000'),
 numpy.datetime64('2021-09-30T00:00:00.000000'),
 numpy.datetime64('2021-12-31T00:00:00.000000'),
 numpy.datetime64('2022-03-31T00:00:00.000000'),
 numpy.datetime64('2022-06-30T00:00:00.000000'),
 numpy.datetime64('2022-09-30T00:00:00.000000'),
 numpy.datetime64('2022-12-31T00:00:00.000000'),
 numpy.datetime64('2023-03-31T00:00:00.000000'),
 numpy.datetime64('2023-06-30T00:00:00.000000'),
 numpy.datetime64('2023-09-30T00:00:00.000000')]

In [27]:
cotation['ABEV3']

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,Company
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-10-01,18.350000,18.549999,18.350000,18.430000,15.599095,6236800,ABEV3
2018-10-02,18.790001,18.799999,17.980000,18.250000,15.446741,26709600,ABEV3
2018-10-03,18.740000,18.740000,18.030001,18.129999,15.345175,21377300,ABEV3
2018-10-04,18.150000,18.200001,17.660000,17.870001,15.125111,12303600,ABEV3
2018-10-05,18.040001,18.080000,17.520000,17.650000,14.938905,16164800,ABEV3
...,...,...,...,...,...,...,...
2023-09-25,13.150000,13.330000,13.110000,13.220000,13.220000,20229100,ABEV3
2023-09-26,13.190000,13.230000,13.000000,13.040000,13.040000,29094600,ABEV3
2023-09-27,13.080000,13.100000,12.860000,12.970000,12.970000,32045700,ABEV3
2023-09-28,13.000000,13.130000,12.930000,12.980000,12.980000,43949200,ABEV3


In [24]:
cotation['ABEV3'].loc[date_sub_x_days].values

NameError: name 'date_sub_x_days' is not defined

In [34]:
x = 0
new_cols = new_cols
for col in new_cols:
    if col in cotation['ABEV3'].index:
        print('OK')
    else: 
        print(f'Need to add {col} in the cotation table')
        x += 1

        for sub_days in range (1,10):
            # Subtrair um dia
            days = np.timedelta64(sub_days, 'D')
            date_sub_x_days = col - days
            if date_sub_x_days in cotation['ABEV3'].index:
                #new_line_data = cotation['ABEV3'].loc[date_sub_x_days]
                new_line_data = pd.Series(name=col, data = cotation['ABEV3'].loc[date_sub_x_days])
                print(new_line_data)
                break

        #new_line = pd.Series(new_line_data, index=col)
        #new_line = pd.Series(data=new_line_data)
        #cotation['ABEV3'].loc[new_line.index] = new_line
        #cotation['ABEV3'] = pd.concat([cotation['ABEV3'], new_line.to_frame().T])
        cotation['ABEV3'] = cotation['ABEV3']._append(new_line_data)
        print('Added')
print(x)

Need to add 2018-12-31T00:00:00.000000 in the cotation table
Open             15.23
High             15.67
Low               15.2
Close            15.38
Adj Close    13.285298
Volume        15498800
Company          ABEV3
Name: 2018-12-31T00:00:00.000000, dtype: object
Added
Need to add 2019-03-31T00:00:00.000000 in the cotation table
Open         16.870001
High         16.969999
Low          16.700001
Close            16.83
Adj Close    14.537812
Volume        15839700
Company          ABEV3
Name: 2019-03-31T00:00:00.000000, dtype: object
Added
Need to add 2019-06-30T00:00:00.000000 in the cotation table
Open             18.16
High         18.200001
Low          17.780001
Close        17.889999
Adj Close    15.453443
Volume        17249500
Company          ABEV3
Name: 2019-06-30T00:00:00.000000, dtype: object
Added
OK
Need to add 2019-12-31T00:00:00.000000 in the cotation table
Open         19.200001
High             19.35
Low              18.67
Close            18.67
Adj Close    16.

In [31]:
cotation['ABEV3']._append(new_line_data)

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,Company
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-10-01,18.350000,18.549999,18.350000,18.430000,15.599095,6236800,ABEV3
2018-10-02,18.790001,18.799999,17.980000,18.250000,15.446741,26709600,ABEV3
2018-10-03,18.740000,18.740000,18.030001,18.129999,15.345175,21377300,ABEV3
2018-10-04,18.150000,18.200001,17.660000,17.870001,15.125111,12303600,ABEV3
2018-10-05,18.040001,18.080000,17.520000,17.650000,14.938905,16164800,ABEV3
...,...,...,...,...,...,...,...
2023-09-26,13.190000,13.230000,13.000000,13.040000,13.040000,29094600,ABEV3
2023-09-27,13.080000,13.100000,12.860000,12.970000,12.970000,32045700,ABEV3
2023-09-28,13.000000,13.130000,12.930000,12.980000,12.980000,43949200,ABEV3
2023-09-29,13.080000,13.170000,13.050000,13.110000,13.110000,22876500,ABEV3


In [35]:
cotation['ABEV3']

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,Company
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-10-01,18.350000,18.549999,18.350000,18.430000,15.599095,6236800,ABEV3
2018-10-02,18.790001,18.799999,17.980000,18.250000,15.446741,26709600,ABEV3
2018-10-03,18.740000,18.740000,18.030001,18.129999,15.345175,21377300,ABEV3
2018-10-04,18.150000,18.200001,17.660000,17.870001,15.125111,12303600,ABEV3
2018-10-05,18.040001,18.080000,17.520000,17.650000,14.938905,16164800,ABEV3
...,...,...,...,...,...,...,...
2019-12-31,19.200001,19.350000,18.670000,18.670000,16.550779,17430500,ABEV3
2020-12-31,16.049999,16.100000,15.650000,15.650000,14.240406,19432700,ABEV3
2021-12-31,15.580000,15.590000,15.390000,15.420000,14.648190,11819300,ABEV3
2022-12-31,14.670000,14.670000,14.430000,14.520000,14.520000,22326300,ABEV3


In [29]:
cotation['ABEV3'][-8:]

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,Company
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2023-09-20,13.47,13.56,13.45,13.47,13.47,15729400,ABEV3
2023-09-21,13.32,13.4,13.2,13.24,13.24,29875400,ABEV3
2023-09-22,13.25,13.28,13.1,13.12,13.12,16325200,ABEV3
2023-09-25,13.15,13.33,13.11,13.22,13.22,20229100,ABEV3
2023-09-26,13.19,13.23,13.0,13.04,13.04,29094600,ABEV3
2023-09-27,13.08,13.1,12.86,12.97,12.97,32045700,ABEV3
2023-09-28,13.0,13.13,12.93,12.98,12.98,43949200,ABEV3
2023-09-29,13.08,13.17,13.05,13.11,13.11,22876500,ABEV3


In [43]:
cotation['ABEV3'].loc[[np.datetime64('2023-09-29'),np.datetime64('2023-09-30')]]

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,Company
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2023-09-29,13.08,13.17,13.05,13.11,13.11,22876500,ABEV3
2023-09-30,13.08,13.17,13.05,13.11,13.11,22876500,ABEV3


In [None]:
cotation['ABEV3'].loc[cotation['ABEV3'].index[3]]

Open             18.15
High         18.200001
Low              17.66
Close        17.870001
Adj Close    15.125112
Volume        12303600
Company          ABEV3
Name: 2018-10-04 00:00:00, dtype: object