# Process Description: Obtaining Fundamentus Share Data

## 1. Web Scraping with Selenium
- Utilize Selenium, a web automation tool, to navigate to the Fundamentus website.
- Identify and locate the necessary HTML elements containing the shares data.
- Use Selenium to interact with the webpage, extract the required information such as stock prices, financial metrics, and other relevant data.

## 2. Downloading Files
- Identify and locate any downloadable files on the Fundamentus website, such as the general balance report of each company.
- Use Selenium to simulate the process of clicking download buttons or links.
- Implement logic to wait for file downloads to complete before proceeding.

## 3. Extracting from ZIP
- If the downloaded files are in ZIP format, use Python's zipfile module to programmatically extract the contents.
- Identify the target files within the ZIP archive that contain the shares data.
- Extract the relevant files to a specified directory for further processing.

## 4. Data Processing
- Once the shares data is obtained, processed, and extracted, proceed with any necessary data cleaning, transformation, or analysis.
- Utilize Python libraries such as Pandas for efficient data manipulation.

## 5. Data Storage or Analysis
- Depending on the project requirements, store the processed shares data in a database, XLS file, or other data storage solutions.
- Conduct further analysis, visualization, or reporting as needed.

### So, let's code it!

## Import the needed libraries

In [1]:
# Selenium for web automation
from selenium import webdriver  # pip install selenium or pip install --upgrade selenium
# Webdriver Manager for managing browser drivers
from webdriver_manager.chrome import ChromeDriverManager  # pip install webdriver-manager
# Selenium service for Chrome browser
from selenium.webdriver.chrome.service import Service
# Selenium common locator strategies (By)
from selenium.webdriver.common.by import By
# Time module for handling time-related operations
import time
# NumPy for numerical operations
import numpy as np
# PyAutoGUI for automating mouse and keyboard interactions
import pyautogui
# Selenium Keys for keyboard interactions
from selenium.webdriver.common.keys import Keys
# OS module for interacting with the operating system
import os
# shutil for high-level file operations
import shutil
# sys for accessing Python interpreter variables
import sys
# zipfile for working with ZIP archives
import zipfile
# Custom module for store the paths
import my_paths

## Steps 1, 2 and 3 (Web Scraping with Selenium, Downloading Files and Extracting from ZIP)

### Automate the process of finding and downloading the files. Moving to desired folder and extracting to XLS files

In [2]:
# Class definition for FundamentusBot
class FundamentusBot:
    # Constructor to configure the browser
    def __init__(self):
        # Configure Chrome options with download preferences (always ask where to save the file)
        chrome_options = webdriver.ChromeOptions()
        chrome_options.add_experimental_option("prefs", {
            "download.prompt_for_download": True,
            "safebrowsing.enabled": True
        })
        #create the service
        service = Service(ChromeDriverManager().install())
        #create and open the browser
        self.browser = webdriver.Chrome(service=service, options=chrome_options)

    # Method to open the Fundamentus website and navigate to the desired data section
    def open_site(self):
        #navigate to initial page of Fundamentus
        self.browser.get("https://fundamentus.com.br/")
        time.sleep(2) #wait 2 seconds 
        #click to open the Fundamentus Mobile
        self.browser.find_element(By.XPATH, '/html/body/div[1]/div[1]/div[2]/ul[1]/li[5]/a').click()
        time.sleep(3) #wait
        #call the method open_data
        self.open_data()

    # Method to navigate to the data section and process data for a list of companies
    def open_data(self):
        #list of companies of interest
        companies = ["ABEV3", "AZUL4", "B3SA3", "BBSE3", "BRML3", "BBDC4", "BRAP4", "BBAS3", "BRKM5", "BRFS3", "BPAC11", "CRFB3", "CCRO3", "CMIG4", "HGTX3", "CIEL3", "COGN3", "CPLE6", "CSAN3", "CPFE3", "CVCB3", "CYRE3", "ECOR3", "ELET6", "EMBR3", "ENBR3", "ENGI11", "ENEV3", "EGIE3", "EQTL3", "EZTC3", "FLRY3", "GGBR4", "GOAU4", "GOLL4", "NTCO3", "HAPV3", "HYPE3", "IGTA3", "GNDI3", "ITSA4", "ITUB4", "JBSS3", "JHSF3", "KLBN11", "RENT3", "LCAM3", "LAME4", "LREN3", "MGLU3", "MRFG3", "BEEF3", "MRVE3", "MULT3", "PCAR3", "PETR4", "BRDT3", "PRIO3", "QUAL3", "RADL3", "RAIL3", "SBSP3", "SANB11", "CSNA3", "SULA11", "SUZB3", "TAEE11", "VIVT3", "TIMS3", "TOTS3", "UGPA3", "USIM5", "VALE3", "VVAR3", "WEGE3", "YDUQ3"]
        #iterate the companies list
        for company in companies:
            #get company
            self.company = company
            #call the method that download the information
            self.dwnld_company_info()
        time.sleep(5) #wait
        # call the method that move the .zip files from 'source_path_download' to 'destination_path_zip'
        self.move_files()
        # call the method that extract the XLS files from the .zip and save in 'destination_path_xls'
        self.extract_files()
        # close the browser
        self.browser.quit()

    # Method to download company information from the Fundamentus website
    def dwnld_company_info(self):
        #try to do the sequence of actions
        try:
            # find the search bar and type the company
            self.browser.find_element(By.XPATH, '/html/body/div[1]/div[1]/div[2]/form/input[1]').send_keys(self.company)
            time.sleep(2) #wait
            # press enter to open the company's data
            self.browser.find_element(By.XPATH, '/html/body/div[1]/div[1]/div[2]/form/input[1]').send_keys(Keys.ENTER)
            time.sleep(2) #wait
            # click in 'Informações'
            self.browser.find_element(By.XPATH, '/html/body/div[2]/div[2]/div/div/div[2]/ul/li[3]/span').click()
            time.sleep(2) #wait
            # click in "Balanços em EXCEL"
            self.browser.find_element(By.XPATH, '/html/body/div[2]/div[2]/div/div/div[2]/ul/li[3]/ul/li[10]').click()
            time.sleep(3) #wait
            #get the code of the company 
            paper =self.browser.find_element(By.XPATH, '/html/body/div[2]/div[1]/div/div/div[1]/h1').text
            time.sleep(3) #wait
            # click in "baixar" to download the file
            self.browser.find_element(By.XPATH, '//*[@id="form-planilha"]/a').click()
            time.sleep(3) #wait
            # typing a new name for the downloadable file
            pyautogui.typewrite(f'bal_{paper}.zip')
            time.sleep(3) #wait
            # press ENTER to complete the download
            pyautogui.press('enter')
        # in case of any problem during the actions (try), the exception runs
        except Exception as e:
            # show the errors
            print(f'Error: {e}')
            # print a message
            print(f'There was an errror with the company: {self.company}')

    # Method to move downloaded files to a specified destination
    def move_files(self):
        #get all the files from "download" folder
        files_list = [f for f in os.listdir(my_paths.source_path_download) if os.path.isfile(os.path.join(my_paths.source_path_download, f))]
        # iterate all the files
        for file in files_list:
            # get the first 3 characters of the file name
            file_name = file[0:3]
            # check if the file name starts with 'bal' 
            if file_name == 'bal':
                # Check if the destination file exists
                if os.path.exists(f'{my_paths.destination_path_zip}/{file}'):
                    # If it exists, remove it
                    os.remove(f'{my_paths.destination_path_zip}/{file}')
                # move the file to the desired destination
                shutil.move(f'{my_paths.source_path_download}/{file}', my_paths.destination_path_zip)


    # Method to extract files from ZIP archives
    def extract_files(self):
        # get all the files from the directory
        files_list = [f for f in os.listdir(my_paths.destination_path_zip) if os.path.isfile(os.path.join(my_paths.destination_path_zip, f))]
        # iterate each file
        for file in files_list:
            # get the first 3 characters of the file name
            file_name = file[0:3]
            # check if the file name starts with 'bal' 
            if file_name == 'bal': 
                # Check if the destination file exists
                if os.path.exists(f'{my_paths.destination_path_xls}/{file[0:-4]}.xls'):
                    # If it exists, remove it
                    os.remove(f'{my_paths.destination_path_xls}/{file[0:-4]}.xls') 
                # Using the 'with' statement to open the ZIP archive for reading ('r' mode)
                # The archive file path is constructed using the destination path for ZIP files and the current file in the loop
                with zipfile.ZipFile(f'{my_paths.destination_path_zip}/{file}', 'r') as zip_ref:
                    # Extract the specific file to the specified folder
                    zip_ref.extract('balanco.xls',f'{my_paths.destination_path_xls}/')
                    # Rename the file
                    source_path = f'{my_paths.destination_path_xls}/' #path of the file
                    file_to_rename = 'balanco.xls' #current file name
                    new_filename = f'{file[0:-4]}.xls' #desired file name
                    old_file_path = os.path.join(source_path,file_to_rename) #old file
                    new_file_path = os.path.join(source_path, new_filename) #new file
                    os.rename(old_file_path, new_file_path) #rename
                
# Create an instance of FundamentusBot and execute the process
bot = FundamentusBot()
bot.open_site()


Error: Message: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/div[2]/div[2]/div/div/div[2]/ul/li[3]/span"}
  (Session info: chrome=120.0.6099.72); For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors#no-such-element-exception
Stacktrace:
	GetHandleVerifier [0x005B6E73+174291]
	(No symbol) [0x004E0AC1]
	(No symbol) [0x001F6FF6]
	(No symbol) [0x00229876]
	(No symbol) [0x00229C2C]
	(No symbol) [0x0025BD42]
	(No symbol) [0x00247054]
	(No symbol) [0x0025A104]
	(No symbol) [0x00246DA6]
	(No symbol) [0x00221034]
	(No symbol) [0x00221F8D]
	GetHandleVerifier [0x006549CC+820268]
	sqlite3_dbdata_init [0x00714EBE+652494]
	sqlite3_dbdata_init [0x007148D9+650985]
	sqlite3_dbdata_init [0x0070962C+605244]
	sqlite3_dbdata_init [0x0071586B+654971]
	(No symbol) [0x004EFEBC]
	(No symbol) [0x004E8428]
	(No symbol) [0x004E854D]
	(No symbol) [0x004D5858]
	BaseThreadInitThunk [0x750DFA29+25]
	RtlGetAppConta

FileNotFoundError: [WinError 2] O sistema não pode encontrar o arquivo especificado: 'C:/Users/felip/Documents/GitHub/StockMarketPrediction/balances/bal_ABEV3.zip'

In [4]:
# get all the files from the directory
files_list = [f for f in os.listdir(my_paths.destination_path_zip) if os.path.isfile(os.path.join(my_paths.destination_path_zip, f))]
# iterate each file
for file in files_list:
    # get the first 3 characters of the file name
    file_name = file[0:3]
    # check if the file name starts with 'bal' 
    if file_name == 'bal': 
        # Check if the destination file exists
        if os.path.exists(f'{my_paths.destination_path_xls}/{file[0:-4]}.xls'):
            # If it exists, remove it
            os.remove(f'{my_paths.destination_path_xls}/{file[0:-4]}.xls') 
        # Using the 'with' statement to open the ZIP archive for reading ('r' mode)
        # The archive file path is constructed using the destination path for ZIP files and the current file in the loop
        with zipfile.ZipFile(f'{my_paths.destination_path_zip}/{file}', 'r') as zip_ref:
            # Extract the specific file to the specified folder
            zip_ref.extract('balanco.xls',f'{my_paths.destination_path_xls}/')
            # Rename the file
            source_path = f'{my_paths.destination_path_xls}/' #path of the file
            file_to_rename = 'balanco.xls' #current file name
            new_filename = f'{file[0:-4]}.xls' #desired file name
            old_file_path = os.path.join(source_path,file_to_rename) #old file
            new_file_path = os.path.join(source_path, new_filename) #new file
            os.rename(old_file_path, new_file_path) #rename

## Step 4 (Data Processing)