# Exercici 1

Realitza web scraping d'una pàgina de la borsa de Madrid (https://www.bolsamadrid.es) utilitzant BeautifulSoup i Selenium.

Perform web scraping on a Madrid Stock Exchange page using BeautifulSoup and Selenium.

In [18]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
# https://www.youtube.com/watch?v=rhnMvvmfBFI



We will scrape the "Acciones" page from bolsademadrid.es, the page where we can find the current stock exchanges. First, we will use BeautifulSoup and then do the same with Selenium. 

In [3]:
#read url 
url = "https://www.bolsamadrid.es/esp/aspx/Mercados/Precios.aspx?indice=ESI100000000"
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")

In [21]:
#find the table we want to scrape
table = soup.find("table", id = "ctl00_Contenido_tblAcciones")

In [34]:
#get the content
cont = table.find_all("td")

content = list()

for i in cont:
    content.append(i.text)

In [12]:
#name the columns
columns = list(("Name", "Last", "% Diff", "Max", "Min", "Volum", "Cash", "Date", "Time"))

In [25]:
#create a dictionary
dic = {}

#skip every 9 items to separate the different data values
for i in range(9):
    dic[columns[i]] = content[i::9]

In [31]:
#create a dataframe from the dictionary
df = pd.DataFrame(data = dic)
df

Unnamed: 0,Name,Last,% Diff,Max,Min,Volum,Cash,Date,Time
0,ACCIONA,1800000,6,1817000,1797000,10.187,"1.838,67",31/05/2022,09:28:39
1,ACERINOX,119550,8,120000,118000,218.437,"2.603,18",31/05/2022,09:29:00
2,ACS,268500,60,268900,266200,58.228,"1.557,05",31/05/2022,09:29:07
3,AENA,1414500,-32,1423500,1408500,8.867,"1.256,01",31/05/2022,09:28:02
4,ALMIRALL,105300,-19,106400,105300,28.031,29672,31/05/2022,09:20:13
5,AMADEUS,593400,-214,599600,589200,47.648,"2.831,24",31/05/2022,09:29:13
6,ARCELORMIT.,303000,-66,304550,301600,42.289,"1.280,79",31/05/2022,09:28:49
7,B.SANTANDER,30545,-76,30730,30470,2.526.699,"7.720,93",31/05/2022,09:29:08
8,BA.SABADELL,8412,-5,8500,8390,1.741.257,"1.467,58",31/05/2022,09:28:52
9,BANKINTER,59620,-10,59880,59360,158.504,94426,31/05/2022,09:29:02


In [33]:
df.to_csv("precios_sesion_bf.csv", index = False)

We have saved the created dataframe into a csv file so that we can access it easily. Now we will do similarly using the Selenium library.

In [None]:
# https://www.scrapingbee.com/blog/web-scraping-101-with-python/#4-web-crawling-frameworks
# https://www.scrapingbee.com/blog/web-scraping-with-scrapy/
# https://www.scrapingbee.com/blog/selenium-python/

In [7]:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

#the path where chromedriver is
driver_path = "D:/users/ciberintegra_03/chromedriver_win32/chromedriver"

#setting up the chrome driver so it doesn't visibly open a window
options = Options()
options.headless = True

#read url
driver = webdriver.Chrome(options = options, executable_path = driver_path)
driver.get("https://www.bolsamadrid.es/esp/aspx/Mercados/Precios.aspx?indice=ESI100000000")

  # Remove the CWD from sys.path while we load stuff.


In [11]:
from selenium.webdriver.common.by import By

#get the table to scrape
table = driver.find_element(By.ID, "ctl00_Contenido_tblAcciones")

In [14]:
results = []

#get the content
for i in table.find_elements(By.TAG_NAME, "td"):
    results.append(i.text)    

In [16]:
#create a dictionary
dicc = {}

#skip every 9 items to separate the different data values
#using the same column name as with BeautifulSoup
for i in range(9):
    dicc[columns[i]] = results[i::9]

In [19]:
#create a dataframe from the dictionary
df2 = pd.DataFrame(data = dicc)
df2

Unnamed: 0,Name,Last,% Diff,Max,Min,Volum,Cash,Date,Time
0,ACCIONA,1877000,457,1881000,1831000,199.330,"36.650,10",01/06/2022,12:19:35
1,ACERINOX,120350,-25,122900,119150,911.697,"11.061,11",01/06/2022,12:15:01
2,ACS,265000,38,268400,264700,130.203,"3.473,50",01/06/2022,12:19:06
3,AENA,1419500,-14,1446500,1419500,20.217,"2.886,76",01/06/2022,12:12:15
4,ALMIRALL,104300,68,104900,103800,87.577,91425,01/06/2022,12:18:21
5,AMADEUS,575000,-59,584800,574800,151.740,"8.808,56",01/06/2022,12:18:27
6,ARCELORMIT.,294550,-175,302850,292450,173.968,"5.168,24",01/06/2022,12:17:05
7,B.SANTANDER,30210,22,30585,30195,4.764.177,"14.452,70",01/06/2022,12:19:24
8,BA.SABADELL,8380,60,8494,8368,8.113.274,"6.846,74",01/06/2022,12:19:24
9,BANKINTER,59340,27,60000,59280,3.342.838,"19.968,91",01/06/2022,12:19:10


In [21]:
df2.to_csv("precios_sesion_se.csv", index = False)

In [22]:
driver.close()

We have saved the dataframe into a csv file too so that we can access it at a later time. 

# Exercici 2

Documenta el teu conjunt de dades generat amb la informació que tenen els diferents arxius de Kaggle.

Document your dataset with the type of information on Kaggle datasets.

This dataset has been put together using the web scraping tools BeautifulSoup and Selenium, from the Bolsa de Madrid (Madrid Stock Exchange) webpage, bolsademadrid.es. It gathers stock variation of the top 35 Spanish corporations on a given day. This is known as Ibex 35, and it is the benchmark stock market index of the Bolsa de Madrid. 

These are the nine variables the dataset contains:
* Name - the name of the company
* Last - the last price recorded of a single stock
* % Diff - the percentage of difference from openning
* Max - maximum price recorded that day of a single stock
* Min - minimum price recorded that day of a single stock
* Volum - the number of shares that have been traded in that stock exchange session, 
* Cash - market value of a security
* Date - the date of the session
* Time - the time when the data was last updated