<h1>Extraer datos bursátiles usando WebScraping</h1>


No todos los datos de stock están disponibles a través de la API en esta asignación; aquí se va a usar web-scraping para obtener los datos financieros. Usando 'BeautifulSoup' y 'requests' extraeremos datos históricos de acciones de una página web.

<div class="alert alert-block alert-info" style="margin-top: 20px">
    <ul>
        <li>Descarga de la página web mediante la biblioteca Request</li>
        <li>Analizando HTML de la página web usando BeautifulSoup</li>
        <li>Extraer datos y construir DataFrame</li>
    </ul>
</div>

<hr>


In [1]:
!pip install pandas
!pip install requests
!pip install bs4
!pip install plotly

Collecting plotly
  Downloading plotly-4.14.3-py2.py3-none-any.whl (13.2 MB)
Collecting retrying>=1.3.3
  Downloading retrying-1.3.3.tar.gz (10 kB)
Building wheels for collected packages: retrying
  Building wheel for retrying (setup.py): started
  Building wheel for retrying (setup.py): finished with status 'done'
  Created wheel for retrying: filename=retrying-1.3.3-py3-none-any.whl size=11434 sha256=03a9cb7700431c3918069a7139163cc15327df3edfcc702354a4f00f13c30489
  Stored in directory: c:\users\cyber\appdata\local\pip\cache\wheels\c4\a7\48\0a434133f6d56e878ca511c0e6c38326907c0792f67b476e56
Successfully built retrying
Installing collected packages: retrying, plotly
Successfully installed plotly-4.14.3 retrying-1.3.3


In [2]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

## Extraer datos bursátiles usando WebScraping


Usando la librería `requests` descargamos la página web [https://finance.yahoo.com/quote/AMZN/history?period1=1451606400&period2=1612137600&interval=1mo&filter=history&frequency=1mo&includeAdjustedClose=true](https://finance.yahoo.com/quote/AMZN/history?period1=1451606400&period2=1612137600&interval=1mo&filter=history&frequency=1mo&includeAdjustedClose=true&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork-23455606&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork-23455606&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork-23455606&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ&cm_mmc=Email_Newsletter-_-Developer_Ed%2BTech-_-WW_WW-_-SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork-23455606&cm_mmca1=000026UJ&cm_mmca2=10006555&cm_mmca3=M12345678&cvosrc=email.Newsletter.M12345678&cvo_campaign=000026UJ). Salvamos el texto de respuesta como una variable llamada `html_data`.


In [3]:
url="https://finance.yahoo.com/quote/AMZN/history?period1=1451606400&period2=1612137600&interval=1mo&filter=history&frequency=1mo&includeAdjustedClose=true"
html_data=requests.get(url).text

Analizamos los datos HTML usando `beautiful_soup`.


In [4]:
soup = BeautifulSoup(html_data,"html5lib")

Para sacar el contenido de la etiqueta <code>title</code>


In [6]:
soup.title

<title>Amazon.com, Inc. (AMZN) Stock Historical Prices &amp; Data - Yahoo Finance</title>

 Usando BeautifulSoap, extraemos la tabla con los precios históricos de las acciones y la guardamos en 'dataframe' llamado `amazon_data`. El 'dataframe' debe tener las columnas Fecha, Apertura, Máxima, Mínima, Cierre, Cierre adjunto y Volumen. 


### Construimos el DataFrame

In [9]:

amazon_data = pd.DataFrame(columns=["Date", "Open", "High", "Low", "Close", "Volume"])

for row in soup.find("tbody").find_all("tr"):
    col = row.find_all("td")
    date =col[0].text
    Open =col[1].text
    high =col[2].text
    low =col[3].text
    close =col[4].text
    adj_close =col[5].text
    volume =col[6].text
    
    amazon_data = amazon_data.append({"Date":date, "Open":Open, "High":high, "Low":low, "Close":close, "Adj Close":adj_close, "Volume":volume}, ignore_index=True)
amazon_data    

Unnamed: 0,Date,Open,High,Low,Close,Volume,Adj Close
0,"Jan 01, 2021",3270.00,3363.89,3086.00,3206.20,71528900,3206.20
1,"Dec 01, 2020",3188.50,3350.65,3072.82,3256.93,77567800,3256.93
2,"Nov 01, 2020",3061.74,3366.80,2950.12,3168.04,90810500,3168.04
3,"Oct 01, 2020",3208.00,3496.24,3019.00,3036.15,116242300,3036.15
4,"Sep 01, 2020",3489.58,3552.25,2871.00,3148.73,115943500,3148.73
...,...,...,...,...,...,...,...
56,"May 01, 2016",663.92,724.23,656.00,722.79,90614500,722.79
57,"Apr 01, 2016",590.49,669.98,585.25,659.59,78464200,659.59
58,"Mar 01, 2016",556.29,603.24,538.58,593.64,94009500,593.64
59,"Feb 01, 2016",578.15,581.80,474.00,552.52,124144800,552.52


Imprimimos las cinco primeras filas del 'dataframe' `amazon_data`.


In [18]:
amazon_data.head(5)

Unnamed: 0,Date,Open,High,Low,Close,Volume,Adj Close
0,"Jan 01, 2021",3270.0,3363.89,3086.0,3206.2,71528900,3206.2
1,"Dec 01, 2020",3188.5,3350.65,3072.82,3256.93,77567800,3256.93
2,"Nov 01, 2020",3061.74,3366.8,2950.12,3168.04,90810500,3168.04
3,"Oct 01, 2020",3208.0,3496.24,3019.0,3036.15,116242300,3036.15
4,"Sep 01, 2020",3489.58,3552.25,2871.0,3148.73,115943500,3148.73


Para sacar el nombre de las columnas del 'dataframe'.


In [20]:
amazon_data.columns.values

array(['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close'],
      dtype=object)

Si queremos sacar el precio de apertura de la acción del 1 de junio del 2019 del 'dataframe' escribimos: 

In [57]:
amazon_data.loc[(amazon_data.Date=='Jun 01, 2019'),'Open']

19    1,760.01
Name: Open, dtype: object