# Extracting and Visualizing Stock Data

### Description

Extracting essential data from a dataset and displaying it is a necessary part of data science, from which individuals can make appropriate decisions based on data. In this notebook, I am showing how to extract financial data in two different ways :
- the *yfinance* API to retieve data from Yahoo Finance
- webscraping internet pages using the *BeautifulSoup* library

In [None]:
#Install the required libraries
!pip install yfinance
!pip install pandas
!pip install requests
!pip install bs4
!pip install plotly

In [None]:
#Import the functions you need
import yfinance as yf 
import pandas as pd
import requests
from bs4 import BeautifulSoup
import plotly.graph_objects as go
from plotly.subplots import make_subplots

### Extracting Stock Data using yfinance

YFinance was created to help the programs and users who were relying on the historical **Yahoo Finance API** after its closure. It solves the problem by allowing users to download data using python and it has some great features which also makes it favourable to use for stock data analysis. YFinance not only downloads **the Stock Price data** it also allows us to download **all the financial data of a Company** since its listing in the stock market. Itâ€™s easy to use and is blazingly fast. This library is pretty famous for Financial Data Analysis. 

The `Ticker` module allows to access ticker data using python. See all features and attributes at [yfinance](https://pypi.org/project/yfinance/).

In [None]:
tesla = yf.Ticker('TSLA')
gme = yf.Ticker('GME')

Pass the function `.history()` to your ticker object to extract stock information and save it in a DataFrame. Set the period using the *period* parameter or *start* and *end*.

In [None]:
tesla_data = tesla.history(period = 'max')
gme_data = gme.history(start = '2002-02-13', end = '2021-06-14')

In [None]:
#Display the first rows of your DataFrames
print(tesla_data.head())
print(gme_data.head())

In [None]:
#Reset the index of your DataFrames
tesla_data.reset_index(inplace = True)
gme_data.reset_index(inplace = True)

### Webscraping to extract Revenue Data with BeautifulSoup

BeautifulSoup is a Python library for pulling data out of HTML and XML files. There are a lot of very useful attributes you can pass to a BeautfilSoup object such as `.prettify()`, `.title`, `.find_all()` ... 
See all features at [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) .
In this part, we are using BeautifulSoup in order to extract revenue data from the *macrotrends.com* website.

Use the requests library to download the webpages :
- https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue
- https://www.macrotrends.net/stocks/charts/GME/gamestop/revenue

In [None]:
url_tesla = 'https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue'
url_gme = 'https://www.macrotrends.net/stocks/charts/GME/gamestop/revenue'

#The requests library collects the HTML Data
tesla_html_data = requests.get(url_tesla)
gme_html_data = requests.get(url_gme)

In [None]:
# If you want to use another parser for the following part, you might need to download it
# !pip install html5lib

In [None]:
#Parse the html data (.text to use the content) using BeautifulSoup
tesla_soup = BeautifulSoup(tesla_html_data.text, 'html.parser')
gme_soup = BeautifulSoup(gme_html_data.text, 'html.parser')

We want to retrieve the "Quarterly Revenue" table from the webpages and store it in a DataFrame.\
(Basic knowledge of HTML is required)

In [None]:
#Create a DataFrame to store the values
tesla_revenue = pd.DataFrame(columns = ["Date", "Revenue"])

#Search for the "Quarterly Revenue" table in the soup object
for table in tesla_soup.find_all("table"):
    if table.thead.th.text == "Tesla Quarterly Revenue(Millions of US $)":
        tesla_qr = table
        
#Copy all the rows of the table into the DataFrame
for row in tesla_qr.find('tbody').find_all('tr'):
    col = row.find_all('td')
    date = col[0].text
    revenue = col[1].text
    
    #Remove the $ sign and the comma from the values
    tesla_revenue = tesla_revenue.append({"Date" : date, "Revenue" : revenue[1:].replace(',', '')}, ignore_index=True)

tesla_revenue

It appears that some values are missing on rows 45 and 48. We are removing them from the DataFrame.

In [None]:
index_to_drop = []
for i in range(tesla_revenue.shape[0]):
    if tesla_revenue['Revenue'][i] == '':
        index_to_drop.append(i)
        
tesla_revenue = tesla_revenue.drop(index_to_drop)
tesla_revenue.tail()

#Alternative version (much more efficient but less intuitive)
tesla_revenue = tesla_revenue[tesla_revenue['Revenue'] != ""]

#Reset the index and display the last rows of the Dataframe
tesla_revenue.reset_index(inplace = True)
tesla_revenue.tail()

In [None]:
#Repeat the process for GME
gme_revenue = pd.DataFrame(columns = ["Date", "Revenue"])

for table in gme_soup.find_all('table'):
    if table.thead.th.text == "GameStop Quarterly Revenue(Millions of US $)":
        gme_qr = table

for row in gme_qr.find('tbody').find_all('tr'):
    col = row.find_all('td')
    date = col[0].text
    revenue = col[1].text
    
    gme_revenue = gme_revenue.append({"Date" : date, "Revenue" : revenue[1:].replace(',', '')}, ignore_index = True)

#Change the displaying options of pandas to see the full DataFrame
pd.set_option("display.max_rows", None, "display.max_columns", None)
gme_revenue

It seems like there are no missing values. 
We are done extracting the Data.

### Visualizing Stock Data

We define a very simple `plot_graph` function in order to visualize our data. It takes 2 DataFrames as inputs `stock_data` and `revenue_data` as well as the name of the stock.

In [None]:
def plot_graph(stock_data, revenue_data, stock):
    fig = make_subplots(rows=2, cols=1, shared_xaxes=True, subplot_titles=("Historical Share Price", "Historical Revenue"), vertical_spacing = .3)
    fig.add_trace(go.Scatter(x=pd.to_datetime(stock_data.Date, infer_datetime_format=True), y=stock_data.Close.astype("float"), name="Share Price"), row=1, col=1)
    fig.add_trace(go.Scatter(x=pd.to_datetime(revenue_data.Date, infer_datetime_format=True), y=revenue_data.Revenue.astype("float"), name="Revenue"), row=2, col=1)
    fig.update_xaxes(title_text="Date", row=1, col=1)
    fig.update_xaxes(title_text="Date", row=2, col=1)
    fig.update_yaxes(title_text="Price ($US)", row=1, col=1)
    fig.update_yaxes(title_text="Revenue ($US Millions)", row=2, col=1)
    fig.update_layout(showlegend=False,
    height=900,
    title=stock,
    xaxis_rangeslider_visible=True)
    fig.show()

In [None]:
#Plot the Tesla Stock and Revenue graphs
plot_graph(tesla_data, tesla_revenue, 'Tesla Stock Data')

In [None]:
#Plot the GME Stock and Revenue graphs
plot_graph(gme_data, gme_revenue, 'GME Stock Data')

#### Change Log

| Date (YYYY-MM-DD) | Version | Changed By    | Change Description        |
| ----------------- | ------- | ------------- | ------------------------- |
| 2021-06-15        | 1.0     | Louis Sichel-Dulong | Added to Github       |
