## Extracting and Visualizing Stock Data

### Description
Extracting essential data from a dataset and displaying it is a necessary part of data science; therefore individuals can make correct decisions based on the data. In this assignment, you will extract some stock data, you will then display this data in a graph.

<h2>Table of Contents</h2>
<div class="alert alert-block alert-info" style="margin-top: 20px">
    <ul>
        <li>Define a Function that Makes a Graph</li>
        <li>Question 1: Use yfinance to Extract Stock Data</li>
        <li>Question 2: Use Webscraping to Extract Tesla Revenue Data</li>
        <li>Question 3: Use yfinance to Extract Stock Data</li>
        <li>Question 4: Use Webscraping to Extract GME Revenue Data</li>
        <li>Question 5: Plot Tesla Stock Graph</li>
        <li>Question 6: Plot GameStop Stock Graph</li>
    </ul>
</div>

<hr>


In [2]:
!pip install yfinance
!pip install bs4
!pip install nbformat
!pip install --upgrade plotly



In [3]:
import yfinance as yf
import pandas as pd
import requests
from bs4 import BeautifulSoup
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.io as pio
import warnings

In [4]:
# Set default renderer for Plotly
pio.renderers.default = "iframe"

In [5]:
def make_graph(stock_data, revenue_data, stock):
    fig = make_subplots(
        rows=2,
        cols=1,
        shared_xaxes=True,
        subplot_titles=("Historical Share Price", "Historical Revenue"),
        vertical_spacing=0.3
    )

    # Filter to show data up to mid‑2021
    stock_data_specific = stock_data[stock_data.Date <= "2021-06-14"]
    revenue_data_specific = revenue_data[revenue_data.Date <= "2021-04-30"]

    fig.add_trace(
        go.Scatter(
            x=pd.to_datetime(stock_data_specific.Date, infer_datetime_format=True),
            y=stock_data_specific.Close.astype("float"),
            name="Share Price"
        ),
        row=1,
        col=1
    )

    fig.add_trace(
        go.Scatter(
            x=pd.to_datetime(revenue_data_specific.Date, infer_datetime_format=True),
            y=revenue_data_specific.Revenue.astype("float"),
            name="Revenue"
        ),
        row=2,
        col=1
    )

    fig.update_xaxes(title_text="Date", row=1, col=1)
    fig.update_xaxes(title_text="Date", row=2, col=1)
    fig.update_yaxes(title_text="Price ($US)", row=1, col=1)
    fig.update_yaxes(title_text="Revenue ($US Millions)", row=2, col=1)

    fig.update_layout(
        showlegend=False,
        height=900,
        title=stock,
        xaxis_rangeslider_visible=True
    )

    fig.show()

### Use yfinance to Extract Stock Data

In [7]:
# Create a Ticker object for Tesla (TSLA)
tesla = yf.Ticker("TSLA")

In [8]:
# Get historical stock data for the maximum available time
tesla_data = tesla.history(period="max")

In [9]:
# Reset the index so Date becomes a column
tesla_data.reset_index(inplace=True)

In [10]:
# Display the first five rows
tesla_data.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits
0,2010-06-29 00:00:00-04:00,1.266667,1.666667,1.169333,1.592667,281494500,0.0,0.0
1,2010-06-30 00:00:00-04:00,1.719333,2.028,1.553333,1.588667,257806500,0.0,0.0
2,2010-07-01 00:00:00-04:00,1.666667,1.728,1.351333,1.464,123282000,0.0,0.0
3,2010-07-02 00:00:00-04:00,1.533333,1.54,1.247333,1.28,77097000,0.0,0.0
4,2010-07-06 00:00:00-04:00,1.333333,1.333333,1.055333,1.074,103003500,0.0,0.0


### Use Webscraping to Extract Tesla Revenue Data

In [12]:
url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/revenue.htm"

In [17]:
# Get the HTML
html_data = requests.get(url).text

In [21]:
len(tesla_revenue), tesla_revenue.head()

(54,
   Tesla Quarterly Revenue (Millions of US $)  \
 0                                 2022-09-30   
 1                                 2022-06-30   
 2                                 2022-03-31   
 3                                 2021-12-31   
 4                                 2021-09-30   
 
   Tesla Quarterly Revenue (Millions of US $).1  
 0                                      $21,454  
 1                                      $16,934  
 2                                      $18,756  
 3                                      $17,719  
 4                                      $13,757  )

In [23]:
from io import StringIO

tables = pd.read_html(StringIO(html_data))


In [27]:
# Tesla revenue is in the second table (index 1)
tesla_revenue = tables[1]
tesla_revenue.columns = ["Date", "Revenue"]

In [29]:
# Clean the Revenue column: remove $ and commas
tesla_revenue["Revenue"] = (
    tesla_revenue["Revenue"]
    .astype(str)
    .str.replace(",", "", regex=False)
    .str.replace("$", "", regex=False)
)

In [31]:
# Remove empty strings and NaN
tesla_revenue = tesla_revenue[tesla_revenue["Revenue"] != ""]
tesla_revenue.dropna(inplace=True)

In [33]:
# Check last rows
tesla_revenue.tail()

Unnamed: 0,Date,Revenue
49,2010-06-30,28.0
50,2010-03-31,21.0
51,2009-12-31,
52,2009-09-30,46.0
53,2009-06-30,27.0


### Use yfinance to Extract Stock Data

In [36]:
# Create a Ticker object for GameStop (GME)
gme = yf.Ticker("GME")

In [38]:
# Get historical stock data for the maximum available time
gme_data = gme.history(period="max")

In [40]:
# Reset the index so Date becomes a column
gme_data.reset_index(inplace=True)

In [42]:
# Display the first five rows
gme_data.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits
0,2002-02-13 00:00:00-05:00,1.620129,1.69335,1.603296,1.691667,76216000,0.0,0.0
1,2002-02-14 00:00:00-05:00,1.712707,1.716074,1.670626,1.68325,11021600,0.0,0.0
2,2002-02-15 00:00:00-05:00,1.68325,1.687458,1.658002,1.674834,8389600,0.0,0.0
3,2002-02-19 00:00:00-05:00,1.666418,1.666418,1.578047,1.607504,7410400,0.0,0.0
4,2002-02-20 00:00:00-05:00,1.61592,1.662209,1.603296,1.662209,6892800,0.0,0.0


### Use Webscraping to Extract GME Revenue Data

In [49]:
url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/stock.html"

In [51]:
# Download the HTML
html_data_2 = requests.get(url).text

In [55]:
tables = pd.read_html(StringIO(html_data_2))

In [57]:
# Table with GameStop Quarterly Revenue 
gme_revenue = tables[1]
gme_revenue.columns = ["Date", "Revenue"]

In [59]:
# Remove $ and commas from Revenue
gme_revenue["Revenue"] = gme_revenue["Revenue"].replace(r"[\$,]", "", regex=True)

In [61]:
# Remove empty strings and NaN
gme_revenue = gme_revenue[gme_revenue["Revenue"] != ""]
gme_revenue.dropna(inplace=True)

In [63]:
# Inspect last rows
gme_revenue.tail()

Unnamed: 0,Date,Revenue
57,2006-01-31,1667
58,2005-10-31,534
59,2005-07-31,416
60,2005-04-30,475
61,2005-01-31,709


### Plot Tesla Stock Graph¶

In [66]:
make_graph(tesla_data, tesla_revenue, "Tesla")


The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.


The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.



In [68]:
tesla_data.head()
tesla_revenue.head()
tesla_data.columns   
tesla_revenue.columns

Index(['Date', 'Revenue'], dtype='object')

### Plot GameStop Stock Graph

In [71]:
make_graph(gme_data, gme_revenue, 'GameStop')


The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.


The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.

