<h1>Extracting and Visualizing Stock Data</h1>
<h2>Description</h2>


Extracting essential data from a dataset and displaying it is a necessary part of data science; therefore individuals can make correct decisions based on the data. In this assignment, you will extract some stock data, you will then display this data in a graph.


In [1]:
#!pip install yfinance==0.2.38
#!pip install pandas==2.2.2
#!pip install nbformat

In [2]:
!pip install yfinance
!pip install bs4
!pip install nbformat

Collecting bs4
  Downloading bs4-0.0.2-py2.py3-none-any.whl.metadata (411 bytes)
Downloading bs4-0.0.2-py2.py3-none-any.whl (1.2 kB)
Installing collected packages: bs4
Successfully installed bs4-0.0.2


In [3]:
import yfinance as yf
import pandas as pd
import requests
from bs4 import BeautifulSoup
import plotly.graph_objects as go
from plotly.subplots import make_subplots

In [4]:
import warnings
# Ignore all warnings
warnings.filterwarnings("ignore", category=FutureWarning)

## Define Graphing Function


In this section, we define the function `make_graph`. **It takes a dataframe with stock data (dataframe must contain Date and Close columns), a dataframe with revenue data (dataframe must contain Date and Revenue columns), and the name of the stock.**


In [5]:
def make_graph(stock_data, revenue_data, stock):
    fig = make_subplots(rows=2, cols=1, shared_xaxes=True, subplot_titles=("Historical Share Price", "Historical Revenue"), vertical_spacing = .3)
    stock_data_specific = stock_data[stock_data.Date <= '2021--06-14']
    revenue_data_specific = revenue_data[revenue_data.Date <= '2021-04-30']
    fig.add_trace(go.Scatter(x=pd.to_datetime(stock_data_specific.Date), y=stock_data_specific.Close.astype("float"), name="Share Price"), row=1, col=1)
    fig.add_trace(go.Scatter(x=pd.to_datetime(revenue_data_specific.Date), y=revenue_data_specific.Revenue.astype("float"), name="Revenue"), row=2, col=1)
    fig.update_xaxes(title_text="Date", row=1, col=1)
    fig.update_xaxes(title_text="Date", row=2, col=1)
    fig.update_yaxes(title_text="Price ($US)", row=1, col=1)
    fig.update_yaxes(title_text="Revenue ($US Millions)", row=2, col=1)
    fig.update_layout(showlegend=False,
    height=900,
    title=stock,
    xaxis_rangeslider_visible=True)
    fig.show()

## 1: ```yfinance``` to Extract Stock Data

`Ticker` function has been used to extract the data of the required stock. For example, the stock is Tesla and its ticker symbol is `TSLA`.

In [6]:
tesla = yf.Ticker("TSLA")

`ticker.history` extract stock information and save it in a dataframe named `tesla_data`. Set the `period` parameter to ` "max" ` so that we get information for the maximum amount of time.


In [None]:
tesla_data = tesla.history(period="max")

YFRateLimitError: Too Many Requests. Rate limited. Try after a while.

In [None]:
# I was facing a problem to download the data from yfinance due to rate limiting, so I used a fallback to Stooq via pandas-datareader with the help of ChatGPT.
%pip install pandas-datareader
import os
import pandas as pd

CACHE_PATH = "tesla_data.csv"

def load_from_cache(path=CACHE_PATH):
    if os.path.exists(path):
        df = pd.read_csv(path, index_col=0, parse_dates=True)
        if not df.empty:
            print("Loaded TSLA from cache.")
            return df
    return None

def save_cache(df, path=CACHE_PATH):
    df.to_csv(path)
    print(f"Saved TSLA to {path}")

tesla_data = load_from_cache()
if tesla_data is None:
    try:
        import yfinance as yf
        tesla = yf.Ticker("TSLA")
        # Try ONCE — if rate-limited, bail out to fallback
        tesla_data = tesla.history(period="max", timeout=30)
        if tesla_data is None or tesla_data.empty:
            raise RuntimeError("Empty response from yfinance; using fallback.")
        print("Downloaded TSLA via yfinance.")
    except Exception as e:
        print(f"yfinance failed ({e}). Using Stooq fallback...")
        from pandas_datareader import data as pdr
        tesla_data = pdr.DataReader("TSLA", "stooq").sort_index()
        print("Downloaded TSLA via Stooq.")

    save_cache(tesla_data)

print(tesla_data.head())


Collecting pandas-datareader
  Downloading pandas_datareader-0.10.0-py3-none-any.whl.metadata (2.9 kB)
Downloading pandas_datareader-0.10.0-py3-none-any.whl (109 kB)
Installing collected packages: pandas-datareader
Successfully installed pandas-datareader-0.10.0
Note: you may need to restart the kernel to use updated packages.
yfinance failed (Too Many Requests. Rate limited. Try after a while.). Using Stooq fallback...
Downloaded TSLA via Stooq.
Saved TSLA to tesla_data.csv
                Open      High       Low     Close     Volume
Date                                                         
2020-08-10   96.5333   97.1667   92.3900   94.5700  112833960
2020-08-11   93.0667   94.6667   91.0000   91.6267  129387510
2020-08-12   98.0000  105.6670   95.6667  103.6500  328482510
2020-08-13  107.4000  110.0800  104.4830  108.0670  306379620
2020-08-14  111.0000  111.2530  108.4430  110.0470  188664210


In [19]:
tesla_data.reset_index(inplace=True)
tesla_data.head()

Unnamed: 0,index,Date,Open,High,Low,Close,Volume
0,0,2020-08-10,96.5333,97.1667,92.39,94.57,112833960
1,1,2020-08-11,93.0667,94.6667,91.0,91.6267,129387510
2,2,2020-08-12,98.0,105.667,95.6667,103.65,328482510
3,3,2020-08-13,107.4,110.08,104.483,108.067,306379620
4,4,2020-08-14,111.0,111.253,108.443,110.047,188664210


## 2: Webscraping to Extract Tesla Revenue Data

Use the `requests` library to download the webpage.

In [20]:
html_data = requests.get("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/revenue.htm")

Using `beautiful_soup` the html data has been parse (parser `html.parser`).

In [21]:
soup = BeautifulSoup(html_data.content, 'html.parser')

Using `BeautifulSoup` or the `read_html` function extract the table with `Tesla Revenue` and store it into a dataframe named `tesla_revenue`. The dataframe should have columns `Date` and `Revenue`.


```
1. Find All Tables: using `soup.find_all('table')`.
2. Identify the Relevant Table: then loops through each table. If a table contains the text “Tesla Quarterly Revenue,”, select that table.
3. Initialize a DataFrame: Pandas DataFrame called `tesla_revenue` with columns “Date” and “Revenue.”
4. Loop Through Rows: For each row in the relevant table, extract the data from the first and second columns (date and revenue).
5. Clean Revenue Data: Remove dollar signs and commas from the revenue value.
6. Add Rows to DataFrame: Create a new row in the DataFrame with the extracted date and cleaned revenue values.
7. Repeat for All Rows: Continue this process for all rows in the table.
```

In [22]:
revenue_table = None
for table in soup.find_all('table'):
    if "Tesla Quarterly Revenue" in table.text:
        revenue_table = table
        break
tesla_revenue = pd.DataFrame(columns=["Date", "Revenue"])

for row in revenue_table.find("tbody").find_all("tr"):
    col = row.find_all("td")
    date = col[0].text
    revenue = col[1].text
    
    tesla_revenue = pd.concat([tesla_revenue, pd.DataFrame({"Date": [date], "Revenue": [revenue]})], ignore_index=True)

In [23]:
tesla_revenue["Revenue"] = tesla_revenue['Revenue'].str.replace(',|\$',"", regex=True)

In [24]:
tesla_revenue.dropna(inplace=True)
tesla_revenue = tesla_revenue[tesla_revenue['Revenue'] != ""]

In [25]:
tesla_revenue.tail(5)

Unnamed: 0,Date,Revenue
48,2010-09-30,31
49,2010-06-30,28
50,2010-03-31,21
52,2009-09-30,46
53,2009-06-30,27


## 3: Use ```yfinance``` to Extract Stock Data

In [26]:
gamestop = yf.Ticker("GME")

In [27]:
gme_data = gamestop.history(period='max')

YFRateLimitError: Too Many Requests. Rate limited. Try after a while.

In [28]:
# I was facing a problem to download the data from yfinance due to rate limiting, so I used a fallback to Stooq via pandas-datareader with the help of ChatGPT.

CACHE_PATH = "gme_data.csv"

def load_from_cache(path=CACHE_PATH):
    if os.path.exists(path):
        df = pd.read_csv(path, index_col=0, parse_dates=True)
        if not df.empty:
            print("Loaded GME from cache.")
            return df
    return None

def save_cache(df, path=CACHE_PATH):
    df.to_csv(path)
    print(f"Saved GME to {path}")

gme_data = load_from_cache()
if gme_data is None:
    try:
        import yfinance as yf
        gme = yf.Ticker("GME")
        # Try ONCE — if rate-limited, bail out to fallback
        gme_data = tesla.history(period="max", timeout=30)
        if gme_data is None or gme_data.empty:
            raise RuntimeError("Empty response from yfinance; using fallback.")
        print("Downloaded TSLA via yfinance.")
    except Exception as e:
        print(f"yfinance failed ({e}). Using Stooq fallback...")
        from pandas_datareader import data as pdr
        gme_data = pdr.DataReader("TSLA", "stooq").sort_index()
        print("Downloaded TSLA via Stooq.")

    save_cache(gme_data)

print(gme_data.head())

yfinance failed (Too Many Requests. Rate limited. Try after a while.). Using Stooq fallback...
Downloaded TSLA via Stooq.
Saved GME to gme_data.csv
                Open      High       Low     Close     Volume
Date                                                         
2020-08-10   96.5333   97.1667   92.3900   94.5700  112833960
2020-08-11   93.0667   94.6667   91.0000   91.6267  129387510
2020-08-12   98.0000  105.6670   95.6667  103.6500  328482510
2020-08-13  107.4000  110.0800  104.4830  108.0670  306379620
2020-08-14  111.0000  111.2530  108.4430  110.0470  188664210


In [29]:
gme_data.reset_index(inplace=True)
gme_data.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume
0,2020-08-10,96.5333,97.1667,92.39,94.57,112833960
1,2020-08-11,93.0667,94.6667,91.0,91.6267,129387510
2,2020-08-12,98.0,105.667,95.6667,103.65,328482510
3,2020-08-13,107.4,110.08,104.483,108.067,306379620
4,2020-08-14,111.0,111.253,108.443,110.047,188664210


## 4: Use Webscraping to Extract GME Revenue Data

In [30]:
html_data_2 = requests.get("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/stock.html")

In [31]:
soup = BeautifulSoup(html_data_2.content, 'html.parser')

In [32]:
gme_revenue_table = None
for table in soup.find_all('table'):
    if "GameStop Quarterly Revenue" in table.text:
        gme_revenue_table =  table
        break

gme_revenue = pd.DataFrame(columns=["Date", "Revenue"])

for row in gme_revenue_table.find('tbody').find_all('tr'):
    col = row.find_all('td')
    date = col[0].text
    revenue = col[1].text
    
    gme_revenue = pd.concat([gme_revenue, pd.DataFrame({"Date": [date], "Revenue": [revenue]})], ignore_index=True)

gme_revenue["Revenue"] = gme_revenue['Revenue'].str.replace(',|\$',"", regex=True)

In [33]:
gme_revenue.tail(5)

Unnamed: 0,Date,Revenue
57,2006-01-31,1667
58,2005-10-31,534
59,2005-07-31,416
60,2005-04-30,475
61,2005-01-31,709


## 5: Plot Tesla Stock Graph


In [34]:
make_graph(tesla_data, tesla_revenue, 'Tesla')

## 6: Plot GameStop Stock Graph


In [35]:
make_graph(gme_data, gme_revenue, 'GameStop')