<p style="text-align:center">
    <a href="https://skills.network/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkPY0220ENSkillsNetwork900-2022-01-01" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="200" alt="Skills Network Logo">
    </a>
</p>


<h1>Extracting and Visualizing Stock Data</h1>
<h2>Description</h2>


Extracting essential data from a dataset and displaying it is a necessary part of data science; therefore individuals can make correct decisions based on the data. In this assignment, you will extract some stock data, you will then display this data in a graph.


<h2>Table of Contents</h2>
<div class="alert alert-block alert-info" style="margin-top: 20px">
    <ul>
        <li>Define a Function that Makes a Graph</li>
        <li>Question 1: Use yfinance to Extract Stock Data</li>
        <li>Question 2: Use Webscraping to Extract Tesla Revenue Data</li>
        <li>Question 3: Use yfinance to Extract Stock Data</li>
        <li>Question 4: Use Webscraping to Extract GME Revenue Data</li>
        <li>Question 5: Plot Tesla Stock Graph</li>
        <li>Question 6: Plot GameStop Stock Graph</li>
    </ul>
<p>
    Estimated Time Needed: <strong>30 min</strong></p>
</div>

<hr>


***Note***:- If you are working Locally using anaconda, please uncomment the following code and execute it.


In [None]:
#!pip install yfinance==0.2.38
#!pip install pandas==2.2.2
#!pip install nbformat

In [None]:
!pip install yfinance==0.1.67
!mamba install bs4==4.10.0 -y
!pip install nbformat==4.2.0

In [None]:
import yfinance as yf
import pandas as pd
import requests
from bs4 import BeautifulSoup
import plotly.graph_objects as go
from plotly.subplots import make_subplots

In Python, you can ignore warnings using the warnings module. You can use the filterwarnings function to filter or ignore specific warning messages or categories.


In [None]:
import warnings
# Ignore all warnings
warnings.filterwarnings("ignore", category=FutureWarning)

## Define Graphing Function


In this section, we define the function `make_graph`. **You don't have to know how the function works, you should only care about the inputs. It takes a dataframe with stock data (dataframe must contain Date and Close columns), a dataframe with revenue data (dataframe must contain Date and Revenue columns), and the name of the stock.**


In [None]:
def make_graph(stock_data, revenue_data, stock):
    fig = make_subplots(rows=2, cols=1, shared_xaxes=True, subplot_titles=("Historical Share Price", "Historical Revenue"), vertical_spacing = .3)
    stock_data_specific = stock_data[stock_data.Date <= '2021--06-14']
    revenue_data_specific = revenue_data[revenue_data.Date <= '2021-04-30']
    fig.add_trace(go.Scatter(x=pd.to_datetime(stock_data_specific.Date, infer_datetime_format=True), y=stock_data_specific.Close.astype("float"), name="Share Price"), row=1, col=1)
    fig.add_trace(go.Scatter(x=pd.to_datetime(revenue_data_specific.Date, infer_datetime_format=True), y=revenue_data_specific.Revenue.astype("float"), name="Revenue"), row=2, col=1)
    fig.update_xaxes(title_text="Date", row=1, col=1)
    fig.update_xaxes(title_text="Date", row=2, col=1)
    fig.update_yaxes(title_text="Price ($US)", row=1, col=1)
    fig.update_yaxes(title_text="Revenue ($US Millions)", row=2, col=1)
    fig.update_layout(showlegend=False,
    height=900,
    title=stock,
    xaxis_rangeslider_visible=True)
    fig.show()

Use the make_graph function that we’ve already defined. You’ll need to invoke it in questions 5 and 6 to display the graphs and create the dashboard.
> **Note: You don’t need to redefine the function for plotting graphs anywhere else in this notebook; just use the existing function.**


## Question 1: Use yfinance to Extract Stock Data


Using the `Ticker` function enter the ticker symbol of the stock we want to extract data on to create a ticker object. The stock is Tesla and its ticker symbol is `TSLA`.


Using the ticker object and the function `history` extract stock information and save it in a dataframe named `tesla_data`. Set the `period` parameter to ` "max" ` so we get information for the maximum amount of time.


**Reset the index** using the `reset_index(inplace=True)` function on the tesla_data DataFrame and display the first five rows of the `tesla_data` dataframe using the `head` function. Take a screenshot of the results and code from the beginning of Question 1 to the results below.


In [None]:

import yfinance as yf


tesla_ticker = yf.Ticker("TSLA")


tesla_data = tesla_ticker.history(period="max")


tesla_data.reset_index(inplace=True)

print(tesla_data.head())


## Question 2: Use Webscraping to Extract Tesla Revenue Data


Use the `requests` library to download the webpage https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/revenue.htm Save the text of the response as a variable named `html_data`.


Parse the html data using `beautiful_soup` using parser i.e `html5lib` or `html.parser`.


Using `BeautifulSoup` or the `read_html` function extract the table with `Tesla Revenue` and store it into a dataframe named `tesla_revenue`. The dataframe should have columns `Date` and `Revenue`.


<details><summary>Step-by-step instructions</summary>

```

Here are the step-by-step instructions:

1. Create an Empty DataFrame
2. Find the Relevant Table
3. Check for the Tesla Quarterly Revenue Table
4. Iterate Through Rows in the Table Body
5. Extract Data from Columns
6. Append Data to the DataFrame

```
</details>


<details><summary>Click here if you need help locating the table</summary>

```
    
Below is the code to isolate the table, you will now need to loop through the rows and columns like in the previous lab
    
soup.find_all("tbody")[1]
    
If you want to use the read_html function the table is located at index 1

We are focusing on quarterly revenue in the lab.
```

</details>


In [None]:
# Import required libraries
import requests
from bs4 import BeautifulSoup
import pandas as pd

# Step 1: Download the webpage
url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/revenue.htm"
response = requests.get(url)

# Step 2: Save the text of the response
html_data = response.text

# Step 3: Parse the HTML data using BeautifulSoup
soup = BeautifulSoup(html_data, "html.parser")

# Step 4: Extract the Tesla Revenue table
tables = pd.read_html(html_data)
tesla_revenue = None

# Look for the table with "Tesla Revenue"
for table in tables:
    if "Tesla Revenue" in table.columns or "Date" in table.columns:
        tesla_revenue = table
        break

# Ensure the dataframe has columns 'Date' and 'Revenue'
if tesla_revenue is not None:
    tesla_revenue.columns = ["Date", "Revenue"]

# Display the first few rows of the dataframe
print(tesla_revenue.head())


Execute the following line to remove the comma and dollar sign from the `Revenue` column.


In [None]:
tesla_revenue["Revenue"] = tesla_revenue['Revenue'].str.replace(',|\$',"")

Execute the following lines to remove an null or empty strings in the Revenue column.


In [None]:
tesla_revenue.dropna(inplace=True)

tesla_revenue = tesla_revenue[tesla_revenue['Revenue'] != ""]

Display the last 5 row of the `tesla_revenue` dataframe using the `tail` function. Take a screenshot of the results.


In [1]:
# Remove commas and dollar signs from the Revenue column
tesla_revenue["Revenue"] = tesla_revenue['Revenue'].str.replace(',|\$', "", regex=True)

# Drop rows with null values in the Revenue column
tesla_revenue.dropna(inplace=True)

# Remove rows with empty strings in the Revenue column
tesla_revenue = tesla_revenue[tesla_revenue['Revenue'] != ""]

# Display the last 5 rows of the dataframe
print(tesla_revenue.tail())


NameError: name 'tesla_revenue' is not defined

## Question 3: Use yfinance to Extract Stock Data


Using the `Ticker` function enter the ticker symbol of the stock we want to extract data on to create a ticker object. The stock is GameStop and its ticker symbol is `GME`.


In [None]:
# Import the required library
import yfinance as yf

# Create a Ticker object for GameStop
gamestop_ticker = yf.Ticker("GME")

# Display the Ticker object to confirm
print(gamestop_ticker)


Using the ticker object and the function `history` extract stock information and save it in a dataframe named `gme_data`. Set the `period` parameter to ` "max" ` so we get information for the maximum amount of time.


In [None]:
# Extract GameStop stock data for the maximum available period
gme_data = gamestop_ticker.history(period="max")

# Display the first few rows of the dataframe to confirm
print(gme_data.head())


**Reset the index** using the `reset_index(inplace=True)` function on the gme_data DataFrame and display the first five rows of the `gme_data` dataframe using the `head` function. Take a screenshot of the results and code from the beginning of Question 3 to the results below.


In [None]:
# Step 1: Create a Ticker object for GameStop
import yfinance as yf
gamestop_ticker = yf.Ticker("GME")

# Step 2: Extract GameStop stock data for the maximum available period
gme_data = gamestop_ticker.history(period="max")

# Step 3: Reset the index
gme_data.reset_index(inplace=True)

# Step 4: Display the first five rows of the dataframe
print(gme_data.head())


## Question 4: Use Webscraping to Extract GME Revenue Data


Use the `requests` library to download the webpage https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/stock.html. Save the text of the response as a variable named `html_data_2`.


In [None]:

# Import the requests library
import requests

# Define the URL of the webpage
url_2 = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/stock.html"

# Download the webpage
response_2 = requests.get(url_2)

# Save the text of the response as html_data_2
html_data_2 = response_2.text

# Display the first 500 characters of html_data_2 to confirm
print(html_data_2[:500])


Parse the html data using `beautiful_soup` using parser i.e `html5lib` or `html.parser`.


In [None]:
# Import the BeautifulSoup library
from bs4 import BeautifulSoup

# Parse the HTML data using BeautifulSoup with the 'html.parser'
soup_2 = BeautifulSoup(html_data_2, "html.parser")

# Optionally, display the first few elements of the parsed HTML to confirm
print(soup_2.prettify()[:500])


Using `BeautifulSoup` or the `read_html` function extract the table with `GameStop Revenue` and store it into a dataframe named `gme_revenue`. The dataframe should have columns `Date` and `Revenue`. Make sure the comma and dollar sign is removed from the `Revenue` column.


> **Note: Use the method similar to what you did in question 2.**  


<details><summary>Click here if you need help locating the table</summary>

```
    
Below is the code to isolate the table, you will now need to loop through the rows and columns like in the previous lab
    
soup.find_all("tbody")[1]
    
If you want to use the read_html function the table is located at index 1


```

</details>


In [None]:
# Step 1: Use pandas to extract the table from the parsed HTML
import pandas as pd

# Extract all tables from the HTML using read_html
tables = pd.read_html(str(soup_2))

# Step 2: Locate the table containing "GameStop Revenue"
gme_revenue = None
for table in tables:
    if "GameStop Revenue" in table.columns or "Date" in table.columns:
        gme_revenue = table
        break

# Step 3: Rename columns to 'Date' and 'Revenue' if necessary
if gme_revenue is not None:
    gme_revenue.columns = ["Date", "Revenue"]

# Step 4: Remove commas and dollar signs from the Revenue column
gme_revenue["Revenue"] = gme_revenue["Revenue"].str.replace(",|\$", "", regex=True)

# Step 5: Display the first few rows to confirm
print(gme_revenue.head())


Display the last five rows of the `gme_revenue` dataframe using the `tail` function. Take a screenshot of the results.


In [None]:
# Display the last 5 rows of the gme_revenue dataframe
print(gme_revenue.tail())



## Question 5: Plot Tesla Stock Graph


Use the `make_graph` function to graph the Tesla Stock Data, also provide a title for the graph. Note the graph will only show data upto June 2021.


<details><summary>Hint</summary>

```

You just need to invoke the make_graph function with the required parameter to print the graphs.The structure to call the `make_graph` function is `make_graph(tesla_data, tesla_revenue, 'Tesla')`.

```
    
</details>


In [None]:
# Import required libraries
import matplotlib.pyplot as plt

# Filter the data to show only until June 2021
tesla_data_filtered = tesla_data[tesla_data['Date'] <= '2021-06-30']

# Plot the Tesla stock data (Date vs. Close Price)
plt.figure(figsize=(10, 6))
plt.plot(tesla_data_filtered['Date'], tesla_data_filtered['Close'], label='Tesla Stock Price', color='blue')

# Add a title and labels
plt.title('Tesla Stock Price (up to June 2021)', fontsize=14)
plt.xlabel('Date', fontsize=12)
plt.ylabel('Stock Price (USD)', fontsize=12)
plt.xticks(rotation=45)
plt.grid(True)

# Display the plot
plt.tight_layout()
plt.show()


## Question 6: Plot GameStop Stock Graph


Use the `make_graph` function to graph the GameStop Stock Data, also provide a title for the graph. The structure to call the `make_graph` function is `make_graph(gme_data, gme_revenue, 'GameStop')`. Note the graph will only show data upto June 2021.


<details><summary>Hint</summary>

```

You just need to invoke the make_graph function with the required parameter to print the graphs.The structure to call the `make_graph` function is `make_graph(gme_data, gme_revenue, 'GameStop')`

```
    
</details>


In [None]:
# Assuming the make_graph function is defined, we call it with the required arguments.
make_graph(gme_data, gme_revenue, 'GameStop')


# Exercise: use webscraping to extract stock data


Use the `requests` library to download the webpage [https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/amazon_data_webpage.html](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/amazon_data_webpage.html). Save the text of the response as a variable named `html_data`.


In [None]:
# Import the requests library
import requests

# Define the URL of the webpage
url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/amazon_data_webpage.html"

# Download the webpage
response = requests.get(url)

# Save the text of the response as html_data
html_data = response.text

# Display the first 500 characters of html_data to confirm
print(html_data[:500])


Parse the html data using `beautiful_soup`.


In [None]:
# Import the BeautifulSoup library
from bs4 import BeautifulSoup

# Parse the HTML data using BeautifulSoup with the 'html.parser'
soup = BeautifulSoup(html_data, "html.parser")

# Optionally, display the first 500 characters of the parsed HTML to confirm
print(soup.prettify()[:500])


<b>Question 1:</b> What is the content of the title attribute?


In [None]:
# Find the element with the title attribute and print its value
title_attribute = soup.find('title')

# Display the content of the title element
if title_attribute:
    print(title_attribute.get_text())
else:
    print("Title not found.")


Using BeautifulSoup, extract the table with historical share prices and store it into a data frame named `amazon_data`. The data frame should have columns Date, Open, High, Low, Close, Adj Close, and Volume. Fill in each variable with the correct data from the list `col`.


In [None]:
amazon_data = pd.DataFrame(columns=["Date", "Open", "High", "Low", "Close", "Volume"])

for row in soup.find("tbody").find_all("tr"):
    col = row.find_all("td")
    date = #ADD_CODE
    Open = #ADD_CODE
    high = #ADD_CODE
    low = #ADD_CODE
    close = #ADD_CODE
    adj_close = #ADD_CODE
    volume = #ADD_CODE

    amazon_data = amazon_data.append({"Date":date, "Open":Open, "High":high, "Low":low, "Close":close, "Adj Close":adj_close, "Volume":volume}, ignore_index=True)

Print out the first five rows of the `amazon_data` data frame you created.


In [None]:
# Import pandas
import pandas as pd

# Create an empty DataFrame with the specified columns
amazon_data = pd.DataFrame(columns=["Date", "Open", "High", "Low", "Close", "Adj Close", "Volume"])

# Iterate through each row in the table
for row in soup.find("tbody").find_all("tr"):
    col = row.find_all("td")
    
    # Extract the data for each column
    date = col[0].get_text().strip()  # Extract the Date
    Open = col[1].get_text().strip()  # Extract the Open
    high = col[2].get_text().strip()  # Extract the High
    low = col[3].get_text().strip()   # Extract the Low
    close = col[4].get_text().strip()  # Extract the Close
    adj_close = col[5].get_text().strip()  # Extract the Adj Close
    volume = col[6].get_text().strip()  # Extract the Volume
    
    # Append the row to the DataFrame
    amazon_data = amazon_data.append({"Date": date, "Open": Open, "High": high, "Low": low, "Close": close, "Adj Close": adj_close, "Volume": volume}, ignore_index=True)

# Print the first five rows of the DataFrame
print(amazon_data.head())


<b>Question 2:</b> What are the names of the columns in the data frame?


In [None]:
# Print the names of the columns in the amazon_data DataFrame
print(amazon_data.columns)


<b>Question 3:</b> What is the `Open` of the last row of the amazon_data data frame?


In [None]:
# Get the 'Open' value from the last row
last_row_open = amazon_data.iloc[-1]['Open']

# Print the value
print(last_row_open)


# Exercise: use webscraping to extract stock data


Use the `requests` library to download the webpage [https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/amazon_data_webpage.html](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/amazon_data_webpage.html). Save the text of the response as a variable named `html_data`.


Parse the html data using `beautiful_soup`.


<b>Question 1:</b> What is the content of the title attribute?


Using BeautifulSoup, extract the table with historical share prices and store it into a data frame named `amazon_data`. The data frame should have columns Date, Open, High, Low, Close, Adj Close, and Volume. Fill in each variable with the correct data from the list `col`.


In [None]:
amazon_data = pd.DataFrame(columns=["Date", "Open", "High", "Low", "Close", "Volume"])

for row in soup.find("tbody").find_all("tr"):
    col = row.find_all("td")
    date = #ADD_CODE
    Open = #ADD_CODE
    high = #ADD_CODE
    low = #ADD_CODE
    close = #ADD_CODE
    adj_close = #ADD_CODE
    volume = #ADD_CODE

    amazon_data = amazon_data.append({"Date":date, "Open":Open, "High":high, "Low":low, "Close":close, "Adj Close":adj_close, "Volume":volume}, ignore_index=True)

Print out the first five rows of the `amazon_data` data frame you created.


<b>Question 2:</b> What are the names of the columns in the data frame?


<b>Question 3:</b> What is the `Open` of the last row of the amazon_data data frame?


<h2>About the Authors:</h2>

<a href="https://www.linkedin.com/in/joseph-s-50398b136/">Joseph Santarcangelo</a> has a PhD in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.

Azim Hirjani


## Change Log

| Date (YYYY-MM-DD) | Version | Changed By    | Change Description        |
| ----------------- | ------- | ------------- | ------------------------- |
| 2022-02-28        | 1.2     | Lakshmi Holla | Changed the URL of GameStop |
| 2020-11-10        | 1.1     | Malika Singla | Deleted the Optional part |
| 2020-08-27        | 1.0     | Malika Singla | Added lab to GitLab       |

<hr>

## <h3 align="center"> © IBM Corporation 2020. All rights reserved. <h3/>

<p>
