<a href="https://colab.research.google.com/github/kivvgsr/data-science/blob/main/extracting%20and%20visualizing%20stock%20data%20of%20tesla%20and%20gamestop%20company.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Extracting and Visualizing Stock Data**

In this section, we define the function make_graph. You don't have to know how the function works, you should only care about the inputs. It takes a dataframe with stock data (dataframe must contain Date and Close columns), a dataframe with revenue data (dataframe must contain Date and Revenue columns), and the name of the stock. def make_graph(stock_data, revenue_data, stock):

In [3]:
import yfinance as yf
import pandas as pd
import requests
from bs4 import BeautifulSoup
import plotly.graph_objects as go
from plotly.subplots import make_subplots

In [4]:
 def make_graph(stock_data, revenue_data, stock):
    fig = make_subplots(rows=2, cols=1, shared_xaxes=True, subplot_titles=("Historical Share Price", "Historical Revenue"), vertical_spacing = .3)
    stock_data_specific = stock_data[stock_data.Date <= '2021--06-14']
    revenue_data_specific = revenue_data[revenue_data.Date <= '2021-04-30']
    fig.add_trace(go.Scatter(x=pd.to_datetime(stock_data_specific.Date, infer_datetime_format=True), y=stock_data_specific.Close.astype("float"), name="Share Price"), row=1, col=1)
    fig.add_trace(go.Scatter(x=pd.to_datetime(revenue_data_specific.Date, infer_datetime_format=True), y=revenue_data_specific.Revenue.astype("float"), name="Revenue"), row=2, col=1)
    fig.update_xaxes(title_text="Date", row=1, col=1)
    fig.update_xaxes(title_text="Date", row=2, col=1)
    fig.update_yaxes(title_text="Price ($US)", row=1, col=1)
    fig.update_yaxes(title_text="Revenue ($US Millions)", row=2, col=1)
    fig.update_layout(showlegend=False,)
    height=900,
    title=stock,
    xaxis_rangeslider_visible=True
    fig.show()

Using the Ticker function enter the ticker symbol of the stock we want to extract data on to create a ticker object. The stock is Tesla and its ticker symbol is TSLA.

In [5]:
# Define the ticker symbol
ticker_symbol = "TSLA"
# Create a ticker object using yfinance
ticker = yf.Ticker(ticker_symbol)

In [6]:
# Extract stock data
stock_data = ticker.history(period="max") # It specifies that you want to retrieve historical stock data for the maximum available time period
print(stock_data)

                                 Open        High         Low       Close  \
Date                                                                        
2010-06-29 00:00:00-04:00    1.266667    1.666667    1.169333    1.592667   
2010-06-30 00:00:00-04:00    1.719333    2.028000    1.553333    1.588667   
2010-07-01 00:00:00-04:00    1.666667    1.728000    1.351333    1.464000   
2010-07-02 00:00:00-04:00    1.533333    1.540000    1.247333    1.280000   
2010-07-06 00:00:00-04:00    1.333333    1.333333    1.055333    1.074000   
...                               ...         ...         ...         ...   
2023-07-24 00:00:00-04:00  255.850006  269.850006  254.119995  269.059998   
2023-07-25 00:00:00-04:00  272.380005  272.899994  265.000000  265.279999   
2023-07-26 00:00:00-04:00  263.250000  268.040009  261.750000  264.350006   
2023-07-27 00:00:00-04:00  268.309998  269.130005  255.300003  255.710007   
2023-07-28 00:00:00-04:00  259.859985  267.250000  258.230011  266.440002   

### THE ABOVE code is all about :**The code provided is a Python script that utilizes the yfinance library to extract historical stock data for the company Tesla (ticker symbol: TSLA).

Here's a breakdown of the code:

1.Import the yfinance library: This is the library that allows us to fetch historical stock data using Yahoo Finance.

2.Define the ticker symbol: In this case, the ticker symbol is set to "TSLA," which represents Tesla as the target company for which we want to fetch stock data.

3.Create a ticker object: The yf.Ticker function is used to create a ticker object for the given ticker symbol ("TSLA").

4.Extract stock data: The history method of the ticker object is used to retrieve historical stock data for Tesla. The argument period="max" specifies that we want data for the maximum available time period.**

In [7]:
stock_data.reset_index(inplace=True)
print(stock_data)

                          Date        Open        High         Low  \
0    2010-06-29 00:00:00-04:00    1.266667    1.666667    1.169333   
1    2010-06-30 00:00:00-04:00    1.719333    2.028000    1.553333   
2    2010-07-01 00:00:00-04:00    1.666667    1.728000    1.351333   
3    2010-07-02 00:00:00-04:00    1.533333    1.540000    1.247333   
4    2010-07-06 00:00:00-04:00    1.333333    1.333333    1.055333   
...                        ...         ...         ...         ...   
3288 2023-07-24 00:00:00-04:00  255.850006  269.850006  254.119995   
3289 2023-07-25 00:00:00-04:00  272.380005  272.899994  265.000000   
3290 2023-07-26 00:00:00-04:00  263.250000  268.040009  261.750000   
3291 2023-07-27 00:00:00-04:00  268.309998  269.130005  255.300003   
3292 2023-07-28 00:00:00-04:00  259.859985  267.250000  258.230011   

           Close     Volume  Dividends  Stock Splits  
0       1.592667  281494500        0.0           0.0  
1       1.588667  257806500        0.0           





> ---> The reset_index() method can be used in two ways:

DataFrame.reset_index(): This method returns a new DataFrame with the index reset, while the original DataFrame remains unchanged.

DataFrame.reset_index(inplace=True): This method modifies the DataFrame in place, meaning it resets the index of the original DataFrame without creating a new one.





# **step 2**

### **Use Webscraping to Extract Tesla Revenue Data**

Use the **requests** library to download the webpage https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/revenue.htm Save the text of the response as a variable named **html_data**.

In [None]:
url ="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/revenue.htm"
html_data = requests.get(url).text
print(html_data)


In [43]:
title_content = soup.title.text

print("tesla revenue:", title_content)

tesla revenue: GameStop Revenue 2006-2020 | GME | MacroTrends


Parse the html data using **beautiful_soup**.

In [9]:
soup = BeautifulSoup(html_data, "html.parser")

Using **BeautifulSoup** or the **read_html** function extract the table with **Tesla** **Revenue** and store it into a dataframe named **tesla_revenue**. The dataframe should have columns **Date** and **Revenue**.

In [25]:

tbody_elements = soup.find_all("tbody")

In [47]:
tbody_elements= soup.find_all("tbody")
if len(tbody_elements) >= 2:
    # Step 4: Extract the data manually from the tbody element
    data = []
    for row in tbody_elements[1].find_all("tr"):
        data_row = [cell.get_text(strip=True) for cell in row.find_all(["th", "td"])]
        data.append(data_row)

    # Step 5: Create the DataFrame from the extracted data
    tesla_revenue = pd.DataFrame(data, columns=["Date", "Revenue"])

    # Drop any rows with missing data
    tesla_revenue.dropna(inplace=True)

    # Display the first few rows of the tesla_revenue dataframe
    print(tesla_revenue.head())
else:
    print("Table with Tesla Revenue not found.")

         Date Revenue
0  2020-04-30  $1,021
1  2020-01-31  $2,194
2  2019-10-31  $1,439
3  2019-07-31  $1,286
4  2019-04-30  $1,548


Execute the following line to remove the comma and dollar sign from the Revenue column.

In [32]:
tesla_revenue["Revenue"] = tesla_revenue["Revenue"].str.replace(",", "", regex=False).str.replace("$", "", regex=False)


Execute the following lines to remove an null or empty strings in the **Revenue** column.

In [42]:
tesla_revenue.dropna(inplace=True)

tesla_revenue = tesla_revenue[tesla_revenue['Revenue'] != ""]

Display the last 5 row of the **tesla_revenue** dataframe using the **tail** function.

In [33]:
print(tesla_revenue.tail())

          Date Revenue
48  2010-09-30      31
49  2010-06-30      28
50  2010-03-31      21
52  2009-09-30      46
53  2009-06-30      27


# **Use yfinance to Extract Stock Data**

Using the **Ticker** function enter the ticker symbol of the stock we want to extract data on to create a ticker object. The stock is GameStop and its ticker symbol is **GME**.

In [34]:
ticker_symbol = "GME"
ticker = yf.Ticker(ticker_symbol)

Using the ticker object and the function **history** extract stock information and save it in a dataframe named **gme_data**. Set the **period** parameter to **max** so we get information for the maximum amount of time.

In [35]:
gme_data = ticker.history(period="max")
print(gme_data.head())

                               Open      High       Low     Close    Volume  \
Date                                                                          
2002-02-13 00:00:00-05:00  1.620128  1.693350  1.603296  1.691666  76216000   
2002-02-14 00:00:00-05:00  1.712707  1.716074  1.670626  1.683250  11021600   
2002-02-15 00:00:00-05:00  1.683250  1.687458  1.658001  1.674834   8389600   
2002-02-19 00:00:00-05:00  1.666418  1.666418  1.578047  1.607504   7410400   
2002-02-20 00:00:00-05:00  1.615920  1.662210  1.603296  1.662210   6892800   

                           Dividends  Stock Splits  
Date                                                
2002-02-13 00:00:00-05:00        0.0           0.0  
2002-02-14 00:00:00-05:00        0.0           0.0  
2002-02-15 00:00:00-05:00        0.0           0.0  
2002-02-19 00:00:00-05:00        0.0           0.0  
2002-02-20 00:00:00-05:00        0.0           0.0  


**Reset the index** using the reset_index(inplace=True) function on the gme_data DataFrame and display the first five rows of the gme_data dataframe using the head function. Take a screenshot of the results and code from the beginning of Question 3 to the results below

In [36]:
# Reset the index of gme_data DataFrame in place
gme_data.reset_index(inplace=True)

# Display the first five rows of the gme_data DataFrame using the head function
print(gme_data.head())


                       Date      Open      High       Low     Close    Volume  \
0 2002-02-13 00:00:00-05:00  1.620128  1.693350  1.603296  1.691666  76216000   
1 2002-02-14 00:00:00-05:00  1.712707  1.716074  1.670626  1.683250  11021600   
2 2002-02-15 00:00:00-05:00  1.683250  1.687458  1.658001  1.674834   8389600   
3 2002-02-19 00:00:00-05:00  1.666418  1.666418  1.578047  1.607504   7410400   
4 2002-02-20 00:00:00-05:00  1.615920  1.662210  1.603296  1.662210   6892800   

   Dividends  Stock Splits  
0        0.0           0.0  
1        0.0           0.0  
2        0.0           0.0  
3        0.0           0.0  
4        0.0           0.0  


# **Use Webscraping to Extract GME Revenue Data**

Use the **requests** library to download the webpage https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/stock.html. Save the text of the response as a variable named **html_data.**

In [37]:
# URL of the webpage to download
url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0220EN-SkillsNetwork/labs/project/stock.html"

# Use requests to download the webpage and save the text of the response as html_data
html_data = requests.get(url).text

Parse the html data using **beautiful_soup**.

In [38]:
soup = BeautifulSoup(html_data, "html.parser")

Using **BeautifulSoup** or the **read_html** function extract the table with **GameStop Revenue** and store it into a dataframe named **gme_revenue**. The dataframe should have columns **Date** and **Revenue**. Make sure the comma and dollar sign is removed from the Revenue column using a method similar to what you did in Question 2.

In [39]:
gme_table = soup.find_all("table")[1]

# Step 4: Extract the data from the table and create the DataFrame
data = []
for row in gme_table.find_all("tr"):
    cells = row.find_all("td")
    if len(cells) == 2:  # Ensure we have both Date and Revenue cells
        date = cells[0].text.strip()
        revenue = cells[1].text.strip().replace(",", "").replace("$", "")  # Remove comma and dollar sign
        data.append({"Date": date, "Revenue": revenue})
gme_revenue = pd.DataFrame(data)
print(gme_revenue.head())

gme_revenue["Revenue"] = gme_revenue["Revenue"].str.replace(",", "", regex=False).str.replace("$", "", regex=False)

gme_revenue.dropna(inplace=True)

gme_revenue = gme_revenue[gme_revenue['Revenue'] != ""]

         Date Revenue
0  2020-04-30    1021
1  2020-01-31    2194
2  2019-10-31    1439
3  2019-07-31    1286
4  2019-04-30    1548


Display the last five rows of the **gme_revenue** dataframe using the **tail **function. Take a screenshot of the results.

In [40]:
print(gme_revenue.tail())

          Date Revenue
57  2006-01-31    1667
58  2005-10-31     534
59  2005-07-31     416
60  2005-04-30     475
61  2005-01-31     709


# **Plot Tesla Stock Graph**

Use the **make_graph** function to graph the **Tesla Stock Data**, also provide a **title** for** the graph**. The structure to call the make_graph function is make_graph(tesla_data, tesla_revenue, 'Tesla'). Note the graph will only show data upto June 2021.

In [66]:
# stock_data
# Sort the DataFrame based on the index (Date) in ascending order
stock_data.sort_index(inplace=True)

# Set the index of gme_revenue DataFrame using the dates present in the DataFrame index
tesla_revenue.index = pd.to_datetime(tesla_revenue.index)

# Call the make_graph function to plot the graph
make_graph(stock_data, tesla_revenue, 'tesla')



# **Plot GameStop Stock Graph**

Use the **make_graph** function to graph the GameStop Stock Data, also provide a title for the graph. The structure to call the **make_graph** function is **make_graph(gme_data, gme_revenue, 'GameStop')**. Note the graph will only show data upto June 2021.

In [65]:
print(gme_revenue.columns)
# Sort the DataFrame based on the index (Date) in ascending order
gme_data.sort_index(inplace=True)

# Set the index of gme_revenue DataFrame using the dates present in the DataFrame index
gme_revenue.index = pd.to_datetime(gme_revenue.index)

# Call the make_graph function to plot the graph
make_graph(gme_data, gme_revenue, 'GameStop')








Index(['Revenue'], dtype='object')
