<h1>Cookies and Code: Advanved Stock Analysis with Python</h1>
<h3>By: Samuel Kellum</h3>

<h3>Web Scraping</h3>

If you look at quotes for various securities on Yahoo Finance, you might notice that the URL is always in the following format ```https://finance.yahoo.com/quote/{symbol}```. We can use this knowledge to quickly collect information for any security on Yahoo Finance.

The code below shows how you can concatenate a URL with Python code.

In [None]:
ticker = 
url = f'https://finance.yahoo.com/quote/{ticker}'
print(url)

In [None]:
ticker = 
url = f'https://finance.yahoo.com/quote/{ticker}'
print(url)

Now, to collect data from Yahoo Finance, we need to import a couple of libraries that give us access to various methods (or commands) that we want to use to collect the information from our desired URLs. (Note: you may need to uncomment (by removing ONLY the hashtag, not the percent sign) one or more the pip commands if an imported library is not installed on your machine.

In [None]:
#Import libraries
#%pip install requests
#%pip install bs4
import requests
from bs4 import BeautifulSoup

The first thing we need to do when using Python to collect data from a website is to perform a HTTP GET request on the URL.

To check whether or not the GET request was successful, we can return the status code of the get request.

If the HTTP response status code is 200-299, that means the GET request was successful.

More info on status codes: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status#successful_responses

Some websites block requests that come in without a valid browser as a User-Agent, so we can pass in a <a href="https://www.google.com/search?q=what+is+my+user+agent&oq=what+is+my+user&aqs=chrome.1.69i57j35i39j69i59j0i512j0i20i263i512j0i512l5.2060j0j7&sourceid=chrome&ie=UTF-8">user-agent</a> as a parameter into the GET request.

In [None]:
#headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36"}
r = requests.get(url)
print(url, r.status_code)

At this point, we will use Beautiful Soup to parse the data from a particular HTML element titled <i>"D(ib) Mend(20px)"</i> on Yahoo Finance, which will represent the stock price, change, and percent change:

In [None]:
soup = BeautifulSoup(r.text, 'html.parser')
data = soup.find('div', {'class': 'D(ib) Mend(20px)'}).find_all('fin-streamer')
data

At this point, the data looks pretty messy, but we can use ```.text``` to find the information between the `> <` symbols, which represents the information displayed on the website.

With a little bit of Python code, we can extract all of the information we are interested in.

In [None]:
print(ticker)
for i in range(len(data)):
    print(data[i].text)

Now, let's combine everything we learned above to create a function that can get the data for any security on Yahoo Finance.

In [None]:
def getData(symbol):
    user_agent = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.88 Safari/537.36"}
    url = f'https://finance.yahoo.com/quote/{symbol}'

    r = requests.get(url, headers=user_agent)
    if 200 <= r.status_code < 300:
        soup = BeautifulSoup(r.text, 'html.parser')

        stock = {
            'symbol': symbol,
            'price' : soup.find('div', {'class': 'D(ib) Mend(20px)'}).find_all('fin-streamer')[0].text,
            'change' : soup.find('div', {'class': 'D(ib) Mend(20px)'}).find_all('fin-streamer')[1].text,
            'percent change': soup.find('div', {'class': 'D(ib) Mend(20px)'}).find_all('fin-streamer')[2].text
        }
        
        return stock
    
    else:
        return ("Error:", r.status_code)

This function works for any symbol that exists on yahoo finance, as shown below:

In [None]:
getData("")

In [None]:
getData("")

In [None]:
getData("")

<h3>Financial Analysis</h3>

Now, lets move onto the fun part, the financial analysis!

<h4>Import Libraries and Load Data</h4>

Import libraries:

In [None]:
#%pip install pandas
#%pip install matplotlib
#%pip install numpy
#%pip install yfinance
#%pip install datetime
#%pip install mplfinance

from mplfinance.original_flavor import candlestick_ohlc
import yfinance as yf
from datetime import date
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.dates as dates
%matplotlib inline
matplotlib.style.use('fivethirtyeight')

Now, we can pick which three (or more) stocks we want to compare!

The `yfinance` library will allow us to access the downlaod the data in YahooFinance's API for any security within a date range we specify.

In [None]:
stock1 = yf.download("stock1", start="2017-01-01", end=str(date.today()))
stock2 = yf.download("stock2", start="2017-01-01", end=str(date.today()))
stock3 = yf.download("stock3", start="2017-01-01", end=str(date.today()))

We can also speficy what <a href="https://htmlcolorcodes.com/">hex code </a> we want to represent the color for each stock on our plots, this will help us later so we don't have to copy and paste the hex codes manually every time:

In [None]:
stock1_color = "#"
stock2_color = "#"
stock3_color = "#"

We have now created three `DataFrames`, or tables, containing the daily trading information for each company, as shown below.

In [None]:
stock1

In [None]:
stock2

In [None]:
stock3

<h4>Stock Price and Company Value</h4>

Let's fist create a plot comparing the opening prices for each company:

In [None]:
stock1["Open"].plot(label="Stock 1", color=stock1_color, linewidth=0.95, figsize=(15,7))
stock2["Open"].plot(label="Stock 2", color=stock2_color, linewidth=0.95)
stock3["Open"].plot(label="Stock 3", color=stock3_color, linewidth=0.95)
plt.legend()
plt.title("Stock Prices of Stock 1, Stock 2, and Stock 3")
plt.ylabel("Stock Price")
plt.show()

Ultimately, the stock price itself is largely irrelevant. Let's multiply each company's price by total number of shares outstanding to compare each comapny's value.

To find the total number of shares outstanding for a company, let's create a function that inputs a ticker symbol and returns the number of shares outstanding:

In [None]:
def getSharesOutstanding(ticker):
    if yf.Ticker(ticker).info["sharesOutstanding"]:
        return yf.Ticker(ticker).info["sharesOutstanding"]
    else:
        return "Could not find shares outstanding"

Now, we can apply the function to each of our companies:

In [None]:
stock1_num_shares = getSharesOutstanding("stock1")
print("stock1", stock1_num_shares)
stock2_num_shares = getSharesOutstanding("stock2")
print("stock2", stock2_num_shares)
stock3_num_shares = getSharesOutstanding("stock3")
print("stock3", stock3_num_shares)

We should create a new column that calculates the company value for each day, which is the product of the stock price and the total number of shares outstanding.

In [None]:
stock1["Company Value"] = stock1["Open"] * stock1_num_shares
stock2["Company Value"] = stock2["Open"] * stock2_num_shares
stock3["Company Value"] = stock3["Open"] * stock3_num_shares

Now, we can make a plot that compares the values of our three companies:

In [None]:
amzn["Company Value"].plot(label="Amazon", color=amzn_color, linewidth=0.95, figsize=(15,7))
tsla["Company Value"].plot(label="Tesla", color=tsla_color, linewidth=0.95)
goog["Company Value"].plot(label="Google", color=goog_color, linewidth=0.95)
plt.legend()
plt.title("Value of Amazon, Tesla, and Google")
plt.ylabel("Value ($)")
plt.show()

<h4>Cumulative Return</h4>

A stock's return on investment represents the percent gain or loss in value of the stock. Let's create a visualization representing a stock's cumulative return, or return on investment each days since our beginning date.

Let's create a new column representing the cumulative return, which represents a ratio between the current (for each day) value and the initial value:

In [None]:
first_day = stock1["Open"].iloc[0]
stock1["Cumulative Return"] = stock1["Open"] / first_day
first_day = stock2["Open"].iloc[0]
stock2["Cumulative Return"] = stock2["Open"] / first_day
first_day = stock3["Open"].iloc[0]
stock3["Cumulative Return"] = stock3["Open"] / first_day

Now, we can create a plot to compare the cumulative returns for each comapny. Keep in mind that every value should start at 1 since it represents the inital price.

In [None]:
stock1["Cumulative Return"].plot(label="Stock 1", color=stock1_color, linewidth=0.95, figsize=(15,7))
stock2["Cumulative Return"].plot(label="Stock 2", color=stock2_color, linewidth=0.95)
stock3["Cumulative Return"].plot(label="Stock 3", color=stock3_color, linewidth=0.95)
plt.legend()
plt.title("Cumulative Return of Stock 1, Stock 2, and Stock 3")
plt.ylabel("Return Multiplier")
plt.show()

<h4>Volume</h4>

In addition to price and cumulative return, volume is an important metric to consider since it is generally a sign of price change.

Let's plot a visulaization of the volume for each company:

In [None]:
stock1["Volume"].plot(label="Stock 1", color=stock1_color, linewidth=0.95, figsize=(17,5))
stock2["Volume"].plot(label="Stock 2", color=stock2_color, linewidth=0.95)
stock3["Volume"].plot(label="Stock 3", color=stock3_color, linewidth=0.95)
plt.legend()
plt.title("Stock 1, Stock 2, and Stock 3 Daily Volume")
plt.ylabel("Daily volume (100 millions)")
plt.show()

Let's look into the tallest spike!

In [None]:
#Find row with highest volume
print(stockN["Volume"].argmax())
stockN.iloc[[stockN["Volume"].argmax()]]

What happened around this time period? Let's do some investigation!

In [None]:
stockN.iloc[-25:+25]["Open"].plot(linewidth=0.95)
stockN.iloc[-25:+25]["High"].plot(linewidth=0.95)
stockN.iloc[-25:+25]["Low"].plot(linewidth=0.95)
stockN.iloc[-25:+25]["Close"].plot(linewidth=0.95)
plt.legend()
plt.show()

Volume does not take into account price, so let's compare the total dollar amount traded for each company by creating a new column:

In [None]:
stock1["Dollar Volume"] = stock1["Open"] * stock1["Volume"]
stock2["Dollar Volume"] = stock2["Open"] * stock2["Volume"]
stock3["Dollar Volume"] = stock3["Open"] * stock3["Volume"]

In [None]:
stock1["Dollar Volume"].plot(label="Stock 1", color=stock1_color, linewidth=0.95, figsize=(15,7))
stock2["Dollar Volume"].plot(label="Stock 2", color=stock2_color, linewidth=0.95)
stock3["Dollar Volume"].plot(label="Stock 3", color=stock3_color, linewidth=0.95)
plt.legend()
plt.title("Dollar Amount Volume of Stock 1, Stock 2, and Stock 3")
plt.ylabel("Volume ($100 billions)")
plt.show()

When we factor in share price, what changes? Let's investigate the new peak if so, or we could investigate the peak of a different company: 

In [None]:
#Find row with highest volume
stockN.iloc[[stockN["Dollar Volume"].argmax()]]

<h4>Moving Average</h4>

When analyzing a security for long term investment, you might be skeptial of its true value based on what is seemingly noisy data, or random price fluctuations, meaningless in the long term.

Therefore, we can smooth the sharp edges with a moving average, which represents the average of a previous fixed time period. 

Let's check out a stock's moving average:

In [None]:
#Evenly Weighted Moving Average (# days) and Stock 3 Price
stockN["Moving Average 200"] = stockN["Open"].rolling(200).mean()
stockN["Moving Average 200"].plot(label="Moving Average 200 Days", linewidth=0.95, figsize=(15,7))
stockN["Moving Average 50"] = stockN["Open"].rolling(50).mean()
stockN["Moving Average 50"].plot(label="Moving Average 50 Days", linewidth=0.95)
stockN["Open"].plot(linewidth=0.9)
plt.legend()
plt.show()

Let's also check out an exponentially weighted moving average, which gives more recent dates a higher weight, balanced out with a lower weight for the eariler days:

In [None]:
#Exponentally Weighted Moving Average (# days) and Stock 3 Price
stock3["Exp Moving Average 100"] = stock3.iloc[:,0].ewm(span=100,adjust=False).mean()
stock3["Exp Moving Average 100"].plot(label="Moving Average 100 Days", linewidth=0.95, figsize=(15,7))
stock3["Exp Moving Average 30"] = stock3.iloc[:,0].ewm(span=30,adjust=False).mean()
stock3["Exp Moving Average 30"].plot(label="Moving Average 30 Days", linewidth=0.95)
stock3["Open"].plot(linewidth=0.9)
plt.legend()
plt.show()

You might have noticed that the more days in a moving average, the flatter the data. For example, a 100 day moving average is the average price of the previous 100 days, whereas a 25 day moving average is the average price of the previous 25 days. Larger numbers are less prone to small fluctiations.

<h4>Stock Price Correlation<h/4>

When we are analyzing a series of stocks, one important thing to keep in mind is the relationship between different stock prices. Is there a linear correlation between various stock prices?

To comapre the opening prices for our three companies, we will create a new `DataFrame`, or table, by concatenating the opening price columns of the different companies.

In [None]:
comparison = pd.concat([stock1["Open"],stock2["Open"],stock3["Open"]], axis=1)
comparison.columns = ["Stock 1 Open", "Stock 2 Open", "Stock 3 Open"]

Now, we will make a scatter matrix to compare the relationships between each combination of companies:

In [None]:
pd.plotting.scatter_matrix(comparison, hist_kwds={'bins':50}, figsize=(9,9))
plt.show()

<h4>Volatility</h4>

Volatility is very important when comapring stocks, because is it much more difficult to gain back losses (in terms of percentage). For example, a 50% decrease requires a 100% increase to return to the original value.

For each company, let's calculate the daily return for each day:

In [None]:
stock1["daily_return"] = (stock1["Close"] / stock1["Close"].shift(1)) - 1
stock2["daily_return"] = (stock2["Close"] / stock2["Close"].shift(1)) - 1
stock3["daily_return"] = (stock3["Close"] / stock3["Close"].shift(1)) - 1

Now, let's plot a histogram of the distribution of daily returns for one company.

In [None]:
stockN["daily_return"].plot.hist(bins=50, color=stockN_color)

In [None]:
stockN["daily_return"].describe()

In [None]:
#(1 + mean)^253 - 1
mean = stockN["daily_return"].mean()
np.power(1 + mean, 253) - 1

Let's compare all three daily return histograms:

In [None]:
stock1["daily_return"].plot.hist(bins=100, label="Stock 1", alpha=0.5, color=stock1_color, figsize=(13,6))
stock2["daily_return"].plot.hist(bins=100, label="Stock 2", alpha=0.5, color=stock2_color)
stock3["daily_return"].plot.hist(bins=100, label="Stock 3", alpha=0.5, color=stock3_color)
plt.legend()
plt.show()

To confirm our findings, let's make a <a href="https://en.wikipedia.org/wiki/Kernel_density_estimation">Kernel density estimation</a> plot, which standardizes the data across the companies, as shown below:

In [None]:
#Kernel Density Estimation
stock1["daily_return"].plot(kind="kde", label="Stock 1", color=stock1_color, figsize=(15,6))
stock2["daily_return"].plot(kind="kde", label="Stock 2", color=stock2_color)
stock3["daily_return"].plot(kind="kde", label="Stock 3", color=stock3_color)
plt.legend()
plt.show()

Let's also make some boxplots to compare the price volatilities of each company:

In [None]:
box_df = pd.concat([stock1["daily_return"], stock2["daily_return"], stock3["daily_return"]], axis=1)
box_df.columns=["Stock 1 Daily Returns", "Stock 2 Daily Returns", "Stock 3 Daily Returns"]
box_df.plot(kind="box", figsize=(16,6))

The boxplots confirm the findings from the Kernel density estimation. Now, let's make a scatter matrix to compare the daily returns between each combination of companies:

In [None]:
pd.plotting.scatter_matrix(box_df, hist_kwds={"bins":50}, alpha=0.25, figsize=(8,8))
plt.show()

<h4>Candlestick Graphing</h4>

Day traders use candlestick graphs to track movement in a stock price throughout the day.

<img src="candlestick.png"></img>

If you are interested in learning more about candlesticks, check out this article: https://www.investopedia.com/trading/candlestick-charting-what-is-it/

In [None]:
stock2_day_prices = yf.download(tickers="stock2",
                            period="1d",
                            interval="5m",
                            auto_adjust=True)
stock2_day_prices = stock2_day_prices.reset_index()
stock2_day_prices

In [None]:
stockN_day_prices["date_ax"] = stockN_day_prices["Datetime"].apply(lambda date: dates.date2num(date))
stockN_tuple = [tuple(vals) for vals in stock2_day_prices[["date_ax", "Open", "High", "Low", "Close"]].values]


fig, ax = plt.subplots()
candlestick_ohlc(ax, stock2_tuple, width=0.0015, colorup="g")
plt.xticks([])
plt.title("Stock Candlestick Most Recent Trading Day")
plt.ylabel("Price")
plt.show()

From this candlestick graph, we can recognize the price movement of the security for that day. 

Special thanks to the following YouTube videos for providing some of the code that I used in this notebook:
<ol>
    <li><a href>https://www.youtube.com/watch?v=7sFCOunKL_Y</a></li>
    <li><a href>https://www.youtube.com/watch?v=57qAxRV577c</a></li>
</ol>