<h1>Cookies and Code: Advanved Stock Analysis with Python</h1>
<h3>By: Samuel Kellum</h3>

<h3>Web Scraping</h3>

If you look at quotes for various securities on Yahoo Finance, you might notice that the URL is always in the following format ```https://finance.yahoo.com/quote/{symbol}```. We can use this knowledge to quickly collect information for any security on Yahoo Finance.

The code below shows how you can concatenate a URL with Python code.

In [None]:
ticker = ""
url = f'https://finance.yahoo.com/quote/{ticker}'
print(url)

In [None]:
ticker = ""
url = f'https://finance.yahoo.com/quote/{ticker}'
print(url)

Now, to collect data from Yahoo Finance, we need to import a couple of libraries that give us access to various methods (or commands) that we want to use to collect the information from our desired URLs. (Note: you may need to uncomment (by removing ONLY the hashtag, not the percent sign) one or more the pip commands if an imported library is not installed on your machine.

In [None]:
#Import libraries
#%pip install requests
#%pip install bs4
import requests
from bs4 import BeautifulSoup

The first thing we need to do when using Python to collect data from a website is to perform a HTTP GET request on the URL.

To check whether or not the GET request was successful, we can return the status code of the get request.

If the HTTP response status code is 200-299, that means the GET request was successful.

More info on status codes: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status#successful_responses

Some websites block requests that come in without a valid browser as a User-Agent, so we can pass in a user-agent as a parameter into the GET request.

In [None]:
#You can update user-agent by searching "What is my user agent?" on Google.
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36"}
r = requests.get(url, headers=headers)
print(r.status_code)

At this point, we will use Beautiful Soup to parse the data from a particular HTML element titled <i>"D(ib) Mend(20px)"</i> on Yahoo Finance, which will represent the stock price, change, and percent change in the form of a `List`:

In [None]:
soup = BeautifulSoup(r.text, 'html.parser')
data = soup.find('div', {'class': 'D(ib) Mend(20px)'}).find_all('fin-streamer')
data

At this point, the data looks pretty messy, but we can use ```.text``` to find the information between the `> <` symbols, which represents the information displayed on the website.

With a little bit of Python code, we can extract all of the information we are interested in.

In [None]:
print(ticker)
for i in range(len(data)):
    print(data[i].text)

Now, let's combine everything we learned above to create a function that can get the data for any security.

In [None]:
def getData(symbol):
    headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36"}
    url = f'https://finance.yahoo.com/quote/{symbol}'

    r = requests.get(url, headers=headers)
    if 200 <= r.status_code <= 300:
        soup = BeautifulSoup(r.text, 'html.parser')

        stock = {
            'symbol': symbol,
            'price' : soup.find('div', {'class': 'D(ib) Mend(20px)'}).find_all('fin-streamer')[0].text,
            'change' : soup.find('div', {'class': 'D(ib) Mend(20px)'}).find_all('fin-streamer')[1].text,
            'percent change': soup.find('div', {'class': 'D(ib) Mend(20px)'}).find_all('fin-streamer')[2].text
        }
        return stock
    else:
        return ("Error:", r.status_code)
    

This function works for any symbol that exists on yahoo finance, as shown below:

In [None]:
getData("")

In [None]:
getData("")

In [None]:
getData("")

<h3>Financial Analysis</h3>

Now, lets move onto the fun part, the financial analysis!

<h4>Load Data</h4>

Import libraries:

In [None]:
#%pip install pandas
#%pip install pandas_datareader
#%pip install datetime
#%pip install matplotlib
#%pip install numpy
import pandas_datareader.data as web
import datetime
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
%matplotlib inline

To decide what date range we want to analyze, we can set the start and end dates with the `datetime` library.

In [None]:
start = datetime.datetime(2017,1,1) #Format: (YYYY, M, D)
end = datetime.date.today()
#Uncomment the line below and comment out the line above if you want to specify an earlier end date
#end = datetime.datetime(2022,1,1)

Now, we can pick which three (or more) stocks we want to compare!

The `web.DataReader` library will allow us to access the data in YahooFinance's API for any security within a date range we specify.

In [None]:
stock1 = web.DataReader("stock1", "yahoo", start, end)
stock2 = web.DataReader("stock2", "yahoo", start, end)
stock3 = web.DataReader("stock3", "yahoo", start, end)

We have now created three `DataFrames`, or tables, containing the daily trading information for each company, as shown below.

In [None]:
stock1

In [None]:
stock2

In [None]:
stock3

<h4>Price</h4>

Let's fist create a plot comparing the opening prices for each company, as shown below:

In [None]:
stock1["Open"].plot(label="Stock 1", figsize=(15,7), color="")
stock2["Open"].plot(label="Stock 2", color="")
stock3["Open"].plot(label="Stock 3", color="")
plt.legend()
plt.title("Stock Prices of Stock 1, Stock 2, and Stock 3")
plt.ylabel("Stock Price")
plt.show()

<h4>Cumulative Return</h4>

Since it is difficult to compare securities with different stock prices, for each security, let's calculate the daily cumulative return:

In [None]:
first_day = stock1["Open"].iloc[0]
stock1["Cumulative Return"] = stock1["Open"] / first_day
first_day = stock2["Open"].iloc[0]
stock2["Cumulative Return"] = stock2["Open"] / first_day
first_day = stock3["Open"].iloc[0]
stock3["Cumulative Return"] = stock3["Open"] / first_day

Let's create a plot comparing the cumulative return for each comapny.

In [None]:
stock1["Cumulative Return"].plot(label="Stock 1", figsize=(15,7), color="")
stock2["Cumulative Return"].plot(label="Stock 2", color="")
stock3["Cumulative Return"].plot(label="Stock 3", color="")
plt.legend()
plt.title("Cumulative Return of Stock 1, Stock 2, and Stock 3")
plt.ylabel("Return Multiplier")
plt.show()

<h4>Volume</h4>

In addition to price and cumulative return, volume is an important metric to consider since it is generally a sign of movement.

Let's plot a visulaization of the volume for each company:

In [None]:
stock1["Volume"].plot(label="Stock 1", figsize=(17,5), color="")
stock2["Volume"].plot(label="Stock 2", color="")
stock3["Volume"].plot(label="Stock 3", color="")
plt.legend()
plt.title("Stock 1, Stock 2, and Stock 3 Daily Volume")
plt.ylabel("Daily volume")
plt.show()

Let's check out what happened on the day of the biggest spike!

In [None]:
#Find row with highest volume
print(stockN["Volume"].argmax())
stockN.iloc[[stockN["Volume"].argmax()]]

What happened around this time period? Let's do some investigation!

In [None]:
stockN.iloc['beg':'end']["Open"].plot()
plt.show()

Volume does not take into account price, so let's compare the total dollar amount traded for each company:

In [None]:
stock1["Dollar Volume"] = stock1["Open"] * stock1["Volume"]
stock2["Dollar Volume"] = stock2["Open"] * stock2["Volume"]
stock3["Dollar Volume"] = stock3["Open"] * stock3["Volume"]

In [None]:
stock1["Dollar Volume"].plot(label="Stock 1", figsize=(15,7), color="orange")
stock2["Dollar Volume"].plot(label="Stock 2", color="")
stock3["Dollar Volume"].plot(label="Stock 3", color="")
plt.legend()
plt.title("Dollar Amount Volume of Stock 1, Stock 2, and Stock 3")
plt.ylabel("Volume ($100 billions)")
plt.show()

Is there a new largest spike? If so, let's find the date with the highest volume and explore what happened!

In [None]:
#Find row with highest volume
print(stockN["Dollar Volume"].argmax())
stockN.iloc[[stockN["Dollar Volume"].argmax()]]

<h4>Moving Average</h4>

When analyzing a security, you might get overwhelmed by looking at all of the little spikes and drops, or noisy data, that is meaningless over a longer period of time.

Therefore, we can smooth the noisy curve with a moving average, which represents the average of a previous fixed time period. 

Let's check out a stock's moving average:

In [None]:
stockN["Open"].plot(figsize=(15,7))
stockN["Moving Average 50"] = stockN["Open"].rolling(50).mean()
stockN["Moving Average 50"].plot(label="Moving Average 50 Days")
stockN["Moving Average 200"] = stockN["Open"].rolling(200).mean()
stockN["Moving Average 200"].plot(label="Moving Average 200 Days")
plt.legend()
plt.show()

As you can see, the higher the moving average, the flatter the data. A 200 day moving average is the average price of the previous 200 days, whereas a 50 day moving average is the average price of the previous 50 days.

<h4>Stock Price Correlation<h/4>

When we are analyzing a series of stocks, one important thing to keep in mind is the relationship between different stock prices. Is there a linear correlation between various stock prices?

To comapre the opening prices for our three companies, we will create a new `DataFrame`, or table, by concatenating the opening price columns of the different companies.

In [None]:
comparison = pd.concat([stock1["Open"],stock2["Open"],stock3["Open"]], axis=1)
comparison.columns = ["Stock 1 Open", "Stock 2 Open", "Stock 3 Open"]

Now, we will make a scatter matrix to compare the relationships between each combination of companies:

In [None]:
pd.plotting.scatter_matrix(comparison, figsize=(9,9), hist_kwds={'bins':50})
plt.show()

<h4>Volatility</h4>

Volatility is very a important metric when comapring stocks, because is it much more difficult to gain back losses (in terms of percentage). For example, a 50% decrease requires a 100% increase to return to the original value.

For each company, let's calculate the daily return for each day:

In [None]:
stock1["daily_return"] = (stock1["Close"] / stock1["Close"].shift(1)) - 1
stock2["daily_return"] = (stock2["Close"] / stock2["Close"].shift(1)) - 1
stock3["daily_return"] = (stock3["Close"] / stock3["Close"].shift(1)) - 1

Now, let's plot a histogram of the distribution of daily returns for one company.

In [None]:
stockN["daily_return"].plot.hist(bins=50, color="")

In [None]:
stockN["daily_return"].describe()

In [None]:
#(1 + mean)^253 - 1
mean = stockN["daily_return"].mean()
np.power(1 + mean, 253) - 1

Let's compare all three daily return histograms:

In [None]:
stock1["daily_return"].plot.hist(bins=100, label="Stock 1", alpha=0.5, color="", figsize=(13,6))
stock2["daily_return"].plot.hist(bins=100, label="Stock 2", alpha=0.5, color="")
stock3["daily_return"].plot.hist(bins=100, label="Stock 3", alpha=0.5, color="")
plt.legend()
plt.show()

In [None]:
#Kernel Density Estimation
stock1["daily_return"].plot(kind="kde", label="Stock 1", figsize=(15,6), color="")
stock2["daily_return"].plot(kind="kde", label="Stock 2", color="")
stock3["daily_return"].plot(kind="kde", label="Stock 3", color="")
plt.legend()
plt.show()

From this graph, we can tell that tallest and thinnest density plot is the least volatile security, whereas the shortest and thickest density plot is the most volatile stock.

Let's make some boxplots to compare the price volatilities of each ticker:

In [None]:
box_df = pd.concat([stock1["daily_return"], stock2["daily_return"], stock3["daily_return"]], axis=1)
box_df.columns=["Stock 1 Daily Returns", "Stock 2 Daily Returns", "Stock 3 Daily Returns"]
box_df.plot(kind="box", figsize=(16,6))

The boxplots confirm the findings from the Kernel density estimation, where the longer box-plots represent more volatile stocks. Now, let's make a scatter matrix to compare the daily returns between each combination of companies:

In [None]:
pd.plotting.scatter_matrix(box_df,figsize=(8,8), hist_kwds={"bins":50}, alpha=0.25)
plt.show()

<h4>Candlestick Graphing</h4>

Day traders use candlestick graphs to track movement in a stock price throughout the day.

<img src="candlestick.png"></img>

If you are interested in learning more about candlesticks, check out this article: https://www.investopedia.com/trading/candlestick-charting-what-is-it/

In [None]:
#%pip install mpl_finance
from mplfinance.original_flavor import candlestick_ohlc
import matplotlib.dates as dates

#%pip install yfinance
import yfinance as yf

stock_day_prices = yf.download(tickers="",
                            period="1d",
                            interval="5m",
                            auto_adjust=True)
stock_day_prices = stock_day_prices.reset_index()
stock_day_prices

In [None]:
stock_day_values = [tuple(vals) for vals in stock_day_prices[["date_ax", "Open", "High", "Low", "Close"]].values]

fig, ax = plt.subplots()
candlestick_ohlc(ax, tsla_dec_2020_values, width=0.001, colorup="g")
plt.xticks([])
plt.title("Stock Candlestick Most Recent Trading Day")
plt.ylabel("Price")
plt.show()

From this candlestick graph, we can recognize the price movement of the security for that day. 

Special thanks to the following YouTube videos for providing some of the code that I used in this notebook:
<ol>
    <li><a href>https://www.youtube.com/watch?v=7sFCOunKL_Y</a></li>
    <li><a href>https://www.youtube.com/watch?v=57qAxRV577c</a></li>
</ol>