# Financial Data Analysis with Python - a Deep Dive

In [None]:
import pandas as pd
# import Yahoo finance API
import yfinance as yf

In [None]:
# set start and end date for time series
start = "2014-10-01"
end = "2021-05-31"

In [None]:
# set ticker symbol of financial instrument
# BA = The Boeing Company
# can be searcherd at https://finance.yahoo.com/
symbol = "BA"

In [None]:
# download historical price data for Boeing
# Boeing is a dividend paying stock therefore the adjusted close is lower than the close
# open, high, low and close prices don't take into account dividend payouts
# daily trading volume is in units (numbers of stocks that have been traded on a day)
df = yf.download(symbol, start, end)
df

In [None]:
# get some meta information about the data frame
df.info()

In [None]:
# drawing data for different ticker symbols at once
symbol = ["BA", "MSFT", "^DJI", "EURUSD=X", "GC=F", "BTC-USD"]

Ticker Symbols:
- **BA:** Boeing (US Stock)
- **MSFT:** Microsoft Corp (US Stock)
- **^DJI:** Dow Jones Industrial Average (US Stock Index)
- **EURUSD=X:** Exchange Rate for Currency Pair EUR/USD (Forex)
- **GC=F:** Gold Price (Precious Metal / Commodity)
- **BTC-USD:** Bitcoin in USD (Cryptocurrency)

In [None]:
df = yf.download(symbol, start, end)
df

In [None]:
df.info()

In [None]:
# saving the panda's data frame locally in a file
df.to_csv("../../Assets/Data-Files/multi_assets.csv")

## Initial Inspection and Visualization

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
pd.options.display.float_format = '{:.4f}'.format
plt.style.use("seaborn")

Some more (advanced) techniques to load data into Pandas: https://blog.udemy.com/how-to-create-pandas-dataframes-a-hands-on-guide/

In [None]:
# load the data back into python and pandas
# header = [0, 1] sets the first two rows to column headers
# index_col = 0 moves date to the index
# parse_dates transform date string to datetime index
df = pd.read_csv("../../Assets/Data-Files/multi_assets.csv", header = [0, 1], index_col = 0, parse_dates = [0])
df

In [None]:
df.info()

In [None]:
# select data element in the outer index level
df.Close

In [None]:
# further slice data frame and select single column
df.Close.BA # returns onedimensional labeled array

In [None]:
# second option how to select one column
# DataFrame.loc accesses a group of rows and columns by label(s) or a boolean array
# https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html
df.loc[:, ("Close", "BA")]

In [None]:
# select single row for all columns
df.loc["2015-01-07"]

In [None]:
# select single year
df.loc["2015"]

In [None]:
# swap the levels
df = df.swaplevel(axis = "columns").sort_index(axis = "columns") # swap outer and inner index
df

In [None]:
# after the swap some columns have special characters
# so they can't be retrieved with dot noation anymore
df["EURUSD=X"]

In [None]:
df["BTC-USD"]

In [None]:
# swap the levels back
df = df.swaplevel(axis = "columns").sort_index(axis = "columns") # swap outer and inner index
df

In [None]:
# for further analysis we're only intersted in the close price
# select daily close prices only and create a separate data frame
close = df.Close.copy()
close

In [None]:
# get summary statistics
close.describe()

In [None]:
# create price chart for singule instrument
# dropna() drops missing values
# create chart with plot method
close.BA.dropna().plot(figsize = (15, 8), fontsize = 13)
plt.legend(fontsize = 13)
plt.show()

In [None]:
# create price chart for all six instruments in one single chart
close.dropna().plot(figsize = (15, 8), fontsize = 13)
plt.legend(fontsize = 13)
plt.show()

**Take Home: Absolute Prices are absolutely meaningless/useless (in most cases)**

- Prices that are in a different scale are hard to compare
- A higher price does not imply a higher value or a better performance

## Normalizing Financial Time Series to Base Value (100)

Normalizing to **Base Value** means that all instruments start at the very same level.

In [None]:
# the data frame to work with
close

In [None]:
# normalizing for one instrument (Boeing)
close.iloc[0,0] # first price of BA

In [None]:
# divide all prices of BA by the very first price
# which creates a base value of 1
# then multiply with 100 to get base value of 100
close.BA.div(close.iloc[0,0]).mul(100)

In [None]:
close.iloc[0] # first price all tickers

In [None]:
norm = close.div(close.iloc[0]).mul(100)
norm

In [None]:
# visualizing the normalized data
norm.dropna().plot(figsize = (15, 8), fontsize = 13, logy = False)
plt.legend(fontsize = 13)
plt.show()

The normalized data is still difficult to compare because Bitcoin shows a much better performance than the other five instruments. To make the data better comparable the scale of the y-axis can be changed to create a logarithimc scale.

In [None]:
# changing the scale of the y-axis to logarithmic scale with logy = True
norm.dropna().plot(figsize = (15, 8), fontsize = 13, logy = True)
plt.legend(fontsize = 13)
plt.show()

**Take Home: Normalized prices help to compare financial instruments....**

**....but they are limited when it comes to measuring/comparing performance in more detail**

In [None]:
close.to_csv(("../../Assets/Data-Files/close.csv"))