# Data Analysis - Financial Time Series

**Author**: [Gabriele Pompa](https://www.linkedin.com/in/gabrielepompa/): gabriele.pompa@unisi.com

# Table of contents

[Executive Summary](#executive-summary)

1. [Introduction to yfinance library](#introduction-to-yfinance-library)\
    1.1. [Installing yfinance](#installing-yfinance)\
    1.2. [`yfinance` module basic usage](#yfinance-module-basic-usage)\
&nbsp; &nbsp; &nbsp; &nbsp; 1.2.1. [How to lookup for a Yahoo! Finance ticker of a security](#how-to-lookup-for-a-ticker-of-a-security)\
&nbsp; &nbsp; &nbsp; &nbsp; 1.2.2. [How to get market and meta data for a security: `yf.Ticker()` module](#get-market-and-meta-data:-yf.ticker()-module)\
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1.2.2.1. [Multiple securities simultaneously: `yf.Tickers()` module](#multiple-securities-simultaneously:-yf.tickers()-module)\
&nbsp; &nbsp; &nbsp; &nbsp; 1.2.3. [Mass download of market data: `yf.download()` function](#get-market-and-meta-data:-yf.download()-function)\
2. [Data Analysis](#data-analysis)\
    2.1. [_Focus on:_ buy-and-hold Portfolio of Stocks](#focus-on:-buy-and-hold-portfolio-of-stocks)\
    2.2. [Summary Statistics](#summary-statistics)\
    2.3. [Returns](#returns)\
&nbsp; &nbsp; &nbsp; &nbsp; 2.3.1. [Simple Returns: `.pct_change()` method](#simple-returns:-.pct_change()-method)\
&nbsp; &nbsp; &nbsp; &nbsp; 2.3.2. [_Focus on:_ Simple Returns of Equally Weighted Portfolio](#focus-on:-simple-returns-of-equally-weighted-portfolio)\
&nbsp; &nbsp; &nbsp; &nbsp; 2.3.3. [Log-Returns: `.shift()` method](#log-returns:-.shift()-method)\
    2.4. [Resampling: `.resample()` method](#resampling:-.resample()-method)\
&nbsp; &nbsp; &nbsp; &nbsp; 2.4.1. [Resampling Prices](#resampling-prices)\
&nbsp; &nbsp; &nbsp; &nbsp; 2.4.2. [Resampling log-Returns](#resampling-log-returns)\
&nbsp; &nbsp; &nbsp; &nbsp; 2.4.3. [_Focus on:_ graphical tests of S&P500 Returns Normality](#focus-on:-graphical-tests-of-s&p500-normality-of-returns)\
    2.5. [Rolling Statistics: `.rolling()` method](#rolling-statistics:-.rolling()-method)\
&nbsp; &nbsp; &nbsp; &nbsp; 2.5.1. [Rolling Correlation Matrix](#rolling-correlation-matrix)\
    2.6. [_Focus on:_ S&P500 - VIX correlation analysis](#focus-on:-s&p500-vix-correlation-analysis)\
&nbsp; &nbsp; &nbsp; &nbsp; 2.6.1. [Regression Analysis: $VIX = \alpha + \beta SPX$](#regression-analysis)\
&nbsp; &nbsp; &nbsp; &nbsp; 2.6.2. [Correlation Analysis](#correlation-analysis)

### **Resources**: 

**TODO**

# Executive Summary <a name="executive-summary"></a>

**TODO**

These are the basic imports that we need to work with NumPy, Pandas and to plot data using Matplotlib functionalities

In [None]:
# for NumPy arrays
import numpy as np

# for Pandas Series and DataFrame
import pandas as pd

# for Matplotlib plotting
import matplotlib.pyplot as plt

# to do inline plots in the Notebook
%matplotlib inline

# for Operating System operations
import os

# 1. Introduction <a name="introduction-to-yfinance-library"></a>

When you need to process data using a programming language, either you have your data stored in a file or database (as we have seen in the previous lesson) or you need to get your data from a Data Provider, such as Bloomberg, Reuters, etc.

Typically, data providers store secured their data in remote servers and expose interfaces to the public: these are called [Application Programming Interfaces](https://en.wikipedia.org/wiki/Application_programming_interface) (APIs). For what concern us, an API is a particular piece of code that allows you to get data from a data provider.

There are plenty of APIs which manage the interface between Python code and financial data, like [Reuters Eikon Data API](https://developers.refinitiv.com/eikon-apis/eikon-data-api). Being a business in itself, most of APIs to retrieve financial data are not for free.

Luckily for us, and thanks to people like [Ran Aroussi](https://aroussi.com/), we have a Python API, called [yfinance](https://github.com/ranaroussi/yfinance), which is 100% free:

- Github page for yfinance library: [https://github.com/ranaroussi/yfinance](https://github.com/ranaroussi/yfinance);
- Blog post form Ran Aroussi with a yfinance tutorial: [https://aroussi.com/post/python-yahoo-finance](https://aroussi.com/post/python-yahoo-finance).

In a nutshell, yfinance - named after the now decommissioned _Yahoo! Finance_ API - is a reliable Python API to retrieve market data.

The yfinance API comes under the form of a Python module: `yfinance`. We shall see now how to include it in our Conda installation. 

## 1.1. Installing yfinance <a name="installing-yfinance"></a>

All you need to do to install yfinance library is to:

- (If not done already) In your Anaconda Navigator switch to the class conda environment `ITForBusAndFin2020_env`  (see [Figure 1](#anaconda_nav_and_env)). For Mac users: in case you are working under `base (root)` environment, it's ok, you can stay there.

| ![](../images/anaconda_nav_and_env.PNG) <a name="anaconda_nav_and_env"></a>| 
|:--:| 
| _**Figure 1**: in Anaconda Navigator, switch to the class conda environment_ |

- (iIf not done already) Open your terminal window (the usual black command line window) using the _CMD.exe Prompt_ app or the _console_shortcut_ app in the Anaconda Navigator (both apps are fine and you have displayed one or the other depending on whether you have already updated the Anaconda Navigator or not yet, see [Figure 2](#open_terminal)).

| ![](../images/CMD_exe.PNG) | 
|:--:| 
| ![](../images/console_shortcut.PNG) | 
| _**Figure 2**: Open a Terminal window using the  CMD.exe Prompt app or console_shorcut app in Anaconda Navigator_ <a name="open_terminal"></a>|

- (If not done already) In the terminal window, change directory to your local class folder typing `cd` followed by the complete path to the class folder, like `C:\Users\gabri\Projects\IT_For_Business_And_Finance_2019_20` on my local machine (see [Figure 3](#yfinance_png) below) 


- In the terminal window type the command to install `yfinance` (see [Figure 3](#yfinance_png)):
  ```
  pip install yfinance --upgrade --no-cache-dir
  ```
  
| ![](../images/yfinance.png) <a name="yfinance_png"></a>| 
|:--:| 
| _**Figure 3**: change directory to the class folder and install `yfinance` package in conda_ |



- Always type `y` when asked for installation confirmation;


- You can check that `yfinance` is now part of the packages available in your conda environment typing
  ```
  conda list 
  ```
  which lists all the packages installed (see resulting screen from `conda list` command in [Figure 4](#conda_list_yfinance))

| ![](../images/conda_list_yfinance.png) <a name="conda_list_yfinance"></a>| 
|:--:| 
| _**Figure 4**: check that `yfinance` is installed, typing `conda list`_ |


## 1.2. `yfinance` module basic usage <a name="yfinance-module-basic-usage"></a>

To use yfinance library, we just import the corresponding `yfinance` Python module, giving it the `yf` alias.

In [None]:
import yfinance as yf

For details on `yfinance` usage, see the [dedicated blog post](https://aroussi.com/post/python-yahoo-finance) from Ran Aroussi. Broadly speaking, `yfinance` allows you to:

- get market and meta data for one (or more than one) security, using the `yf.Ticker()` module;
- doing mass download of market data, using the `yf.download()` function.

Let's reuse the utility function to delete files

In [None]:
def removeFile(fileName):
    """
    removeFile(fileName) function remove file 'fileName', if it exists. It also prints on screen a success/failure message.
    
    Parameters:
        fileName (str): name of the file ('Data' folder is assumed)
        
    Returns:
        None
    """

    if os.path.isfile(os.path.join(dataFolderPath, fileName)):
        os.remove(os.path.join(dataFolderPath, fileName))

        # double-check if file still exists
        fileStillExists = os.path.isfile(os.path.join(dataFolderPath, fileName))

        if fileStillExists:
            print("Failure: file {} still exists...".format(fileName))
        else:
            print("Success: file {} successfully removed!".format(fileName))
            
    else:
        print("File {} already removed.".format(fileName))

In [None]:
downloadData = False #True

### 1.2.1. How to lookup for a Yahoo! Finance ticker of a security <a name="how-to-lookup-for-a-ticker-of-a-security"></a>

If you want to get market data and information for a security, you need to use the `yf.Ticker()` module, which takes in input the appropriate security symbol:

```python
yf.Ticker(SymbolString)
```

where `SymbolString` is the Python String representing the symbol ticker of the desired security (like 'AAPL', 'MSFT', etc.).

Most symbols are well known from financial news (like 'AAPL' for Apple Inc. or 'MSFT' for Microsoft Corporation, etc.), but in case you know the public name of a company or security but don't remember the corresponding symbol, you can use the [Symbol Lookup from Yahoo Finance](https://finance.yahoo.com/lookup/). 

Suppose you want to look for the Fiat Chrysler Automobiles symbol. Start writing the public name of the company in the search bar and you get the back the information that the symbol is **'FCAU'**.

| ![](../images/yahoo_symbol_lookup.png) <a name="yahoo_symbol_lookup"></a>| 
|:--:| 
| _**Figure 5**: Symbol Lookup from Yahoo Finance_ |

Here is a list of tickers that we will use in this notebook

Yahoo! Finance ticker | Name
:---: | :---
    'AAPL' | Apple Inc. Stock
    'GOOG' | Alphabet Inc. Stock
    'FB'   | Facebook, Inc. Stock
    'MSFT' | Microsoft Corporation Stock
    'INTC' | Intel Corporation Stock
    'AMZN' | Amazon.com, Inc. Stock
    'BABA' | Alibaba Group Holding Limited Stock
    'NFLX' | Netflix, Inc. Stock
    'DIS'  | The Walt Disney Company Stock
    'GE'   | General Electric Company Stock
    'GS'   | The Goldman Sachs Group, Inc. Stock
    'DB'   | Deutsche Bank Aktiengesellschaft Stock
    '^GSPC'| S&P 500 Index
    '^VIX' | CBOE Volatility Index
    'EURUSD=X' | EUR/USD Exchange Rate
    'EURCHF=X' | EUR/CHF Exchange Rate
    'EURGBP=X' | EUR/GBP Exchange Rate
    'FCAU' | Fiat Chrysler Automobiles N.V.
    'E' | Eni S.p.A. Stock
    'ENIA' | Enel Americas S.A. Stock

### 1.2.2. How to get market and meta data for a security: `yf.Ticker()` module <a name="get-market-and-meta-data:-yf.ticker()-module"></a>

In [None]:
aapl = yf.Ticker("AAPL")
aapl

In [None]:
import json

In [None]:
dataFolderPath = "../Data"

In [None]:
filePath = os.path.join(dataFolderPath, "AAPL_Stock_Info.json")

In [None]:
if downloadData:
    %time aapl_info = aapl.info

else:
    with open(filePath, 'r') as file:
        %time aapl_info = json.load(file)

aapl_info

In [None]:
aapl_info['longBusinessSummary']

In [None]:
aapl_info['regularMarketPreviousClose']

In [None]:
if downloadData:
    with open(filePath, 'w') as file:
        %time json.dump(aapl_info, file, indent="\t")

In [None]:
# removeFile(filePath)

In [None]:
aapl.actions

In [None]:
ax = aapl.actions.plot(secondary_y="Dividends", figsize=(10,6))

ax.set_ylabel("Number of Stock Splits")
ax.right_ax.set_ylabel("Dividends (USD)")

````python
.history(start[, end, interval])
```

where:
    
- `start` parameter, is the `"YYYY-MM-DD"` Python String representing the first date for which we query data;
- `end` parameter, optional, is the `"YYYY-MM-DD"` Python String representing the last date for which we query data. By default is the last available date, which is usually either today or the last business day;
- `interval` parameter, optional, is the Python String representing the frequency of data retrieval. By default it is `"1d"`, that is daily frequency. Accepted values are:
    
`interval` parameter | data frequency
:---: | :---
`"1m"` | every 1 minute
`"2m"` | every 2 minutes
`"5m"` | every 5 minutes
`"15m"` | every 15 minutes
`"30m"` | every 30 minutes
`"60m"` | every 60 minutes
`"90m"` | every 90 minutes
`"1h"` | every 1 hour
`"1d"` | every 1 day
`"5d"` | every 5 days
`"1wk"` | every 1 week
`"1mo"` | every 1 month
`"3mo"` | every 3 months

alternatively

````python
.history(period[, interval])
```

where `period` parameter, optional, the length of the most recent period for which we query data. By default it is "ytd" (acronym for year to date), that is the period of time beginning the first day of the current year up to the current date. Accepted values are

`period` parameter | data period
:---: | :---
`"1d"` | last 1 day
`"5d"` | last 5 days
`"1mo"` | last 1 month
`"3mo"` | last 3 months
`"6mo"` | last 6 months
`"1y"` | last 1 year
`"2y"` | last 2 years
`"5y"` | last 5 years
`"10y"` | last 10 years
`"ytd"` | year to date
`"max"` | maximum available

In [None]:
aapl_history = aapl.history(period="max", interval="1wk")
aapl_history

In [None]:
ax = aapl_history["Close"].plot(figsize=(10,6))

ax.set_title("AAPL")
ax.set_ylabel("Close Price (USD)")

In [None]:
aapl_history_close_last_2y = aapl_history.loc["2018-04-08":, "Close"]
aapl_history_close_last_2y

In [None]:
ax = aapl_history_close_last_2y.plot(figsize=(10,6))

ax.set_title("AAPL")
ax.set_ylabel("Close Price (USD)")

In [None]:
ax = aapl_history_close_last_2y["2020-01-01":].plot(figsize=(10,6))

ax.set_title("AAPL")
ax.set_ylabel("Close Price (USD)")

#### 1.2.2.1. Multiple securities simultaneously: `yf.Tickers()` module <a name="multiple-securities-simultaneously:-yf.tickers()-module"></a>

In [None]:
securities = yf.Tickers('FB AMZN NFLX GOOG')
securities

In [None]:
fb   = securities.tickers.FB
amzn = securities.tickers.AMZN
nflx = securities.tickers.NFLX
goog = securities.tickers.GOOG

In [None]:
filePath = os.path.join(dataFolderPath, "FANG_Stocks_Info.json")

In [None]:
if downloadData:
    %time info_dict = {'FB': fb.info, 'AMZN': amzn.info, 'NFLX': nflx.info, 'GOOG': goog.info}
    
else:
    with open(filePath, 'r') as file:
        %time info_dict = json.load(file)

info_dict

In [None]:
goog_info = info_dict['GOOG']
goog_info

In [None]:
goog_info['longName']

In [None]:
if downloadData:
    with open(filePath, 'w') as file:
        %time json.dump(info_dict, file, indent="\t")

In [None]:
# removeFile(filePath)

In [None]:
nflx_history_last_two_years = nflx.history(start="2018-04-08")
nflx_history_last_two_years

In [None]:
ax = nflx_history_last_two_years["Close"].plot(figsize=(10,6))

ax.set_title("NFLX")
ax.set_ylabel("Close Price (USD)")

In [None]:
ax = nflx_history_last_two_years.loc["2020-01-01":, "Close"].plot(figsize=(10,6))

ax.set_title("NFLX")
ax.set_ylabel("Close Price (USD)")

### 1.2.3. mass download of market data: `yf.download()` function <a name="get-market-and-meta-data:-yf.download()-function"></a>

In [None]:
securityTickerToNameDict = {
    'AAPL':     "Apple Inc. Stock",
    'GOOG':     "Alphabet Inc. Stock",
    'FB':       "Facebook, Inc. Stock",
    'MSFT':     "Microsoft Corporation Stock",
    'INTC':     "Intel Corporation Stock",
    'AMZN':     "Amazon.com, Inc. Stock",
    'BABA':     "Alibaba Group Holding Limited Stock",
    'NFLX':     "Netflix, Inc. Stock",
    'DIS':      "The Walt Disney Company Stock",
    'GE':       "General Electric Company Stock",
    'GS':       "The Goldman Sachs Group, Inc. Stock",
    'DB':       "Deutsche Bank Aktiengesellschaft Stock",
    '^GSPC':    "S&P 500 Index",
    '^VIX':     "CBOE Volatility Index",
    'EURUSD=X': "EUR/USD Exchange Rate",
    'EURCHF=X': "EUR/CHF Exchange Rate",
    'EURGBP=X': "EUR/GBP Exchange Rate",
    'FCAU':     "Fiat Chrysler Automobiles N.V.",
    'E':        "Eni S.p.A. Stock",
    'ENIA':     "Enel Americas S.A. Stock"
}

In [None]:
tickerList = list(securityTickerToNameDict.keys())
tickerList

In [None]:
tickerListString = ' '.join(tickerList)
tickerListString

````python
yf.download(tickers, start[, end, interval])
```

where:
    
- `tickers` parameter is the list of Yahoo! Finance tickers for which we query data;
- `start` parameter, is the `"YYYY-MM-DD"` Python String representing the first date for which we query data;
- `end` parameter, optional, is the `"YYYY-MM-DD"` Python String representing the last date for which we query data. By default is the last available date, which is usually either today or the last business day;
- `interval` parameter, optional, is the Python String representing the frequency of data retrieval. By default it is `"1d"`, that is daily frequency. Accepted values are the same as in case of `.history(..., interval, ...)`.

Alternatively

````python
yf.download(tickers, period[, interval])
```

where `period` parameter, optional, the length of the most recent period for which we query data. By default it is "ytd" (acronym for year to date), that is the period of time beginning the first day of the current year up to the current date. Accepted values are the same as in case of `.history(..., period, ...)`.

In [None]:
if downloadData:
    %time dataFull = yf.download(tickers=tickerListString, start="1985-01-01")
    dataFull

In [None]:
if downloadData:
    dataFull["Adj Close"].loc["2020-04-08", "AAPL"]

In [None]:
filePath = os.path.join(dataFolderPath, "Securities_Close_Price_Dataset.csv")

In [None]:
if downloadData:
    closePrice = dataFull["Close"]
    
else:
    %time closePrice = pd.read_csv(filepath_or_buffer = filePath, index_col = 0, parse_dates = True)
    
closePrice

In [None]:
if downloadData:
    %time closePrice.to_csv(path_or_buf = filePath)

In [None]:
# removeFile(filePath)

# 2. Data Analysis <a name="data-analysis"></a>

In [None]:
closePrice.head()

In [None]:
closePrice.tail()

In [None]:
closePrice.plot(figsize=(10,20), subplots=True)

## 2.1. _Focus on:_ buy-and-hold Portfolio of Stocks <a name="focus-on:-buy-and-hold-portfolio-of-stocks"></a>

In [None]:
stockTickers = ['AAPL', 'GOOG','FB', 'MSFT', 'INTC', 'AMZN', 'BABA', 'NFLX', 'DIS', 'GE', 'GS', 'DB', 'FCAU', 'E', 'ENIA']

In [None]:
stocksClosePrice = closePrice.loc[:, stockTickers]

In [None]:
stocksClosePrice.head()

In [None]:
stocksClosePrice.tail()

In [None]:
stocksClosePrice.shape

Let $\text{shares}_{i,t} $ denote the number of security $i$ shares held at time $t$ and let $ P_{i,t}$ the corresponding price per share. The value of a portfolio of $N$ securities is then

\begin{equation}
\begin{aligned}
V^{p}_{t} = \sum^{N}_{i=1} \text{shares}_{i,t}  \times P_{i,t}
\end{aligned}
\end{equation}

For a buy-and-hold portfolio, the shares $\text{shares}_{i,t}$ at time $t$ are determined at time $t=0$ and held fixed thereafter $\text{shares}_{i,t} = \text{shares}_{i,0}$, such that the value becomes

\begin{equation}
\begin{aligned}
V^{p \text{, buy-n-hold}}_{t} = \sum^{N}_{i=1} \text{shares}_{i,0}  \times P_{i,t}
\end{aligned}
\end{equation}

Notice that, for a buy-and-hold portfolio, the dollar amount invested in each security $i$: $ \text{shares}_{i,0}  \times P_{i,t}$ is not constant over time. Even if the amount of shares bought is fixed at $\text{shares}_{i,0} $, nevertheless the dollar amount of the position in security $i$ changes in value because of the evolution of the share price $P_{i,t}$.

We consider the particular case of unitary shares $\text{shares}_{i,0}=1$ for each security. That is, we buy one share per stock on date December 31st, 1984 and keep it untill today.

In [None]:
N = stocksClosePrice.shape[1]
N

In [None]:
shares = np.ones(N, dtype="int")
shares

In [None]:
dollarInvestedPerSecurity = shares * stocksClosePrice
dollarInvestedPerSecurity.tail()

In [None]:
ptfValue = dollarInvestedPerSecurity.sum(axis=1)
ptfValue.tail()

In [None]:
ax = ptfValue.plot(figsize=(10,6))

ax.set_title("Buy-n-Hold Portfolio (1 share per stock)")
ax.set_ylabel("Value (USD)")

## 2.2. Summary Statistics <a name="summary-statistics"></a>

In [None]:
closePrice.shape

In [None]:
closePrice.info()

In [None]:
closePrice.describe().round(1)

In [None]:
closePrice.mean()

In [None]:
closePrice.std()

## 2.3. Returns <a name="returns"></a>

### 2.3.1. Simple Returns: `.pct_change()` method <a name="simple-returns:-.pct_change()-method"></a>

\begin{equation}
\begin{aligned}
R^{sim}_{t,1} =  \frac{P_t - P_{t-1}}{P_{t-1}} = \frac{P_t}{P_{t-1}} - 1
\end{aligned}
\end{equation}


In [None]:
simpleRets = closePrice.pct_change(periods=1)

In [None]:
simpleRets.tail().round(2)

In [None]:
simpleRets.plot(figsize=(10,20), subplots=True)

In [None]:
simpleRets.hist(figsize=(13,13), bins=50)

### 2.3.2. _Focus on:_ Simple Returns of Equally Weighted Portfolio <a name="focus-on:-simple-returns-of-equally-weighted-portfolio"></a>

Let's define a portfolio of $N$ securities in terms of its weights, that is the fraction of the total investment in the portfolio held in each individual investment. For security $i$ at time $t$, the weight $w_{i,t}$ is defined as 

$$
w_{i,t} = \frac{\text{Amount invested in security } i \text{ at time }t}{\text{Total value of the portfolio}} = \frac{\text{shares}_{i,t} \times P_{i,t}}{\sum^{N}_{j=1} \text{shares}_{j,t} \times P_{j,t}}
$$

notice that weights are normalized: $\sum^{N}_{i=1} w_{i,t} = 1$. 


Let's denote with $R^{i \text{, }sim}_{t,1}$ the one-period simple returns of security $i$, calculated as shown in the previous section. The one period simple returns $R^{p \text{, }sim}_{t,1}$ of a portfolio of $N$ securities can then be expressed as

$$
R^{p \text{, }sim}_{t,1} = \sum^{N}_{i=1} w_{i,t} \times R^{i \text{, }sim}_{t,1}
$$

that is, simple portfolio returns are the weighted average of the returns of the single investements in the portfolio.

In an equally-weighted portfolio, the value of each single investment in the portfolio is constant across securities. That is, 
$$
\text{shares}_{i,t} \times P_{i,t} \equiv \text{const}
$$
and thus the weights are
$$
w_{i,t} = \frac{\text{const}}{\sum^{N}_{j=1} \text{const}} = \frac{1}{N}
$$
for each security $i$ and time $t$. Since security prices change over time, a frequent rebalancing is needed in order to keep constant the dollar amount of each position. 

In [None]:
stocksSimpleRets = stocksClosePrice.pct_change(periods=1)

In [None]:
w = np.ones(N)/N
w

In [None]:
weightedSimpleRets = w * stocksSimpleRets

In [None]:
ptfSimpleRets = weightedSimpleRets.sum(axis=1)

In [None]:
ax = ptfSimpleRets.plot(figsize=(10,6))

ax.set_title("Equally Weighted Portfolio Simple Returns")
ax.set_ylabel("Simple Returns")

### 2.3.3. Log-Returns: `.shift()` method <a name="log-returns:-.shift()-method"></a>

\begin{equation}
\begin{aligned}
R^{log}_{t,1} = \log \left( \frac{P_t}{P_{t-1}} \right)
\end{aligned}
\end{equation}


In [None]:
logRets = np.log(closePrice/closePrice.shift(periods=1))

In [None]:
logRets.tail().round(2)

In [None]:
logRets.plot(figsize=(10,20), subplots=True)

In [None]:
logRets.hist(figsize=(13,13), bins=50)

## 2.4. Resampling: `.resample()` method <a name="resampling:-.resample()-method"></a>

````python
.resample(rule, label, closed).computation()
```

creates a resampled DataFrame binning the original one into row buckets according to the resampling `rule`. Row buckets are assigned to the index of the left or right edge according to the `label` parameter and buckets either include the left or right edge according to the `closed` parameter. For details see [.resample() documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html). Parameters are: 
    
- `rule` parameter is the Python String representing the target resampling conversion. Values are of the form `"NumberFrequencyString"` (e.g. `"1w"` for weekly resampling) and a non exhaustive list of accepted frequency `FrequencyString` strings is 
    
`FrequencyString` parameter | data frequency
:---: | :---
B   | business day frequency
D   | calendar day frequency
W   | weekly frequency
M   | month end frequency
BM  | business month end frequency
MS  | month start frequency
BMS | business month start frequency
Q   | quarter end frequency
BQ  | business quarter endfrequency
QS  | quarter start frequency
BQS | business quarter start frequency
A   | year end frequency
BA  | business year end frequency
AS  | year start frequency
BAS | business year start frequency
H   | hourly frequency
T   | minutely frequency
S   | secondly frequency
L   | milliseconds
U   | microseconds

- `label` parameter, either `"left"` or `"right"`, sets the bin edge label to use to label the bucket with;


- `closed` parameter, either `"left"` or `"right"`, sets which side of bin interval is closed;


- `.computation()` is a method which maps the values of the rows in each bucket to a single value. Most common methods are in the following table (see ["Computations / descriptive stats" documentation](https://pandas.pydata.org/pandas-docs/stable/reference/resampling.html#computations-descriptive-stats) for details)
    
`.computation()` method | returns
:---: | :---
`.last()`   | value of the last row in each bucket
`.first()`   | value of the first row in each bucket
`.min()`   | minimum value of the rows in each bucket
`.max()`   | maximum value of the rows in each bucket
`.sum()`   | sum of the values of the rows in each bucket
`.mean()`   | mean of the values of the rows in each bucket

### 2.4.1. Resampling Prices <a name="resampling-prices"></a>

In [None]:
closePrice.head(15)

In [None]:
weeklyClosePrice = closePrice.resample(rule='5B', label='right', closed='right').last()

In [None]:
weeklyClosePrice.head()

In [None]:
weeklyClosePrice.loc["1985-01-07", "AAPL"]

In [None]:
closePrice.loc[:"1985-01-07", "AAPL"]

In [None]:
weeklyClosePrice.loc["1985-01-14", "AAPL"]

In [None]:
closePrice.loc["1985-01-08":"1985-01-14", "AAPL"]

In [None]:
weeklyClosePrice.loc["1985-01-21", "AAPL"]

In [None]:
closePrice.loc["1985-01-15":"1985-01-21", "AAPL"]

### 2.4.2. Resampling log-Returns <a name="resampling-log-returns"></a>

log-returns are time-additive

\begin{equation}
\begin{aligned}
R^{log}_{t,1} = \log \left( \frac{P_t}{P_{t-1}} \right)
\end{aligned}
\end{equation}

\begin{equation}
\begin{aligned}
R^{log}_{t,n} = \log \left( \frac{P_t}{P_{t-n}} \right)
\end{aligned}
\end{equation}

\begin{equation}
\begin{aligned}
R^{log}_{t,n} &= \log \left( \frac{P_t}{P_{t-n}} \right) \nonumber \\
              &= \log \left( \frac{P_t}{P_{t-1}} \times \frac{P_{t-1}}{P_{t-2}} \times \frac{P_{t-2}}{P_{t-3}} \times \cdots \times \frac{P_{t-n+2}}{P_{t-n+1}} \times \frac{P_{t-n+1}}{P_{t-n}} \right) \nonumber \\
              &= \log \left( \frac{P_t}{P_{t-1}} \right) + \log \left( \frac{P_{t-1}}{P_{t-2}} \right) + \log \left( \frac{P_{t-2}}{P_{t-3}} \right) + \cdots + \log \left( \frac{P_{t-n+2}}{P_{t-n+1}} \right) + \log \left( \frac{P_{t-n+1}}{P_{t-n}} \right) \nonumber \\
              &= R^{log}_{t,1} + R^{log}_{t-1,1} + R^{log}_{t-2,1} + \cdots + R^{log}_{t-n+2,1} + R^{log}_{t-n+1,1} \nonumber \\
              &= \sum_{i=1}^{n} R^{log}_{t-i+1,1} \nonumber \\
\end{aligned}
\end{equation}

that is the n-period log-return at time t is the sum of the n most recent one-period log-returns up to time t.

In [None]:
logRets.head(15)

In [None]:
weeklyLogRets = logRets.resample(rule='5B', label='right', closed='right').sum()

In [None]:
weeklyLogRets.head()

In [None]:
weeklyLogRets.loc["1985-01-07", "AAPL"]

In [None]:
logRets.loc[:"1985-01-07", "AAPL"]

In [None]:
logRets.loc[:"1985-01-07", "AAPL"].sum()

In [None]:
weeklyLogRets.loc["1985-01-14", "AAPL"]

In [None]:
logRets.loc["1985-01-08":"1985-01-14", "AAPL"]

In [None]:
logRets.loc["1985-01-08":"1985-01-14", "AAPL"].sum()

In [None]:
weeklyLogRets.loc["1985-01-21", "AAPL"]

In [None]:
logRets.loc["1985-01-15":"1985-01-21", "AAPL"]

In [None]:
logRets.loc["1985-01-15":"1985-01-21", "AAPL"].sum()

### 2.4.3. _Focus on:_ graphical tests of S&P500 Returns Normality <a name="focus-on:-graphical-tests-of-s&p500-normality-of-returns"></a>

In [None]:
spxCloseLevel = closePrice.loc[:, '^GSPC'].dropna()

In [None]:
spxLogRets = np.log(spxCloseLevel/spxCloseLevel.shift(periods=1)) 

In [None]:
ax = spxLogRets.hist(bins=50, density=True, figsize=(10,6))

ax.set_title("S&P500 daily log-returns")
ax.set_xlabel("Returns")
ax.set_ylabel("Frequency")
ax.set_xlim(-0.2, 0.2)

In [None]:
import math
from scipy import stats

def resampleRets(dfRets, days):
    """
    Function resampleRets(dfRets, days) gets in input the 'dfRets' DataFrame
    of S&P500 log-returns and the resampling frequency String 'days'. It then returns
    the dfRets resampled, using a .sum() aggregation to compute compound returns
    over a horizon of 'days'. It drops NaNs.
    
    Parameters:
        dfRets (pd.DataFrame): log-returns,
        days (String): resampling frequency.
        
    Returns:
        resampledRets (pd.DataFrame): log-returns resampled,
        
    """    
    
    resampledRets = dfRets.resample(rule=days+'B', label='right', closed='right').sum().dropna()
    
    return resampledRets

def normalFit(dfRets):
    """
    Function normalFit(dfRets) gets in input the 'dfRets' DataFrame
    of S&P500 log-returns. It then:
        - makes a normal fit of dfRets;
        - compute the normal pdf, with fit mean and std, over a uniform grid of returns;
        - computes higher sample moments: skewness and (excess) kurtosis
    
    Parameters:
        dfRets (pd.DataFrame): log-returns,
    
    Returns:
        fitRes (Dict): normal fit results, from normalFit() function;
        
    """    
        
    # normal fit
    mu_fit, sigma_fit = stats.norm.fit(dfRets.values)
    
    # create a uniform grid of points between minimum and maximum values of dfRets
    num_points = dfRets.shape[0]
    df_unif_grid = np.linspace(dfRets.min(), dfRets.max(), num_points)

    # fit normal pdf
    pdf_fit = stats.norm.pdf(df_unif_grid, loc=mu_fit, scale=sigma_fit)
    
    # higher sample moments
    sample_skewness = dfRets.skew()
    sample_kurtosis = dfRets.kurtosis()
    
    # wrapping output in a Dict
    fitRes = {'mu_fit':            mu_fit, 
              'sigma_fit':         sigma_fit,
              'sample_skew':       sample_skewness,
              'sample_kurt':       sample_kurtosis,
              'returns_grid':      df_unif_grid,
              'pdf(returns_grid)': pdf_fit}
    
    return fitRes

def makePlots(dfRets, fitRes, days):
    """
    Function makePlots(dfRets, fitRes, days) gets in input the 'dfRets' DataFrame
    of S&P500 log-returns, the normal fit results Dict 'fitRes' and the resampling 
    frequency String 'days'. It then:
        - makes a normalized histogram of dfRets, using a number of bins = sqrt(number of data);
        - compares the histogram with the best normal fit;
        - makes a Q-Q plot of dfRets quantiles, against normal hypothesis.
    
    Parameters:
        dfRets (pd.DataFrame): S&P500 log-returns,
        fitRes (Dict): normal fit results, from normalFit() function;
        days (String): resampling frequency.
    
    Returns:
        None
        
    """    
    
    fig, axs = plt.subplots(figsize=(20,6), nrows=1, ncols=2)
    
    # Histogram
    bin_num = math.ceil(math.sqrt(dfRets.shape[0]))
    axs[0].hist(x=dfRets.values, bins=bin_num, density=True, histtype='bar', ec='black',
                label="Empirical (skew={:.2f}, excess kurt={:.2f})".format(fitRes['sample_skew'], fitRes['sample_kurt']))
    
    axs[0].plot(fitRes['returns_grid'], fitRes['pdf(returns_grid)'], '--', lw=2, 
                label="Normal fit $N(z;\mu={:.2f}, \sigma={:.2f})$ pdf".format(fitRes['mu_fit'], fitRes['sigma_fit']))
    
    axs[0].set_title("Histogram of S&P500 log-returns ({} days resample)".format(days), fontsize=20)
    axs[0].set_xlabel("Returns", fontsize=20)
    axs[0].set_ylabel("Frequency", fontsize=20)
    axs[0].set_xlim(-0.2, 0.4)
    axs[0].set_ylim(0.0, 26)
    axs[0].legend(loc='upper right', ncol=1, fontsize=12)
    
    
    # Q-Q plot
    stats.probplot(x=dfRets.values, dist='norm', plot=axs[1])
    axs[1].set_title("Q-Q plot of S&P500 log-returns ({} days resample) against Normal distribution".format(days))
    axs[1].set_xlabel("Theoretical Quantiles", fontsize=20)
    axs[1].set_ylabel("Sample Quantiles", fontsize=20)
    axs[1].set_xlim(-4, 4)
    axs[1].set_ylim(-0.4, 0.4)
    
    fig.tight_layout()
    plt.show()

def main(dfRets, resamplingFreqList):
    """
    Function main(dfRets, resamplingFreqList), takes in input S&P500 data 'dfRets', takes in input a list of resampling 
    frequencies, 'resamplingFreqList'. It then:
        - computes S&P500 log-returns, using getLogRets() function; 
        - resamples S&P500 log-returns, using resampleRets() function;
        - makes a normal fit of S&P500 log-returns, using normalFit() function;
        - makes two plots, a histogram and a Q-Q plot, using makePlots() function;
    
    Parameters:
        dfRets (pd.DataFrame):            S&P500 log-returns,
        resamplingFreqList (List of int): resampling frequency list.
    
    Returns:
        None
        
    """    
    
    # loop over desired resampling frequencies
    for d in resamplingFreqList:
        
        # resampling
        resampledDfRets = resampleRets(dfRets, str(d))
        
        # normal fit
        fitResults = normalFit(resampledDfRets)
        
        # make histogram and Q-Q plot
        makePlots(resampledDfRets, fitResults, d)

In [None]:
main(spxLogRets, [5, 21, 63, 126, 252])

## 2.5. Rolling Statistics: `.rolling()` method <a name="rolling-statistics:-.rolling()-method"></a>

In [None]:
disney = closePrice.loc["2015-01-01":, "DIS"].dropna()

In [None]:
disney.head()

In [None]:
ax = disney.plot(figsize=(10,6))

ax.set_title("The Walt Disney Company")
ax.set_ylabel("Close Price (USD)")

In [None]:
sma = disney.rolling(window=42).mean()

In [None]:
lma = disney.rolling(window=252).mean()

In [None]:
disneyRollingIndicators = pd.DataFrame(data={"Price": disney, 
                                             "SMA (42d)": sma, 
                                             "LMA (252d)": lma}, 
                                       index=disney.index)

In [None]:
disneyRollingIndicators.tail()

In [None]:
ax = disneyRollingIndicators.plot(figsize=(10,6))

ax.set_title("The Walt Disney Company")
ax.set_ylabel("Close Price (USD)")

### 2.5.1. Rolling Correlation Matrix <a name="rolling-correlation-matrix"></a>

In [None]:
FANGStocks = logRets.loc["2015-01-01":, ['FB', 'AMZN', 'NFLX', 'GOOG']].dropna()

In [None]:
FANGStocks.head()

In [None]:
ax = FANGStocks.plot(figsize=(10,6))

ax.set_title("FANG Stocks")
ax.set_ylabel("Log-Returns")

In [None]:
FANGCorr = FANGStocks.rolling(window=252).corr()

In [None]:
FANGCorr.tail(10)

In [None]:
FANGCorr.index

In [None]:
FANGCorr.loc[("2020-04-09", "FB"), "AMZN"]

In [None]:
FANGCorr.loc[(slice(None), "FB"), "AMZN"]

In [None]:
FANGCorr.xs("2020-04-09")

In [None]:
import seaborn as sns

ax = sns.heatmap(FANGCorr.xs("2020-04-09"), annot=True, cmap="Blues")

bottom, top = ax.get_ylim()
ax.set_ylim(bottom + 0.5, top - 0.5)

## 2.6. _Focus on:_ S&P500 - VIX inverse relationship <a name="focus-on:-s&p500-vix-inverse-relationship"></a>

In [None]:
spx_vix = closePrice.loc[:, ['^GSPC', '^VIX']].dropna()

In [None]:
spx_vix = spx_vix.rename(columns = {'^GSPC': 'SPX', '^VIX': 'VIX'})

In [None]:
spx_vix.head()

In [None]:
spx_vix.tail()

In [None]:
ax = spx_vix.plot(secondary_y="VIX", figsize=(10,6))

ax.set_ylabel("SPX Level")
ax.right_ax.set_ylabel("VIX Level")

### 2.6.1. Regression Analysis: $VIX = \alpha + \beta SPX$ <a name="regression-analysis"></a>

In [None]:
spx_vixLogRets = np.log(spx_vix/spx_vix.shift(periods=1)).dropna()

In [None]:
spx_vixLogRets.head()

In [None]:
spx_vixLogRets.tail()

In [None]:
beta, alpha = np.polyfit(spx_vixLogRets['SPX'], spx_vixLogRets['VIX'], deg=1)

In [None]:
beta, alpha

In [None]:
ax = spx_vixLogRets.plot(kind='scatter', x='SPX', y='VIX', figsize=(10,6))

ax.plot(spx_vixLogRets['SPX'], alpha + beta*spx_vixLogRets['SPX'], c='r', lw=2, 
        label="OLS reg: VIX = {:0f} {:0f} * SPX".format(alpha, beta))

ax.set_xlabel("SPX")
ax.set_ylabel("VIX")
ax.legend()
ax.set_title("Scatter plot of SPX and VIX Log-Returns")

### 2.6.2. Correlation Analysis <a name="correlation-analysis"></a>

In [None]:
corrSpxVix = spx_vixLogRets.corr()

In [None]:
rollCorrSpxVix = spx_vixLogRets['SPX'].rolling(window=252).corr(spx_vixLogRets['VIX'])

ax = rollCorrSpxVix.plot(figsize=(10,6), label='Rolling Correlation (252d window)')
ax.axhline(corrSpxVix.loc['SPX','VIX'], c='r', lw=2, label='Sample Correlation')

ax.set_ylabel("Correlation")
ax.legend()
ax.set_title("Correlation between SPX and VIX Log-Returns")