# HW #1 Futures Spreads Dynamiccs
[FINM 33150] Regression Analysis and Quantitative Trading Strategies\
Winter 2022 | Professor Brian Boonstra

_**Due:** Sunday, January 16th, at 11:00pm\
**Name:** Ashley Tsoi (atsoi, Student ID: 12286230)_

### 1. Fetch and clean data

#### 1-1. Install these packages for the first time

In [1]:
# !pip install quandl
# !pip install plotnine

#### 1-2. Import packages

In [2]:
import quandl
import json
import pandas as pd
import numpy as np
import datetime as dt
import functools

# let plot display in the notebook instead of in a different window
%matplotlib inline 
from matplotlib import pyplot as plt
plt.xkcd()
import seaborn as sns
import plotnine as p9

#### 1-3. Define the functions to fetch data from Quandl

**1-3-1. Get my personal keys** from ../data/APIs.json

In [3]:
f = open('../data/APIs.json')
APIs = json.load(f)
f.close()

**1-3-2. Define helper function** to validate date format

In [4]:
def assertCorrectDateFormat(date_text):
    try:
        dt.datetime.strptime(date_text, '%Y-%m-%d')
    except ValueError:
        raise ValueError("Incorrect date format, should be YYYY-MM-DD")

**1-3-3. Define function** to retrieve raw data from Quandl

**Documentation:**
```
https://data.nasdaq.com/data/OWF-optionworks-futures-options/documentation
```

In [5]:
# Define function that retrieves data from Quandl
def getQuandlData(secs,start_date,end_date,column_index=False):
    # Get data fom Quandl using quandl.get
    # NOTE: missing data for the inputted date will return NaN rows.

    # INPUT         | DATA TYPE                 | DESCRIPTION
    # secs          | string / tuple of string  | security name(s)
    # start_date    | string (YYYY-MM-DD)       | start date of data
    # end_date      | string (YYYY-MM-DD)       | end date of data (same as or after start_date)
    # column_index  | False / string (int)      | index of a single column; False means select all columns

    # Input validation
    assertCorrectDateFormat(start_date)
    assertCorrectDateFormat(end_date)
    assert end_date >= start_date
    
    if type(secs) is tuple: secs=list(secs)
    # Retrieve data using quandl.get
    if column_index: # if only select one column
        data = quandl.get(secs, returns="pandas",
                          api_key=APIs['Quandl'],
                          column_index=column_index,
                          start_date=start_date, end_date=end_date)
    else:
        data = quandl.get(secs, returns="pandas",
                          api_key=APIs['Quandl'],
                          start_date=start_date, end_date=end_date)

    return data

**1-3-4. Define functions** to get cleaned, second month quarterly futures prices

1. Select the rows needed (i.e. lowest "DtT" that is greater than 30)
2. Select the columns needed (i.e. column names that end with " - Future")
3. Rename the column to the security name
4. Forward-fill `NaN` values

In [6]:
@functools.lru_cache(maxsize=16) # Cache the function output
def _getOneSecondMonthQFuturesPrice(security,start_date,end_date):
    # For 1 future, return its second month quarterly prices from Quandl (Pandas Series)
    # Rows: security with minimum DtT (Date to Termination) that is greater than 30
    
    quarterly_codes = "HMUZ" # H: March, M: June, U: September, Z: December
    year_range = [str(y) for y in range(int(start_date[:4]), int(end_date[:4])+1)] # get all years between start_date and end_date
    
    # construct all security codes for the quarter / year combination
    quarterly_secs = []
    for qc in quarterly_codes:
        for year in year_range:
            quarterly_secs.append("OWF/" + security + "_" +qc + year + "_IVM")
    
    print('Retriving Quandl data for securities: \n',quarterly_secs)
    
    # for all security codes, find their data
    data = []
    for s in quarterly_secs:
        prices = getQuandlData(s,start_date,end_date,'1')
        DtT = getQuandlData(s,start_date,end_date,'16') # need "DtT" column to do row selection

        secData = pd.concat([prices,DtT], axis=1)
        secData = secData.loc[secData["DtT"] > 30] # select the rows where DtT is greater than 30
        secData.reset_index(level=0, inplace=True) # make "Date" index into a column
        data.append(secData)
    
    data = pd.concat(data).reset_index(drop=True)

    # select 1 row per date with min DtT 
    data = data.loc[data.groupby("Date").DtT.idxmin()]

    # make "Date" the index again and rename column to the security name (as inputted)
    data.set_index('Date', inplace=True)
    data.rename(columns={'Future': security}, inplace=True)
    data.fillna(method='ffill', inplace=True) # forward-fill NaN data

    return data[security]


**1-3-5. Define functions** to make spreads from a pair of securities

In [7]:
def getSpread(sec1,sec2,start_date,end_date,multiplier=1):
    # Return spreads: sec2*multiplier - sec1
    sec1Prices = _getOneSecondMonthQFuturesPrice(sec1,start_date,end_date)
    sec2Prices = _getOneSecondMonthQFuturesPrice(sec2,start_date,end_date)

    return pd.DataFrame((sec2Prices*multiplier)-sec1Prices).rename(columns={0:sec1+' | '+sec2})

#### 1-4. Fetch Cleaned Spread Data Using Functions Above

Last 2 unique digits of my student ID: 3, 0

**Securities:**
```
3: ICE_T_T ICE_G_G 0.1342281879194631 (1/7.45)
0: ICE_B_B ICE_G_G 0.1342281879194631 (1/7.45)
```

**Securities Description:**
```
ICE_T_T - WTI Crude Oil
ICE_G_G - 
ICE_B_B - Brent Crude Oil
```

**Dates:**
```
December 3, 2019 - August 31, 2021
```

In [8]:
spread_3 = getSpread('ICE_T_T','ICE_G_G','2019-12-03','2021-08-31',multiplier=1/7.45)
spread_0 = getSpread('ICE_B_B','ICE_G_G','2019-12-03','2021-08-31',multiplier=1/7.45)

Retriving Quandl data for securities: 
 ['OWF/ICE_T_T_H2019_IVM', 'OWF/ICE_T_T_H2020_IVM', 'OWF/ICE_T_T_H2021_IVM', 'OWF/ICE_T_T_M2019_IVM', 'OWF/ICE_T_T_M2020_IVM', 'OWF/ICE_T_T_M2021_IVM', 'OWF/ICE_T_T_U2019_IVM', 'OWF/ICE_T_T_U2020_IVM', 'OWF/ICE_T_T_U2021_IVM', 'OWF/ICE_T_T_Z2019_IVM', 'OWF/ICE_T_T_Z2020_IVM', 'OWF/ICE_T_T_Z2021_IVM']
Retriving Quandl data for securities: 
 ['OWF/ICE_G_G_H2019_IVM', 'OWF/ICE_G_G_H2020_IVM', 'OWF/ICE_G_G_H2021_IVM', 'OWF/ICE_G_G_M2019_IVM', 'OWF/ICE_G_G_M2020_IVM', 'OWF/ICE_G_G_M2021_IVM', 'OWF/ICE_G_G_U2019_IVM', 'OWF/ICE_G_G_U2020_IVM', 'OWF/ICE_G_G_U2021_IVM', 'OWF/ICE_G_G_Z2019_IVM', 'OWF/ICE_G_G_Z2020_IVM', 'OWF/ICE_G_G_Z2021_IVM']
Retriving Quandl data for securities: 
 ['OWF/ICE_B_B_H2019_IVM', 'OWF/ICE_B_B_H2020_IVM', 'OWF/ICE_B_B_H2021_IVM', 'OWF/ICE_B_B_M2019_IVM', 'OWF/ICE_B_B_M2020_IVM', 'OWF/ICE_B_B_M2021_IVM', 'OWF/ICE_B_B_U2019_IVM', 'OWF/ICE_B_B_U2020_IVM', 'OWF/ICE_B_B_U2021_IVM', 'OWF/ICE_B_B_Z2019_IVM', 'OWF/ICE_B_B_Z2020_IVM', 'O

#### 1-5. Clean data

**1-5-1. Clean raw data**
1. Select the rows needed (i.e. lowest "DtT" that is greater than 30)
2. Select the columns needed (i.e. column names that end with " - Future")
3. Apply the (1/7.45) multiplier to `ICE_G_G`
4. Forward-fill `NaN` values

**1-5-2. Make pairs and calculate spreads**

In [None]:
# Pair 3:
pair3 = 
raw_data.head()

# Pair 0:

### 2. Analysis
#### 2-1. Characterize the relative dynamics of $s^{(i)}_t$ in reasonable ways, using charts and statistics.

#### 2-2. Examine more quantiles than just the median. Look at tails. 
Do the spreads correlate? How about their difference $(d)$ values? Do spreads exhibit patterns over time?