# HW #3 Financial Ratio Quantile Strategies
[FINM 33150] Regression Analysis and Quantitative Trading Strategies\
Winter 2022 | Professor Brian Boonstra

_**Due:** Thursday, February 3rd, at 11:00pm\
**Name:** Ashley Tsoi (atsoi, Student ID: 12286230)_

### 1. Fetch and clean data

#### 1-1. Import packages

In [51]:
import os
import functools
import warnings
warnings.simplefilter(action="ignore", category=SettingWithCopyWarning)
import quandl
import json
import pandas as pd
pd.set_option("display.precision", 4)
pd.set_option('display.float_format', lambda x: '%.4f' % x)
from pandas.core.common import SettingWithCopyWarning
import math
import numpy as np
import datetime as dt

# let plot display in the notebook instead of in a different window
%matplotlib inline 
from matplotlib import pyplot as plt
plt.rcParams['figure.figsize'] = [21, 8]

#### 1-2. Define the functions to fetch data from Quandl

**1-2-1. Get my personal keys** from ../data/APIs.json

In [2]:
f = open('../data/APIs.json')
APIs = json.load(f)
f.close()

**1-2-2. Define date-format helper function**

In [3]:
def assertCorrectDateFormat(date_text):
    try:
        dt.datetime.strptime(date_text, '%Y-%m-%d')
    except ValueError:
        raise ValueError("Incorrect date format, should be YYYY-MM-DD")

**1-2-3. Define function** to retrieve raw data from Quandl

**Documentation:**
```
Zacks Fundamentals Collection B (ZFB)
https://data.nasdaq.com/databases/ZFB/documentation
https://data.nasdaq.com/databases/ZFB/usage/quickstart/python
```

In [4]:
# Define function that retrieves ZFB data from Quandl
@functools.lru_cache(maxsize=16) # Cache the function output
def getQuandlZFBData(from_table,secs,start_date,end_date,columns):
    # Get data fom Quandl using quandl.get_table
    # NOTE: missing data for the inputted date will NOT return a row.

    # INPUT         | DATA TYPE                 | DESCRIPTION
    # from_table    | string                    | FC, FR, MT, MKTV, SHRS, or HDM
    # secs          | string / tuple of string  | security ticker(s)
    # start_date    | string (YYYY-MM-DD)       | start date of data
    # end_date      | string (YYYY-MM-DD)       | end date of data (same as or after start_date)
    # columns       | string / tuple of string  | names of the columns to return
    
    if secs=='all': secs = list(pd.read_csv('../data/zacks-tickers.csv').ticker.unique()) # import all tickers from zacks-tickers

    if type(secs)==str: seclen = 1
    else: seclen=len(secs)
    print("Quandl | START | Retriving Quandl data for {:d} securities from the ZACKS/{} table: \n".format(seclen,from_table))
    
    # Retrieve data using quandl.get_table
    quandl.ApiConfig.api_key = APIs['Quandl']

    if from_table in ['FC','FR','MKTV','SHRS','HDM']:
        data = quandl.get_table('ZACKS/'+from_table,
                                ticker = secs, 
                                per_end_date = {'gte':start_date, 'lte':end_date},
                                qopts = {'columns':list(columns)},
                                paginate = True)
        
        data['per_end_date'] = pd.to_datetime(data['per_end_date'])
        if 'filing_date' in data.columns:
            data['filing_date'] = pd.to_datetime(data['filing_date'])

    elif from_table == 'MT':
        data = quandl.get_table('ZACKS/MT',
                                ticker = secs, 
                                qopts = {'columns':list(columns)},
                                paginate = True)

    else:
        print("from_table is limited to FC, FR, MT, MKTV, SHRS and HDM")
        
    print("Quandl | DONE  | Returning {:d} rows of data from the ZACKS/{} table.\n".format(len(data),from_table))

    return data


@functools.lru_cache(maxsize=16) # Cache the function output
def _getZFBData(secs,start_date,end_date):
    # Merged Zacks data in five tables: FC, FR, MT, MKTV, and SHRS
    # NOTE: missing data for the inputted date will NOT return a row.

    # INPUT         | DATA TYPE                 | DESCRIPTION
    # secs          | string / tuple of string  | security ticker(s)
    # start_date    | string (YYYY-MM-DD)       | start date of data
    # end_date      | string (YYYY-MM-DD)       | end date of data (same as or after start_date)
    
    # Retrieve data using quandl.get_table
    fc = getQuandlZFBData('FC',secs,start_date,end_date,('ticker','exchange','per_end_date','per_type','zacks_sector_code','basic_net_eps','diluted_net_eps','tot_lterm_debt','net_lterm_debt','filing_date'))
    fr = getQuandlZFBData('FR',secs,start_date,end_date,('ticker','exchange','per_end_date','per_type','ret_invst','tot_debt_tot_equity'))
    mt = getQuandlZFBData('MT',secs,start_date,end_date,('ticker','ticker_type','asset_type'))
    mktv = getQuandlZFBData('MKTV',secs,start_date,end_date,('ticker','per_end_date','per_type','mkt_val'))
    shrs = getQuandlZFBData('SHRS',secs,start_date,end_date,('ticker','per_end_date','per_type','shares_out','avg_d_shares'))

    # Merge the tables
    print("MERGE  | START | \n")

    zacks_1 = fc.merge(fr, how='outer', on=['ticker','exchange','per_end_date','per_type'])
    zacks_2 = mktv.merge(shrs, how='outer', on=['ticker','per_end_date','per_type'])
    zacks_3 = zacks_1.merge(zacks_2, how='outer', on=['ticker','per_end_date','per_type'])
    zacks = zacks_3.merge(mt, how='outer', on='ticker')

    print("MERGE  | DONE  | Returning {:d} rows of ZACKS data.\n".format(len(zacks)))
    
    return zacks


**Documentation**
```
End of Day US Stock Prices (EOD)
https://data.nasdaq.com/databases/EOD/documentation
https://data.nasdaq.com/databases/EOD/usage/quickstart/python
```

In [24]:
# Define function that retrieves EOD data from Quandl
@functools.lru_cache(maxsize=16) # Cache the function output
def getQuandlEODData(sec,start_date,end_date,columns):
    # Get one security (sec)'s data fom Quandl using quandl.get_table
    # NOTE: missing data for the inputted date will NOT return a row.

    # INPUT         | DATA TYPE                 | DESCRIPTION
    # sec           | string / list of string   | security ticker
    # start_date    | string (YYYY-MM-DD)       | start date of data
    # end_date      | string (YYYY-MM-DD)       | end date of data (same as or after start_date)
    # columns       | string / list of string   | columns to return
    
    print("Quandl | START | Retriving Quandl data for security: \n",sec)
    
    # Retrieve data using quandl.get_table
    quandl.ApiConfig.api_key = APIs['Quandl']
    data = quandl.get_table('QUOTEMEDIA/PRICES',
                            ticker = sec, 
                            date = {'gte':start_date, 'lte':end_date},
                            qopts = {'columns':list(columns)}
                            )

    print("Quandl | DONE  | Returning {:d} dates of data for {}.\n".format(len(data),sec))
    return data



def getAdjClose(secs,start_date,end_date):

    if type(secs)==str: secs = (secs,)

    data = []
    for sec in secs:
        file_name = "../data_large/"+sec
        if not os.path.isfile(file_name):
            # download as CSV in local directory
            getQuandlEODData(secs,start_date,end_date,('ticker','date','adj_close')).set_index(['ticker','date']).to_csv(file_name)
        
        data.append(pd.read_csv(file_name))
    
    return pd.concat(data)



**1-2-4. Define function** to filter / clean raw data

**Requirements:**
```
- US Equities
- not in the automotive, financial or insurance sector over the entire period
- end-of-day adjusted closing prices are available over the entire period
- debt/market cap ratio is greater than 0.1
- has feasible calculation of the ratios over the entire period: 
  - debt to market cap, 
  - return on investment, and 
  - price to earnings. 
  Including for at least one PER END DATE no more than one year old. Debt ratio of zero is OK.
```

In [56]:
def getCleanZFBData(secs,start_date,end_date):

    # === GET RAW DATA ============================================
    raw_zacks = _getZFBData(secs,start_date,end_date)
    
    # === FILTER ==================================================
    # US Equities only
    zacks = raw_zacks[raw_zacks['exchange'].isin(('NYSE','NASDAQ'))]  # select US stock exchanges
    zacks = zacks[zacks['ticker_type']=='S']                          # S = Securities
    zacks = zacks[zacks['asset_type']=='COM'][zacks.columns]          # COM = Common stocks
    zacks.drop(['exchange','ticker_type','asset_type'], axis=1, inplace=True) # drop these columns as they are no longer needed

    # remove tickers without filing dates (tickers without filing dates are impossible to join on)
    filingDate_filter = zacks[pd.isnull(zacks['filing_date'])]['ticker'].unique()
    zacks = zacks[(~zacks['ticker'].isin(filingDate_filter))]
    
    # not in the automotive, financial or insurance sector for any date (since there might be sector changes)
    sector_filter = zacks[zacks['zacks_sector_code'].isin((5,13))]['ticker'].unique() # 5 = finance (includes insurance), 13 = Autumotive
    zacks = zacks[(~zacks['ticker'].isin(sector_filter))]
    zacks.drop(['zacks_sector_code'], axis=1, inplace=True) # drop these columns as they are no longer needed
    
    # debt-to-market-cap ratio greater than 0.1 AND not null (filter all since we will have enough tickers)
    badDebtToMC_filter = zacks[(zacks['tot_debt_tot_equity']<=0.1) | (pd.isnull(zacks['tot_debt_tot_equity']))]['ticker'].unique()
    zacks = zacks[(~zacks['ticker'].isin(badDebtToMC_filter))]

    # other ratios are not null
    # nullRatio_filter = list(zacks[pd.isnull(zacks['mkt_val'])]['ticker'].unique())
    nullRatio_filter = list(zacks[pd.isnull(zacks['ret_invst'])]['ticker'].unique())
    nullRatio_filter += list(zacks[(pd.isnull(zacks['basic_net_eps']) & pd.isnull(zacks['diluted_net_eps']))]['ticker'].unique())
    nullRatio_filter += list(zacks[(pd.isnull(zacks['tot_lterm_debt'])) & (pd.isnull(zacks['net_lterm_debt']))]['ticker'].unique())
    zacks = zacks[(~zacks['ticker'].isin(set(nullRatio_filter)))]

    zacks['basic_net_eps'].clip(lower=0.001, inplace=True)     # make all negative eps 0.001
    zacks['diluted_net_eps'].clip(lower=0.001, inplace=True)   # make all negative eps 0.001
    
    # end-of-day adjusted closing prices are available
    noEOD_filter = []
    tickers = zacks['ticker'].unique()
    for sec in tickers:
        if len(getAdjClose(sec,start_date,end_date)) < 1910: # 1910 = number of trading days in the period 2013-07-01 -- 2021-01-31
            noEOD_filter.append(sec)
            os.remove('../data_large/'+sec)
    zacks = zacks[(~zacks['ticker'].isin(noEOD_filter))]

    # If have both quarterly & annual data for the same ticker & date, use quarterly
    

    # === FORWARD FILL ============================================
    # zacks.ffill() 
    
    

    print(f'remaining number of tickers: {len(zacks.ticker.unique())}')

    return zacks

In [57]:
data = getCleanZFBData('all','2013-07-01','2021-01-31')

Quandl | START | Retriving Quandl data for security: 
 ('AR',)
Quandl | DONE  | Returning 1839 dates of data for ('AR',).

Quandl | START | Retriving Quandl data for security: 
 ('BF.A',)
Quandl | DONE  | Returning 0 dates of data for ('BF.A',).

Quandl | START | Retriving Quandl data for security: 
 ('BF.B',)
Quandl | DONE  | Returning 0 dates of data for ('BF.B',).

Quandl | START | Retriving Quandl data for security: 
 ('COMM',)
Quandl | DONE  | Returning 1828 dates of data for ('COMM',).

Quandl | START | Retriving Quandl data for security: 
 ('CRD.A',)
Quandl | DONE  | Returning 0 dates of data for ('CRD.A',).

Quandl | START | Retriving Quandl data for security: 
 ('CRD.B',)
Quandl | DONE  | Returning 0 dates of data for ('CRD.B',).

Quandl | START | Retriving Quandl data for security: 
 ('DD',)
Quandl | DONE  | Returning 858 dates of data for ('DD',).

Quandl | START | Retriving Quandl data for security: 
 ('DGI',)
Quandl | DONE  | Returning 1075 dates of data for ('DGI',).

Qua

In [85]:
# data[pd.isnull(data['net_lterm_debt'])]['ticker'].unique()
a = data[(data['ticker'] == "AAPL") | (data['ticker'] == "AAP")].set_index(['ticker','per_end_date'])

In [88]:
a.loc['AAPL'].loc['20']

Unnamed: 0_level_0,per_type,basic_net_eps,diluted_net_eps,tot_lterm_debt,net_lterm_debt,filing_date,ret_invst,tot_debt_tot_equity,mkt_val,shares_out,avg_d_shares
per_end_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2013-09-30,A,1.4296,1.4196,16960.0,16896.0,2013-10-30,26.3592,0.1373,,,
2014-09-30,A,1.6225,1.6125,28987.0,11960.0,2014-10-27,28.1142,0.3164,,,
2015-09-30,A,2.32,2.305,53329.0,27114.0,2015-10-28,30.9201,0.539,,,
2016-09-30,A,2.0875,2.0775,75427.0,22454.0,2016-10-26,22.4312,0.6786,,,
2017-09-30,A,2.3175,2.3025,97207.0,25162.0,2017-11-03,20.9082,0.863,,,
2018-09-30,A,3.0025,2.9775,93735.0,469.0,2018-11-05,29.6348,1.0685,,,
2019-09-30,A,2.9925,2.9725,91807.0,-1842.0,2019-10-31,30.3113,1.194,,,
2020-09-30,A,3.31,3.28,98667.0,3462.0,2020-10-30,35.0054,1.7208,,,
2013-09-30,Q,0.2968,0.295,16960.0,16896.0,2013-10-30,5.3463,0.1373,433126.0,25437.92,25455.67
2013-12-31,Q,0.5211,0.5179,16961.0,,2014-01-28,8.914,0.1308,500739.22,24991.44,25240.66


In [90]:
filtered = data.set_index(['ticker','per_end_date']).groupby('per_type').transform(lambda v: v.ffill())
# a.groupby('company')['value'].transform(lambda v: v.ffill())

In [91]:
filtered.loc['AAPL']

Unnamed: 0_level_0,basic_net_eps,diluted_net_eps,tot_lterm_debt,net_lterm_debt,filing_date,ret_invst,tot_debt_tot_equity,mkt_val,shares_out,avg_d_shares
per_end_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2013-09-30,1.4296,1.4196,16960.0,16896.0,2013-10-30,26.3592,0.1373,,,
2014-09-30,1.6225,1.6125,28987.0,11960.0,2014-10-27,28.1142,0.3164,,,
2015-09-30,2.32,2.305,53329.0,27114.0,2015-10-28,30.9201,0.539,,,
2016-09-30,2.0875,2.0775,75427.0,22454.0,2016-10-26,22.4312,0.6786,,,
2017-09-30,2.3175,2.3025,97207.0,25162.0,2017-11-03,20.9082,0.863,,,
2018-09-30,3.0025,2.9775,93735.0,469.0,2018-11-05,29.6348,1.0685,,,
2019-09-30,2.9925,2.9725,91807.0,-1842.0,2019-10-31,30.3113,1.194,,,
2020-09-30,3.31,3.28,98667.0,3462.0,2020-10-30,35.0054,1.7208,,,
2013-09-30,0.2968,0.295,16960.0,16896.0,2013-10-30,5.3463,0.1373,433126.0,25437.92,25455.67
2013-12-31,0.5211,0.5179,16961.0,16896.0,2014-01-28,8.914,0.1308,500739.22,24991.44,25240.66


#### 1-3. Fetch cleaned data using the functions above

**Dates:**
```
January 1, 2014 - January 31, 2021*
```
**Note: fetch data from July 1, 2013 to get all data reported by January 1, 2014*

**1-3-1. Fetch data** 

In [None]:
# zacks_fc = getQuandlZFBData('FC','all','2013-07-01','2021-01-31',('ticker','exchange','per_end_date','per_type','zacks_sector_code','basic_net_eps','diluted_net_eps','tot_lterm_debt','net_lterm_debt','filing_date'))
# zacks_fr = getQuandlZFBData('FR','all','2013-07-01','2021-01-31',('ticker','exchange','per_end_date','per_type','ret_invst','tot_debt_tot_equity'))
# zacks_mt = getQuandlZFBData('MT','all','2013-07-01','2021-01-31',('ticker','ticker_type','asset_type'))
# zacks_mktv = getQuandlZFBData('MKTV','all','2013-07-01','2021-01-31',('ticker','per_end_date','per_type','mkt_val'))
# zacks_shrs = getQuandlZFBData('SHRS','all','2013-07-01','2021-01-31',('ticker','per_end_date','per_type','shares_out','avg_d_shares'))

In [11]:
l = [1,0,-1,-2]
[x if x>=0 else 0 for x in l ]

[1, 0, 0, 0]