# Metrics Research

## Hypothesis Testing

The general framework of hypothesis testing as applied to backtesting follows these steps:

1. Based on a backtest on some finite sample of data, we compute a certain statistical measure called the **test statistic**.


2. We suppose that t-statistic based on an infinite data set is actually zero.This supposition is called the null hypothesis.


3. We suppose that the probability distribution of daily returns is known. This probability distribution has a zero mean, based on the null hypothesis. We describe later how we determine this probability distribution.


4. Based on this null hypothesis probability distribution, we compute the probability p that the t-statistic will be at least as extreme (allowing for the possibility of a negative test statistic). This probability p is called the **p-value**, and if it is very small (let’s say smaller than 0.01), that means we can “reject the null hypothesis,” and conclude that the backtested average daily return is statistically significant.

## Determining the Probability Distribution Under Null Hypothesis

### Gaussian Distribution

Suppose that the daily returns follow a Gaussian distribution, with a mean of zero and a standard deviation given by the sample standard deviation of the t-statistic. 


If we do this, it is clear that if the backtest has a high Sharpe ratio, it would be very easy for us to reject the null hypothesis. This is because the standard test statistic for a Gaussian distribution is none other than the average divided by the standard deviation and multiplied by the square root of the number of data points 

(basically fancy way of saying the sharpe ratio lmao)

| p-value         | 0.1   | 0.05  | 0.01  | 0.001 |
|-----------------|-------|-------|-------|-------|
| Critical Values | 1.282 | 1.645 | 2.326 | 3.091 |

### Monte Carlo Simulation

Another method is to use Monte Carlo methods to generate simulated historical price data and feed these simulated data into our strategy to determine the empirical probability distribution of profits.


Our belief is that the profitability of the trading strategy captured some subtle patterns or correlations of the price series, and not just because of the first few moments of the price distributions.


So if we generate many simulated price series with the same first moments and the same length as the actual price data, and run the trading strategy over all these simulated price series, we can find out in what fraction p of these price series are the average returns greater than or equal to the backtest return.

### Simulated Trades Method

In this method, instead of generating simulated price data, we generate sets of simulated trades, with the constraint that the number of long and short entry trades is the same as in the backtest, and with the same average holding period for the trades. 

These trades are distributed randomly over the actual historical price series. We then measure what fraction of such sets of trades has average return greater than or equal to the backtest average return.

## Options for T-Statistic

- Returns
    - Cumulative Returns
    - Average Daily Returns
    
    
- Drawdown
    - Maximum Drawdown Length
    - Number of Drawdowns > 1 Week


- Losses
    - Number of 15% Losses
    - Biggest Loss
    
**OR Some Combination of the Above.**

# Hypothesis Tests

In [41]:
import numpy as np
import pandas as pd
import scipy.stats as st
import math
from tqdm import tqdm

## Gaussian

### Average Daily Returns

In [56]:
daily_returns = pd.read_csv('../data/returnss2.csv')

In [39]:
ret = daily_returns['0'].to_list()

In [40]:
# we multiply by the sqrt(len(ret)) to "annualize" the sharpe ratio based on big brain maths
sharpe = np.sqrt(len(ret)) * np.nanmean(ret) / np.nanstd(ret)
print("Gaussian Test statistic = %f" % sharpe)

Gaussian Test statistic = 4.707045


| p-value         | 0.1   | 0.05  | 0.01  | 0.001 |
|-----------------|-------|-------|-------|-------|
| Critical Values | 1.282 | 1.645 | 2.326 | 3.091 |

In [43]:
daily_returns1 = pd.read_csv('../data/fet-celr-500.csv')

In [44]:
daily_returns1

Unnamed: 0,lookback,thres,sell_thres,cusum,returns,drawdowns
0,500,0.5,-0.25,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[-0.03508, -0.02851, 0.01315, 0.00257, 0.00144...",[42074700]
1,500,0.5,0.00,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[-0.01806, -0.00843, 0.01069, 0.00495, -0.0061...",[42072360]
2,500,0.5,0.25,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[-0.01785, 0.01069, 0.00495, -0.00618, 0.00675...","[41984460, 76620, 1020]"
3,500,0.5,0.50,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[-0.02108, 0.02401, -0.0058, 0.00508, 0.00149,...","[41977980, 41280, 8820, 4080, 1980]"
4,500,0.5,0.75,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[-0.01678, 0.00274, -0.0091, 0.01447, 0.0061, ...","[40695720, 654600, 319920, 59400, 41340, 32580..."
...,...,...,...,...,...,...
129,500,4.0,1.25,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0.0288, -0.00431, 0.06933, 0.01905, 0.01192, ...","[5012580, 3699000, 3206880, 2055000, 1475340, ..."
130,500,4.0,1.50,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0.01816, -0.00431, 0.06933, 0.01843, 0.01156,...","[5011980, 3413340, 3206160, 1941480, 1621500, ..."
131,500,4.0,2.00,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[-0.00012, 0.07744, 0.02643, -0.01771, -0.0351...","[5733480, 4752120, 3348240, 2321700, 1187760, ..."
132,500,4.0,2.50,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0.01738, 0.09451, 0.01469, -0.01485, -0.06607...","[5650980, 4062960, 3291300, 2797140, 2434020, ..."


In [66]:
daily_returns1["cusum"] = daily_returns1["cusum"].apply(eval)
daily_returns1["returns"] = daily_returns1["returns"].apply(eval)
daily_returns1["drawdowns"] = daily_returns1["drawdowns"].apply(eval)

In [71]:
daily_returns1["sharpe"] = np.nan

for index, row in daily_returns1.iterrows():
    r = row['returns']
    daily_returns1.at[index, 'sharpe'] = np.sqrt(len(r)) * np.nanmean(r) / np.nanstd(r)

In [72]:
daily_returns1

Unnamed: 0,lookback,thres,sell_thres,cusum,returns,drawdowns,sharpe
0,500,0.5,-0.25,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[-0.03508, -0.02851, 0.01315, 0.00257, 0.00144...",[42074700],-88.057489
1,500,0.5,0.00,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[-0.01806, -0.00843, 0.01069, 0.00495, -0.0061...",[42072360],-47.387212
2,500,0.5,0.25,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[-0.01785, 0.01069, 0.00495, -0.00618, 0.00675...","[41984460, 76620, 1020]",-27.940365
3,500,0.5,0.50,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[-0.02108, 0.02401, -0.0058, 0.00508, 0.00149,...","[41977980, 41280, 8820, 4080, 1980]",-15.999500
4,500,0.5,0.75,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[-0.01678, 0.00274, -0.0091, 0.01447, 0.0061, ...","[40695720, 654600, 319920, 59400, 41340, 32580...",-9.308326
...,...,...,...,...,...,...,...
129,500,4.0,1.25,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0.0288, -0.00431, 0.06933, 0.01905, 0.01192, ...","[5012580, 3699000, 3206880, 2055000, 1475340, ...",2.314567
130,500,4.0,1.50,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0.01816, -0.00431, 0.06933, 0.01843, 0.01156,...","[5011980, 3413340, 3206160, 1941480, 1621500, ...",2.396464
131,500,4.0,2.00,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[-0.00012, 0.07744, 0.02643, -0.01771, -0.0351...","[5733480, 4752120, 3348240, 2321700, 1187760, ...",2.755810
132,500,4.0,2.50,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0.01738, 0.09451, 0.01469, -0.01485, -0.06607...","[5650980, 4062960, 3291300, 2797140, 2434020, ...",1.783434


In [78]:
d3 = daily_returns1.nlargest(10,'sharpe')

In [79]:
d3

Unnamed: 0,lookback,thres,sell_thres,cusum,returns,drawdowns,sharpe
123,500,4.0,-0.25,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0.0254, 0.0205, 0.0416, -0.00819, -0.0051, 0....","[4414620, 4347480, 3815460, 3507000, 2350260, ...",3.626867
122,500,4.0,-0.5,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0.0254, 0.0205, 0.0416, -0.00667, 0.02131, 0....","[5004000, 3520920, 2825820, 2364540, 1944480, ...",3.492299
124,500,4.0,0.0,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0.02628, 0.01683, 0.03474, 0.01204, 0.00636, ...","[3006720, 2840400, 2679780, 2128200, 1884000, ...",3.128669
131,500,4.0,2.0,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[-0.00012, 0.07744, 0.02643, -0.01771, -0.0351...","[5733480, 4752120, 3348240, 2321700, 1187760, ...",2.75581
126,500,4.0,0.5,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0.02123, 0.02876, 0.06433, 0.01204, 0.0027, 0...","[5040720, 4068300, 3928680, 3258240, 2274300, ...",2.717855
128,500,4.0,1.0,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0.02597, -0.00431, 0.06501, 0.01905, 0.01325,...","[5013600, 3726360, 2054940, 1613880, 1565040, ...",2.676724
125,500,4.0,0.25,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0.02628, 0.02572, 0.06433, 0.01204, 0.01173, ...","[5040840, 3943860, 3649380, 2428320, 2402100, ...",2.508247
127,500,4.0,0.75,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0.02123, 0.03326, 0.07171, 0.01204, 0.00914, ...","[5016660, 3532200, 3258420, 2427360, 1627620, ...",2.476624
130,500,4.0,1.5,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0.01816, -0.00431, 0.06933, 0.01843, 0.01156,...","[5011980, 3413340, 3206160, 1941480, 1621500, ...",2.396464
129,500,4.0,1.25,"[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0.0288, -0.00431, 0.06933, 0.01905, 0.01192, ...","[5012580, 3699000, 3206880, 2055000, 1475340, ...",2.314567


### Anything Else

(Basically like the sharpe ratio, but using params other than returns)

In [27]:
# list to base std off of (ex. for avg daily return, would be list of daily returns.
# for max drawdown length, would be list of drawdown lengths)
raw_value_list = ret

#the raw value of the t-stat to test
raw_value = np.nanmean(ret)

#the null hypothesis value
null_hypothesis = 0 

print('Average Daily Returns = %f' % raw_value)

Average Daily Returns = 0.027455


In [29]:
t_stat = (raw_value - null_hypothesis) / np.nanstd(raw_value_list)
print(t_stat)

0.4123370454656365


In [37]:
1 - st.norm.cdf(t_stat)

0.3400461996725078

In [34]:
st.norm.sf(abs(t_stat))

0.3400461996725078

In [32]:
# Finding Critical Values
# dof (degrees of freedom) is pretty much just a fancy way of saying n-1, (n=number of values in distribution)
# theoretically infinite, since we are using a hypothetical gaussian distribution
dof = 1000

print(st.t.ppf(q=1-(0.1),df=dof))
print(st.t.ppf(q=1-(0.05),df=dof))
print(st.t.ppf(q=1-(0.01),df=dof))
print(st.t.ppf(q=1-(0.001),df=dof))


1.2823987214609247
1.6463788172854639
2.3300826747555097
3.0984021639128754


## Monte Carlo

### Monte Carlo Returns

In [None]:
# The metric to use to measure the input data
# for "predict the market" strategies, can be close price or market returns
# can use z-score for this strategy
zscore_metric = spread.zscore

#daily returns of strategy
#ret

In [None]:
skew_, loc_, scale_ = st.pearson3.fit(zscore_metric)
num_better_samples = 0

In [42]:
sample_size = 10000
ret_sims = []

for sample in tqdm(range(sample_size)):
    zscore_sim = st.pearson3.rvs(skew=skew_, loc=loc_, scale=scale_, size=zscore_metric.shape[0], random_state=sample)
    
    spread_sim = spread
    spread_sim['zscore'] = zscore_sim
    
    _, _, _, _, ret_sim, _ = run_backtest(spread_sim, 2.0, 0.)[0]
    ret_sims.append(ret_sim)
    
    if (np.mean(ret_sim) >= np.mean(ret)):
        num_better_samples += 1
            
print("Randomized zscore: p-value = %f" % (num_better_samples / sample_size))

  0%|          | 0/10000 [00:00<?, ?it/s]


NameError: name 'skew_' is not defined

## Randomized Entry Test

### Randomized Entry Returns

In [None]:
num_better_samples = 0

In [None]:
sample_size = 10000
ret_sims = []

for sample in tqdm(range(sample_size)):
    long_a_sim = long_a.shuffle()
    long_b_sim = long_b.shuffle() 
    
    ret_sim = run_fake_backtest(spread, long_a_sim, long_b_sim, 1.5, 5)[0]
    ret_sims.append(ret_sim)
    
    if (np.mean(ret_sim) >= np.mean(ret)):
        num_better_samples += 1

print("Randomized Entry: p-value = %f" % (num_better_samples / sample_size))

In [None]:
def get_a_b(al, ac, ah, bl, bc, bh):
    return ac-abs(ac-al)/2, ac+abs(ac-ah)/2, bc-abs(bc-bl)/2, bc+abs(bc-bh)/2

def run_fake_backtest(spread, long_a_sim, long_b_sim, thres, sell_thres, fee=0.002, interest=0.002):
    total, p_total = 0, 0 #Previous total
    cusum, returns = [], []
    price_a, price_b, long = None, None, None #Values: None, "A", "B"
    long_a, long_b, liquidate,  dd_indices= [], [], [], [] #Drawdown indicies
    dd_i = True
    
    for i in range(spread.shape[0]):
        al, ah, bl, bh = get_a_b(spread.Al[i], spread.A[i], spread.Ah[i], spread.Bl[i], spread.B[i], spread.Bh[i])
        
        if i in long_a_sim: # Looking to buy
            price_a = ah
            price_b = bl
            long = "A"
            long_a.append(spread.index[i])
            
        elif i in long_b_sim:
            price_a = al
            price_b = bh
            long = "B"
            long_b.append(spread.index[i])
            
        elif (long == "A" and (i in long_b_sim)) or (long == "B" and (i in long_a_sim)): #Liquidate positions
            al, ah, bl, bh = get_a_b(spread.Al[i], spread.A[i], spread.Ah[i], spread.Bl[i], spread.B[i], spread.Bh[i])
            gain = 0
            if long=="A":
                gain = liquidate_assets(price_b, bh, al, price_a, fee, long_a[-1], spread.index[i], interest)
            else:
                gain = liquidate_assets(price_a, ah, bl, price_b, fee, long_b[-1], spread.index[i], interest)
            returns.append(gain)
            total += gain
            price_a, price_b, long = None, None, None
            liquidate.append(spread.index[i])
        cusum.append(total)
        

        if total < p_total:
            if dd_i:
                dd_indices.append(spread.index[i])
                dd_i = False
        else:
            if not dd_i:
                dd_indices.append(spread.index[i])
                dd_i = True
            p_total = total
    if total < p_total:
        dd_indices.append(spread.index[i])
    drawdowns = get_drawdowns(dd_indices)
    return long_a, long_b, liquidate, cusum, returns, drawdowns
        
def liquidate_assets(x1, x2, y1, y2, fee, d1, d2, interest):
    interest = ((d2-d1).days + 1) * interest
    total = (x1 - x2)/x1 - 2*fee - interest
    total += (y1 - y2)/y1 - 2*fee - interest
    return total

def get_drawdowns(dd_indices):
    a = dd_indices[1::2]
    b = dd_indices[::2]
    a = np.array(a)
    b = np.array(b[:len(a)])
    c = a-b
    c.sort()
    return c[::-1]