# Metrics Research

## Hypothesis Testing

The general framework of hypothesis testing as applied to backtesting follows these steps:

1. Based on a backtest on some finite sample of data, we compute a certain statistical measure called the **test statistic**.


2. We suppose that t-statistic based on an infinite data set is actually zero.This supposition is called the null hypothesis.


3. We suppose that the probability distribution of daily returns is known. This probability distribution has a zero mean, based on the null hypothesis. We describe later how we determine this probability distribution.


4. Based on this null hypothesis probability distribution, we compute the probability p that the t-statistic will be at least as extreme (allowing for the possibility of a negative test statistic). This probability p is called the **p-value**, and if it is very small (let’s say smaller than 0.01), that means we can “reject the null hypothesis,” and conclude that the backtested average daily return is statistically significant.

## Determining the Probability Distribution Under Null Hypothesis

### Gaussian Distribution

Suppose that the daily returns follow a Gaussian distribution, with a mean of zero and a standard deviation given by the sample standard deviation of the t-statistic. 


If we do this, it is clear that if the backtest has a high Sharpe ratio, it would be very easy for us to reject the null hypothesis. This is because the standard test statistic for a Gaussian distribution is none other than the average divided by the standard deviation and multiplied by the square root of the number of data points 

(basically fancy way of saying the sharpe ratio lmao)

| p-value         | 0.1   | 0.05  | 0.01  | 0.001 |
|-----------------|-------|-------|-------|-------|
| Critical Values | 1.282 | 1.645 | 2.326 | 3.091 |

### Monte Carlo Simulation

Another method is to use Monte Carlo methods to generate simulated historical price data and feed these simulated data into our strategy to determine the empirical probability distribution of profits.


Our belief is that the profitability of the trading strategy captured some subtle patterns or correlations of the price series, and not just because of the first few moments of the price distributions.


So if we generate many simulated price series with the same first moments and the same length as the actual price data, and run the trading strategy over all these simulated price series, we can find out in what fraction p of these price series are the average returns greater than or equal to the backtest return.

### Simulated Trades Method

In this method, instead of generating simulated price data, we generate sets of simulated trades, with the constraint that the number of long and short entry trades is the same as in the backtest, and with the same average holding period for the trades. 

These trades are distributed randomly over the actual historical price series. We then measure what fraction of such sets of trades has average return greater than or equal to the backtest average return.

## Options for T-Statistic

- Returns
    - Cumulative Returns
    - Average Daily Returns
    
    
- Drawdown
    - Maximum Drawdown Length
    - Number of Drawdowns > 1 Week


- Losses
    - Number of 15% Losses
    - Biggest Loss
    
**OR Some Combination of the Above.**

# Hypothesis Tests

## Guassian

In [5]:
import scipy.stats as st
import math

### Average Daily Returns

In [None]:
ret = #daily returns of strategy

In [None]:
# we multiply by the sqrt(len(ret)) to "annualize" the sharpe ratio based on big brain maths
sharpe = np.sqrt(len(ret)) * np.nanmean(ret) / np.nanstd(ret)
print("Gaussian Test statistic = %f" % sharpe)

| p-value         | 0.1   | 0.05  | 0.01  | 0.001 |
|-----------------|-------|-------|-------|-------|
| Critical Values | 1.282 | 1.645 | 2.326 | 3.091 |

### Anything Else

(Basically like the sharpe ratio, but using params other than returns)

In [None]:
# list to base std off of (ex. for avg daily return, would be list of daily returns.
# for max drawdown length, would be list of drawdown lengths)
raw_value_list = 
raw_value = #the raw value of the t-stat to test

In [None]:
t_stat = (raw_value - 0) / np.nanstd(raw_value_list)

In [11]:
# Finding Critical Values
# dof (degrees of freedom) is pretty much just a fancy way of saying n-1, (n=number of values in distribution)
# theoretically infinite, since we are using a hypothetical gaussian distribution
dof = 10000 

print(st.t.ppf(q=1-(0.1),df=dof))
print(st.t.ppf(q=1-(0.05),df=dof))
print(st.t.ppf(q=1-(0.01),df=dof))
print(st.t.ppf(q=1-(0.001),df=dof))


1.2816362297304775
1.6450060180692423
2.3267208386694755
3.091047516030612
