<a href="https://colab.research.google.com/github/elaineleiyoung/statistical-arbitrage/blob/main/statistical_arbitrage.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Statistical Arbitrage Model: 
Pairs trading is a special form of statistical arbitrage where a portfolio has only two stocks. By combining two cointegrated stocks, we can construct a spread that is mean-reverting, even when these two stocks themselves are not. 
1.   Collect historical data on two stocks
2.   Test for Cointegration
3.   Calculate the spread
4.   Trading Strategy
5.   Backtesting



In [None]:
pip install yfinance

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


### Downloading Market Data

In [None]:
import yfinance as yf

# Set the ticker symbols for Bitcoin and Ethereum
tickers = ['BTC-USD', 'ETH-USD']

# Download market data from Yahoo Finance
data = yf.download(tickers, start="2019-01-01", end="2022-02-22")

# Extract the adjusted close prices for Bitcoin and Ethereum
btc_prices = data['Adj Close']['BTC-USD']
eth_prices = data['Adj Close']['ETH-USD']

print(btc_prices)
print(eth_prices)

[*********************100%***********************]  2 of 2 completed
Date
2019-01-01     3843.520020
2019-01-02     3943.409424
2019-01-03     3836.741211
2019-01-04     3857.717529
2019-01-05     3845.194580
                  ...     
2022-02-17    40538.011719
2022-02-18    40030.976562
2022-02-19    40122.156250
2022-02-20    38431.378906
2022-02-21    37075.281250
Name: BTC-USD, Length: 1148, dtype: float64
Date
2019-01-01     140.819412
2019-01-02     155.047684
2019-01-03     149.135010
2019-01-04     154.581940
2019-01-05     155.638596
                 ...     
2022-02-17    2881.481934
2022-02-18    2785.727539
2022-02-19    2763.701172
2022-02-20    2628.648438
2022-02-21    2573.816162
Name: ETH-USD, Length: 1148, dtype: float64


### Analyzing Data

In [None]:
# calculating price difference: 
# subtract the daily closing price of Ethereum from the daily closing price of Bitcoin to get the price difference between the two assets
price_diff = []
for i in range(len(btc_prices)):
  price_diff.append(btc_prices[i] - eth_prices[i])
print(price_diff)

[3702.7006072998047, 3788.3617401123047, 3687.606201171875, 3703.1355895996094, 3689.5559844970703, 3918.8863677978516, 3873.549072265625, 3880.488265991211, 3884.4932708740234, 3550.2993774414062, 3559.817153930664, 3535.3344955444336, 3436.055320739746, 3576.9835205078125, 3508.642578125, 3531.4597702026367, 3554.8220443725586, 3536.8290939331055, 3604.0493392944336, 3481.5388412475586, 3458.8746185302734, 3485.8295974731445, 3467.670440673828, 3483.5025939941406, 3483.3875274658203, 3485.971710205078, 3470.560531616211, 3363.860466003418, 3342.518730163574, 3377.274101257324, 3350.7317123413086, 3380.3355255126953, 3410.630531311035, 3356.5207595825195, 3351.332450866699, 3358.913902282715, 3308.848533630371, 3294.936378479004, 3547.5125274658203, 3551.7727279663086, 3565.3817443847656, 3527.1322708129883, 3530.9562759399414, 3509.5169525146484, 3495.490653991699, 3498.710403442383, 3506.5273818969727, 3540.2373046875, 3769.615982055664, 3801.748291015625, 3850.2662200927734, 3807.9

In [None]:
#run a z-test to determine when to short and buy each respective stock
#copy pasted from https://medium.com/analytics-vidhya/statistical-arbitrage-with-pairs-trading-and-backtesting-ec657b25a368

def zscore(series):
 return (series — series.mean()) / np.std(series)
# create a dataframe for trading signals
signals = pd.DataFrame()
signals['BTC-USD'] = test_close['BTC-USD'] 
signals['ETH-USD'] = test_close['ETH-USD']
ratios = signals.asset1 / signals.asset2
# calculate z-score and define upper and lower thresholds
signals['z'] = zscore(ratios)
signals['z upper limit'] = np.mean(signals['z']) + np.std(signals['z'])
signals['z lower limit'] = np.mean(signals['z']) - np.std(signals['z'])
# create signal - short if z-score is greater than upper limit else long
signals['signals1'] = 0
signals['signals1'] = np.select([signals['z'] > \
                                 signals['z upper limit'], signals['z'] < signals['z lower limit']], [-1, 1], default=0)
# we take the first order difference to obtain portfolio position in that stock
signals['positions1'] = signals['signals1'].diff()
signals['signals2'] = -signals['signals1']
signals['positions2'] = signals['signals2'].diff()
# verify datafame head and tail
signals.head(3).append(signals.tail(3))
# visualize trading signals and position
fig=plt.figure(figsize=(14,6))
bx = fig.add_subplot(111)   
bx2 = bx.twinx()
#plot two different assets
l1, = bx.plot(signals[''], c='#4abdac')
l2, = bx2.plot(signals['ETH-USD'], c='#907163')
u1, = bx.plot(signals['BTC-USD'][signals['positions1'] == 1], lw=0, marker='^', markersize=8, c='g',alpha=0.7)