The cells below are for testing the functions of the Selection class in the *pair_selection.py* file. For more information about the purpose and usability of some functions please refer to the docstrings inside the file.

In [None]:
# ---------- INSTALLATION AND IMPORTATION ----------

!pip install googleapis-common-protos protobuf grpcio pandas systemathics.apis
!pip install statsmodels
!pip install matplotlib

In [2]:
import pair_selection

# Construction

We import our file and create an object with the default variables except for the date which we define as from January 1st 2015. The construction of this object will take some time to make all the API calls to retrieve the information on the assets and then the prices of each.

By default we have set the **stationarity** variable to *True* and the **correlation** one to *False* so we don't calculate it, the reason is that just looking at correlation might give you spurious results. For instance, if your pairs trading strategy is based on the spread between the prices of the two stocks, it is possible that the prices of the two stocks keep on increasing without ever mean-reverting, thus, one should be careful of using only correlation for pairs trading. This is why we would rather use the **cointegration** that is very similar to stationarity, indeed if A and B are cointegrated then it implies that the log of the ratio is stationary which suggests that the mean and variance remains constant over time.

In [3]:
s = pair_selection.Selection(start_date="2015-01-01")

2 class variables are built during the initialization, they can be consulted :

*data* is a dataframe gathering general information about the assets such as their name, ticker and sector among others.

In [4]:
s.data

Unnamed: 0,Index,Name,Ticker,Exchange,Primary,Isin,Cusip,Sedol,Sector
0,Nasdaq 100|Nasdaq Composite,Mercadolibre Inc Common Stock,MELI,XNGS,XNGS,US58733R1023,58733R102,B23X1H3,Catalog/Specialty Distribution
2,Nasdaq 100|Nasdaq Composite,Check Point Software Technologies Ltd,CHKP,XNGS,XNGS,IL0010824113,M22465104,2181334,Computer Software: Prepackaged Software
4,Nasdaq 100|Nasdaq Composite|Russell 1000|Russe...,Okta Inc Cl A,OKTA,XNGS,XNGS,US6792951054,679295105,BDFZSP1,EDP Services
9,Nasdaq 100|Nasdaq Composite|Russell 1000|Russe...,Sirius Xm Holdings Inc,SIRI,XNGS,XNGS,US82968B1035,82968B103,BGLDK10,Broadcasting
12,Nasdaq 100|Nasdaq Composite|Russell 1000|Russe...,Marvell Technology Inc,MRVL,XNGS,XNGS,US5738741041,573874104,BNKJSM5,Semiconductors
...,...,...,...,...,...,...,...,...,...
186,Composite|Industrials|Nasdaq 100|Nasdaq Compos...,Amgen Inc,AMGN,XNGS,XNGS,US0311621009,031162100,2023607,Biotechnology: Pharmaceutical Preparations
188,Composite|Industrials|Nasdaq 100|Nasdaq Compos...,Cisco Systems Inc,CSCO,XNGS,XNGS,US17275R1023,17275R102,2198163,Computer peripheral equipment
189,Composite|Industrials|Nasdaq 100|Nasdaq Compos...,Walgreens Boots Alliance Inc,WBA,XNGS,XNGS,US9314271084,931427108,BTN1Y44,Medical/Nursing Services
190,Composite|Industrials|Nasdaq 100|Nasdaq Compos...,Apple Inc,AAPL,XNGS,XNGS,US0378331005,037833100,2046251,Computer Manufacturing


*df_all_prices* is also a dataframe but it gathers the daily closing prices of all the assets between the starting date we've set in the constructor until the day the request was made.

In [5]:
s.df_all_prices

Unnamed: 0,Dates,MELI,CHKP,SIRI,MRVL,WDAY,SGEN,LULU,SPLK,CPRT,...,AMZN,TSLA,SBUX,EXC,MSFT,AMGN,CSCO,WBA,AAPL,INTC
64,2015-01-02,125.85,78.46,3.475,14.520,80.41,32.54,55.34,58.790,18.290,...,308.52,43.862,40.720,37.57,46.760,159.89,27.61,76.00,27.3325,36.36
65,2015-01-05,124.30,78.01,3.400,14.280,80.01,33.05,55.96,56.840,18.110,...,302.19,42.018,39.940,36.50,46.325,157.99,27.06,74.50,26.5625,35.95
66,2015-01-06,122.08,77.95,3.350,14.825,79.42,31.45,55.57,55.780,17.905,...,295.29,42.256,39.615,36.22,45.650,152.90,27.05,74.69,26.5650,35.28
67,2015-01-07,121.84,78.59,3.410,15.040,79.35,32.15,57.65,56.245,17.960,...,298.42,42.190,40.590,36.27,46.230,158.24,27.30,76.60,26.9375,36.02
68,2015-01-08,123.66,80.53,3.540,16.015,82.78,31.74,59.07,58.450,18.145,...,300.46,42.123,41.245,36.55,47.590,157.67,27.51,77.55,27.9725,36.69
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1812,2021-12-10,1144.50,110.61,6.280,89.020,279.68,142.57,409.58,110.840,149.210,...,3444.24,1017.030,116.730,53.91,342.540,210.89,59.25,49.50,179.4500,50.59
1813,2021-12-13,1180.00,112.05,6.290,86.590,280.92,144.17,403.75,111.500,145.810,...,3391.35,966.410,115.560,54.14,339.400,211.39,58.61,49.14,175.7400,50.00
1814,2021-12-14,1190.40,111.75,6.300,85.290,273.35,144.46,395.09,109.350,147.080,...,3381.83,958.510,114.710,53.85,328.340,213.74,57.77,49.32,174.3300,49.70
1815,2021-12-15,1210.12,113.41,6.380,88.370,279.22,150.08,402.48,112.820,149.280,...,3466.30,975.990,114.680,54.69,334.650,219.25,59.93,49.66,179.3000,50.67


# Method Class

Let's test some class methods :

In [6]:
ticker1 = 'MCHP'
ticker2 = 'AVGO'

new_df = s.logAndRatio(ticker1, ticker2)
new_df

Unnamed: 0,Date,Price_1,Price_2,Ratio
0,2015-01-02,22.435,100.09,-0.649465
1,2015-01-05,21.935,98.49,-0.652254
2,2015-01-06,21.510,96.25,-0.650760
3,2015-01-07,21.630,98.85,-0.659920
4,2015-01-08,22.320,103.79,-0.667461
...,...,...,...,...
1748,2021-12-10,87.130,631.68,-0.860329
1749,2021-12-13,85.160,621.66,-0.863317
1750,2021-12-14,84.890,614.91,-0.859955
1751,2021-12-15,87.940,639.86,-0.861899


In [14]:
corrStatio = s.pairCorr(ticker1, ticker2)
corrStatio

{'corr': None, 'statio': 0.002108385837462449}

Since the variable indicating that the stationarity calculation should be performed is set to *False* by default, it is normal that no value or dedicated column is returned.

Let's repeat the same operation but this time between two specific dates to see if everything works correctly.

In [7]:
ticker1 = 'MCHP'
ticker2 = 'AVGO'

start_date = '2019-01-08'
end_date = '2021-11-17'

new_df = s.logAndRatio(ticker1, ticker2, start_date, end_date)
new_df

Unnamed: 0,Date,Price_1,Price_2,Ratio
0,2019-01-08,36.480,236.07,-0.810986
1,2019-01-09,37.925,246.28,-0.812504
2,2019-01-10,38.310,249.52,-0.813793
3,2019-01-11,38.565,250.57,-0.812736
4,2019-01-14,37.120,250.87,-0.829841
...,...,...,...,...
718,2021-11-11,83.500,555.40,-0.822919
719,2021-11-12,83.350,563.22,-0.829772
720,2021-11-15,83.210,565.77,-0.832464
721,2021-11-16,84.260,568.77,-0.829315


# Get best pairs

Again we will try to call the method ``get_best_pairs()`` without modifying the parameters (the calculation will be done from the first to the last date) then we will add an interval.

In [8]:
top = s.get_best_pairs()
top

Unnamed: 0,Ticker 1,Ticker 2,Correlation,Stationnarity
0,MCHP,AVGO,,0.002108
1,XLNX,AVGO,,0.004202
2,TXN,AVGO,,0.00959
3,SWKS,NXPI,,0.014984
4,GILD,BIIB,,0.026449
5,MCHP,TXN,,0.036703
6,CHKP,EA,,0.042883
7,MU,AVGO,,0.046359
8,PEP,MNST,,0.048071
9,ADSK,ADBE,,0.04854


In [9]:
start_date = '2019-01-08'
end_date = '2021-11-17'

top = s.get_best_pairs(start_date, end_date) # pouvoir appliquer la fonction entre deux dates
top

Unnamed: 0,Ticker 1,Ticker 2,Correlation,Stationnarity
0,MCHP,AVGO,,8.2e-05
1,GILD,BIIB,,0.002157
2,MCHP,TXN,,0.005016
3,AMGN,BIIB,,0.01337
4,CTAS,CPRT,,0.027623
5,ANSS,EA,,0.030685
6,NXPI,AVGO,,0.045377


# Get alltime best pair at regular intervals

In order to get the best pairs per period for our backtest we will call the function *get_alltime_best_pairs*. This function works like this :

From the starting date until last date and progressing in a **range of a specific interval** in months, it saves in a **json file** the pairs with a **satisfying correlation value and/or stationarity** following the instance variable criteria set and this **between two increasing months of a specific interval**. 

*We presume that start_date is 2015-03-01 and end_date is 2021-06-18, interval is 6 and repetition is 2. We will then start the computations on price data between 2015-03-01 and 2015-09-01 (+6 months), then do it again after 2 months: therefore between 2015-05-01 and 2015-11-01 etc until end_date is 2021-06-01.*

We keep the default repetition of the computation i.e. every month, but to have multiples files for backtesting purposes we need to vary the use of the correlation or not and the stationarity, but also the interval of months on which each computation is made. The goal is to have enough different outputs for the backtest and to only keep the best at the end, therefore we will also try with correlation only and both correlation and stationarity.

In [10]:
s.get_alltime_best_pairs(filename="json_best_pairs/semestrial_statio")

In [12]:
s.get_alltime_best_pairs(interval=12, filename="json_best_pairs/yearly_statio")

In [13]:
s.get_alltime_best_pairs(interval=4, filename="json_best_pairs/trimestrial_statio")

The three cells before only took into account the stationary result of a pair : if *p*value was under 5% then we could consider the pair as stationary and add it to the json file.

Now we try the mingling with the correlation, the pair will be considered as valid if stationary but also if correlated at at least 70% (values can be accessed and modified with ```self.mincorr_level=0.8``` for setting it at 80%).

In [14]:
s.use_statio = True
s.use_corr = True
s.get_alltime_best_pairs(filename="json_best_pairs/semestrial_statio_and_corr")

In [15]:
s.get_alltime_best_pairs(interval=12, filename="json_best_pairs/yearly_statio_and_corr")

In [16]:
s.get_alltime_best_pairs(interval=4, filename="json_best_pairs/trimestrial_statio_and_corr")

Same idea as before but this time only considering the correlation and not the stationarity anymore.

In [17]:
s.use_statio = False
s.use_corr = True
s.get_alltime_best_pairs(filename="json_best_pairs/semestrial_corr")

In [18]:
s.get_alltime_best_pairs(interval=12, filename="json_best_pairs/yearly_corr")

In [19]:
s.get_alltime_best_pairs(interval=4, filename="json_best_pairs/trimestrial_corr")

Here is an overview of what the first line ``s.get_alltime_best_pairs(filename="semestrial_statio")`` returns in the json file :

```json
{
    "2015-07-01": [
        [
            "SPLK",
            "INTU"
        ],
        [
            "MCHP",
            "NVDA"
        ],
        [
            "SPLK",
            "EA"
        ],
        [
            "ANSS",
            "CDNS"
        ],
        [
            "SPLK",
            "ATVI"
        ],
        [
            "CDNS",
            "ADBE"
        ],
        [
            "ADSK",
            "WDAY"
        ]
    ],
    "2015-08-01": [ " ... " ],
    "2015-09-01": [ " ... " ],
    " ... ": [ " ... " ],
    "2021-12-01": [
        [
            "ANSS",
            "ADBE"
        ],
        [
            "NVDA",
            "XLNX"
        ],
        [
            "MRVL",
            "AVGO"
        ],
        [
            "ADSK",
            "CHKP"
        ],
        [
            "MRVL",
            "MCHP"
        ],
        [
            "CHKP",
            "EA"
        ],
        [
            "PAYX",
            "CPRT"
        ],
        [
            "VRTX",
            "GILD"
        ]
    ]
}
```