The cells below are for testing the functions of the Selection class in the *pair_selection.py* file. For more information about the purpose and usability of some functions please refer to the docstrings inside the file.

In [None]:
# ---------- INSTALLATION AND IMPORTATION ----------

!pip install googleapis-common-protos protobuf grpcio pandas systemathics.apis
!pip install statsmodels
!pip install matplotlib

In [None]:
import pair_selection

# Construction

We import our file and create an object with the default variables except for the date which we define as from January 1st 2015. The construction of this object will take some time to make all the API calls to retrieve the information on the assets and then the prices of each.

By default we have set the **stationarity** variable to *True* and the **correlation** one to *False* so we don't calculate it, the reason is that just looking at correlation might give you spurious results. For instance, if your pairs trading strategy is based on the spread between the prices of the two stocks, it is possible that the prices of the two stocks keep on increasing without ever mean-reverting, thus, one should be careful of using only correlation for pairs trading. This is why we would rather use the **cointegration** that is very similar to stationarity, indeed if A and B are cointegrated then it implies that the log of the ratio is stationary which suggests that the mean and variance remains constant over time.

In [None]:
s = pair_selection.Selection(start_date="2015-01-01")

2 class variables are built during the initialization, they can be consulted :

*data* is a dataframe gathering general information about the assets such as their name, ticker and sector among others.

In [None]:
s.data

*df_all_prices* is also a dataframe but it gathers the daily closing prices of all the assets between the starting date we've set in the constructor until the day the request was made.

In [None]:
s.df_all_prices

# Method Class

Let's test some class methods :

In [None]:
ticker1 = 'MCHP'
ticker2 = 'AVGO'

new_df = s.logAndRatio(ticker1, ticker2)
new_df

In [None]:
corrStatio = s.pairCorr(ticker1, ticker2)
corrStatio

Since the variable indicating that the stationarity calculation should be performed is set to *False* by default, it is normal that no value or dedicated column is returned.

Let's repeat the same operation but this time between two specific dates to see if everything works correctly.

In [None]:
ticker1 = 'MCHP'
ticker2 = 'AVGO'

start_date = '2019-01-08'
end_date = '2021-11-17'

new_df = s.logAndRatio(ticker1, ticker2, start_date, end_date)
new_df

# Get best pairs

Again we will try to call the method ``get_best_pairs()`` without modifying the parameters (the calculation will be done from the first to the last date) then we will add an interval.

In [None]:
top = s.get_best_pairs()
top

In [None]:
start_date = '2019-01-08'
end_date = '2021-11-17'

top = s.get_best_pairs(start_date, end_date) # pouvoir appliquer la fonction entre deux dates
top

# Get alltime best pair at regular intervals

In order to get the best pairs per period for our backtest we will call the function *get_alltime_best_pairs*. This function works like this :

From the starting date until last date and progressing in a **range of a specific interval** in months, it saves in a **json file** the pairs with a **satisfying correlation value and/or stationarity** following the instance variable criteria set and this **between two increasing months of a specific interval**. 

*We presume that start_date is 2015-03-01 and end_date is 2021-06-18, interval is 6 and repetition is 2. We will then start the computations on price data between 2015-03-01 and 2015-09-01 (+6 months), then do it again after 2 months: therefore between 2015-05-01 and 2015-11-01 etc until end_date is 2021-06-01.*

We keep the default repetition of the computation i.e. every month, but to have multiples files for backtesting purposes we need to vary the use of the correlation or not and the stationarity, but also the interval of months on which each computation is made. The goal is to have enough different outputs for the backtest and to only keep the best at the end, therefore we will also try with correlation only and both correlation and stationarity.

In [None]:
s.get_alltime_best_pairs(filename="json_best_pairs/semestrial_statio")

In [None]:
s.get_alltime_best_pairs(interval=12, filename="json_best_pairs/yearly_statio")

In [None]:
s.get_alltime_best_pairs(interval=4, filename="json_best_pairs/trimestrial_statio")

The three cells before only took into account the stationary result of a pair : if *p*value was under 5% then we could consider the pair as stationary and add it to the json file.

Now we try the mingling with the correlation, the pair will be considered as valid if stationary but also if correlated at at least 70% (values can be accessed and modified with ```self.mincorr_level=0.8``` for setting it at 80%).

In [None]:
s.use_statio = True
s.use_corr = True
s.get_alltime_best_pairs(filename="json_best_pairs/semestrial_statio_and_corr")

In [None]:
s.get_alltime_best_pairs(interval=12, filename="json_best_pairs/yearly_statio_and_corr")

In [None]:
s.get_alltime_best_pairs(interval=4, filename="json_best_pairs/trimestrial_statio_and_corr")

Same idea as before but this time only considering the correlation and not the stationarity anymore.

In [None]:
s.use_statio = False
s.use_corr = True
s.get_alltime_best_pairs(filename="json_best_pairs/semestrial_corr")

In [None]:
s.get_alltime_best_pairs(interval=12, filename="json_best_pairs/yearly_corr")

In [None]:
s.get_alltime_best_pairs(interval=4, filename="json_best_pairs/trimestrial_corr")

Here is an overview of what the first line ``s.get_alltime_best_pairs(filename="semestrial_statio")`` returns in the json file :

```json
{
    "2015-07-01": [
        [
            "SPLK",
            "INTU"
        ],
        [
            "MCHP",
            "NVDA"
        ],
        [
            "SPLK",
            "EA"
        ],
        [
            "ANSS",
            "CDNS"
        ],
        [
            "SPLK",
            "ATVI"
        ],
        [
            "CDNS",
            "ADBE"
        ],
        [
            "ADSK",
            "WDAY"
        ]
    ],
    "2015-08-01": [ " ... " ],
    "2015-09-01": [ " ... " ],
    " ... ": [ " ... " ],
    "2021-12-01": [
        [
            "ANSS",
            "ADBE"
        ],
        [
            "NVDA",
            "XLNX"
        ],
        [
            "MRVL",
            "AVGO"
        ],
        [
            "ADSK",
            "CHKP"
        ],
        [
            "MRVL",
            "MCHP"
        ],
        [
            "CHKP",
            "EA"
        ],
        [
            "PAYX",
            "CPRT"
        ],
        [
            "VRTX",
            "GILD"
        ]
    ]
}
```