# Partner selection

## Introduction

This notebook will simply demo implementations of 4 partner selection methods described in Stübinger et al. (2016) section 3.1.1.

Procedure encompasses the following steps:
- Creat a PartnerSelection object.
- Add target stocks.
- Determine partner stocks based on the chosen approach.

## Methodology

### Traditional Approach
- Calculate the sum of all pairwise correlations for all possible cohorts via Spearman’s ρ, consisting of a fixed target stock and one of the $C_{p}^{n}$
combination of potential partner stocks, where n denotes total number of potential partner stocks, and p denotes number of partner stocks in a cohort.
- The cohort with the largest sum of pairwise correlations is considered as the output.

### Extended Approach
- Change the sum of all pairwise correlations in Traditional Approach to one of the multivariate conditional versions of Spearman’s ρ showing below.

    - $ \hat{\rho}_1 = h(d) \times \{ {-1 + \frac{2^d}{n} \sum_{j=1}^n \prod_{i=1}^d (1 - \hat{U}_{ij}) \}} $
    
    - $ \hat{\rho}_2 = h(d) \times \{ {-1 + \frac{2^d}{n} \sum_{j=1}^n \prod_{i=1}^d \hat{U}_{ij} \} } $
    
    - $ \hat{\rho}_3 = -3 + \frac{12}{n {d \choose 2}} \times \sum_{k<l} \sum_{j=1}^n (1-\hat{U}{kj})(1-\hat{U}{lj}) $
 
- Follow the same calculation processes discribed in Traditional Approach.


### Geometric Approach
- Calculate the sum of all perpendicular distances from each row instance of the data to the diagonal vector for all possible cohorts.
- The cohort with the smallest sum is considered as the output.

    Example: p = 1 and t = 2, where t denotes total number of time instances.
    
    data = $ \left[ \begin{matrix} 0 & 1 \\ 0 & 1 \end{matrix} \right]$, diagonal vector = [1, 1]
    
    SUM = $\frac{\sqrt{2}}{2} + \frac{\sqrt{2}}{2}$

### Extremal  Approach
- Calculate the chi square type test statistic described in Mangold (2015) section 3 for all possible cohorts.
- The cohort with the largest test statistic is considered as the output.
- In the implementation, I first use SymPy to generate analytical formula of $\dot c_\theta (u_1, ..., u_{p + 1})$, then do rest of the calculations.
- This approach will take more time than others, especially the first time you use it.

---

## Import Library

In [1]:
import numpy as np
import pandas as pd

import data
import selection

## Load Data
You can download the data online or simply load from the pkl file.

In [2]:
#Get data online
tickers = data.get_sp500_tickers()
adj_close = data.get_adj_close(tickers, 2015, 1, 1, 2020, 12, 31)
daily_return = adj_close.pct_change().dropna(how = "all", axis = 0)
daily_return.to_pickle("S&P500_Daily_Return.pkl")

[*********************100%***********************]  505 of 505 completed


In [3]:
# Get data from local pkl file
daily_return = pd.read_pickle("S&P500_Daily_Return.pkl")

---

# Example Usage of Implementations

## Step 1 - Creat a PartnerSelection object

- num_of_partner : Number of partner the target stock needed. if num_of_partner = n, total stocks as a cohort will be n + 1
- num_take_account : Number of potential partner stocks take into account.

In [4]:
partner_selection = selection.PartnerSelection(daily_return = daily_return, num_of_partner = 3, num_take_account = 50)

## Step 2 - Add target stocks

### Normal Usages

In [5]:
partner_selection.add_target_stock("V")
partner_selection.add_target_stock("MS")
partner_selection.add_target_stock("PYPL")

print(partner_selection.target_stock_added())

['V', 'MS', 'PYPL']


### Errors
If the ticker is not in the columns of DataFrame for creating the PartnerSelection object, an error will be raised.

In [6]:
partner_selection.add_target_stock("ABCD")

KeyError: "Target stock dose not exist in the object's data."

## Step 3 - Determine partner stocks based on the chosen approach.

### Normal Usages

#### Traditional Approach

In [7]:
%%time
partner_selection.traditional_approach("PYPL")

Wall time: 28 s


['MSFT', 'GOOGL', 'GOOG', 'PYPL']

#### Extended Approach
type_of_estimator : {1, 2, 3}, default 3

In [8]:
%%time
partner_selection.extended_approach("PYPL", type_of_estimator=3)

Wall time: 39.2 s


['SNPS', 'CDNS', 'ANSS', 'PYPL']

#### Geometric Approach

In [9]:
%%time
partner_selection.geometric_approach("PYPL")

Wall time: 38.7 s


['MSFT', 'GOOGL', 'GOOG', 'PYPL']

#### Extremal Approach
The first usage will take more time.

In [10]:
%%time
partner_selection.extremal_approach("PYPL")

Wall time: 7min 30s


['MA', 'V', 'ACN', 'PYPL']

In [11]:
%%time
partner_selection.extremal_approach("MS")

Wall time: 2min 27s


['PNC', 'FITB', 'TFC', 'MS']

### Errors
If the stock isn't added at first step, an error will be raised.

In [12]:
%%time
partner_selection.traditional_approach("ABCD")

KeyError: 'Please add the target stock to the object first.'

---

## Other Functions

### get_result()
It will return a dictionary containing all calculated results.

In [13]:
partner_selection.get_result()

{'PYPL': {'traditional approach': ['MSFT', 'GOOGL', 'GOOG', 'PYPL'],
  'extended approach': ['SNPS', 'CDNS', 'ANSS', 'PYPL'],
  'geometric approach': ['MSFT', 'GOOGL', 'GOOG', 'PYPL'],
  'extremal approach': ['MA', 'V', 'ACN', 'PYPL']},
 'MS': {'extremal approach': ['PNC', 'FITB', 'TFC', 'MS']}}

### visualization(target_stock)
Simple visualization of the result using Bokeh.

In [15]:
from bokeh.io import output_notebook
output_notebook()

partner_selection.visualization("PYPL")

---

# Conclusion

This notebook provides simple demo of 4 partner selection methods, and all implementations are easy for the user to make use of. However, there is still room for improvement in calculating efficiency.

# References

1. Stübinger, J., Mangold, B., Krauss, C., 2016. Statistical arbitrage with vine copulas
2. Schmid, F., Schmidt, R., 2007. Multivariate extensions of Spearman’s rho and related statistics.
3. Mangold, B., 2015. A multivariate linear rank test of independence based on a multiparametric copula with cubic sections.