<a href="https://colab.research.google.com/github/rdamatta/myRepo/blob/master/Pairs_Trading_using_ML_Results.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pairs Trading, a Machine Learning approach

Pairs Trading is an arbitrage strategy involving two equities which prices have comoved in the past and is expected to do so in the future. The rationale behind is simple: at the entry point, short the stock which outperforms and long the underperforming stock. Afterward, liquidate the position at the exit point.

Here we used two Machine Learning algorithms namely, Recurrent Neural Networks (RNN) and Reinforcement Learning (RL) to support a Pairs Trading strategy on High-Frequency Data (HFD) from a set of 2365 stocks from 3 different US exchanges. 

## 7. Results

### 7.1 Time windows

Pairs formation is the first stage of the strategy after pre-processing data. We used 2 days for pair formation and 1 day for trade in a rolling base, which means, the first time window used the first and second working day of the month for pair formation, thus the third working day was for trade. The second time window used the second and the third working day of the month for pair formation, the fourth day was for trade, and so on.

![texto alternativo](https://drive.google.com/uc?id=1_oK_WkIo9yvBQ6pV8cfimmhiC1hoMIZh)

In the end, we got 19 time windows where we ran the algorithm to form pairs and then trade.

### 7.2 Performance measures

We employed the following performance criteria to evaluate the results.

#### 7.2.1 Final position

Only the trades within the core session of stock exchanges, that is, from 09:30:00 to 16:00:00 were considered. The **final position** means the  portfolio value at the end of the core session. If any long/short position was still open at that time, it was automatically closed.

#### 7.2.2 Maximum portfolio value

At every frequency bar, the portfolio value is computed according to the previous portfolio value and the execution of a trade order (if it exists). The **maximum portfolio value** is therefore the highest amount computed.

#### 7.2.3 Return

The **return** may be expressed as a price difference but is generally termed as a percentual change in value. The resulting dimensionless figure is independent of the price level of the financial instrument under evaluation and makes it possible to compare the results effectively across strategies.

Return equation can be illustrated as:

$$R_{t} = \frac{P_{t}}{P_{t-1}} -1$$


Where *R<sub>t</sub>* means the return at time *t*, *P<sub>t</sub>* portfolio's value at time *t*, and *P<sub>t-1</sub>* portfolio's value at time *t-1*.

#### 7.2.4 Average gain

If a positive result is observed, the **average gain** measures the average profitability of the strategy. So, the average gain the expected value of all positive results of the strategy.

#### 7.2.5 Average loss

Similarly to the average gain, the **average loss** is the expected value of all negative results of the strategy.

#### 7.2.6 Win ratio

The **win ratio** describes how profitable the trades ended. It is basically the ratio between the trading periods which experienced a gain and the total trading periods. Win ratios help to evaluate signals precision since improved predictions lead to better win ratios.

#### 7.2.7 Volatility

Return **volatility** measures how much the return increases and falls about its average value. Volatility is also considered a risk measure. The most common measure of volatility is the simple standard deviation.

#### 7.2.8 Sharpe ratio

The **Sharpe ratio** is the average return per unit of volatility (or total risk) that exceeds the risk-free rate. The subtraction of risk-free rate from the expected return helps an investor to spot income from risk-taking activities more effectively. 

The Sharpe ratio can be obtained from the equation:

$$S = \frac{R_{p}-R_{f}}{\sigma_{p} }$$

Where *R<sub>p</sub>* means the expected return of the portfolio, *R<sub>f</sub>* the risk-free rate, and *$\sigma$<sub>p</sub>* the standard deviation of the portfolio's returns.

#### 7.2.9 Maximum drawdown

The **maximum drawdown** or *MDD* is the average observed loss from a peak into a trough of the portfolio value before hitting a new peak. A maximum drawdown is a downside risk measure for a given time frame.

The *MDD* can be computed as follows:

$$MDD = \frac{V_{t}-V_{p}}{V_{p} }$$

Where *V<sub>t</sub>* means a trough value and *V<sub>p</sub>* a peak value.

####7.2.10 Longs (Shorts)

It counts how many **long** (or **short**) positions for Stock<sub>1</sub> were entered during the time frame. As a consequence of the Pairs Trading strategy, the algorithm enters the opposite position for Stock<sub>2</sub>.

### 7.2 Top 10 performers (without costs)

Considering the final portfolio value as the main performance criterion, we built a table with the top 10 performers where *P<sub>n</sub>* means pair *n* within one of the 19 time windows. Please note that the first time window did not form any feasible pair to trade.

In [None]:
import pandas as pd

# Assemble results in a single data frame
for i in range(2,20):
  if i == 2:
    input_file = './data/formation-{}/overall_res.pkl'.format(i)
    res2 = pd.read_pickle(input_file)
    res2.insert(0, "Window", i)
    result2 = res2
  else:
    input_file = './data/formation-{}/overall_res.pkl'.format(i)
    res2 = pd.read_pickle(input_file)
    res2.insert(0, "Window", i)
    result2 = pd.concat([result2,res2])


In [None]:
# Sort assembled data frame in descending order 
result2.sort_values(by='Final', ascending=False).head(10)

Unnamed: 0,Window,Final,Max,Return,Gain,Loss,Win,Volatility,Sharpe,MDD,Longs,Shorts
P1,17,1315.7,1786.1,-0.0284468,855.849,-99.126,0.851881,1.39511,-0.0203904,-2.20763,201,215
P1,19,908.383,1129.56,0.365651,536.671,-160.706,0.843521,23.3189,0.0156804,-1.77965,180,199
P4,12,856.926,1110.15,0.0323,592.419,-24.4264,0.955329,1.42571,0.0226554,-2.55198,225,208
P1,11,788.957,1450.72,-0.000178269,877.838,-83.9156,0.931296,2.1605,-8.25129e-05,-2.37276,275,283
P7,19,746.86,994.52,0.00470036,679.96,-104.854,0.922936,0.278611,0.0168707,-3.55065,183,194
P3,19,719.735,787.89,0.00506807,300.463,,1.0,0.157451,0.0321883,-0.925896,52,51
P5,12,463.351,576.011,0.0181464,295.327,,0.998694,0.386011,0.0470099,-0.964366,192,177
P31,19,433.175,433.175,0.00214685,283.411,,1.0,0.0659996,0.0325283,-0.831894,53,54
P2,3,421.056,557.839,0.00412396,362.22,-46.8929,0.888454,1.55884,0.00264553,-3.11072,202,205
P1,4,408.974,424.804,-0.00178413,194.631,-5.96114,0.988506,0.214929,-0.00830105,-1.57874,63,62


### 7.3 Top 10 performers (with costs)

Also considering the final portfolio value as the main performance criterion, we built a table with the top 10 performers including trading costs of 50 bps (round-trip).

In [None]:
# Assemble results in a single data frame
for i in range(2,20):
  if i == 2:
    input_file = './data/formation-{}/overall_cos.pkl'.format(i)
    res = pd.read_pickle(input_file)
    res.insert(0, "Window", i)
    result = res
  else:
    input_file = './data/formation-{}/overall_cos.pkl'.format(i)
    res = pd.read_pickle(input_file)
    res.insert(0, "Window", i)
    result = pd.concat([result,res])

In [None]:
# Sort assembled data frame in descending order 
result.sort_values(by='Final', ascending=False).head(10)

Unnamed: 0,Window,Final,Max,Return,Gain,Loss,Win,Volatility,Sharpe,MDD,Longs,Shorts
P6,14,-144.997,15.9334,0.00203462,15.9334,-78.7583,0.000261233,0.0923,0.0220436,-12.877,16,16
P2,10,-230.065,19.526,0.000943963,19.526,-131.091,0.000261233,0.0491094,0.0192217,-12.7825,22,22
P2,9,-625.822,11.7884,0.000688309,11.7884,-403.673,0.000261233,0.0983535,0.00699831,-54.0878,18,18
P3,16,-841.273,4.40345,0.000956959,4.40345,-412.935,0.0023511,0.0826277,0.0115816,-192.049,187,181
P4,13,-909.947,-25.1497,0.00136927,,-487.908,0.0,0.0323354,0.0423458,-0.0,99,100
P2,17,-947.337,12.5525,0.000809454,12.3529,-474.005,0.0148903,0.131885,0.00613755,-76.4703,177,163
P7,14,-986.563,0.0,0.000915206,,-574.876,0.0,0.0281267,0.0325387,,20,20
P5,13,-1049.2,0.0,0.0014749,,-545.214,0.0,0.0329069,0.0448204,,141,142
P3,18,-1054.08,0.0,0.000917548,,-574.112,0.0,0.0225258,0.0407332,,23,23
P2,16,-1101.9,2.58027,-0.00207375,0.802005,-546.564,0.00130617,0.230803,-0.00898492,-430.386,392,392
