Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimum Profit Optimization #19

Merged
merged 61 commits into from
Dec 9, 2020
Merged
Show file tree
Hide file tree
Changes from 55 commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
15a788d
[Draft] Minimum Profit Optimization
Stochastic-Adventure Nov 17, 2020
826aaa1
Changes to unit test: Use AlmostEqual for float comparison
Stochastic-Adventure Nov 17, 2020
4f9e57b
Unit tests timed out. Removed the time-consuming tests to see if they…
Stochastic-Adventure Nov 17, 2020
9ccfc28
Check which tests were timed out in Circle CI.
Stochastic-Adventure Nov 17, 2020
beaa604
Change the test to a less time-consuming one.
Stochastic-Adventure Nov 17, 2020
df3b896
Reduce the test for Optimize() to 1.
Stochastic-Adventure Nov 17, 2020
47fd811
Remove unnecessary exception handling.
Stochastic-Adventure Nov 18, 2020
7dad7e3
Fix file structure Minimum Profit Strategy
PanPip Nov 18, 2020
6729be5
Pylint and import fixes Minimum Profit Strategy
PanPip Nov 18, 2020
1e16bfe
Added unit test for split_dataset in Utils
PanPip Nov 18, 2020
bbfb77d
Resolved the comments and added unit test for statistics.
Stochastic-Adventure Nov 19, 2020
65233c1
Import optimization for pylint.
Stochastic-Adventure Nov 19, 2020
8b6bdc6
Test file for a test where the pairs are not cointegrated.
Stochastic-Adventure Nov 19, 2020
94a68df
Removed diagnostic print message and a unit test case.
Stochastic-Adventure Nov 19, 2020
7bb0406
Bug fixes. Forgot 'iloc'
Stochastic-Adventure Nov 19, 2020
3ade4e8
Added docs, and corner case unit tests.
Stochastic-Adventure Nov 20, 2020
2a4e9ac
Merge branch 'minimum_profit' of https://github.com/hudson-and-thames…
Stochastic-Adventure Nov 20, 2020
aa8ee88
Added trading simulation component
Stochastic-Adventure Nov 23, 2020
a1704a8
Merge branch 'minimum_profit' of https://github.com/hudson-and-thames…
Stochastic-Adventure Nov 23, 2020
b211672
Update requirements.txt
Stochastic-Adventure Nov 23, 2020
e9720cc
Coverage fix.
Stochastic-Adventure Nov 23, 2020
cdb2ae5
Fixed a bug.
Stochastic-Adventure Nov 23, 2020
24f40ce
Fix leaking warnings and error prints in Minimum Profit
PanPip Nov 24, 2020
b7100ee
Make progress bar work without tqdm package Minimum Profit
PanPip Nov 24, 2020
4f571fa
New sphinx documentation.
Stochastic-Adventure Nov 24, 2020
f95a4c9
Merge branch 'minimum_profit' of https://github.com/hudson-and-thames…
Stochastic-Adventure Nov 24, 2020
dbb0f34
Adapted Illya's fix.
Stochastic-Adventure Nov 24, 2020
15cf5d6
Tqdm fix.
Stochastic-Adventure Nov 24, 2020
b3eb777
Update minimum_profit.py
Stochastic-Adventure Nov 24, 2020
cdde1a7
Hotfix: Doc typesetting typos.
Stochastic-Adventure Nov 24, 2020
5b8f127
Small code fixes for Minimum Profit - Cointegration Approach
PanPip Nov 25, 2020
c9effd9
Small docs fixes for Minimum Profit - Cointegration Approach
PanPip Nov 25, 2020
3b5df5e
Small test fixes for Minimum Profit - Cointegration Approach
PanPip Nov 25, 2020
68576a8
Delete minimum_profit_simulation.py
Stochastic-Adventure Nov 25, 2020
391a651
Try to suppress warning using the statsmodels.ARIMA.
Stochastic-Adventure Nov 26, 2020
4d46872
Keep suppressing the warnings.
Stochastic-Adventure Nov 26, 2020
f0d6c82
Remove the diagnostic console output.
Stochastic-Adventure Nov 26, 2020
8b26d7c
Remove console output.
Stochastic-Adventure Nov 26, 2020
f38c3e2
Fixed all mentions of ”we".
Stochastic-Adventure Nov 26, 2020
3243836
Coverage Fix.
Stochastic-Adventure Nov 26, 2020
2491752
Update trading_simulation.py
Stochastic-Adventure Nov 26, 2020
a1341d4
Update trading_simulation.py
Stochastic-Adventure Nov 26, 2020
c23ef22
Pickle causing RecursionError. Force the data into a list to retrieve…
Stochastic-Adventure Nov 26, 2020
66fd9b0
Weird bug when calling shape of DataFrame. Tentative fix.
Stochastic-Adventure Nov 26, 2020
d155de8
Temporary disable pylint error to see what exactly caused the error.
Stochastic-Adventure Nov 26, 2020
7097f45
Coverage fix. No pickle files for trading signal.
Stochastic-Adventure Nov 26, 2020
2b635b6
Final edits and removing trading simulation from the build.
Stochastic-Adventure Nov 26, 2020
28502a6
No more testing the trading simulation module.
Stochastic-Adventure Nov 26, 2020
9f30862
Merge branch 'develop' into minimum_profit
PanPip Nov 26, 2020
28d19ef
Small style fixes for Minimum Profit - Cointegration Approach
PanPip Nov 26, 2020
bb59550
Cut build time in half.
Stochastic-Adventure Nov 27, 2020
a140fe0
Passing pylint.
Stochastic-Adventure Nov 27, 2020
3b8d32a
Update test_minimum_profit.py
Stochastic-Adventure Nov 27, 2020
d50242a
Small code and test fixes for Minimum Profit - Cointegration Approach
PanPip Nov 27, 2020
c67e29d
Small docs fixes for Minimum Profit - Cointegration Approach
PanPip Nov 27, 2020
63b93e6
Small fixes for Minimum Profit - Cointegration Approach
PanPip Nov 27, 2020
c548ec0
Update bullets in docs
Jackal08 Dec 7, 2020
11e1c89
Changelog and license fix.
Stochastic-Adventure Dec 7, 2020
bc9d7d7
Update changelog.rst
Stochastic-Adventure Dec 7, 2020
506f43a
Small typo fixes for Minimum Profit - Cointegration Approach
PanPip Dec 9, 2020
5713c56
Update Notebook links Minimum Profit - Cointegration Approach
PanPip Dec 9, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions arbitragelab/cointegration_approach/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,5 @@
from arbitragelab.cointegration_approach.engle_granger import EngleGrangerPortfolio
from arbitragelab.cointegration_approach.signals import (get_half_life_of_mean_reversion,
bollinger_bands_trading_strategy, linear_trading_strategy)
from arbitragelab.cointegration_approach.minimum_profit import MinimumProfit
from arbitragelab.cointegration_approach.coint_sim import CointegrationSimulation
410 changes: 410 additions & 0 deletions arbitragelab/cointegration_approach/coint_sim.py

Large diffs are not rendered by default.

388 changes: 388 additions & 0 deletions arbitragelab/cointegration_approach/minimum_profit.py

Large diffs are not rendered by default.

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
290 changes: 290 additions & 0 deletions docs/source/cointegration_approach/minimum_profit.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,290 @@
.. _cointegration_approach-minimum_profit:

.. note::
The following documentation closely follows two papers:

- `Loss protection in pairs trading through minimum profit bounds: a cointegration approach <http://downloads.hindawi.com/archive/2006/073803.pdf>`__ by Lin, Y.-X., McCrae, M., and Gulati, C. (2006)
- `Finding the optimal pre-set boundaries for pairs trading strategy based on cointegration technique <https://ro.uow.edu.au/cgi/viewcontent.cgi?article=1040&context=cssmwp>`__ by Puspaningrum, H., Lin, Y.-X., and Gulati, C. M. (2010)

===========================
Minimum Profit Optimization
===========================

Introduction
############

A common pairs trading strategy is to "fade the spread", i.e. to open a trade when the spread is sufficiently far away
from its equilibrium in anticipation of the spread reverting to the mean. Within the context of cointegration, the
spread refers to cointegration error, and in the remainder of this documentation "spread" and "cointegration error" will
be used interchangeably.

In order to define a strategy, the concept of "sufficiently far away from the equilibrium of the spread", i.e. a pre-set
boundary chosen to open a trade, needs to be clearly defined. The boundary can affect the minimum total profit (MTP)
over a specific trading horizon. The higher the pre-set boundary for opening trades, the higher the profit per trade
but the lower the trade numbers. The opposite applies to lowering the boundary values. The number of trades over a
specified trading horizon is determined jointly by the average trade duration and the average inter-trade interval.

This module is designed to find the optimal pre-set boundary that would maximize the MTP for cointegration error
following an AR(1) process by numerically estimating the average trade duration, average inter-trade interval, and the
average number of trades based on the mean first-passage time.

In this strategy, the following assumptions are made:

- The price of two assets (:math:`S_1` and :math:`S_2`) are cointegrated over the relevant time period, which includes both in-sample and out-of-sample (trading) period.
- The cointegration error follows a stationary AR(1) process.
- The cointegration error is symmetrically distributed so that the optimal boundary could be applied on both sides of the mean.
- Short sales are permitted or possible through a broker and there is no interest charged for the short sales and no cost for trading.
- The cointegration coefficient :math:`\beta > 0`, where a cointegration relationship is defined as:

.. math::

P_{S_1,t} - \beta P_{S_2,t} = \varepsilon_t

In the following sections, as originally shown in the paper, the derivation of the minimum profit per trade and the mean
first-passage time of a stationary AR(1) process is presented.

Minimum Profit per Trade
########################

Denote a trade opened when the cointegration error :math:`\varepsilon_t` overshoots the pre-set upper boundary :math:`U`
as a **U-trade**, and similarly a trade opened when :math:`\varepsilon_t` falls through the pre-set lower
boundary :math:`L` as an **L-trade**. Without loss of generality, it can be assumed that the mean
of :math:`\varepsilon_t` equals 0. Then the minimum profit per U-trade can be derived from the following trade setup.

- When :math:`\varepsilon_t \geq U` at :math:`t_o`, open a trade by selling :math:`N` of asset :math:`S_1` and buying :math:`\beta N` of asset :math:`S_2`.
- When :math:`\varepsilon_t \leq 0` at :math:`t_c`, close the trade.

The profit per trade would thus be:

.. math::

P = N (P_{S_1, t_o} - P_{S_1, t_c}) + \beta N (P_{S_2, t_c} - P_{S_2, t_o})

Since the two assets are cointegrated during the trade period, the cointegration relationship can be substituted into
the above equation and derive the following:

.. math::
:nowrap:

\begin{align*}
P & = N (P_{S_1, t_o} - P_{S_1, t_c}) + \beta N (P_{S_2, t_c} - P_{S_2, t_o}) \\
& = N (\beta P_{S_2, t_c} - P_{S_1, t_c}) + N (P_{S_1, t_o} - \beta P_{S_2, t_o}) \\
& = -N \varepsilon_{t_c} + N \varepsilon_{t_o} \\
& \geq N U
\end{align*}

Thus, by trading the asset pair with the weight as a proportion of the cointegration coefficient, the profit per U-trade
is at least :math:`U` dollars when trading one unit of the pair. Should the required minimum profit be higher, then the
strategy can trade multiple units of the pair weighted by the cointegration coefficient.

According to the assumptions in the Introduction section, the lower boundary will be set at :math:`-U` due to the
symmetric distribution of the cointegration error. The profit of an L-trade can thus be derived from the following trade
setup.

- When :math:`\varepsilon_t \leq -U` at :math:`t_o`, open a trade by buying :math:`N` of asset :math:`S_1` and selling :math:`\beta N` of asset :math:`S_2`.
- When :math:`\varepsilon_t \geq 0` at :math:`t_c`, square the trade.

Using the same derivation above, it can be shown that the profit per L-trade is also at least :math:`U` dollars per unit.
Therefore, the boundary is exactly the minimum profit per trade, where the strategy only trade one unit of the
cointegrated pair weighted by the cointegration coefficient.

.. figure:: images/AME-DOV.png
:width: 100 %
:align: center

An example of pair trading Ametek Inc. (AME) and Dover Corp. (DOV) from January 2nd, 2019 to date. The green line defines the boundary for U-trades and the red line defines the boundary for L-trades. They equally deviate from the cointegration error mean (the black line).

Mean First-passage Time of an AR(1) Process
###########################################

Consider a stationary AR(1) process:

.. math::

Y_t = \phi Y_{t-1} + \xi_t

where :math:`-1 < \phi < 1`, and :math:`\xi_t \sim N(0, \sigma_{\xi}^2) \quad \mathrm{i.i.d}`. The mean first-passage
time over interval :math:`\lbrack a, b \rbrack` of :math:`Y_t`, starting at initial state
:math:`y_0 \in \lbrack a, b \rbrack`, which is denoted by :math:`E(\mathcal{T}_{a,b}(y_0))`, is given by

.. math::

E(\mathcal{T}_{a,b}(y_0)) = \frac{1}{\sqrt{2 \pi}\sigma_{\xi}}\int_a^b E(\mathcal{T}_{a,b}(u)) \> \mathrm{exp} \Big( - \frac{(u-\phi y_0)^2}{2 \sigma_{\xi}^2} \Big) du + 1

This integral equation can be solved numerically using the Nyström method, i.e. by solving the following linear
equations:

.. math::

\begin{pmatrix}
1 - K(u_0, u_0) & -K(u_0, u_1) & \ldots & -K(u_0, u_n) \\
-K(u_1, u_0) & 1 - K(u_1, u_1) & \ldots & -K(u_1, u_n) \\
\vdots & \vdots & \vdots & \vdots \\
-K(u_n, u_0) & -K(u_n, u_1) & \ldots & 1-K(u_n, u_n)
\end{pmatrix}
\begin{pmatrix}
E_n(\mathcal{T}_{a,b}(u_0)) \\
E_n(\mathcal{T}_{a,b}(u_1)) \\
\vdots \\
E_n(\mathcal{T}_{a,b}(u_n)) \\
\end{pmatrix}
=
\begin{pmatrix}
1 \\
1 \\
\vdots \\
1 \\
\end{pmatrix}

where :math:`E_n(\mathcal{T}_{a,b}(u_0))` is a discretized estimate of the integral, and the Gaussian kernel function
:math:`K(u_i, u_j)` is defined as:

.. math::

K(u_i, u_j) = \frac{h}{2 \sqrt{2 \pi} \sigma_{\xi}} w_j \> \mathrm{exp} \Big( - \frac{(u_j - \phi u_i)^2}{2 \sigma_{\xi}^2} \Big)

and the weight :math:`w_j` is defined by the trapezoid integration rule:

.. math::

w_j = \begin{cases}
1 & j = 0 \quad \mathrm{and} \quad j = n \\
2 & 0 < j < n, j \in \mathbb{N}
\end{cases}

The time complexity for solving the above linear equation system is :math:`O(n^3)` (see `here <https://www.netlib.org/lapack/lug/node71.html>`__
for an introduction of the time complexity of :code:`numpy.linalg.solve`), which is the most time-consuming part of this
procedure.

Minimum Total Profit (MTP)
##########################

The MTP of U-trades within a specific trading horizon :math:`\lbrack 0, T \rbrack` is defined by:

.. math::

MTP(U) = \Big( \frac{T}{TD_U + I_U} - 1 \Big) U

where :math:`TD_U` is the trade duration and :math:`I_U` is the inter-trade interval.

From the definition, the MTP is simultaneously determined by :math:`TD_U` and :math:`I_U`, both of which can be derived
from the mean first-passage time. Also, it is already known that :math:`U` is the minimum profit per U-trade,
so :math:`\frac{T}{TD_U + I_U} - 1` can be used to estimate the number of U-trades. Following the assumption that the
de-meaned cointegration error follows an AR(1) process:

.. math::

\varepsilon_t = \phi \varepsilon{t-1} + a_t \qquad a_t \sim N(0, \sigma_a^2) \> \mathrm{i.i.d}

Since the core idea of the approach is to "fade the spread" at :math:`U`, the trade duration can be defined
as the average time of the cointegration error to pass 0 for the first time given that its initial value
is :math:`U`. Thus using the definition of the mean first-passage time of the cointegration error process:

.. math::

TD_U = E(\mathcal{T}_{0, \infty}(U)) = \lim_{b \to \infty} \frac{1}{\sqrt{2 \pi} \sigma_a} \int_0^b E(\mathcal{T}_{0, b}(s)) \> \mathrm{exp} \Big( - \frac{(s- \phi U)^2}{2 \sigma_a^2} \Big) ds + 1

The inter-trade interval is defined as the average time of the de-meaned cointegration error to pass :math:`U` the first
time given its initial value is 0.

.. math::

I_U = E(\mathcal{T}_{- \infty, U}(0)) = \lim_{-b \to - \infty} \frac{1}{\sqrt{2 \pi} \sigma_a} \int_{-b}^U E(\mathcal{T}_{-b, U}(s)) \> \mathrm{exp} \Big( - \frac{s^2}{2 \sigma_a^2} \Big) ds + 1

Under the assumption that the cointegration error follows a stationary AR(1) process, the standard deviation of the
fitted residual :math:`\sigma_a` and the standard deviation of the cointegration error :math:`\sigma_{\varepsilon}` has
the following relationship:

.. math::

\sigma_a = \sqrt{1 - \phi^2} \sigma_{\varepsilon}

The following stylized fact helped approximate the infinity limit for both integrals: for a stationary AR(1) process
:math:`\{ \varepsilon_t \}`, the probability that the absolute value of the process :math:`\vert \varepsilon_t \vert` is
greater than 5 times the standard deviation of the process :math:`5 \sigma_{\varepsilon}` is close to 0. Therefore,
:math:`5 \sigma_{\varepsilon}` will be used as an approximation of the infinity limit in the integrals.

Optimize the Pre-Set Boundaries that Maximizes MTP
##################################################

Based on the above definitions, the numerical algorithm to optimize the pre-set boundaries that maximize MTP could be
given as follows.

1. Perform Engle-Granger or Johansen test (see :ref:`here <cointegration_approach-cointegration_tests>`) to derive the cointegration coefficient :math:`\beta`.
2. Fit the cointegration error :math:`\varepsilon_t` to an AR(1) process and retrieve the AR(1) coefficient and the fitted residual.
3. Calculate the standard deviation of cointegration error (:math:`\sigma_{\varepsilon}`) and the fitted residual (:math:`\sigma_a`).
4. Generate a sequence of pre-set upper bounds :math:`U_i`, where :math:`U_i = i \times 0.01, \> i = 0, \ldots, b/0.01`, and :math:`b = 5 \sigma_{\varepsilon}`.
5. For each :math:`U_i`,

a. Calculate :math:`{TD}_{U_i}`.
b. Calculate :math:`I_{U_i}`. *Note: this is the main bottleneck of the optimization speed.*
c. Calculate :math:`MTP(U_i)`.

6. Find :math:`U^{*}` such that :math:`MTP(U^{*})` is the maximum.
7. Set a desired minimum profit :math:`K \geq U^{*}` and calculate the number of assets to trade according to the following equations:

.. math::

N_{S_2} = \Big \lceil \frac{K \beta}{U^{*}} \Big \rceil

N_{S_1} = \Big \lceil \frac{N_{S_2}}{\beta} \Big \rceil

Implementation
**************

.. automodule:: arbitragelab.cointegration_approach.minimum_profit

.. autoclass:: MinimumProfit
:members:
:inherited-members:

.. automethod:: __init__

Example
*******

.. code-block::

# Importing packages
import pandas as pd
from arbitragelab.cointegration_approach.minimum_profit import MinimumProfit

# Read price series data, set date as index
data = pd.read_csv('X_FILE_PATH.csv', parse_dates=['Date'])
data.set_index('Date', inplace=True)

# Initialize the optimizer for this data
optimizer = MinimumProfit(data)

# Split the data into train and trade set
train_df, trade_df = optimizer.train_test_split(date_cutoff=pd.Timestamp(2019, 1, 1))

# Run an Engle-Granger test to retrieve cointegration coefficient
beta_eg, epsilon_t_eg, ar_coeff_eg, ar_resid_eg = optimizer.fit(use_johansen=False)

# Optimize the pre-set boundaries, retrieve optimal upper bound, optimal minimum total profit,
# and number of trades.
optimal_ub, _, _, optimal_mtp, optimal_num_of_trades = optimizer.optimize(ar_coeff_eg,
epsilon_t_eg,
ar_resid_eg,
len(train_df))

# Generate trading signals based on these optimized parameters
minimum_profit = 100.
trade_signals, num_of_shares, cond_values = optimizer.trade_signal(optimal_ub,
minimum_profit,
beta_eg,
epsilon_t_eg)

Research Notebooks
##################

* `Minimum Profit Optimization`_

.. _`Minimum Profit Optimization`: https://github.com/Hudson-and-Thames-Clients/research/blob/master/Statistical%20Arbitrage/mean_reversion.ipynb

References
##########

* `Lin, Y.-X., McCrae, M., and Gulati, C., 2006. Loss protection in pairs trading through minimum profit bounds: a cointegration approach <http://downloads.hindawi.com/archive/2006/073803.pdf>`_
* `Puspaningrum, H., Lin, Y.-X., and Gulati, C. M. 2010. Finding the optimal pre-set boundaries for pairs trading strategy based on cointegration technique <https://ro.uow.edu.au/cgi/viewcontent.cgi?article=1040&context=cssmwp>`_