support optimization based strategy (#754)

* support optimization based strategy * fix riskdata not found & update doc * refactor signal_strategy * add portfolio example * Update examples/portfolio/prepare_riskdata.py Co-authored-by: you-n-g <you-n-g@users.noreply.github.com> * fix typo Co-authored-by: you-n-g <you-n-g@users.noreply.github.com> * fix typo Co-authored-by: you-n-g <you-n-g@users.noreply.github.com> * update doc * fix riskmodel doc Co-authored-by: you-n-g <you-n-g@users.noreply.github.com> Co-authored-by: you-n-g <you-n-g@users.noreply.github.com>
microsoft · Dec 28, 2021 · 1b8f0b4 · 1b8f0b4
1 parent 4709909
commit 1b8f0b4
Show file tree

Hide file tree

Showing 14 changed files with 667 additions and 261 deletions.
diff --git a/docs/component/strategy.rst b/docs/component/strategy.rst
@@ -8,7 +8,7 @@ Portfolio Strategy: Portfolio Management
 Introduction
 ===================
 
-``Portfolio Strategy`` is designed to adopt different portfolio strategies, which means that users can adopt different algorithms to generate investment portfolios based on the prediction scores of the ``Forecast Model``. Users can use the ``Portfolio Strategy`` in an automatic workflow by ``Workflow`` module, please refer to `Workflow: Workflow Management <workflow.html>`_.  
+``Portfolio Strategy`` is designed to adopt different portfolio strategies, which means that users can adopt different algorithms to generate investment portfolios based on the prediction scores of the ``Forecast Model``. Users can use the ``Portfolio Strategy`` in an automatic workflow by ``Workflow`` module, please refer to `Workflow: Workflow Management <workflow.html>`_.
 
 Because the components in ``Qlib`` are designed in a loosely-coupled way, ``Portfolio Strategy`` can be used as an independent module also.
 
@@ -28,14 +28,14 @@ Qlib provides a base class ``qlib.contrib.strategy.BaseStrategy``. All strategy
     Return the proportion of your total value you will use in investment. Dynamically risk_degree will result in Market timing.
 
 - `generate_order_list`
-    Return the order list. 
+    Return the order list.
 
 Users can inherit `BaseStrategy` to customize their strategy class.
 
 WeightStrategyBase
 --------------------
 
-Qlib also provides a class ``qlib.contrib.strategy.WeightStrategyBase`` that is a subclass of `BaseStrategy`. 
+Qlib also provides a class ``qlib.contrib.strategy.WeightStrategyBase`` that is a subclass of `BaseStrategy`.
 
 `WeightStrategyBase` only focuses on the target positions, and automatically generates an order list based on positions. It provides the `generate_target_weight_position` interface.
 
@@ -71,17 +71,27 @@ TopkDropoutStrategy
 
         - `Topk`: The number of stocks held
         - `Drop`: The number of stocks sold on each trading day
-        
+
         Currently, the number of held stocks is `Topk`.
         On each trading day, the `Drop` number of held stocks with the worst `prediction score` will be sold, and the same number of unheld stocks with the best `prediction score` will be bought.
-        
+
         .. image:: ../_static/img/topk_drop.png
             :alt: Topk-Drop
 
         ``TopkDrop`` algorithm sells `Drop` stocks every trading day, which guarantees a fixed turnover rate.
-        
+
 - Generate the order list from the target amount
 
+EnhancedIndexingStrategy
+------------------------
+`EnhancedIndexingStrategy` Enhanced indexing combines the arts of active management and passive management,
+with the aim of outperforming a benchmark index (e.g., S&P 500) in terms of portfolio return while controlling
+the risk exposure (a.k.a. tracking error).
+
+For more information, please refer to `qlib.contrib.strategy.signal_strategy.EnhancedIndexingStrategy`
+and `qlib.contrib.strategy.optimizer.enhanced_indexing.EnhancedIndexingOptimizer`.
+
+
 Usage & Example
 ====================
 

diff --git a/examples/portfolio/README.md b/examples/portfolio/README.md
@@ -0,0 +1,46 @@
+# Portfolio Optimization Strategy
+
+## Introduction
+
+In `qlib/examples/benchmarks` we have various **alpha** models that predict
+the stock returns. We also use a simple rule based `TopkDropoutStrategy` to
+evaluate the investing performance of these models. However, such a strategy
+is too simple to control the portfolio risk like correlation and volatility.
+
+To this end, an optimization based strategy should be used to for the
+trade-off between return and risk. In this doc, we will show how to use
+`EnhancedIndexingStrategy` to maximize portfolio return while minimizing
+tracking error relative to a benchmark.
+
+
+## Preparation
+
+We use China stock market data for our example.
+
+1. Prepare CSI300 weight:
+
+   ```bash
+   wget http://fintech.msra.cn/stock_data/downloads/csi300_weight.zip
+   unzip -d ~/.qlib/qlib_data/cn_data csi300_weight.zip
+   rm -f csi300_weight.zip
+   ```
+
+2. Prepare risk model data:
+
+   ```bash
+   python prepare_riskdata.py
+   ```
+
+Here we use a **Statistical Risk Model** implemented in `qlib.model.riskmodel`.
+However users are strongly recommended to use other risk models for better quality:
+* **Fundamental Risk Model** like MSCI BARRA
+* [Deep Risk Model](https://arxiv.org/abs/2107.05201)
+
+
+## End-to-End Workflow
+
+You can finish workflow with `EnhancedIndexingStrategy` by running
+`qrun config_enhanced_indexing.yaml`.
+
+In this config, we mainly changed the strategy section compared to
+`qlib/examples/benchmarks/workflow_config_lightgbm_Alpha158.yaml`.
diff --git a/examples/portfolio/config_enhanced_indexing.yaml b/examples/portfolio/config_enhanced_indexing.yaml
@@ -0,0 +1,71 @@
+qlib_init:
+    provider_uri: "~/.qlib/qlib_data/cn_data"
+    region: cn
+market: &market csi300
+benchmark: &benchmark SH000300
+data_handler_config: &data_handler_config
+    start_time: 2008-01-01
+    end_time: 2020-08-01
+    fit_start_time: 2008-01-01
+    fit_end_time: 2014-12-31
+    instruments: *market
+port_analysis_config: &port_analysis_config
+    strategy:
+        class: EnhancedIndexingStrategy
+        module_path: qlib.contrib.strategy
+        kwargs:
+            model: <MODEL>
+            dataset: <DATASET>
+            riskmodel_root: ./riskdata
+    backtest:
+        start_time: 2017-01-01
+        end_time: 2020-08-01
+        account: 100000000
+        benchmark: *benchmark
+        exchange_kwargs:
+            limit_threshold: 0.095
+            deal_price: close
+            open_cost: 0.0005
+            close_cost: 0.0015
+            min_cost: 5
+task:
+    model:
+        class: LGBModel
+        module_path: qlib.contrib.model.gbdt
+        kwargs:
+            loss: mse
+            colsample_bytree: 0.8879
+            learning_rate: 0.2
+            subsample: 0.8789
+            lambda_l1: 205.6999
+            lambda_l2: 580.9768
+            max_depth: 8
+            num_leaves: 210
+            num_threads: 20
+    dataset:
+        class: DatasetH
+        module_path: qlib.data.dataset
+        kwargs:
+            handler:
+                class: Alpha158
+                module_path: qlib.contrib.data.handler
+                kwargs: *data_handler_config
+            segments:
+                train: [2008-01-01, 2014-12-31]
+                valid: [2015-01-01, 2016-12-31]
+                test: [2017-01-01, 2020-08-01]
+    record:
+        - class: SignalRecord
+          module_path: qlib.workflow.record_temp
+          kwargs:
+            model: <MODEL>
+            dataset: <DATASET>
+        - class: SigAnaRecord
+          module_path: qlib.workflow.record_temp
+          kwargs:
+            ana_long_short: False
+            ann_scaler: 252
+        - class: PortAnaRecord
+          module_path: qlib.workflow.record_temp
+          kwargs:
+            config: *port_analysis_config
diff --git a/examples/portfolio/prepare_riskdata.py b/examples/portfolio/prepare_riskdata.py
@@ -0,0 +1,55 @@
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+import os
+import numpy as np
+import pandas as pd
+
+from qlib.data import D
+from qlib.model.riskmodel import StructuredCovEstimator
+
+
+def prepare_data(riskdata_root="./riskdata", T=240, start_time="2016-01-01"):
+
+    universe = D.features(D.instruments("csi300"), ["$close"], start_time=start_time).swaplevel().sort_index()
+
+    price_all = (
+        D.features(D.instruments("all"), ["$close"], start_time=start_time).squeeze().unstack(level="instrument")
+    )
+
+    # StructuredCovEstimator is a statistical risk model
+    riskmodel = StructuredCovEstimator()
+
+    for i in range(T - 1, len(price_all)):
+
+        date = price_all.index[i]
+        ref_date = price_all.index[i - T + 1]
+
+        print(date)
+
+        codes = universe.loc[date].index
+        price = price_all.loc[ref_date:date, codes]
+
+        # calculate return and remove extreme return
+        ret = price.pct_change()
+        ret.clip(ret.quantile(0.025), ret.quantile(0.975), axis=1, inplace=True)
+
+        # run risk model
+        F, cov_b, var_u = riskmodel.predict(ret, is_price=False, return_decomposed_components=True)
+
+        # save risk data
+        root = riskdata_root + "/" + date.strftime("%Y%m%d")
+        os.makedirs(root, exist_ok=True)
+
+        pd.DataFrame(F, index=codes).to_pickle(root + "/factor_exp.pkl")
+        pd.DataFrame(cov_b).to_pickle(root + "/factor_cov.pkl")
+        # for specific_risk we follow the convention to save volatility
+        pd.Series(np.sqrt(var_u), index=codes).to_pickle(root + "/specific_risk.pkl")
+
+
+if __name__ == "__main__":
+
+    import qlib
+
+    qlib.init(provider_uri="~/.qlib/qlib_data/cn_data")
+
+    prepare_data()
diff --git a/qlib/contrib/strategy/__init__.py b/qlib/contrib/strategy/__init__.py
@@ -5,6 +5,7 @@
 from .signal_strategy import (
     TopkDropoutStrategy,
     WeightStrategyBase,
+    EnhancedIndexingStrategy,
 )
 
 from .rule_strategy import (

diff --git a/qlib/portfolio/optimizer/__init__.py → qlib/contrib/strategy/optimizer/__init__.py b/qlib/portfolio/optimizer/__init__.py → qlib/contrib/strategy/optimizer/__init__.py
diff --git a/qlib/portfolio/optimizer/base.py → qlib/contrib/strategy/optimizer/base.py b/qlib/portfolio/optimizer/base.py → qlib/contrib/strategy/optimizer/base.py