Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCN/GAT_INDUS MODEL #1476

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
Recent released features
| Feature | Status |
| -- | ------ |
|GCN/GATs_indus model|[📈 Released](https://github.com/microsoft/qlib/pull/1476) on Mar 24, 2023|
| Release Qlib v0.9.0 | :octocat: [Released](https://github.com/microsoft/qlib/releases/tag/v0.9.0) on Dec 9, 2022 |
| RL Learning Framework | :hammer: :chart_with_upwards_trend: Released on Nov 10, 2022. [#1332](https://github.com/microsoft/qlib/pull/1332), [#1322](https://github.com/microsoft/qlib/pull/1322), [#1316](https://github.com/microsoft/qlib/pull/1316),[#1299](https://github.com/microsoft/qlib/pull/1299),[#1263](https://github.com/microsoft/qlib/pull/1263), [#1244](https://github.com/microsoft/qlib/pull/1244), [#1169](https://github.com/microsoft/qlib/pull/1169), [#1125](https://github.com/microsoft/qlib/pull/1125), [#1076](https://github.com/microsoft/qlib/pull/1076)|
| HIST and IGMTF models | :chart_with_upwards_trend: [Released](https://github.com/microsoft/qlib/pull/1040) on Apr 10, 2022 |
Expand Down Expand Up @@ -355,7 +356,8 @@ Here is a list of models built on `Qlib`.
- [ADD based on pytorch (Hongshun Tang, et al.2020)](examples/benchmarks/ADD/)
- [IGMTF based on pytorch (Wentao Xu, et al.2021)](examples/benchmarks/IGMTF/)
- [HIST based on pytorch (Wentao Xu, et al.2021)](examples/benchmarks/HIST/)

- [GCN based on pytorch (N. Kipf, et al.2016)](examples/benchmarks/GCN/)

Your PR of new Quant models is highly welcomed.

The performance of each model on the `Alpha158` and `Alpha360` dataset can be found [here](examples/benchmarks/README.md).
Expand Down
3 changes: 2 additions & 1 deletion examples/benchmarks/GATs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@
* Graph Attention Networks(GATs) leverage masked self-attentional layers on graph-structured data. The nodes in stacked layers have different weights and they are able to attend over their
neighborhoods’ features, without requiring any kind of costly matrix operation (such as inversion) or depending on knowing the graph structure upfront.
* This code used in Qlib is implemented with PyTorch by ourselves.
* Paper: Graph Attention Networks https://arxiv.org/pdf/1710.10903.pdf
* Paper: Graph Attention Networks https://arxiv.org/pdf/1710.10903.pdf
`stalbe_ind.csv` contains industry information that each instrument belongs to, collected in 2008. If you want to run `GATs` with stocks connection, please run `workflow_config_gats_indus_{Dataset}.yaml`
102 changes: 102 additions & 0 deletions examples/benchmarks/GATs/workflow_config_gats_indus_Alpha158.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
qlib_init:
provider_uri: "~/.qlib/qlib_data/cn_data"
region: cn
market: &market csi300
benchmark: &benchmark SH000300
experiment_name: gats_add_ind_ts
data_handler_config: &data_handler_config
start_time: 2008-01-01
end_time: 2020-08-01
fit_start_time: 2008-01-01
fit_end_time: 2014-12-31
instruments: *market
infer_processors:
- class: FilterCol
kwargs:
fields_group: feature
col_list: ["RESI5", "WVMA5", "RSQR5", "KLEN", "RSQR10", "CORR5", "CORD5", "CORR10",
"ROC60", "RESI10", "VSTD5", "RSQR60", "CORR60", "WVMA60", "STD5",
"RSQR20", "CORD60", "CORD10", "CORR20", "KLOW"
]
- class: RobustZScoreNorm
kwargs:
fields_group: feature
clip_outlier: true
- class: Fillna
kwargs:
fields_group: feature
learn_processors:
- class: DropnaLabel
- class: CSRankNorm
kwargs:
fields_group: label
label: ["Ref($close, -2) / Ref($close, -1) - 1"]
port_analysis_config: &port_analysis_config
strategy:
class: TopkDropoutStrategy
module_path: qlib.contrib.strategy
kwargs:
signal:
- <MODEL>
- <DATASET>
topk: 50
n_drop: 5
backtest:
start_time: 2017-01-01
end_time: 2020-08-01
account: 100000000
benchmark: *benchmark
exchange_kwargs:
limit_threshold: 0.095
deal_price: close
open_cost: 0.0005
close_cost: 0.0015
min_cost: 5
task:
model:
class: GATs_ADD_IND
module_path: qlib.contrib.model.pytorch_gats_add_ind_ts
kwargs:
d_feat: 20
hidden_size: 64
num_layers: 2
dropout: 0.7
n_epochs: 200
lr: 0.00001
early_stop: 10
metric: loss
loss: mse
base_model: LSTM
model_path: "benchmarks/LSTM/csi300_lstm_ts.pkl"
GPU: 0
industrial_data_path: '~/stable_ind.csv'
industry_col: 'industry_citic'
smooth_perplexity: 1
dataset:
class: TSDatasetH
module_path: qlib.data.dataset
kwargs:
handler:
class: Alpha158
module_path: qlib.contrib.data.handler
kwargs: *data_handler_config
segments:
train: [2008-01-01, 2014-12-31]
valid: [2015-01-01, 2016-12-31]
test: [2017-01-01, 2020-08-01]
step_len: 20
record:
- class: SignalRecord
module_path: qlib.workflow.record_temp
kwargs:
model: <MODEL>
dataset: <DATASET>
- class: SigAnaRecord
module_path: qlib.workflow.record_temp
kwargs:
ana_long_short: False
ann_scaler: 252
- class: PortAnaRecord
module_path: qlib.workflow.record_temp
kwargs:
config: *port_analysis_config
92 changes: 92 additions & 0 deletions examples/benchmarks/GATs/workflow_config_gats_indus_Alpha360.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
qlib_init:
provider_uri: "~/.qlib/qlib_data/cn_data"
region: cn
market: &market csi300
benchmark: &benchmark SH000300
data_handler_config: &data_handler_config
start_time: 2008-01-01
end_time: 2020-08-01
fit_start_time: 2008-01-01
fit_end_time: 2014-12-31
instruments: *market
infer_processors:
- class: RobustZScoreNorm
kwargs:
fields_group: feature
clip_outlier: true
- class: Fillna
kwargs:
fields_group: feature
learn_processors:
- class: DropnaLabel
- class: CSRankNorm
kwargs:
fields_group: label
label: ["Ref($close, -2) / Ref($close, -1) - 1"]
port_analysis_config: &port_analysis_config
strategy:
class: TopkDropoutStrategy
module_path: qlib.contrib.strategy
kwargs:
signal:
- <MODEL>
- <DATASET>
topk: 50
n_drop: 5
backtest:
start_time: 2017-01-01
end_time: 2020-08-01
account: 100000000
benchmark: *benchmark
exchange_kwargs:
limit_threshold: 0.095
deal_price: close
open_cost: 0.0005
close_cost: 0.0015
min_cost: 5
task:
model:
class: GATs_ADD_IND
module_path: qlib.contrib.model.pytorch_gats_add_ind
kwargs:
d_feat: 6
hidden_size: 64
num_layers: 2
dropout: 0.7
n_epochs: 200
lr: 0.00001
early_stop: 20
metric: loss
loss: mse
base_model: LSTM
model_path: "benchmarks/LSTM/model_lstm_csi300.pkl"
GPU: 0
industrial_data_path: '~/stable_ind.csv'
industry_col: 'industry_citic'
#smooth_perplexity: 1
dataset:
class: DatasetH
module_path: qlib.data.dataset
kwargs:
class: Alpha360
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation in these lines looks off, and the run_all_model.py file is not able to load the configuration file correctly.

module_path: qlib.contrib.data.handler
kwargs: *data_handler_config
segments:
train: [2008-01-01, 2014-12-31]
valid: [2015-01-01, 2016-12-31]
test: [2017-01-01, 2020-08-01]
record:
- class: SignalRecord
module_path: qlib.workflow.record_temp
kwargs:
model: <MODEL>
dataset: <DATASET>
- class: SigAnaRecord
module_path: qlib.workflow.record_temp
kwargs:
ana_long_short: False
ann_scaler: 252
- class: PortAnaRecord
module_path: qlib.workflow.record_temp
kwargs:
config: *port_analysis_config
4 changes: 4 additions & 0 deletions examples/benchmarks/GCN/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# GCN
* The costing matrix is based on `stable_ind.csv`, which contains industry information that each instrument belongs to, collected in 2008.
* This code used in Qlib is implemented with PyTorch by ourselves.
* Paper: Graph Convolutional Networks https://arxiv.org/pdf/1609.02907.pdf
4 changes: 4 additions & 0 deletions examples/benchmarks/GCN/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
pandas==1.1.2
numpy==1.21.0
scikit_learn==0.23.2
torch==1.7.0
102 changes: 102 additions & 0 deletions examples/benchmarks/GCN/workflow_config_gcn_Alpha158.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
qlib_init:
provider_uri: "~/.qlib/qlib_data/cn_data"
region: cn
market: &market csi300
benchmark: &benchmark SH000300
experiment_name: workflow
data_handler_config: &data_handler_config
start_time: 2008-01-01
end_time: 2020-08-01
fit_start_time: 2008-01-01
fit_end_time: 2014-12-31
instruments: *market
infer_processors:
- class: FilterCol
kwargs:
fields_group: feature
col_list: ["RESI5", "WVMA5", "RSQR5", "KLEN", "RSQR10", "CORR5", "CORD5", "CORR10",
"ROC60", "RESI10", "VSTD5", "RSQR60", "CORR60", "WVMA60", "STD5",
"RSQR20", "CORD60", "CORD10", "CORR20", "KLOW"
]
- class: RobustZScoreNorm
kwargs:
fields_group: feature
clip_outlier: true
- class: Fillna
kwargs:
fields_group: feature
learn_processors:
- class: DropnaLabel
- class: CSRankNorm
kwargs:
fields_group: label
label: ["Ref($close, -2) / Ref($close, -1) - 1"]
port_analysis_config: &port_analysis_config
strategy:
class: TopkDropoutStrategy
module_path: qlib.contrib.strategy
kwargs:
signal:
- <MODEL>
- <DATASET>
topk: 50
n_drop: 5
backtest:
start_time: 2017-01-01
end_time: 2020-08-01
account: 100000000
benchmark: *benchmark
exchange_kwargs:
limit_threshold: 0.095
deal_price: close
open_cost: 0.0005
close_cost: 0.0015
min_cost: 5
task:
model:
class: GCN
module_path: qlib.contrib.model.pytorch_gcn_ts
kwargs:
d_feat: 20
hidden_size: 64
num_layers: 2
dropout: 0.5
n_epochs: 200
lr: 5e-5
early_stop: 20
metric: loss
loss: mse
base_model: LSTM
model_path: "benchmarks/LSTM/csi300_lstm_ts.pkl"
GPU: 0
industrial_data_path: '~/stable_ind.csv'
industry_col: 'industry_citic'
adjacent_coef: 0.01
dataset:
class: TSDatasetH
module_path: qlib.data.dataset
kwargs:
handler:
class: Alpha158
module_path: qlib.contrib.data.handler
kwargs: *data_handler_config
segments:
train: [2008-01-01, 2014-12-31]
valid: [2015-01-01, 2016-12-31]
test: [2017-01-01, 2020-08-01]
step_len: 20
record:
- class: SignalRecord
module_path: qlib.workflow.record_temp
kwargs:
model: <MODEL>
dataset: <DATASET>
- class: SigAnaRecord
module_path: qlib.workflow.record_temp
kwargs:
ana_long_short: False
ann_scaler: 252
- class: PortAnaRecord
module_path: qlib.workflow.record_temp
kwargs:
config: *port_analysis_config
Loading