# tutorials

* [latest docs](https://qlib.readthedocs.io/en/latest/start/getdata.html)

## Building Formulaic Alphas
* https://qlib.readthedocs.io/en/latest/advanced/alpha.html

## Supported operators 
* https://github.com/microsoft/qlib/blob/main/qlib/data/ops.py

## Data pipeline notes

### Loader
* Data Loader in Qlib is designed to load raw data from the original data source. It will be loaded and used in the Data Handler module.
  * QlibDataLoader: The QlibDataLoader class in Qlib is such an interface that allows users to load raw data from the Qlib data source. 
  * StaticDataLoader: The StaticDataLoader class in Qlib is such an interface that allows users to load raw data from file or as provided.

### Handler
* designed to handler those common data processing methods which will be used by most of the models.
* DataHandlerLP: have some ***learnable Processors*** which can learn the parameters of data processing(e.g., parameters for zscore normalization).
  * DK_R / self._data: the raw data loaded from the loader
  * DK_I / self._infer: the data processed for inference
  * DK_L / self._learn: the data processed for learning model.

#### Processor
* DropnaProcessor, RobustZScoreNorm: https://qlib.readthedocs.io/en/latest/reference/api.html#module-qlib.data.dataset.processor

### Dataset

* The Dataset module in Qlib aims to prepare data for model training and inferencing.

#### DatasetH

* The DatasetH class is the dataset with Data Handler. Here is the most important interface of the class:

##### TSDatasetH

* 

## Model

* [docs](https://qlib.readthedocs.io/en/latest/component/model.html)
* qlib.model.base.Model from which all models should inherit.
* ModelFT, which includes the method for finetuning the model.


# trouble shooting

## DLL load failed while importing _openssl: 找不到指定的模块。[install on windows]

* pip install -I cryptography

# Alpha158

* Alpha158 is the ***data handler*** provided by Qlib, please refer to Data Handler.

In [None]:
import qlib
from qlib.contrib.data.handler import Alpha158

data_handler_config = {
    "start_time": "2008-01-01",
    "end_time": "2020-08-01",
    "fit_start_time": "2008-01-01",
    "fit_end_time": "2014-12-31",
    "instruments": "csi300",
}

if __name__ == "__main__":
    qlib.init()
    h = Alpha158(**data_handler_config)

    # get all the columns of the data
    print(h.get_cols())

    # fetch all the labels
    print(h.fetch(col_set="label"))

    # fetch all the features
    print(h.fetch(col_set="feature"))

In [1]:
import qlib
import pandas as pd
from qlib.constant import REG_CN
from qlib.utils import exists_qlib_data, init_instance_by_config
from qlib.workflow import R
from qlib.workflow.record_temp import SignalRecord, PortAnaRecord
from qlib.utils import flatten_dict

# use default data
# NOTE: need to download data from remote: python scripts/get_data.py qlib_data_cn --target_dir ~/.qlib/qlib_data/cn_data
provider_uri = "~/.qlib/qlib_data/cn_data"  # target_dir
# if not exists_qlib_data(provider_uri):
#     print(f"Qlib data is not found in {provider_uri}")
#     sys.path.append(str(scripts_dir))
#     from get_data import GetData

#     GetData().qlib_data(target_dir=provider_uri, region=REG_CN)
qlib.init(provider_uri=provider_uri, region=REG_CN)

[79568:MainThread](2023-07-25 20:39:24,580) INFO - qlib.Initialization - [config.py:416] - default_conf: client.
[79568:MainThread](2023-07-25 20:39:24,595) INFO - qlib.Initialization - [__init__.py:74] - qlib successfully initialized based on client settings.
[79568:MainThread](2023-07-25 20:39:24,595) INFO - qlib.Initialization - [__init__.py:76] - data_path={'__DEFAULT_FREQ': WindowsPath('D:/dataset/quant/.qlib/qlib_data/cn_data')}


In [2]:
from qlib.data import D
D.calendar(start_time='2010-01-01', end_time='2017-12-31', freq='day')[:2]

array([Timestamp('2010-01-04 00:00:00'), Timestamp('2010-01-05 00:00:00')],
      dtype=object)

In [4]:
type(D.calendar)

method