Qlib is an AI-oriented quantitative investment platform, which aims to realize the potential, empower the research, and create the value of AI technologies in quantitative investment.
With Qlib, you can easily try your ideas to create better Quant investment strategies.
For more details, please refer to our paper "Qlib: An AI-oriented Quantitative Investment Platform".
- Framework of Qlib
- Quick Start
- Quant Model Zoo
- Quant Dataset Zoo
- More About Qlib
- Offline Mode and Online Mode
Framework of Qlib
At the module level, Qlib is a platform that consists of the above components. The components are designed as loose-coupled modules and each component could be used stand-alone.
||Users could get a detailed analysis report of forecasting signals and portfolios in this part.|
- The modules with hand-drawn style are under development and will be released in the future.
- The modules with dashed borders are highly user-customizable and extendible.
This quick start guide tries to demonstrate
- It's very easy to build a complete Quant research workflow and try your ideas with Qlib.
- Though with public data and simple models, machine learning technologies work very well in practical Quant investment.
Users can easily install
Qlib by pip according to the following command
pip install pyqlib
Also, users can install
Qlib by the source code according to the following steps:
Qlibfrom source, users need to install some dependencies:
pip install numpy pip install --upgrade cython
Clone the repository and install
git clone https://github.com/microsoft/qlib.git && cd qlib python setup.py install
Load and prepare data by running the following code:
python scripts/get_data.py qlib_data_cn --target_dir ~/.qlib/qlib_data/cn_data
This dataset is created by public data collected by crawler scripts, which have been released in the same repository. Users could create the same dataset with it.
Please pay ATTENTION that the data is collected from Yahoo Finance and the data might not be perfect. We recommend users to prepare their own data if they have high-quality dataset. For more information, users can refer to the related document.
Auto Quant Research Workflow
Qlib provides a tool named
Estimator to run the whole workflow automatically (including building dataset, training models, backtest and evaluation). You can start an auto quant research workflow and have a graphical reports analysis according to the following steps:
Quant Research Workflow: Run
Estimatorwith estimator_config.yaml as following. (Please note that this may not work under MacOS with Python 3.8 due to the incompatibility of the
sacredpackage we use with Python 3.8. We will fix this bug in the future.)
cd examples # Avoid running program under the directory contains `qlib` estimator -c estimator/estimator_config.yaml
The result of
Estimatoris as follows, please refer to please refer to Intraday Trading for more details about the result.
risk excess_return_without_cost mean 0.000675 std 0.005456 annualized_return 0.170077 information_ratio 1.963824 max_drawdown -0.063646 excess_return_with_cost mean 0.000479 std 0.005453 annualized_return 0.120776 information_ratio 1.395116 max_drawdown -0.071216
Here are detailed documents for Estimator.
Graphical Reports Analysis: Run
jupyter notebookto get graphical reports
Building Customized Quant Research Workflow by Code
The automatic workflow may not suite the research workflow of all Quant researchers. To support a flexible Quant research workflow, Qlib also provides a modularized interface to allow researchers to build their own workflow by code. Here is a demo for customized Quant research workflow by code
Quant Model Zoo
Here is a list of models built on
Your PR of new Quant models is highly welcomed.
Quant Dataset Zoo
Dataset plays a very important role in Quant. Here is a list of the datasets built on
Here is a tutorial to build dataset with
Your PR to build new Quant dataset is highly welcomed.
More About Qlib
cd docs/ conda install sphinx sphinx_rtd_theme -y # Otherwise, you can install them with pip # pip install sphinx sphinx_rtd_theme make html
You can also view the latest document online directly.
Qlib is in active and continuing development. Our plan is in the roadmap, which is managed as a github project.
Offline Mode and Online Mode
The data server of Qlib can either deployed as
Offline mode or
Online mode. The default mode is offline mode.
Offline mode, the data will be deployed locally.
Online mode, the data will be deployed as a shared data service. The data and their cache will be shared by all the clients. The data retrieval performance is expected to be improved due to a higher rate of cache hits. It will consume less disk space, too. The documents of the online mode can be found in Qlib-Server. The online mode can be deployed automatically with Azure CLI based scripts. The source code of online data server can be found in Qlib-Server repository.
Performance of Qlib Data Server
The performance of data processing is important to data-driven methods like AI technologies. As an AI-oriented platform, Qlib provides a solution for data storage and data processing. To demonstrate the performance of Qlib data server, we compare it with several other data storage solutions.
We evaluate the performance of several storage solutions by finishing the same task, which creates a dataset (14 features/factors) from the basic OHLCV daily data of a stock market (800 stocks each day from 2007 to 2020). The task involves data queries and processing.
|HDF5||MySQL||MongoDB||InfluxDB||Qlib -E -D||Qlib +E -D||Qlib +E +D|
|Total (1CPU) (seconds)||184.4±3.7||365.3±7.5||253.6±6.7||368.2±3.6||147.0±8.8||47.6±1.0||7.4±0.3|
|Total (64CPU) (seconds)||8.8±0.6||4.2±0.2|
+(-)Eindicates with (out)
+(-)Dindicates with (out)
Most general-purpose databases take too much time on loading data. After looking into the underlying implementation, we find that data go through too many layers of interfaces and unnecessary format transformations in general-purpose database solutions. Such overheads greatly slow down the data loading process. Qlib data are stored in a compact format, which is efficient to be combined into arrays for scientific computation.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the right to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.