# Qlib Explorer

## Installation

For normal x86 intel chip or amd chip, just create a conda virtual environment in python 3.8, then use `pip install` to make a qlib environment. code as follows:

```shell
conda create -n qlib python=3.8
conda activate qlib
pip install pyqlib
```

But on Mac M1 arm, things are slightly different, but to start with, create a conda environment.

```shell
conda create -n qlib python=3.8
```

Then, clone the qlib project to a proper directory, like `Desktop`

```shell
git clone https://github.com/microsoft/qlib
```

Then, `cd` to the directory and install the dependency using conda first

```shell
conda install lightbgm ecos pytables cvxpy mlflow fire ruamel
```

These dependencies are neccessary to install by conda because the are not pure python package, without conda, the installation might not be successful. After installing these, we can simple use the command below to install the qlib library to Mac M1 locally.

```shell
export HDF5_DIR=/path/to/somewhere/you/want/to/store/hdf5file
pip install .
```

Wait for a second, the installation will be successful. And the following cell can be run.

In [4]:
import qlib

qlib.__version__

'0.8.6.99'

## Data Transformation

The Qlib provide a better storage plan for most of quanters, we can view this as below:

![qlib-storage-structure](../images/qlib-storage-structure.png)

In features directory, keeps the features of our dataset, each stock for a directory, and each feature a bin file in the directory. Show as follows:

![qlib-storage-features](../images/qlib-storage-features.png)

Anyway, the three core file in these directories are `day.txt`, `all.txt`, and the `bin files`. The `day.txt` simply stored the all trading day calender amoung all instruments, which is the union set of the all instruments' trading days. The `all.txt` records the entry date and exit date for every instrument in a line. The `bin files` are slightly special, the value of them are a column for one sepcified instrument, but the **first value is the index of the feature start date in `day.txt`**, and the file is stored in numpy generic `bin` format wihout any extra information.

To generate these files, you can simple realize that by numpy or pandas. For `day.txt`, simply construct the union set of your total data, and store them into a list, use `np.savetxt` is quite enough, or you can construct a Series (or DataFrame), using `data.to_csv('day.txt', headers=False, index=False)` is also a great way to accomplish that. For `all.txt`, get the min date and max date for each instrument, construct a list or DataFrame, use `np.savetxt` or `data.to_csv` like above works fine, further more, to make this faster, we recommend using `pandas.core.DataFrame.groupby`. For `bin files`, we need to get each instrument, and get the index of first date, save the numpy format data using `data[feature].value.tofile`. But before that, we must apply `np.hstack([date_index, data[feature].value])` first. However, in updating process, `hstack` is not neccessary.

Qlib provide a `script/dump_bin` script to simplify our workflow of dumping data into bin, however, that is for `csv` file only, if your files are `feathe` or `parquet` format, that can not help. So we provide a `SingleFileDumper` in [library](../library/dumper.py). You can try that.

In [2]:
from library.dumper import SingleFileDumper

dumper = SingleFileDumper(
    file_path = '../data/kline-daily/market-daily.parquet',
    file_type = 'parquet',
    date_field = 'index', 
    inst_field = 'index',
    dump_path = '../data/qlib-day/'
)
dumper.dump()

## Data Plot

Now we have successfully converted daily trading data into qlib format, we can use the qlib api to fetch useful data into memory.

In [5]:
import plotly.graph_objects as go
from qlib.data import D

In [6]:
qlib.init(provider_uri='../data/qlib-day')

[94941:MainThread](2022-08-21 14:36:34,683) INFO - qlib.Initialization - [config.py:413] - default_conf: client.
[94941:MainThread](2022-08-21 14:36:34,893) INFO - qlib.Initialization - [__init__.py:74] - qlib successfully initialized based on client settings.
[94941:MainThread](2022-08-21 14:36:34,893) INFO - qlib.Initialization - [__init__.py:76] - data_path={'__DEFAULT_FREQ': PosixPath('/Users/oak/Repo/Data/qlib-day')}


In [7]:
ohlc = D.features(instruments=['600519.xshg'], start_time='2021-01-01', end_time='2022-01-01', fields=['$open', '$high', '$low', '$close'])
go.Figure(data = go.Candlestick(
    x = ohlc.index.levels[1],
    open = ohlc['$open'],
    high = ohlc['$high'],
    low = ohlc['$low'],
    close = ohlc['$close'],
))