Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add download qlib_data docs #69

Merged
merged 3 commits into from
Nov 19, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ Also, users can install ``Qlib`` by the source code according to the following s
## Data Preparation
Load and prepare data by running the following code:
```bash
python scripts/get_data.py qlib_data_cn --target_dir ~/.qlib/qlib_data/cn_data
python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/cn_data --region cn
```

This dataset is created by public data collected by [crawler scripts](scripts/data_collector/), which have been released in
Expand Down
4 changes: 2 additions & 2 deletions docs/component/data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ Qlib Format Dataset

.. code-block:: bash

python scripts/get_data.py qlib_data_cn --target_dir ~/.qlib/qlib_data/cn_data
python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/cn_data --region cn

After running the above command, users can find china-stock data in Qlib format in the ``~/.qlib/csv_data/cn_data`` directory.

Expand All @@ -59,7 +59,7 @@ Supposed that users prepare their CSV format data in the directory ``~/.qlib/csv

.. code-block:: bash

python scripts/dump_bin.py dump --csv_path ~/.qlib/csv_data/my_data --qlib_dir ~/.qlib/qlib_data/my_data --include_fields open,close,high,low,volume,factor
python scripts/dump_bin.py dump_all --csv_path ~/.qlib/csv_data/my_data --qlib_dir ~/.qlib/qlib_data/my_data --include_fields open,close,high,low,volume,factor

After conversion, users can find their Qlib format data in the directory `~/.qlib/qlib_data/my_data`.

Expand Down
2 changes: 1 addition & 1 deletion docs/introduction/quick.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Load and prepare data by running the following code:

.. code-block::

python scripts/get_data.py qlib_data_cn --target_dir ~/.qlib/qlib_data/cn_data
python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/cn_data --region cn

This dataset is created by public data collected by crawler scripts in ``scripts/data_collector/``, which have been released in the same repository. Users could create the same dataset with it.

Expand Down
2 changes: 1 addition & 1 deletion docs/start/initialization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Please follow the steps below to initialize ``Qlib``.
- Download and prepare the Data: execute the following command to download stock data. Please pay `attention` that the data is collected from `Yahoo Finance <https://finance.yahoo.com/lookup>`_ and the data might not be perfect. We recommend users to prepare their own data if they have high-quality datasets. Please refer to `Data <../component/data.html#converting-csv-format-into-qlib-format>` for more information about customized dataset.
.. code-block:: bash

python scripts/get_data.py qlib_data_cn --target_dir ~/.qlib/qlib_data/cn_data
python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/cn_data --region cn
Please refer to `Data Preparation <../component/data.html#data-preparation>`_ for more information about `get_data.py`,


Expand Down
2 changes: 1 addition & 1 deletion examples/train_backtest_analyze.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
"outputs": [],
"source": [
"# use default data\n",
"# NOTE: need to download data from remote: python scripts/get_data.py qlib_data_cn --target_dir ~/.qlib/qlib_data/cn_data\n",
"# NOTE: need to download data from remote: python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/cn_data --region cn\n",
"provider_uri = \"~/.qlib/qlib_data/cn_data\" # target_dir\n",
"if not exists_qlib_data(provider_uri):\n",
" print(f\"Qlib data is not found in {provider_uri}\")\n",
Expand Down
61 changes: 61 additions & 0 deletions scripts/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@

- [Download Qlib Data](#Download-Qlib-Data)
- [Download CN Data](#Download-CN-Data)
- [Downlaod US Data](#Downlaod-US-Data)
- [Download CN Simple Data](#Download-CN-Simple-Data)
- [Help](#Help)
- [Using in Qlib](#Using-in-Qlib)
- [US data](#US-data)
- [CN data](#CN-data)


## Download Qlib Data


### Download CN Data

```bash
python get_data.py qlib_data --target_dir ~/.qlib/qlib_data/cn_data --region cn
```

### Downlaod US Data

> The US stock code contains 'PRN', and the directory cannot be created on Windows system: https://superuser.com/questions/613313/why-cant-we-make-con-prn-null-folder-in-windows

```bash
python get_data.py qlib_data --target_dir ~/.qlib/qlib_data/us_data --region us
```

### Download CN Simple Data

```bash
python get_data.py qlib_data --name qlib_data_simple --target_dir ~/.qlib/qlib_data/cn_data --region cn
```

### Help

```bash
python get_data.py qlib_data --help
```

## Using in Qlib
> For more information: https://qlib.readthedocs.io/en/latest/start/initialization.html


### US data

```python
import qlib
from qlib.config import REG_US
provider_uri = "~/.qlib/qlib_data/us_data" # target_dir
qlib.init(provider_uri=provider_uri, region=REG_US)
```

### CN data

```python
import qlib
from qlib.config import REG_CN
provider_uri = "~/.qlib/qlib_data/cn_data" # target_dir
qlib.init(provider_uri=provider_uri, region=REG_CN)
```
3 changes: 3 additions & 0 deletions scripts/dump_bin.py
Original file line number Diff line number Diff line change
Expand Up @@ -242,6 +242,9 @@ def _dump_bin(self, file_or_data: [Path, pd.DataFrame], calendar_list: List[pd.T
def dump(self):
raise NotImplementedError("dump not implemented!")

def __call__(self, *args, **kwargs):
self.dump()


class DumpDataAll(DumpDataBase):
def _get_all_date(self):
Expand Down