Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the error when the stock code is a number #78

Merged
merged 5 commits into from Jan 26, 2021
Merged

Conversation

zhupr
Copy link
Collaborator

@zhupr zhupr commented Dec 8, 2020

Description

Motivation and Context

How Has This Been Tested?

  • Pass the test by running: pytest qlib/tests/test_all_pipeline.py under upper directory of qlib.
  • If you are adding a new feature, test on your own test scripts.

Screenshots of Test Results (if appropriate):

  1. Pipeline test:
  2. Your own tests:

Types of changes

  • Fix bugs
  • Add new feature
  • Update documentation

@zhupr zhupr force-pushed the main branch 3 times, most recently from 231fabd to 4f6fe04 Compare December 8, 2020 16:56
@you-n-g
Copy link
Collaborator

you-n-g commented Dec 9, 2020

@zhupr Please check the CI results.
Thanks

@@ -591,6 +594,8 @@ def _load_instruments(self, market):
df = pd.read_csv(fname, sep="\t", names=["inst", "start_datetime", "end_datetime", "save_inst"])
df["start_datetime"] = pd.to_datetime(df["start_datetime"])
df["end_datetime"] = pd.to_datetime(df["end_datetime"])
df["inst"] = df["inst"].astype(str)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

give more docs about the save_inst

@@ -223,8 +223,11 @@ def convert_instruments(self, instrument):
for _path in Path(C.get_data_path()).joinpath("instruments").glob("*.txt"):
_df = pd.read_csv(_path, sep="\t", names=["inst", "start_datetime", "end_datetime", "save_inst"])
_df_list.append(_df.iloc[:, [0, -1]])
df = pd.concat(_df_list, sort=False).sort_values("save_inst")
df = df.drop_duplicates(subset=["save_inst"], keep="first").fillna(axis=1, method="ffill")
df = pd.concat(_df_list, sort=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dtype:  Type name or dict of column -> type, optional
      Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32, ‘c’: ‘Int64’} Use str or object together with suitable na_values settings to preserve and not interpret dtype. If converters are specified, they will be applied INSTEAD of dtype conversion.

Will dtype be a cleaner solution?

@@ -223,8 +223,11 @@ def convert_instruments(self, instrument):
for _path in Path(C.get_data_path()).joinpath("instruments").glob("*.txt"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is very strange that InstrumentProvider has such such a function named convert_instruments

@zhupr zhupr mentioned this pull request Dec 20, 2020
# `day`
begin = inst_time[1]
end = inst_time[2]
elif len(inst_time) == 5:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very strange to use different logic for processing different frequency data.
Why not using code like pd.read_csv(fname, sep="\t", names=["inst", "start_datetime", "end_datetime"])

qlib/__init__.py Outdated
@@ -2,7 +2,7 @@
# Licensed under the MIT License.


__version__ = "0.6.1.dev"
__version__ = "0.6.2.dev"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussion

@staticmethod
def _delete_qlib_data(file_dir: Path):
logger.info(f"delete {file_dir}")
for _name in ["features", "calendars", "instruments"]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about cache data ?


Examples
---------
python get_data.py qlib_data --name qlib_data --target_dir ~/.qlib/qlib_data/cn_data --version latest --interval 1d --region cn
-------

"""
file_name = f"{name}_{region.lower()}_{interval.lower()}_{version}.zip"
self._download_data(file_name.lower(), target_dir)
_version = re.search(r"(\d+)(.)(\d+)(.)(\d+)", qlib.__version__)[0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add more documents about the background for this code.

dataset_name=name,
region=region.lower(),
interval=interval.lower(),
qlib_version=qlib_version,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider the situation that the qlib version changes but the data version remains the same?

replace_names += [f"COM{i}" for i in range(10)]
replace_names += [f"LPT{i}" for i in range(10)]

prefix = "_QLIB_" if code.isupper() else "_qlib_"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this is necessary?

qlib/__init__.py Outdated
@@ -2,7 +2,7 @@
# Licensed under the MIT License.


__version__ = "0.6.1.dev"
__version__ = "0.6.1.99.dev"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add ".99", do we still need ".dev" ?

@you-n-g you-n-g merged commit 36e5c60 into microsoft:main Jan 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants