Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fund data as an example #292

Merged
merged 13 commits into from
Mar 19, 2021
Merged

Add fund data as an example #292

merged 13 commits into from
Mar 19, 2021

Conversation

wangershi
Copy link
Contributor

Description

Add fund data as an example

Motivation and Context

There are only stock data as the example, we can also use qlib for fund.

How Has This Been Tested?

A new feature, I test it offline.

Screenshots of Test Results (if appropriate):

  1. Pipeline test:
  2. Your own tests:
>>> import qlib
>>> from qlib.data import D
>>> qlib.init(provider_uri="~/.qlib/qlib_data/cn_fund_data")
[49788:MainThread](2021-02-28 19:04:43,535) INFO - qlib.Initialization - [config.py:276] - default_conf: client.
[49788:MainThread](2021-02-28 19:04:45,584) WARNING - qlib.Initialization - [config.py:292] - redis connection failed(host=127.0.0.1 port=6379), cache will not be used!
[49788:MainThread](2021-02-28 19:04:49,953) INFO - qlib.Initialization - [__init__.py:46] - qlib successfully initialized based on client settings.
[49788:MainThread](2021-02-28 19:04:49,956) INFO - qlib.Initialization - [__init__.py:47] - data_path=C:\Users\daoz\.qlib\qlib_data\cn_fund_data
>>> df = D.features(D.instruments(market="all"), ["$DWJZ", "$LJJZ"], freq="day")
>>> df
000001     2001-12-18   1.000000   1.000000
           2001-12-21   1.000000   1.000000
           2001-12-28   1.000000   1.000000
           2002-01-04   1.000000   1.000000
           2002-01-11   1.001000   1.001000
...                          ...        ...
000011     2021-02-22  21.035000  27.839001
           2021-02-23  21.070000  27.874001
           2021-02-24  20.670000  27.474001
           2021-02-26  20.219000  27.023001

[16929 rows x 2 columns]

Types of changes

  • Fix bugs
  • Add new feature
  • Update documentation

@ghost
Copy link

ghost commented Feb 28, 2021

CLA assistant check
All CLA requirements met.

@you-n-g you-n-g requested a review from zhupr February 28, 2021 11:16
@you-n-g
Copy link
Collaborator

you-n-g commented Feb 28, 2021

@wangershi Thanks for your PR.
Please format your code with python -m black . -l 120

@wangershi
Copy link
Contributor Author

@wangershi Thanks for your PR.
Please format your code with python -m black . -l 120

Done, thanks @you-n-g .

@Derek-Wds Derek-Wds added the enhancement New feature or request label Mar 4, 2021
raise ValueError(f"cannot support {interval}")
return _result

def collector_data(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the subclass does not make any changes, this function can be omitted

return df


class FundNormalize1d(FundNormalize, ABC):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s more appropriate to write this class like this:

class FundNormalize1d(FundNormalize):
    pass

@@ -93,6 +98,78 @@ def _get_calendar(month):
return calendar


def return_date_list(source_dir, date_field_name: str, file_path: Path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def return_date_list(date_field_name: str, file_path: Path):
    date_list = pd.read_csv(file_path, sep=",", index_col=0)[date_field_name].to_list()
    return sorted(map(lambda x: pd.Timestamp(x), date_list))


logger.info(f"count how many funds trade in this day......")
_dict_count_trade = dict() # dict{date:count}
_fun = partial(return_date_list, source_dir, date_field_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_fun = partial(return_date_list, date_field_name)

_dict_count_trade = dict() # dict{date:count}
_fun = partial(return_date_list, source_dir, date_field_name)
with tqdm(total=_number_all_funds) as p_bar:
with ProcessPoolExecutor(max_workers=max_workers) as executor:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following code can read the file less:

all_oldest_list = []
with tqdm(total=_number_all_funds) as p_bar:
    with ProcessPoolExecutor(max_workers=max_workers) as executor:
        for date_list in executor.map(_fun, file_list):
            if date_list:
                all_oldest_list.append(date_list[0])
            for date in date_list:
                if date not in _dict_count_trade.keys():
                    _dict_count_trade[date] = 0

                _dict_count_trade[date] += 1

            p_bar.update()

logger.info(f"count how many funds have founded in this day......")
_dict_count_founding = {date: _number_all_funds for date in _dict_count_trade.keys()}  # dict{date:count}
with tqdm(total=_number_all_funds) as p_bar:
    for oldest_date in all_oldest_list:
        for date in _dict_count_founding.keys():
            if date < oldest_date:
                _dict_count_founding[date] -= 1

@wangershi
Copy link
Contributor Author

Done, thanks @zhupr .

@you-n-g you-n-g merged commit ba56e40 into microsoft:main Mar 19, 2021
@wangershi wangershi deleted the addFund branch March 19, 2021 03:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants