Ibovespa index support #990

igor17400 · 2022-03-17T22:25:08Z

Description

For the feat: download ibovespa index historic composition the following files were included:
- scripts/data_collector/br_index/README.md
- scripts/data_collector/br_index/collector.py
- scripts/data_collector/br_index/requirements.txt
For fix: typo error instead of end_date, it was written end_ate the following files were modified:
- scripts/data_collector/index.py
For feat: adds support for downloading stocks historic prices from Brazil's stocks exchange (B3) the following files were modified:
- scripts/data_collector/utils.py
- scripts/data_collector/yahoo/README.md
- scripts/data_collector/yahoo/collector.py

Motivation and Context

Brazil's stock exchange is one of the biggest in the world, therefore adding its dataset into qlib can help people do research via qlib infrastructure using different datasets.

Application scenario: support brazilian stocks exchange investment strategy research.
Related works (Papers, Github repos etc.): Not applicable
Any other relevant and important information: More details on Brazil's stock exchange can be found on B3

How Has This Been Tested?

Pass the test by running: pytest qlib/tests/test_all_pipeline.py under upper directory of qlib.
If you are adding a new feature, test on your own test scripts.

There isn't a specific script for test, however the added code can be tested running the following commands:

python scripts/data_collector/br_index/collector.py --index_name IBOV --qlib_dir ~/.qlib/qlib_data/br_data --method parse_instruments
python scripts/data_collector/br_index/collector.py --index_name IBOV --qlib_dir ~/.qlib/qlib_data/br_data --method save_new_companies
python scripts/data_collector/br_index/collector.py download_data --source_dir ~/.qlib/stock_data/source/br_data --start 2003-01-03 --end 2022-03-01 --delay 1 --interval 1d --region BR
python scripts/data_collector/br_index/collector.py download_data --source_dir ~/.qlib/stock_data/source/br_data_1min --delay 1 --interval 1min --region BR

Screenshots of Test Results (if appropriate):

Pipeline test: Not applicable
Your own tests: Not applicable

Types of changes

Add new feature
Update documentation (README.md)

closes #956

ibovespa(ibov) is the largest index in Brazil's stocks exchange. The br_index folder has support for downloading new companies for the current index composition. And has support, as well, for downloading companies from historic composition of ibov index. Partially resolves issue #956

…'s stocks exchange (B3) Together with commit c2f933 it resolves issue #956

ghost · 2022-03-17T22:25:20Z

All CLA requirements met.

ChiahungTai · 2022-03-22T00:37:34Z

@you-n-g Should it rerun the mac os test?

…t file

scripts/data_collector/br_index/collector.py

scripts/data_collector/utils.py

SunsetWolf · 2022-03-31T06:43:46Z

In the course of reviewing this PR we found two issues.

there are multiple places where the get_instruments() method is used, and we feel that scripts.index.py is the best place for the get_instruments() method to go.
data_collector.utils has some very generic stuff put inside it.

Are you willing to fix them? If not, I will fix these issues in the future.

igor17400 · 2022-03-31T21:19:24Z

@SunsetWolf I can work on this!
I'll give it a look

…BOVIndex

…bov index

…ode usability. Message in the PR request to understand the context of the change In the course of reviewing this PR we found two issues. 1. there are multiple places where the get_instruments() method is used, and we feel that scripts.index.py is the best place for the get_instruments() method to go. 2. data_collector.utils has some very generic stuff put inside it.

The reason to use retry=2 is due to the fact that Yahoo Finance unfortunately does not keep track of the majority of Brazilian stocks. Therefore, the decorator deco_retry with retry argument set to 5 will keep trying to get the stock data 5 times, which makes the code to download Brazilians stocks very slow. In future, this may change, but for now I suggest to leave retry argument to 1 or 2 in order to improve download speed. In order to achieve this code logic an argument called retry_config was added into YahooCollectorBR1d and YahooCollectorBR1min

you-n-g · 2022-04-02T07:24:16Z

scripts/data_collector/us_index/collector.py

-        qlib_dir=qlib_dir, index_name=index_name, freq=freq, request_retry=request_retry, retry_sleep=retry_sleep
-    )
-    getattr(obj, method)()
-

 if __name__ == "__main__":
    fire.Fire(get_instruments)


Suggested change

fire.Fire(get_instruments)

fire.Fire(partial(get_instruments, market_index="br_index" ))

This applies to other index data source as well.

Do you think this will make the interface easier?
Then all the collector.py CLI in each folder can remove a redundant arguments.

you-n-g · 2022-04-02T07:24:52Z

scripts/data_collector/br_index/README.md

@@ -53,9 +53,9 @@ With that reference, the index's composition can be compared quarter by quarter

 ```bash
 # parse instruments, using in qlib/instruments.
-python collector.py --index_name IBOV --qlib_dir ~/.qlib/qlib_data/br_data --method parse_instruments
+python collector.py --index_name IBOV --qlib_dir ~/.qlib/qlib_data/br_data --method parse_instruments --market_index br_index


you-n-g · 2022-04-02T07:26:02Z

scripts/data_collector/utils.py

@@ -564,3 +562,47 @@ def generate_minutes_calendar_from_daily(

 if __name__ == "__main__":


It will be better if ifmain is at the bottom of the script.

Yes, much better!
It was a lack of attention on my part

you-n-g · 2022-04-02T07:31:09Z

scripts/data_collector/yahoo/collector.py

@@ -147,7 +148,29 @@ def _show_logging_func():
    def get_data(
        self, symbol: str, interval: str, start_datetime: pd.Timestamp, end_datetime: pd.Timestamp
    ) -> pd.DataFrame:
-        @deco_retry(retry_sleep=self.delay)
+        if hasattr(self, 'retry_config'):


Will it look simpler to make retry as a part of the class interface ( for example, abstract attribute )

@you-n-g but should I define this class interface only for YahooCollectorBR1d and YahooCollectorBR1min classes? Or into YahooCollector base class?

Could you take a look into commit 6cc96cc? To check if I've implemented the way you suggested

Maybe we can define

class YahooCollector(...): retry = 5 # Configuration attribute. How many times will it try to re-request the data if the network fails. class YahooCollectorBR(...): """" The reason to use retry=2 is due to the fact that Yahoo Finance unfortunately does not keep track of the majority of Brazilian stocks. Therefore, the decorator deco_retry with retry argument set to 5 will keep trying to get the stock data 5 times, which makes the code to download Brazilians stocks very slow. In future, this may change, but for now I suggest to leave retry argument to 1 or 2 in order to improve download speed. In order to achieve this code logic an argument called retry_config was added into YahooCollectorBR base class """ retry = 2

Now the retry is part of the interface, so we don't have to use hasattr to access the retry.

@you-n-g I've added such modifications into commit c313804

Using partial as `fire.Fire(partial(get_instruments, market_index="br_index" ))` will make the interface easier for the user to execute the script. Then all the collector.py CLI in each folder can remove a redundant arguments.

This way we don't have to use hasattr to access the retry attribute as previously done

you-n-g · 2022-04-06T01:01:24Z

It looks great now!

* feat: download ibovespa index historic composition ibovespa(ibov) is the largest index in Brazil's stocks exchange. The br_index folder has support for downloading new companies for the current index composition. And has support, as well, for downloading companies from historic composition of ibov index. Partially resolves issue microsoft#956 * fix: typo error instead of end_date, it was written end_ate * feat: adds support for downloading stocks historic prices from Brazil's stocks exchange (B3) Together with commit c2f933 it resolves issue microsoft#956 * fix: code formatted with black. * wip: Creating code logic for brazils stock market data normalization * docs: brazils stock market data normalization code documentation * fix: code formatted the with black * docs: fixed typo * docs: more info about python version used to generate requirements.txt file * docs: added BeautifulSoup requirements * feat: removed debug prints * feat: added ibov_index_composition variable as a class attribute of IBOVIndex * feat: added increment to generate the four month period used by the ibov index * refactor: Added get_instruments() method inside utils.py for better code usability. Message in the PR request to understand the context of the change In the course of reviewing this PR we found two issues. 1. there are multiple places where the get_instruments() method is used, and we feel that scripts.index.py is the best place for the get_instruments() method to go. 2. data_collector.utils has some very generic stuff put inside it. * refactor: improve brazils stocks download speed The reason to use retry=2 is due to the fact that Yahoo Finance unfortunately does not keep track of the majority of Brazilian stocks. Therefore, the decorator deco_retry with retry argument set to 5 will keep trying to get the stock data 5 times, which makes the code to download Brazilians stocks very slow. In future, this may change, but for now I suggest to leave retry argument to 1 or 2 in order to improve download speed. In order to achieve this code logic an argument called retry_config was added into YahooCollectorBR1d and YahooCollectorBR1min * fix: added __main__ at the bottom of the script * refactor: changed interface inside each index Using partial as `fire.Fire(partial(get_instruments, market_index="br_index" ))` will make the interface easier for the user to execute the script. Then all the collector.py CLI in each folder can remove a redundant arguments. * refactor: implemented class interface retry into YahooCollectorBR * docs: added BR as a possible region into the documentation * refactor: make retry attribute part of the interface This way we don't have to use hasattr to access the retry attribute as previously done

SunsetWolf · 2023-09-18T03:38:21Z

Hi, @igor17400
This script is not working right now because of the missing datasource, hopefully the datasource can be updated.

igor17400 added 3 commits March 17, 2022 17:04

fix: typo error instead of end_date, it was written end_ate

c2f933b

feat: adds support for downloading stocks historic prices from Brazil…

09b8ad9

…'s stocks exchange (B3) Together with commit c2f933 it resolves issue #956

igor17400 added 2 commits March 19, 2022 11:03

fix: code formatted with black.

77107f3

wip: Creating code logic for brazils stock market data normalization

3aaf1df

igor17400 added 4 commits March 23, 2022 15:07

docs: brazils stock market data normalization code documentation

9ceb592

fix: code formatted the with black

d1b73b3

docs: fixed typo

cc0e126

docs: more info about python version used to generate requirements.tx…

95938ea

…t file

you-n-g reviewed Mar 31, 2022

View reviewed changes

scripts/data_collector/br_index/collector.py Outdated Show resolved Hide resolved

you-n-g reviewed Mar 31, 2022

View reviewed changes

scripts/data_collector/br_index/collector.py Outdated Show resolved Hide resolved

scripts/data_collector/utils.py Show resolved Hide resolved

igor17400 added 6 commits March 31, 2022 21:07

docs: added BeautifulSoup requirements

b0aafa2

feat: removed debug prints

592559a

feat: added ibov_index_composition variable as a class attribute of I…

92aa003

…BOVIndex

feat: added increment to generate the four month period used by the i…

4903845

…bov index

you-n-g reviewed Apr 2, 2022

View reviewed changes

igor17400 added 5 commits April 2, 2022 08:39

fix: added __main__ at the bottom of the script

1d80c4c

refactor: changed interface inside each index

dc72c6b

Using partial as `fire.Fire(partial(get_instruments, market_index="br_index" ))` will make the interface easier for the user to execute the script. Then all the collector.py CLI in each folder can remove a redundant arguments.

refactor: implemented class interface retry into YahooCollectorBR

6cc96cc

docs: added BR as a possible region into the documentation

1cbfb5c

refactor: make retry attribute part of the interface

c313804

This way we don't have to use hasattr to access the retry attribute as previously done

you-n-g merged commit 56cfa48 into microsoft:main Apr 6, 2022

igor17400 deleted the ibovespa-index-support branch April 6, 2022 10:38

you-n-g added the enhancement New feature or request label Apr 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ibovespa index support #990

Ibovespa index support #990

igor17400 commented Mar 17, 2022 •

edited

Loading

ghost commented Mar 17, 2022 •

edited by ghost

Loading

ChiahungTai commented Mar 22, 2022

SunsetWolf commented Mar 31, 2022 •

edited

Loading

igor17400 commented Mar 31, 2022

you-n-g Apr 2, 2022

you-n-g Apr 2, 2022

you-n-g Apr 2, 2022

igor17400 Apr 2, 2022

you-n-g Apr 2, 2022

igor17400 Apr 2, 2022 •

edited

Loading

you-n-g Apr 3, 2022

igor17400 Apr 3, 2022

you-n-g commented Apr 6, 2022

SunsetWolf commented Sep 18, 2023

	fire.Fire(get_instruments)
	fire.Fire(partial(get_instruments, market_index="br_index" ))

		@@ -564,3 +562,47 @@ def generate_minutes_calendar_from_daily(

		if __name__ == "__main__":

Ibovespa index support #990

Ibovespa index support #990

Conversation

igor17400 commented Mar 17, 2022 • edited Loading

Description

Motivation and Context

How Has This Been Tested?

Screenshots of Test Results (if appropriate):

Types of changes

ghost commented Mar 17, 2022 • edited by ghost Loading

ChiahungTai commented Mar 22, 2022

SunsetWolf commented Mar 31, 2022 • edited Loading

igor17400 commented Mar 31, 2022

you-n-g Apr 2, 2022

Choose a reason for hiding this comment

you-n-g Apr 2, 2022

Choose a reason for hiding this comment

you-n-g Apr 2, 2022

Choose a reason for hiding this comment

igor17400 Apr 2, 2022

Choose a reason for hiding this comment

you-n-g Apr 2, 2022

Choose a reason for hiding this comment

igor17400 Apr 2, 2022 • edited Loading

Choose a reason for hiding this comment

you-n-g Apr 3, 2022

Choose a reason for hiding this comment

igor17400 Apr 3, 2022

Choose a reason for hiding this comment

you-n-g commented Apr 6, 2022

SunsetWolf commented Sep 18, 2023

igor17400 commented Mar 17, 2022 •

edited

Loading

ghost commented Mar 17, 2022 •

edited by ghost

Loading

SunsetWolf commented Mar 31, 2022 •

edited

Loading

igor17400 Apr 2, 2022 •

edited

Loading