Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ibovespa index support #990

Merged
merged 20 commits into from
Apr 6, 2022
Merged

Ibovespa index support #990

merged 20 commits into from
Apr 6, 2022

Conversation

igor17400
Copy link
Contributor

@igor17400 igor17400 commented Mar 17, 2022

Description

  • For the feat: download ibovespa index historic composition the following files were included:
    • scripts/data_collector/br_index/README.md
    • scripts/data_collector/br_index/collector.py
    • scripts/data_collector/br_index/requirements.txt
  • For fix: typo error instead of end_date, it was written end_ate the following files were modified:
    • scripts/data_collector/index.py
  • For feat: adds support for downloading stocks historic prices from Brazil's stocks exchange (B3) the following files were modified:
    • scripts/data_collector/utils.py
    • scripts/data_collector/yahoo/README.md
    • scripts/data_collector/yahoo/collector.py

Motivation and Context

Brazil's stock exchange is one of the biggest in the world, therefore adding its dataset into qlib can help people do research via qlib infrastructure using different datasets.

  1. Application scenario: support brazilian stocks exchange investment strategy research.
  2. Related works (Papers, Github repos etc.): Not applicable
  3. Any other relevant and important information: More details on Brazil's stock exchange can be found on B3

How Has This Been Tested?

  • Pass the test by running: pytest qlib/tests/test_all_pipeline.py under upper directory of qlib.

  • If you are adding a new feature, test on your own test scripts.

There isn't a specific script for test, however the added code can be tested running the following commands:

  • python scripts/data_collector/br_index/collector.py --index_name IBOV --qlib_dir ~/.qlib/qlib_data/br_data --method parse_instruments
  • python scripts/data_collector/br_index/collector.py --index_name IBOV --qlib_dir ~/.qlib/qlib_data/br_data --method save_new_companies
  • python scripts/data_collector/br_index/collector.py download_data --source_dir ~/.qlib/stock_data/source/br_data --start 2003-01-03 --end 2022-03-01 --delay 1 --interval 1d --region BR
  • python scripts/data_collector/br_index/collector.py download_data --source_dir ~/.qlib/stock_data/source/br_data_1min --delay 1 --interval 1min --region BR

Screenshots of Test Results (if appropriate):

  1. Pipeline test: Not applicable
  2. Your own tests: Not applicable

Types of changes

  • Add new feature
  • Update documentation (README.md)

closes #956

ibovespa(ibov) is the largest index in Brazil's stocks exchange.
The br_index folder has support for downloading new companies for the current index composition.
And has support, as well, for downloading companies from historic composition of ibov index.

Partially resolves issue #956
…'s stocks exchange (B3)

Together with commit c2f933 it resolves issue #956
@ghost
Copy link

ghost commented Mar 17, 2022

CLA assistant check
All CLA requirements met.

@ChiahungTai
Copy link
Collaborator

@you-n-g Should it rerun the mac os test?

@SunsetWolf
Copy link
Collaborator

SunsetWolf commented Mar 31, 2022

In the course of reviewing this PR we found two issues.

  1. there are multiple places where the get_instruments() method is used, and we feel that scripts.index.py is the best place for the get_instruments() method to go.
  2. data_collector.utils has some very generic stuff put inside it.

Are you willing to fix them? If not, I will fix these issues in the future.

@igor17400
Copy link
Contributor Author

@SunsetWolf I can work on this!
I'll give it a look

…ode usability.

Message in the PR request to understand the context of the change

In the course of reviewing this PR we found two issues.

    1. there are multiple places where the get_instruments() method is used,
	and we feel that scripts.index.py is the best place for the
	get_instruments() method to go.
    2. data_collector.utils has some very generic stuff put inside it.
The reason to use retry=2 is due to the fact that
Yahoo Finance unfortunately does not keep track of the majority
of Brazilian stocks.

Therefore, the decorator deco_retry with retry argument
set to 5 will keep trying to get the stock data 5 times,
which makes the code to download Brazilians stocks very slow.

In future, this may change, but for now
I suggest to leave retry argument to 1 or 2 in
order to improve download speed.

In order to achieve this code logic an argument called retry_config
was added into YahooCollectorBR1d and YahooCollectorBR1min
qlib_dir=qlib_dir, index_name=index_name, freq=freq, request_retry=request_retry, retry_sleep=retry_sleep
)
getattr(obj, method)()


if __name__ == "__main__":
fire.Fire(get_instruments)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fire.Fire(get_instruments)
fire.Fire(partial(get_instruments, market_index="br_index" ))
  • This applies to other index data source as well.

Do you think this will make the interface easier?
Then all the collector.py CLI in each folder can remove a redundant arguments.

@@ -53,9 +53,9 @@ With that reference, the index's composition can be compared quarter by quarter

```bash
# parse instruments, using in qlib/instruments.
python collector.py --index_name IBOV --qlib_dir ~/.qlib/qlib_data/br_data --method parse_instruments
python collector.py --index_name IBOV --qlib_dir ~/.qlib/qlib_data/br_data --method parse_instruments --market_index br_index
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

@@ -564,3 +562,47 @@ def generate_minutes_calendar_from_daily(

if __name__ == "__main__":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be better if ifmain is at the bottom of the script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, much better!
It was a lack of attention on my part

@@ -147,7 +148,29 @@ def _show_logging_func():
def get_data(
self, symbol: str, interval: str, start_datetime: pd.Timestamp, end_datetime: pd.Timestamp
) -> pd.DataFrame:
@deco_retry(retry_sleep=self.delay)
if hasattr(self, 'retry_config'):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it look simpler to make retry as a part of the class interface ( for example, abstract attribute )

Copy link
Contributor Author

@igor17400 igor17400 Apr 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@you-n-g but should I define this class interface only for YahooCollectorBR1d and YahooCollectorBR1min classes? Or into YahooCollector base class?

Could you take a look into commit 6cc96cc? To check if I've implemented the way you suggested

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can define

class YahooCollector(...):
      retry = 5  # Configuration attribute.  How many times will it try to re-request the data if the network fails.

class YahooCollectorBR(...):
        """"
            The reason to use retry=2 is due to the fact that
            Yahoo Finance unfortunately does not keep track of the majority
            of Brazilian stocks. 
            
            Therefore, the decorator deco_retry with retry argument
            set to 5 will keep trying to get the stock data 5 times, 
            which makes the code to download Brazilians stocks very slow. 
            
            In future, this may change, but for now 
            I suggest to leave retry argument to 1 or 2 in 
            order to improve download speed.
            In order to achieve this code logic an argument called retry_config
            was added into YahooCollectorBR base class
        """
      retry = 2

Now the retry is part of the interface, so we don't have to use hasattr to access the retry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@you-n-g I've added such modifications into commit c313804

Using partial as `fire.Fire(partial(get_instruments, market_index="br_index" ))`
will make the interface easier for the user to execute the script.
Then all the collector.py CLI in each folder can remove a redundant arguments.
This way we don't have to use hasattr to access the retry attribute as previously done
@you-n-g
Copy link
Collaborator

you-n-g commented Apr 6, 2022

It looks great now!

@you-n-g you-n-g merged commit 56cfa48 into microsoft:main Apr 6, 2022
@igor17400 igor17400 deleted the ibovespa-index-support branch April 6, 2022 10:38
@you-n-g you-n-g added the enhancement New feature or request label Apr 24, 2022
qianyun210603 pushed a commit to qianyun210603/qlib that referenced this pull request Mar 23, 2023
* feat: download ibovespa index historic composition

ibovespa(ibov) is the largest index in Brazil's stocks exchange.
The br_index folder has support for downloading new companies for the current index composition.
And has support, as well, for downloading companies from historic composition of ibov index.

Partially resolves issue microsoft#956

* fix: typo error instead of end_date, it was written end_ate

* feat: adds support for downloading stocks historic prices from Brazil's stocks exchange (B3)

Together with commit c2f933 it resolves issue microsoft#956

* fix: code formatted with black.

* wip: Creating code logic for brazils stock market data normalization

* docs: brazils stock market data normalization code documentation

* fix: code formatted the with black

* docs: fixed typo

* docs: more info about python version used to generate requirements.txt file

* docs: added BeautifulSoup requirements

* feat: removed debug prints

* feat: added ibov_index_composition variable as a class attribute of IBOVIndex

* feat: added increment to generate the four month period used by the ibov index

* refactor: Added get_instruments() method inside utils.py for better code usability.

Message in the PR request to understand the context of the change

In the course of reviewing this PR we found two issues.

    1. there are multiple places where the get_instruments() method is used,
	and we feel that scripts.index.py is the best place for the
	get_instruments() method to go.
    2. data_collector.utils has some very generic stuff put inside it.

* refactor: improve brazils stocks download speed

The reason to use retry=2 is due to the fact that
Yahoo Finance unfortunately does not keep track of the majority
of Brazilian stocks.

Therefore, the decorator deco_retry with retry argument
set to 5 will keep trying to get the stock data 5 times,
which makes the code to download Brazilians stocks very slow.

In future, this may change, but for now
I suggest to leave retry argument to 1 or 2 in
order to improve download speed.

In order to achieve this code logic an argument called retry_config
was added into YahooCollectorBR1d and YahooCollectorBR1min

* fix: added __main__ at the bottom of the script

* refactor: changed interface inside each index

Using partial as `fire.Fire(partial(get_instruments, market_index="br_index" ))`
will make the interface easier for the user to execute the script.
Then all the collector.py CLI in each folder can remove a redundant arguments.

* refactor: implemented  class interface retry into YahooCollectorBR

* docs: added BR as a possible region into the documentation

* refactor: make retry attribute part of the interface

This way we don't have to use hasattr to access the retry attribute as previously done
@SunsetWolf
Copy link
Collaborator

Hi, @igor17400
This script is not working right now because of the missing datasource, hopefully the datasource can be updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add new dataset and index: ibovespa and B3 datasets from Brazil Stock Exchange
4 participants