MAINT: get_stock_dividends shouldn't read from tuple and replacing pd.TimeGrouper by pd.Grouper #2621

samatix · 2020-01-18T12:53:33Z

The function get_stock_dividends shouldn't read from tuple as the order of the columns is not known. This lead to errors when upgrading the dependencies to higher releases.
Adding a dictionary (fast) to get the id for the each datum is safer.

…the order of the columns is not known. Adding a dictionary (fast) to get the id for the each datum

coveralls · 2020-01-18T15:23:20Z

Coverage increased (+0.07%) to 88.035% when pulling b4c4349 on samatix:mainstream into c825927 on quantopian:master.

ssanderson

@samatix thanks for the PR! The pandas grouper change looks good to me. The adjustments change looks like it fixes a potential problem, but it also looks to me like the functionality it's fixing is nearly unused and was implemented somewhat questionably. My suggestion there would be to remove most of that functionatlity entirely in favor of a narrower method on SQLiteAdjustmentReader.

ssanderson · 2020-01-21T13:42:47Z

tests/pipeline/test_downsampling.py

@@ -649,19 +649,19 @@ def check_downsampled_term(self, term):

        expected_results = {
            'year': (raw_term_results
-                     .groupby(pd.TimeGrouper('AS'))
+                     .groupby(pd.Grouper(freq='AS'))


👍 for this change.

ssanderson · 2020-01-21T13:51:05Z

zipline/data/data_portal.py

-                "record_date": pd.Timestamp(dividend_tuple[6], unit="s"),
-                "sid": dividend_tuple[7]
+                "declared_date":
+                    dividend_tuple[self._dividends_fields['declared_date']],


A possibly simpler option here would be to specify the column list in the SELECT rather than doing a SELECT *.

ssanderson · 2020-01-21T14:02:23Z

zipline/data/data_portal.py

@@ -297,6 +297,17 @@ def __init__(self,
            if self._first_trading_day is not None else None
        )

+        # Store the location of the dividends table fields
+        if self._adjustment_reader is not None:
+            stock_dividend_payouts_fields = self._adjustment_reader.conn.\


Hmm. I'm not excited about adding more places in DataPortal where we're depending on using adjustment_reader.conn. The intent of the SQLiteAdjustmentReader class is to provide an interface to split and dividend data that hides the implementation details of how those values are actually stored. The fact that we're making SQL queries directly here is a bit awkward in that regard.

It looks to me like the only call site for get_stock_dividends is in the BenchmarkSource class, where we call it to check if a stock has any stock dividends (and if so, we raise an error). My guess is we're doing that because the benchmark source doesn't properly account for stock dividends.

So, given the above concerns, I think my ideal solution here would be to remove DataPortal.get_stock_dividends entirely in favor of a narrower method on SQLiteAdjustmentReader that can be used to query for whether or not a given sid has stock dividends. Thoughts?

samatix added 2 commits January 18, 2020 13:50

ENH: The function to get dividend stock shouldn't read from tuple as …

efeeb7f

…the order of the columns is not known. Adding a dictionary (fast) to get the id for the each datum

ENH: Replacing the deprecated function pd.TimeGrouper by pd.Grouper

1149d16

samatix changed the title ~~MAINT: The function get_stock_dividends shouldn't read from tuple~~ MAINT: get_stock_dividends shouldn't read from tuple and replacing pd.TimeGrouper by pd.Grouper Jan 18, 2020

STY: Style amendment on data_partal.py to pass flake8 tests

b4c4349

ssanderson reviewed Jan 21, 2020

View reviewed changes

samatix closed this Mar 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAINT: get_stock_dividends shouldn't read from tuple and replacing pd.TimeGrouper by pd.Grouper #2621

MAINT: get_stock_dividends shouldn't read from tuple and replacing pd.TimeGrouper by pd.Grouper #2621

samatix commented Jan 18, 2020

coveralls commented Jan 18, 2020 •

edited

Loading

ssanderson left a comment

ssanderson Jan 21, 2020

ssanderson Jan 21, 2020

ssanderson Jan 21, 2020

MAINT: get_stock_dividends shouldn't read from tuple and replacing pd.TimeGrouper by pd.Grouper #2621

MAINT: get_stock_dividends shouldn't read from tuple and replacing pd.TimeGrouper by pd.Grouper #2621

Conversation

samatix commented Jan 18, 2020

coveralls commented Jan 18, 2020 • edited Loading

ssanderson left a comment

Choose a reason for hiding this comment

ssanderson Jan 21, 2020

Choose a reason for hiding this comment

ssanderson Jan 21, 2020

Choose a reason for hiding this comment

ssanderson Jan 21, 2020

Choose a reason for hiding this comment

coveralls commented Jan 18, 2020 •

edited

Loading