-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT: get_stock_dividends shouldn't read from tuple and replacing pd.TimeGrouper by pd.Grouper #2621
Conversation
…the order of the columns is not known. Adding a dictionary (fast) to get the id for the each datum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samatix thanks for the PR! The pandas grouper change looks good to me. The adjustments change looks like it fixes a potential problem, but it also looks to me like the functionality it's fixing is nearly unused and was implemented somewhat questionably. My suggestion there would be to remove most of that functionatlity entirely in favor of a narrower method on SQLiteAdjustmentReader
.
@@ -649,19 +649,19 @@ def check_downsampled_term(self, term): | |||
|
|||
expected_results = { | |||
'year': (raw_term_results | |||
.groupby(pd.TimeGrouper('AS')) | |||
.groupby(pd.Grouper(freq='AS')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for this change.
"record_date": pd.Timestamp(dividend_tuple[6], unit="s"), | ||
"sid": dividend_tuple[7] | ||
"declared_date": | ||
dividend_tuple[self._dividends_fields['declared_date']], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A possibly simpler option here would be to specify the column list in the SELECT
rather than doing a SELECT *
.
@@ -297,6 +297,17 @@ def __init__(self, | |||
if self._first_trading_day is not None else None | |||
) | |||
|
|||
# Store the location of the dividends table fields | |||
if self._adjustment_reader is not None: | |||
stock_dividend_payouts_fields = self._adjustment_reader.conn.\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. I'm not excited about adding more places in DataPortal
where we're depending on using adjustment_reader.conn
. The intent of the SQLiteAdjustmentReader
class is to provide an interface to split and dividend data that hides the implementation details of how those values are actually stored. The fact that we're making SQL queries directly here is a bit awkward in that regard.
It looks to me like the only call site for get_stock_dividends
is in the BenchmarkSource class, where we call it to check if a stock has any stock dividends (and if so, we raise an error). My guess is we're doing that because the benchmark source doesn't properly account for stock dividends.
So, given the above concerns, I think my ideal solution here would be to remove DataPortal.get_stock_dividends
entirely in favor of a narrower method on SQLiteAdjustmentReader
that can be used to query for whether or not a given sid has stock dividends. Thoughts?
The function get_stock_dividends shouldn't read from tuple as the order of the columns is not known. This lead to errors when upgrading the dependencies to higher releases.
Adding a dictionary (fast) to get the id for the each datum is safer.