-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up TimeSeriesImputerTransform
#217
Conversation
🚀 Deployed on https://deploy-preview-217--etna-docs.netlify.app |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #217 +/- ##
=======================================
Coverage 89.03% 89.04%
=======================================
Files 199 199
Lines 13183 13200 +17
=======================================
+ Hits 11738 11754 +16
- Misses 1445 1446 +1 ☔ View full report in Codecov by Sentry. |
- If "seasonal" then replace missing dates using seasonal moving average | ||
- If "seasonal" then replace missing dates using seasonal moving average in autoregressive manner | ||
|
||
- If "seasonal_statistics" then replace missing dates using seasonal moving average on existing values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why do you suggest this name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't come up with better naming, do you have any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it isn't autorregressive how does it work with missing values? It isn't clear from the docs. We should clarify the differences.
Before submitting (must do checklist)
Flame graphs(before)
Proposed Changes
Rewrite NaNs restoring logic to masks(now we store binary mask for nans instead of
nan_timestamps
)Optimize
_fill
implementation for different strategies:SimpleImputer
from sklearn, rewrite it to work correctly on subset of segmentsMeanTransform
to obtain values to use for imputationFix bug in
TimeSeriesImputerTransform
created with window=-1 and seasonality!=1Add new strategy "seasonal_statistics" -- fill missing values using only existing values
Flame graphs(after)
Additional thoughts
update_columns_from_pandas
withset_columns_wide
to_pandas
takes most of the time, looks like we can stop optimizing on this point -- however we can think about optimizing the base operations as the might take large time at scaleCode to reproduce
Closing issues