New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] end-to-end dask
integration
#4013
Comments
Another option (same spirit as option 1): pass to |
hm, indeed. Just from naive inspection, this seems to be two ways to achieve the same end. Do you have a good grip on which of |
@topher-lo, I would be keen to hear how you would design the dispatch inside |
Definitely |
@fkiraly FYI this is how nixtla/mlforecast implemented a distributed time series data structure: https://github.com/Nixtla/mlforecast/blob/main/nbs/distributed.core.ipynb |
This PR moves functionality for broadcast/vectorized application of estimator methods to `VectorizedDF`, to a new method called `vectorize_est`. It also serves to explore refactor end states. Reasons to move broadcast/vectorized application to `VectorizedDF`: * duplication between transformers and forecasters which have very similar logic implemented * current state can be seen as a violation of the law of Demeter; target state improves cohesion, reduces coupling between base classes and `VectorizedDF` * localized logic in `VectorizedDF` seems to be a more natural starting point for adding `dask` and scaling/distributed features, see #4013 Also contains a direct unit test for `vectorize_est`. Note that the new method `vectorize_est` is also indirectly covered by tests for vectorization functionality in transformers and forecasters, which also covers the external points of the refactor in `BaseForecaster` and `BaseTransformer`.
This issue collects a roadmap for achieving end-to-end integration with
dask
data containers at the framework (not estimator) level. For discussion and design.dask
based mtypes, especially for panel and hierarchical datadask
mtypes - part 1,Series
#3554dask
mtypes - part 2,Panel
andHierarchical
#4011dask
data frames and avoid calls tocompute
. Possibly:VectorizeDF
, which recognizesdask
and handles vectorization viadask
groupby
dask
natively, it is wrapped in adelayed
calldask
groupby
for vectorizationdask
support in key transformers: lag, windowing, aggregation/summarization, temporal featuresThe text was updated successfully, but these errors were encountered: