Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Top-level API for time grouping #49

Closed
jhamman opened this issue Oct 20, 2020 · 5 comments · Fixed by #51
Closed

Top-level API for time grouping #49

jhamman opened this issue Oct 20, 2020 · 5 comments · Fixed by #51

Comments

@jhamman
Copy link
Member

jhamman commented Oct 20, 2020

This package has a growing number of methods that rely on a DateTimeIndex inside the estimators. I'm beginning to think this design decision is complicating the implementation of individual methods (see @dgergel's great and persistent attempts in #28 as an example) and that we should consider stepping back from the current approach. The use DateTimeIndexes inside the estimators is also the primary divergence from full scikit-learn integration. In this issue, I'll propose a new top-level API that supports using time indexes for grouped training/prediction outside individual regressors/transformers.

I'll start by outlining a few common time-grouping approaches:

My proposal is that we develop (or in some cases, continue to develop) a series of grouper classes that perform these (sometimes) esoteric grouping operations and that we utilize these grouper classes with an API object that supports training and prediction.

The grouper concept may look something like this:

index = ds.indexes['time']
# or
index = df.index

group_iter = PaddedDOYGrouper(index, window=15)

under the hood, these groupers would support iteration like this...

# pandas
for inds in group_iter:
    df_group = df.iloc[inds]

# xarray
for inds in group_iter:
    ds_group = ds.isel(time=inds)

We could then write a simple model API that combines the grouper object with a proper sklearn-compatible regressors/pipelines:

arrm_model = GroupedRegressor(estimator=PiecewiseLinearRegression, grouper=PaddedDOYGrouper)

arrm_model.fit(X_df, y_df)  # -> fits multiple PiecewiseLinearRegression for each group produced by the grouper
...

If done correctly, I think we can share the Groupers between Pandas and Xarray applications allowing us to use these either at the PointWiseDownscaler level or for individual points.


@jukent and @dgergel - I'm curious to hear from you on the potential feasibility of this approach for the methods you have developed here. Am I missing anything in that would keep us from executing on this sort of API?

@jukent
Copy link
Contributor

jukent commented Oct 21, 2020

I think this is a great idea and doable. Maybe we should set up a meeting at the end of next week?

@dgergel
Copy link
Contributor

dgergel commented Oct 21, 2020

I agree with @jukent, I think this is a great idea and a good way forward for dealing with the DatetimeIndex issues that have been getting increasingly complicated with the newer grouper classes. Maybe we can aim to schedule a call next Thursday or Friday?

@dgergel
Copy link
Contributor

dgergel commented Oct 26, 2020

@jhamman @jukent dropping this when2meet in for finding a time to discuss this either later this week or sometime next week.

@jhamman
Copy link
Member Author

jhamman commented Oct 27, 2020

I'm out of commission this week but would be happy to chat about this next week. I filled out your poll. Thanks for setting it up.

@jukent
Copy link
Contributor

jukent commented Oct 27, 2020

I also filled out the poll. Thanks for putting that together.

@jhamman jhamman mentioned this issue Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants