Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add get_equidistant_timeseries() method #338

Merged
merged 13 commits into from
Nov 29, 2021
Merged

Add get_equidistant_timeseries() method #338

merged 13 commits into from
Nov 29, 2021

Conversation

dbrakenhoff
Copy link
Member

@dbrakenhoff dbrakenhoff commented Nov 11, 2021

Short Description

Add method for getting equidistant timeseries. Could be useful sometimes, but especially necessary for the stoffer_toloi test (see #336). This test was using pd.Series.asfreq to get an equidistant timeseries of the noise but in certain situations this dropped a lot of data, resulting in a calculation being performed on a subset of the noise. This new method addresses that, but it does perform some nearest interpolation. The old method did not interpolate, and only sampled existing data points that occur at the user-specified frequency.

I'm still looking for some input, as I'm not entirely sure whether we would prefer the rather complicated method I added, or something simpler using pandas methods with a few limitations as shown in the notebook. Also not sure if we want a notebook showcasing this method necessarily. So suggestions/feedback welcome on this PR!

Checklist before PR can be merged:

dbrakenhoff and others added 6 commits November 8, 2021 16:13
- especially fix issue for cases when first index timing is different to others
method will attempt use any unsampled points from original timeseries to fill some remaining NaNs in the new equidistant timeseries. This only happens in rare cases.
- improve filling with unused observations from original series
@dbrakenhoff dbrakenhoff added the enhancement Indicates improvement of existing features label Nov 11, 2021
@raoulcollenteur
Copy link
Member

Stoffer-toloi is for missing data, whereas many time series are irregular. One way to deal with the irregular time steps is to interpolate and get missing data time series. Another approach would be to conclude that the stoffer-toloi is not applicable. Yet another method would be is to select a smaller time step through the freq argument (perhaps this is the best).

I would prefer to make the interpolation in the stoffer-toloi test optional through a keyword-argument in that method, and give users control over that. The current PR makes a fundamental change without users being aware which should IMO be prevented.

@dbrakenhoff
Copy link
Member Author

dbrakenhoff commented Nov 24, 2021

I'm okay with making the behavior optional, but if we use the asfreq() method as an alternative, I think we need to add a warning when the resulting equidistant noise series is missing more than some percentage of the number of original noise values?

And to be clear, the current implementation in pastas will silently drop a bunch of noise values if the original timeseries timesteps are not equidistant, especially if the timestamp of the first observation is different from others in the timeseries. The method get_equidistant_series() attempts to prevent this loss of data by creating an equidistant series that takes as much data as possible from the original noise series. It does shift observations in time (slightly) to create this equidistant timeseries. I think the latter is preferable in situations when asfreq() would drop a lot of values. opti

Update: Also I propose changing the asfreq() method to check which time offset is the most common:

t_offset = ps.utils._get_time_offset(series.index, freq).value_counts().idxmax()
new_idx = pd.date_range(
    series.index[0].floor(freq) + t_offset,
    series.index[-1].floor(freq) + t_offset, 
    freq=freq
    )
series.reindex(new_idx)

This avoids losing a lot of data if the first observation happens to have a different time offset than the other observations.

EDIT: improve snippet using floor() instead of normalize()

- pick modified asfreq method OR
- get_equidistant_timeseries()
update example notebook with asfreq sampling method
@dbrakenhoff dbrakenhoff merged commit 8883396 into dev Nov 29, 2021
@raoulcollenteur raoulcollenteur deleted the asfreq branch December 1, 2021 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Indicates improvement of existing features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENHANCEMENT] improve reindexing to equidistant timeseries in stoffer_toloi
3 participants