-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add get_equidistant_timeseries()
method
#338
Conversation
- especially fix issue for cases when first index timing is different to others
method will attempt use any unsampled points from original timeseries to fill some remaining NaNs in the new equidistant timeseries. This only happens in rare cases.
- improve filling with unused observations from original series
- credit to @martinvonk for the examples
Stoffer-toloi is for missing data, whereas many time series are irregular. One way to deal with the irregular time steps is to interpolate and get missing data time series. Another approach would be to conclude that the stoffer-toloi is not applicable. Yet another method would be is to select a smaller time step through the I would prefer to make the interpolation in the stoffer-toloi test optional through a keyword-argument in that method, and give users control over that. The current PR makes a fundamental change without users being aware which should IMO be prevented. |
I'm okay with making the behavior optional, but if we use the And to be clear, the current implementation in pastas will silently drop a bunch of noise values if the original timeseries timesteps are not equidistant, especially if the timestamp of the first observation is different from others in the timeseries. The method Update: Also I propose changing the t_offset = ps.utils._get_time_offset(series.index, freq).value_counts().idxmax()
new_idx = pd.date_range(
series.index[0].floor(freq) + t_offset,
series.index[-1].floor(freq) + t_offset,
freq=freq
)
series.reindex(new_idx) This avoids losing a lot of data if the first observation happens to have a different time offset than the other observations. EDIT: improve snippet using |
- pick modified asfreq method OR - get_equidistant_timeseries() update example notebook with asfreq sampling method
Short Description
Add method for getting equidistant timeseries. Could be useful sometimes, but especially necessary for the
stoffer_toloi
test (see #336). This test was usingpd.Series.asfreq
to get an equidistant timeseries of the noise but in certain situations this dropped a lot of data, resulting in a calculation being performed on a subset of the noise. This new method addresses that, but it does perform some nearest interpolation. The old method did not interpolate, and only sampled existing data points that occur at the user-specified frequency.I'm still looking for some input, as I'm not entirely sure whether we would prefer the rather complicated method I added, or something simpler using pandas methods with a few limitations as shown in the notebook. Also not sure if we want a notebook showcasing this method necessarily. So suggestions/feedback welcome on this PR!
Checklist before PR can be merged: