def runavg(ts, w): #4

robwschlegel · 2018-05-29T07:19:54Z

'''

Performs a running average of an input time series using uniform window of width w. This function assumes that the input time series is periodic.

Inputs:

  ts            Time series [1D numpy array]
  w             Integer length (must be odd) of running average window

Outputs:

  ts_smooth     Smoothed time series

The text was updated successfully, but these errors were encountered:

robwschlegel · 2018-05-29T07:20:49Z

Isn't this what smooth_percentile() does? I think this has been accounted for and may be closed.

robwschlegel · 2019-01-23T19:40:36Z

This may actually be a problematic difference when missing values are present. I'm going to investigate this on Friday and report back.

ajsmit · 2019-01-24T04:13:28Z

What exactly are you referring to here? About how it is implemented in python or R?

robwschlegel · 2019-01-24T13:21:00Z

I have discovered that as more data are missing in a time series, the two languages begin calculating the 90th percentile thresholds differently. At 10% missing data these differences become large enough to begin affecting the number of events detected. I went over the R code in minute detail to track down where this happens, and next need to go through the python code. I think there are a couple of places where the languages may be handling missing values differently when calculating percentiles. But it looks like this is the most.

ajsmit · 2019-01-24T15:16:17Z

The percentile function is written in C++... I hope it is not there. That'll be a pain to fix. Maybe in one of the tests substitute the C++ bit that I wrote with an R equivalent and see what happens. — A/Prof. AJ Smit | Department of Biodiversity & Conservation Biology | University of the Western Cape | Private Bag X17 | Bellville 7535 | South Africa Work tel.: +27 21 959 3783 <//+27 21 959 3783> | Fax.: +27 21 959 2312 <//+27 21 959 2312> | Mobile: +27 78 300 6005 <//+27 78 300 6005> GitHub: https://github.com/ajsmit R packages: https://github.com/ajsmit/RmarineHeatWaves https://robwschlegel.github.io/heatwaveR From: William Schlegel Robert <notifications@github.com> <notifications@github.com> Reply: robwschlegel/heatwaveR <reply@reply.github.com> <reply@reply.github.com> Date: 24 January 2019 at 15:21:01 To: robwschlegel/heatwaveR <heatwaver@noreply.github.com> <heatwaver@noreply.github.com> Cc: Smit AJ <albertus.smit@gmail.com> <albertus.smit@gmail.com>, Comment <comment@noreply.github.com> <comment@noreply.github.com> Subject: Re: [robwschlegel/heatwaveR] def runavg(ts, w): (#4) I have discovered that as more data are missing in a time series, the two languages begin calculating the 90th percentile thresholds differently. At 10% missing data these differences become large enough to begin affecting the number of events detected. I went over the R code in minute detail to track down where this happens, and next need to go through the python code. I think there are a couple of places where the languages may be handling missing values differently when calculating percentiles. But it looks like this is the most. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABaZAaueG4WuYu1EHibFaoODFbAnGwhRks5vGbM8gaJpZM4UQ9Iv> .

robwschlegel · 2019-01-24T15:29:42Z

We have the old quantile function clim_calc() still in the package. It returns the exact same results as the C++ version, so that is good.
Apparently all of the high-level programming languages calculate quantiles differently... sigh. The R functionquantile() provides 8 different options. The default is 7, which is what your C++ code does, too. Option 8 appears to be a bit closer to Python, but does not come close to covering the gap created by the missing data. That's why I think the problem lies somewhere else. The quantile calculations differ only by a very very slim margin. My current thinking is that one language or the other may still count missing data in the overall n of data before finding means, thereby reducing the returned value. That would however be very silly. But I can't presently think of any other reason except for that the running means between Python and R behave differently in the presence of missing data.
So this Friday Eric and I will go through the source code of both languages step-by-step to see where the divergence occurs.

ajsmit · 2019-01-24T18:30:04Z

You said "language or the other may still count missing data in the overall n of data before finding means"... I don't think this is possible in R or C++ as the mean functions there do not work when NAs are present. It shouldn't work anywhere for that matter. Maybe it is that moving window function, which was quite tricky to implement. It is not the traditional moving window function, because it runs along a block of data (n doys x m years), and if missing values are in there it might be handled differently in python and R. — A/Prof. AJ Smit | Department of Biodiversity & Conservation Biology | University of the Western Cape | Private Bag X17 | Bellville 7535 | South Africa Work tel.: +27 21 959 3783 <//+27 21 959 3783> | Fax.: +27 21 959 2312 <//+27 21 959 2312> | Mobile: +27 78 300 6005 <//+27 78 300 6005> GitHub: https://github.com/ajsmit R packages: https://github.com/ajsmit/RmarineHeatWaves https://robwschlegel.github.io/heatwaveR From: William Schlegel Robert <notifications@github.com> <notifications@github.com> Reply: robwschlegel/heatwaveR <reply@reply.github.com> <reply@reply.github.com> Date: 24 January 2019 at 17:29:43 To: robwschlegel/heatwaveR <heatwaver@noreply.github.com> <heatwaver@noreply.github.com> Cc: Smit AJ <albertus.smit@gmail.com> <albertus.smit@gmail.com>, Comment <comment@noreply.github.com> <comment@noreply.github.com> Subject: Re: [robwschlegel/heatwaveR] def runavg(ts, w): (#4) language or the other may still count missing data in the overall n of data before finding means

robwschlegel · 2019-01-25T13:03:05Z

We have the old quantile function clim_calc() still in the package. It returns the exact same results as the C++ version, so that is good.
Apparently all of the high-level programming languages calculate quantiles differently... sigh. The R functionquantile() provides 8 different options. The default is 7, which is what your C++ code does, too. Option 8 appears to be a bit closer to Python, but does not come close to covering the gap created by the missing data. That's why I think the problem lies somewhere else. The quantile calculations differ only by a very very slim margin. My current thinking is that one language or the other may still count missing data in the overall n of data before finding means, thereby reducing the returned value. That would however be very silly. But I can't presently think of any other reason except for that the running means between Python and R behave differently in the presence of missing data.
So this Friday Eric and I will go through the source code of both languages step-by-step to see where the divergence occurs.

robwschlegel created this issue from a note in Implement same functionality as python code (In progress) May 29, 2018

robwschlegel moved this from In progress to Done in Implement same functionality as python code Oct 30, 2023

robwschlegel closed this as completed Oct 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

def runavg(ts, w): #4

def runavg(ts, w): #4

robwschlegel commented May 29, 2018

robwschlegel commented May 29, 2018

robwschlegel commented Jan 23, 2019

ajsmit commented Jan 24, 2019

robwschlegel commented Jan 24, 2019

ajsmit commented Jan 24, 2019 via email

robwschlegel commented Jan 24, 2019

ajsmit commented Jan 24, 2019 via email

robwschlegel commented Jan 25, 2019

def runavg(ts, w): #4

def runavg(ts, w): #4

Comments

robwschlegel commented May 29, 2018

robwschlegel commented May 29, 2018

robwschlegel commented Jan 23, 2019

ajsmit commented Jan 24, 2019

robwschlegel commented Jan 24, 2019

ajsmit commented Jan 24, 2019 via email

robwschlegel commented Jan 24, 2019

ajsmit commented Jan 24, 2019 via email

robwschlegel commented Jan 25, 2019