Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

def runavg(ts, w): #4

Closed
robwschlegel opened this issue May 29, 2018 · 8 comments
Closed

def runavg(ts, w): #4

robwschlegel opened this issue May 29, 2018 · 8 comments

Comments

@robwschlegel
Copy link
Owner

'''

Performs a running average of an input time series using uniform window of width w. This function assumes that the input time series is periodic.

Inputs:

  ts            Time series [1D numpy array]
  w             Integer length (must be odd) of running average window

Outputs:

  ts_smooth     Smoothed time series
@robwschlegel robwschlegel created this issue from a note in Implement same functionality as python code (In progress) May 29, 2018
@robwschlegel
Copy link
Owner Author

Isn't this what smooth_percentile() does? I think this has been accounted for and may be closed.

@robwschlegel
Copy link
Owner Author

This may actually be a problematic difference when missing values are present. I'm going to investigate this on Friday and report back.

@ajsmit
Copy link
Collaborator

ajsmit commented Jan 24, 2019

What exactly are you referring to here? About how it is implemented in python or R?

@robwschlegel
Copy link
Owner Author

I have discovered that as more data are missing in a time series, the two languages begin calculating the 90th percentile thresholds differently. At 10% missing data these differences become large enough to begin affecting the number of events detected. I went over the R code in minute detail to track down where this happens, and next need to go through the python code. I think there are a couple of places where the languages may be handling missing values differently when calculating percentiles. But it looks like this is the most.

@ajsmit
Copy link
Collaborator

ajsmit commented Jan 24, 2019 via email

@robwschlegel
Copy link
Owner Author

We have the old quantile function clim_calc() still in the package. It returns the exact same results as the C++ version, so that is good.
Apparently all of the high-level programming languages calculate quantiles differently... sigh. The R functionquantile() provides 8 different options. The default is 7, which is what your C++ code does, too. Option 8 appears to be a bit closer to Python, but does not come close to covering the gap created by the missing data. That's why I think the problem lies somewhere else. The quantile calculations differ only by a very very slim margin. My current thinking is that one language or the other may still count missing data in the overall n of data before finding means, thereby reducing the returned value. That would however be very silly. But I can't presently think of any other reason except for that the running means between Python and R behave differently in the presence of missing data.
So this Friday Eric and I will go through the source code of both languages step-by-step to see where the divergence occurs.

@ajsmit
Copy link
Collaborator

ajsmit commented Jan 24, 2019 via email

@robwschlegel
Copy link
Owner Author

We have the old quantile function clim_calc() still in the package. It returns the exact same results as the C++ version, so that is good.
Apparently all of the high-level programming languages calculate quantiles differently... sigh. The R functionquantile() provides 8 different options. The default is 7, which is what your C++ code does, too. Option 8 appears to be a bit closer to Python, but does not come close to covering the gap created by the missing data. That's why I think the problem lies somewhere else. The quantile calculations differ only by a very very slim margin. My current thinking is that one language or the other may still count missing data in the overall n of data before finding means, thereby reducing the returned value. That would however be very silly. But I can't presently think of any other reason except for that the running means between Python and R behave differently in the presence of missing data.
So this Friday Eric and I will go through the source code of both languages step-by-step to see where the divergence occurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants