Error in approxfun(vals, cumsum(tabulate(match(x, vals)))/(n + 1), method = "linear", : need at least two non-NA values to interpolate #195

b-jorges · 2020-07-20T05:13:51Z

Heyho,

I am getting the following error for one specification of a GLMM:

GLMM_RandomIntercepts_PSE = glmer(cbind(Yes, Total - Yes) ~ ConditionOfInterest*Difference + (1| ID), 
                                  family = binomial(link = "probit"),
                                  data = Psychometric)

I tried playing around a bit,

adding random slopes for "Difference" per ID (didn't work),
changing the probit link for a logit link, but keeping the random effects as is (worked)
adding random slopes for "Difference" per ID AND changing the probit link for a logit link (didn't work)

For context: I am working on some guidelines on how to best analyze a particular type of data, so I am looking into the "best" model specification from different perspectives, one of which are to what extent they meet the assumptions. So it would be extremely useful for me to be able to compare all relevant specifications.

Here is a (trimmed) version of the analyses (includes simulation of data), including the four cases I am mentioning:
https://github.com/b-jorges/Power-Analyses-Psychophysics/blob/master/Sample%20Code.R

Do you see any simple fix for that?

Thanks,
Björn

Ps: Awesome package!

The text was updated successfully, but these errors were encountered:

florianhartig · 2020-07-20T08:16:28Z

OK, for future reference, I paste the reproducible example here

require(dplyr)
require(tidyverse)
require(lme4)
require(DHARMa)


set.seed(1)


ID = paste0("S0",1:10)
ConditionOfInterest = c(0,1)
StandardValues = c(5,6,7,8)
reps = 1:100

PSE_Difference = 0.5
JND_Difference = 0.5

Multiplicator_PSE_Standard = 0
Multiplicator_SD_Standard = 0.15
Type_ResponseFunction = "Normal"
SD_ResponseFunction = 0.1
Mean_Variability_Between = 0.1
SD_Variability_Between = 0.1


Psychometric = expand.grid(ID=ID, ConditionOfInterest=ConditionOfInterest, StandardValues=StandardValues, reps = reps)

Psychometric = Psychometric %>%
  group_by(ID) %>%#
  mutate(PSE_Factor_ID = rnorm(1,1,Mean_Variability_Between), #how much variability is in the means of the psychometric functions between subjects?
         SD_Factor_ID = rnorm(1,1,SD_Variability_Between)) #how much variability is in the standard deviations of the psychometric functions between subjects?

Psychometric = Psychometric %>%
  mutate(
    Mean_Standard = StandardValues+StandardValues*Multiplicator_PSE_Standard,
    SD_Standard = StandardValues*Multiplicator_SD_Standard,
    Mean = (Mean_Standard + (ConditionOfInterest==1)*Mean_Standard*PSE_Difference),
    SD = abs(SD_Standard + (ConditionOfInterest==1)*SD_Standard*JND_Difference))

Psychometric = Psychometric %>%
  mutate(
    Mean = Mean*PSE_Factor_ID,
    SD = SD*SD_Factor_ID)

if (Type_ResponseFunction == "normal"){
  
  Psychometric = Psychometric %>%
    mutate(
      staircase_factor = pnorm(length(reps),1,SD_ResponseFunction*(1+ConditionOfInterest*JND_Difference)))
  
} else if (Type_ResponseFunction == "Cauchy"){
  
  Psychometric = Psychometric %>%
    mutate(
      staircase_factor = rcauchy(length(reps),1,SD_ResponseFunction*(1+ConditionOfInterest*JND_Difference)))
  
  #} else if (Type_ResponseFunction == "uniform"){
  #  
  #  Psychometric = Psychometric %>%
  #  mutate(
  #      staircase_factor = seq(SD_ResponseFunction[1],SD_ResponseFunction[2],(SD_ResponseFunction[2]-SD_ResponseFunc#tion[1]/6)))
  
} else{
  
  print("distribution not valid")
  
}

Psychometric = Psychometric %>%
  mutate(
    staircase_factor = rcauchy(length(reps),1,SD_ResponseFunction), 
    Presented_TestStimulusStrength = Mean*staircase_factor,
    Difference = Presented_TestStimulusStrength - StandardValues)

Psychometric = Psychometric %>%
  mutate(
    AnswerProbability = pnorm(Presented_TestStimulusStrength,Mean,SD),
    
    ##get binary answers ("Test was stronger" yes/no) from probabilities for each trial
    Answer = as.numeric(rbernoulli(length(AnswerProbability),AnswerProbability))
  )

###prepare for glmer()
Psychometric = Psychometric %>%
  filter(abs(staircase_factor-1) < 0.75) %>%
  group_by(ID,ConditionOfInterest,StandardValues,Difference) %>%
  mutate(Yes = sum(Answer==1),
         Total = length(ConditionOfInterest))




GLMM_RandomIntercepts_PSE = glmer(cbind(Yes, Total - Yes) ~ ConditionOfInterest + Difference + (1| ID), 
                                  family = binomial(link = "probit"),
                                  data = Psychometric)

Sim = simulateResiduals(GLMM_RandomIntercepts_PSE)
plot(Sim)

florianhartig · 2020-07-20T08:18:50Z

The problem is there are several data points in the simulation that produce zero variance. You can see this via

apply(getSimulations(GLMM_RandomIntercepts_PSE, 100), 1, var)

I think other users had this problem before. I guess the best way to deal with it would be to calculate the residual to NA instead of throwing an error. I should have implemented that before and will do so now!

florianhartig · 2020-07-20T08:35:51Z

After a bit of thinking, I guess it should be evaluate to U[0,1] instead of NA. I have implemented this now, and it seems to work OK, will push this to master, if you install the development version this should work now.

florianhartig · 2020-07-22T06:52:48Z

Note to self - this needs a small correction to check for < and >

Istalan · 2020-08-21T13:23:38Z

Hi,

This appears to be an issue stemming from the function DHARMa.ecdf and while i'm in favor of not calling it in the PIT algorithm it would also be good to rewrite in order to be able to deal with vectors that only have one unique value. My proposition would be to use a value just smaller than the data: min(x)-e providing a function value of 0:

DHARMa.ecdf <- function (x)
{
  x <- sort(x)
  n <- length(x)
  if (n < 1)
    stop(paste("DHARMa.ecdf - length vector < 1", x))
  e <- 10^-15 # numeric close to zero to identify a point just before the data with F(min(x) -e) = 0
  vals <- c(x[1] - e, unique(x))
 
  rval <- approxfun(vals, cumsum(tabulate(match(x, vals)))/ (n +1) ,
                    method = "linear", yleft = 0, yright = 1, ties = "ordered")
  class(rval) <- c("ecdf", "stepfun", class(rval))
  assign("nobs", n, envir = environment(rval))
  attr(rval, "call") <- sys.call()
  rval
}

This approach has some weirdness with regard to numerical precision, but if that is relevant to the data-set, than R is probably not the right platform anyway. Also, weirdly, both this and the original function divides by n+1 which leads to F(max(x)) = 1-1/(n+1). This seams to be intentional but i'm not all that convinced by the overall setup.

Here are my tests:

my_ecdf <- DHARMa.ecdf(rep(1, 250))

my_ecdf(-1)
my_ecdf(0)
e <- 10^-15 
my_ecdf(1-e)
# strange behaviour at the limit of numerical precision
my_ecdf(1-e/1.05)
my_ecdf(1-(e/1.06))
my_ecdf(1) # this is also happens with the as is version, since it divides by n+1
1-(1/251)
my_ecdf(1 + e) # making it slightly larger obtains the 1

The default ecdf function behaves precisely as expected, but is not symmetric or anything, so perhapps one could use something like (ecdf(x) + ecdf(x+e))/2, sadly this doesn't work on it's own, so a bit additional effort would have to be invested.

my_ecdf <- ecdf(rep(1, 250))

my_ecdf(-1)
my_ecdf(0)
e <- 10^-15 
my_ecdf(1-e)
# numerical issues happen at smaller values
my_ecdf(1-e/10)
my_ecdf(1-(e/20))
my_ecdf(1) # takes the standard value of 1 at max(x)
1-(1/251)
my_ecdf(1 + e)

florianhartig · 2020-08-22T13:19:02Z

Hi Lukas, I agree with not calling DHARMa.ecdf in the PIT calculation, this is indeed unnecessary. About your modifications - this may very well be a good idea, but at the moment my priority is to correct the current errors in DHARMa asap. I will thus open a new ticked for the ecdf and move this to the next commit.

I just modified the code, the issue above should be solved for a standard normal user!

James-Thorson-NOAA · 2021-05-31T14:31:15Z

Hi all,

I just closed issue #286 but have some questions about the fix that was implemented here (which appears to be the same issue). Hopefully my questions are helpful and not annoying, and thanks for your attention.

My interpretation of PIT residuals is that:

If all of simulatedResponse[i,] (i.e., the simulated approximation to the distribution for sample i) are less than observedResponse[i] (the observed value for sample i), then the quantile residual should be 1.
Similarly, if all simulatedResponse[i,] are greater than observedResponse[i], then the quantile residual should be 0.

This is true even in instances when var(simulatedResponse[i,])=0, where these instances were throwing an error. In such cases, the third possible scenario is:

if all simulatedResponse[i,] are equal to observedResponse[i], then the quantile residual should be U(0,1).

I believe these three are a complete set of possible outcomes when var=0, and my interpretation of the proper fix appears to differ from some of the proposals discussed in the thread, i.e., using U(0,1) or fixing a value of 0 for all three cases when any simulated variance is zero (where by my logic these solutions are appropriate for some but not all such cases).

Do y'all want to discuss further here? If yes, are you willing to summarize what the ultimate fix was that was implemented?

Istalan · 2021-05-31T14:58:57Z

Hi,

there was no need to apologize in the original thread, i just felt awkward answering without a preface.

Anyway, this issue was resolved by changing the PIT-algorithm to no longer use the custom ecdf. You can find the current implementation here:

DHARMa/DHARMa/R/helper.R

Lines 87 to 93 in bcf9e1e

    
           scaledResiduals = rep(NA, n) 
        
           for (i in 1:n){ 
        
             minSim <- mean(simulations[i,] < observed[i])  
        
             maxSim <- mean(simulations[i,] <= observed[i])  
        
             if (minSim == maxSim) scaledResiduals[i] = minSim 
        
             else scaledResiduals[i] = runif(1, minSim, maxSim) 
        
           }

I agree with your interpretation and i'm quite sure that we implemented that behavior. Plus, if all simulations and the observed value are identical the residual is U[0, 1] distributed. The discussion on this issue is hard to follow as it simultaneously toke place in #197, which had a different focus.

Please feel free to double check the behavior of the current implementation. It can't hurt.

florianhartig · 2021-05-31T18:02:43Z

Hi @James-Thorson-NOAA - as @Istalan said, my reading is that the code is implementing the behavior that you describe.

James-Thorson-NOAA · 2021-05-31T21:32:35Z

I agree, thanks for pointing me to the code; the snippet appears to do what I describe (and previous and ongoing use testing doesn't indicate any problems either).

florianhartig changed the title ~~Error for simulateResiduals() with glmerMOD object~~ Error in approxfun(vals, cumsum(tabulate(match(x, vals)))/(n + 1), method = "linear", : need at least two non-NA values to interpolate Jul 20, 2020

florianhartig added enhancement minor bug labels Jul 20, 2020

florianhartig added this to the 0.3.3. milestone Jul 20, 2020

florianhartig closed this as completed in 7ae3a0d Jul 20, 2020

florianhartig reopened this Jul 22, 2020

Istalan mentioned this issue Aug 15, 2020

Model failing outlier test because of too few outliers? #197

Closed

florianhartig mentioned this issue Aug 22, 2020

DHARMa.ecdf function #199

Open

florianhartig closed this as completed in 3466b4a Aug 22, 2020

Istalan mentioned this issue May 31, 2021

Error: need at least two non-NA values to interpolate #286

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in approxfun(vals, cumsum(tabulate(match(x, vals)))/(n + 1), method = "linear", : need at least two non-NA values to interpolate #195

Error in approxfun(vals, cumsum(tabulate(match(x, vals)))/(n + 1), method = "linear", : need at least two non-NA values to interpolate #195

b-jorges commented Jul 20, 2020 •

edited

Loading

florianhartig commented Jul 20, 2020

florianhartig commented Jul 20, 2020

florianhartig commented Jul 20, 2020

florianhartig commented Jul 22, 2020

Istalan commented Aug 21, 2020 •

edited

Loading

florianhartig commented Aug 22, 2020

James-Thorson-NOAA commented May 31, 2021 •

edited

Loading

Istalan commented May 31, 2021 •

edited

Loading

florianhartig commented May 31, 2021

James-Thorson-NOAA commented May 31, 2021

Error in approxfun(vals, cumsum(tabulate(match(x, vals)))/(n + 1), method = "linear", : need at least two non-NA values to interpolate #195

Error in approxfun(vals, cumsum(tabulate(match(x, vals)))/(n + 1), method = "linear", : need at least two non-NA values to interpolate #195

Comments

b-jorges commented Jul 20, 2020 • edited Loading

florianhartig commented Jul 20, 2020

florianhartig commented Jul 20, 2020

florianhartig commented Jul 20, 2020

florianhartig commented Jul 22, 2020

Istalan commented Aug 21, 2020 • edited Loading

florianhartig commented Aug 22, 2020

James-Thorson-NOAA commented May 31, 2021 • edited Loading

Istalan commented May 31, 2021 • edited Loading

florianhartig commented May 31, 2021

James-Thorson-NOAA commented May 31, 2021

b-jorges commented Jul 20, 2020 •

edited

Loading

Istalan commented Aug 21, 2020 •

edited

Loading

James-Thorson-NOAA commented May 31, 2021 •

edited

Loading

Istalan commented May 31, 2021 •

edited

Loading