Calabrese bias correction (Maximum likelihood) #28

damariszurell · 2018-05-04T08:55:22Z

Dear Sylvain,

we were looking at your package recently as we were interested whether anybody has implemented the bias correction in MEMs proposed by Calabrese et al. 2014. However, when looking at your code, this does not seem to implement the procedure as described by Calabrese, unless I am completely misinterpreting your code. Your code:

if (method == "MaximumLikelihood") {
# Maximum likelihood (Calabrese et al, 2014)
if (verbose)
cat("\n Local species richness computed by maximum likelihood adjustment. \n")
diversity.map <- mapDiversity(obj, method = 'bSSDM',
verbose = FALSE)$diversity.map
Richness <- .richness(obj)
SSDM_Richness <- values(mask(diversity.map, Richness))
SSDM_Richness <- SSDM_Richness[-which(is.na(SSDM_Richness))]
Richness <- values(Richness)
Richness <- Richness[-which(is.na(Richness))]
fit <- lm(Richness ~ SSDM_Richness)
a <- fit$coefficients[1]
b <- fit$coefficients[2]
diversity.map <- a + b * diversity.map
}

Your code seems to simply regress observed against predicted richness and then uses the LM coefficients to correct occurrence probabilities. This seems different from what Calabrese et al. suggested: 1) logit-transform the occurrence probabilities, 2) apply additive adjustment that depends on richness, 3) then inverse-logit-transform back to probability scale. The adjustment parameters are estimated via maximum likelihood (using the package poibin), identifying the set of adjustment parameters that maximises the probability of the site-specific species richness given the adjusted occurrence probabilities (which depend on the original occurrence probabilities and on the adjustment parameters). Unless I am completely mistaken, this is not what your code is doing?

Best, Damaris

sylvainschmitt · 2018-05-14T14:17:01Z

Dear Damaris,

First thank you for your code review and all my apologies for the mistake. In fact, the Calabrese method was just implemented at the end after a reviewer of the article ask for it and has not been reviewed nor tested in detail. So I would not be surprised of such a mistake and again I apologize. I kind of lack of time right now but effectively what the code does is using the linear model coefficients to correct occurrence probabilities. So if you have the time, I would be glad to review a pull request from you to do the correction. If not I will try to correct it but latter.

Best,

Sylvain

damariszurell · 2018-05-15T18:15:15Z

Dear Sylvain,

thanks for your response. I am not actually using your package as I have my own class of models that is not transformed easily into your objects. Thus, pulling and changing your code is impractical for me. Anyway, I find it important that the method is revised and people do not mistake it for the actual Calabrese correction.
I can send you my code for the correction that you could adapt. I checked with Justin Calabrese that it does the right thing. Justin's code is actually slightly faster, but I don't feel legitimate to post it here.

Best, Damaris

#--------------------------------------------------
# helper functions and maximum likelihood function:
logit = function(x) {x=ifelse(x<0.0001,0.0001,ifelse(x>0.9999,.9999,x));   ;log(x/(1 - x))}

invlogit = function(x) {exp(x)/(1+exp(x))}

nLL.Calabrese <- function(par,sr,probs) {
	require(poibin)
	
	bysite <- function(j) {
		logit.probs <- logit(as.numeric(probs[j,]))
		corr.probs <- invlogit( logit.probs + par[1]*sr[j] + par[2] )
		log(dpoibin(sr[j],as.numeric(corr.probs)))
	}
	- sum(sapply(seq_len(length(sr)),bysite)) 	# optim will perform minimization but we aim for maximum likelihood and thus invert
	}

#-----------------------------------------
# find the maximum likelihood estimates for adjustment parameters:
adj.par <- optim(par=c(0,0), fn=nLL.Calabrese, sr=obs.sr, probs=pred.prob) 
# pred.prob is a  site-by-species data.frame of occurrence probabilities
# obs.sr is a vector of species richness

# adjust the predicted probabilities and stack:
corr.sr <- rowSums(apply(pred.prob,2,FUN=function(x){invlogit(logit(x)+adj.par$par[1]*obs.sr+adj.par$par[2])}))

lukasbaumbach · 2019-07-04T07:53:26Z

Working on it.
Also, trying to use method= "MaximumLikelihood" currently fails, since there is a typo in the checkargs function (expects "MaximumLikelyhood"). The typo doesn't exist in the mapDiversity function, but checkargs blocks it before it can run.

sylvainschmitt · 2019-07-04T08:01:11Z

Wow, it's kind of the issue of not having everything under formal testing. To be fixed thus. But to be honest part of the mapDiversity methods have been requested by reviewers and implemented a bit in a hurry during my master 2 thesis, when I was working completely on a different subject. So I'm really sorry for that kind of mistakes but they unfortunately don't surprise me (I'm just happy they are not numerous).

lukasbaumbach · 2019-07-04T11:30:06Z

Don't worry, for a master's thesis the scope of this project is already quite excessive.

Ok, so I just tried implementing Damaris'/Justin's code in the following way (for now just adding it as a new method):

if(method=="Calabrese") {
    if (verbose)
      cat("\n Local species richness computed by maximum likelihood adjustment with Calabrese bias correction. \n")
    diversity.map <- mapDiversity(obj, method = 'bSSDM',
                                  verbose = FALSE)$diversity.map
    Richness <- .richness(obj)
    # test data preparation
    ind_notNA <- Which(!is.na(Richness),cells=TRUE)
    pred.prob <- as.data.frame(sapply(obj@esdms, function(x) x@projection[ind_notNA]))
    obs.sr <- Richness[ind_notNA]
    
    ### Start of code from Damaris Zurell/Justin Calabrese (minor edits)
    library(poibin) # should be called earlier
    # helper functions and maximum likelihood function
    logit = function(x) {x=ifelse(x<0.0001,0.0001,ifelse(x>0.9999,.9999,x));   ;log(x/(1 - x))}
    invlogit = function(x) {exp(x)/(1+exp(x))}
    
    nLL.Calabrese <- function(par,sr,probs) {
      bysite <- function(j) {
        logit.probs <- logit(as.numeric(probs[j,]))
        corr.probs <- invlogit( logit.probs + par[1]*sr[j] + par[2] )
        log(dpoibin(sr[j],as.numeric(corr.probs)))
      }
      - sum(sapply(seq_len(length(sr)),bysite)) 	# optim performs minimization, so for maximum likelihood we need to invert
    }
    
  # find the maximum likelihood estimates for adjustment parameters
    adj.par <- optim(par=c(0,0), fn=nLL.Calabrese, sr=obs.sr, probs=pred.prob) 
    # pred.prob is a site-by-species data.frame of occurrence probabilities
    # obs.sr is a vector of species richness
    
    # adjust the predicted probabilities and stack
    corr.sr <- rowSums(apply(pred.prob,2,FUN=function(x){invlogit(logit(x)+adj.par$par[1]*obs.sr+adj.par$par[2])}))
    ###
    diversity.map[ind_notNA] <- corr.sr
  }

I ran an example with your original method="MaximumLikelihood" against the "Calabrese" method and found this:

Assuming I did everything right, there are a couple of things to note here: 1) the patterns are similar, but the Calabrese method mainly resulted in lower probabilities, 2) the value range is very wide for Calabrese (0 to 5), 3) the high values of Calabrese match the locations of high observed species richness (they show up as white dots, since they are outside the plotting range of 0-0.04) and 4) the previous maximum likelihood approach even produced negative values in some spots (that's where the white color on the left comes from).

Any thoughts?

sylvainschmitt · 2019-07-04T11:38:02Z

I think you did a good job ! It's not at all the same approach. The previous implementation was a simple linear regression used to adjust the richness prediction, and not at all the method described by Calabrese. Damaris used correctly their method (I think they even published with it) and was kind enough to furnish their code. So let's integrate your implementation to SSDM. The only "bad side" is that it makes us add a new dependency poibin, but that's not really an issue. Do you see how correctly import the needed functions in the Roxygen skeleton of the corresponding R file ? Please import only the needed functions and not the whole package. And yes their method is much more restrictive compared to the previous raw calculation of richness. Moreover knowing a bit the north of New Caledonia I think their prediction is more realistic (knowing the example use degraded data).

lukasbaumbach · 2019-07-04T11:59:59Z

Thanks. I'd like to see what the result looks like for a larger dataset with more co-occurences, I guess this can change the picture quite a bit. Do I get it right, that the above maps show the occurrence probability of all species occurring together in one site?

And yes, regarding your comment on package dependencies:
#' @importFrom stats lm optim
#' @importFrom poibin dpoibin

lukasbaumbach · 2019-07-17T12:26:37Z

Should the old 'MaximumLikelihood' stacking method either be removed and replaced by the real one or be renamed (e.g. something like 'LinearRegression') and the real Calabrese method gets its name ('MaximumLikelihood' or 'Calabrese')?

sylvainschmitt · 2019-07-18T06:08:27Z

The old 'MaximumLikelihood' stacking method should be replaced by the real one.

lukasbaumbach · 2020-04-06T13:56:18Z

implemented since v0.2.6.9001

sylvainschmitt · 2020-04-06T13:59:45Z

Thanks for the cleaning of Issues ;)

sylvainschmitt added the enhancement label May 14, 2018

sylvainschmitt added this to Low priority in General Feb 22, 2019

lukasbaumbach closed this as completed Apr 6, 2020

General automation moved this from Low priority to Closed Apr 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calabrese bias correction (Maximum likelihood) #28

Calabrese bias correction (Maximum likelihood) #28

damariszurell commented May 4, 2018

sylvainschmitt commented May 14, 2018

damariszurell commented May 15, 2018 •

edited

lukasbaumbach commented Jul 4, 2019

sylvainschmitt commented Jul 4, 2019

lukasbaumbach commented Jul 4, 2019

sylvainschmitt commented Jul 4, 2019 •

edited

lukasbaumbach commented Jul 4, 2019

lukasbaumbach commented Jul 17, 2019

sylvainschmitt commented Jul 18, 2019

lukasbaumbach commented Apr 6, 2020

sylvainschmitt commented Apr 6, 2020

Calabrese bias correction (Maximum likelihood) #28

Calabrese bias correction (Maximum likelihood) #28

Comments

damariszurell commented May 4, 2018

sylvainschmitt commented May 14, 2018

damariszurell commented May 15, 2018 • edited

lukasbaumbach commented Jul 4, 2019

sylvainschmitt commented Jul 4, 2019

lukasbaumbach commented Jul 4, 2019

sylvainschmitt commented Jul 4, 2019 • edited

lukasbaumbach commented Jul 4, 2019

lukasbaumbach commented Jul 17, 2019

sylvainschmitt commented Jul 18, 2019

lukasbaumbach commented Apr 6, 2020

sylvainschmitt commented Apr 6, 2020

damariszurell commented May 15, 2018 •

edited

sylvainschmitt commented Jul 4, 2019 •

edited