Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calabrese bias correction (Maximum likelihood) #28

Closed
damariszurell opened this issue May 4, 2018 · 11 comments
Closed

Calabrese bias correction (Maximum likelihood) #28

damariszurell opened this issue May 4, 2018 · 11 comments
Projects

Comments

@damariszurell
Copy link

Dear Sylvain,

we were looking at your package recently as we were interested whether anybody has implemented the bias correction in MEMs proposed by Calabrese et al. 2014. However, when looking at your code, this does not seem to implement the procedure as described by Calabrese, unless I am completely misinterpreting your code. Your code:

if (method == "MaximumLikelihood") {
# Maximum likelihood (Calabrese et al, 2014)
if (verbose)
cat("\n Local species richness computed by maximum likelihood adjustment. \n")
diversity.map <- mapDiversity(obj, method = 'bSSDM',
verbose = FALSE)$diversity.map
Richness <- .richness(obj)
SSDM_Richness <- values(mask(diversity.map, Richness))
SSDM_Richness <- SSDM_Richness[-which(is.na(SSDM_Richness))]
Richness <- values(Richness)
Richness <- Richness[-which(is.na(Richness))]
fit <- lm(Richness ~ SSDM_Richness)
a <- fit$coefficients[1]
b <- fit$coefficients[2]
diversity.map <- a + b * diversity.map
}

Your code seems to simply regress observed against predicted richness and then uses the LM coefficients to correct occurrence probabilities. This seems different from what Calabrese et al. suggested: 1) logit-transform the occurrence probabilities, 2) apply additive adjustment that depends on richness, 3) then inverse-logit-transform back to probability scale. The adjustment parameters are estimated via maximum likelihood (using the package poibin), identifying the set of adjustment parameters that maximises the probability of the site-specific species richness given the adjusted occurrence probabilities (which depend on the original occurrence probabilities and on the adjustment parameters). Unless I am completely mistaken, this is not what your code is doing?

Best, Damaris

@sylvainschmitt
Copy link
Owner

Dear Damaris,

First thank you for your code review and all my apologies for the mistake. In fact, the Calabrese method was just implemented at the end after a reviewer of the article ask for it and has not been reviewed nor tested in detail. So I would not be surprised of such a mistake and again I apologize. I kind of lack of time right now but effectively what the code does is using the linear model coefficients to correct occurrence probabilities. So if you have the time, I would be glad to review a pull request from you to do the correction. If not I will try to correct it but latter.

Best,

Sylvain

@damariszurell
Copy link
Author

damariszurell commented May 15, 2018

Dear Sylvain,

thanks for your response. I am not actually using your package as I have my own class of models that is not transformed easily into your objects. Thus, pulling and changing your code is impractical for me. Anyway, I find it important that the method is revised and people do not mistake it for the actual Calabrese correction.
I can send you my code for the correction that you could adapt. I checked with Justin Calabrese that it does the right thing. Justin's code is actually slightly faster, but I don't feel legitimate to post it here.

Best, Damaris

#--------------------------------------------------
# helper functions and maximum likelihood function:
logit = function(x) {x=ifelse(x<0.0001,0.0001,ifelse(x>0.9999,.9999,x));   ;log(x/(1 - x))}

invlogit = function(x) {exp(x)/(1+exp(x))}

nLL.Calabrese <- function(par,sr,probs) {
	require(poibin)
	
	bysite <- function(j) {
		logit.probs <- logit(as.numeric(probs[j,]))
		corr.probs <- invlogit( logit.probs + par[1]*sr[j] + par[2] )
		log(dpoibin(sr[j],as.numeric(corr.probs)))
	}
	- sum(sapply(seq_len(length(sr)),bysite)) 	# optim will perform minimization but we aim for maximum likelihood and thus invert
	}

#-----------------------------------------
# find the maximum likelihood estimates for adjustment parameters:
adj.par <- optim(par=c(0,0), fn=nLL.Calabrese, sr=obs.sr, probs=pred.prob) 
# pred.prob is a  site-by-species data.frame of occurrence probabilities
# obs.sr is a vector of species richness

# adjust the predicted probabilities and stack:
corr.sr <- rowSums(apply(pred.prob,2,FUN=function(x){invlogit(logit(x)+adj.par$par[1]*obs.sr+adj.par$par[2])}))

@sylvainschmitt sylvainschmitt added this to Low priority in General Feb 22, 2019
@lukasbaumbach
Copy link
Collaborator

Working on it.
Also, trying to use method= "MaximumLikelihood" currently fails, since there is a typo in the checkargs function (expects "MaximumLikelyhood"). The typo doesn't exist in the mapDiversity function, but checkargs blocks it before it can run.

@sylvainschmitt
Copy link
Owner

Wow, it's kind of the issue of not having everything under formal testing. To be fixed thus. But to be honest part of the mapDiversity methods have been requested by reviewers and implemented a bit in a hurry during my master 2 thesis, when I was working completely on a different subject. So I'm really sorry for that kind of mistakes but they unfortunately don't surprise me (I'm just happy they are not numerous).

@lukasbaumbach
Copy link
Collaborator

Don't worry, for a master's thesis the scope of this project is already quite excessive.

Ok, so I just tried implementing Damaris'/Justin's code in the following way (for now just adding it as a new method):

if(method=="Calabrese") {
    if (verbose)
      cat("\n Local species richness computed by maximum likelihood adjustment with Calabrese bias correction. \n")
    diversity.map <- mapDiversity(obj, method = 'bSSDM',
                                  verbose = FALSE)$diversity.map
    Richness <- .richness(obj)
    # test data preparation
    ind_notNA <- Which(!is.na(Richness),cells=TRUE)
    pred.prob <- as.data.frame(sapply(obj@esdms, function(x) x@projection[ind_notNA]))
    obs.sr <- Richness[ind_notNA]
    
    ### Start of code from Damaris Zurell/Justin Calabrese (minor edits)
    library(poibin) # should be called earlier
    # helper functions and maximum likelihood function
    logit = function(x) {x=ifelse(x<0.0001,0.0001,ifelse(x>0.9999,.9999,x));   ;log(x/(1 - x))}
    invlogit = function(x) {exp(x)/(1+exp(x))}
    
    nLL.Calabrese <- function(par,sr,probs) {
      bysite <- function(j) {
        logit.probs <- logit(as.numeric(probs[j,]))
        corr.probs <- invlogit( logit.probs + par[1]*sr[j] + par[2] )
        log(dpoibin(sr[j],as.numeric(corr.probs)))
      }
      - sum(sapply(seq_len(length(sr)),bysite)) 	# optim performs minimization, so for maximum likelihood we need to invert
    }
    
  # find the maximum likelihood estimates for adjustment parameters
    adj.par <- optim(par=c(0,0), fn=nLL.Calabrese, sr=obs.sr, probs=pred.prob) 
    # pred.prob is a site-by-species data.frame of occurrence probabilities
    # obs.sr is a vector of species richness
    
    # adjust the predicted probabilities and stack
    corr.sr <- rowSums(apply(pred.prob,2,FUN=function(x){invlogit(logit(x)+adj.par$par[1]*obs.sr+adj.par$par[2])}))
    ###
    diversity.map[ind_notNA] <- corr.sr
  }

I ran an example with your original method="MaximumLikelihood" against the "Calabrese" method and found this:
grafik

Assuming I did everything right, there are a couple of things to note here: 1) the patterns are similar, but the Calabrese method mainly resulted in lower probabilities, 2) the value range is very wide for Calabrese (0 to 5), 3) the high values of Calabrese match the locations of high observed species richness (they show up as white dots, since they are outside the plotting range of 0-0.04) and 4) the previous maximum likelihood approach even produced negative values in some spots (that's where the white color on the left comes from).

Any thoughts?

@sylvainschmitt
Copy link
Owner

sylvainschmitt commented Jul 4, 2019

I think you did a good job ! It's not at all the same approach. The previous implementation was a simple linear regression used to adjust the richness prediction, and not at all the method described by Calabrese. Damaris used correctly their method (I think they even published with it) and was kind enough to furnish their code. So let's integrate your implementation to SSDM. The only "bad side" is that it makes us add a new dependency poibin, but that's not really an issue. Do you see how correctly import the needed functions in the Roxygen skeleton of the corresponding R file ? Please import only the needed functions and not the whole package. And yes their method is much more restrictive compared to the previous raw calculation of richness. Moreover knowing a bit the north of New Caledonia I think their prediction is more realistic (knowing the example use degraded data).

@lukasbaumbach
Copy link
Collaborator

Thanks. I'd like to see what the result looks like for a larger dataset with more co-occurences, I guess this can change the picture quite a bit. Do I get it right, that the above maps show the occurrence probability of all species occurring together in one site?

And yes, regarding your comment on package dependencies:
#' @importFrom stats lm optim
#' @importFrom poibin dpoibin

@lukasbaumbach
Copy link
Collaborator

Should the old 'MaximumLikelihood' stacking method either be removed and replaced by the real one or be renamed (e.g. something like 'LinearRegression') and the real Calabrese method gets its name ('MaximumLikelihood' or 'Calabrese')?

@sylvainschmitt
Copy link
Owner

The old 'MaximumLikelihood' stacking method should be replaced by the real one.

@lukasbaumbach
Copy link
Collaborator

implemented since v0.2.6.9001

General automation moved this from Low priority to Closed Apr 6, 2020
@sylvainschmitt
Copy link
Owner

Thanks for the cleaning of Issues ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
General
  
Closed
Development

No branches or pull requests

3 participants