Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hill # NaN in genetic and trait results #121

Open
isaacovercast opened this issue May 22, 2023 · 20 comments
Open

Hill # NaN in genetic and trait results #121

isaacovercast opened this issue May 22, 2023 · 20 comments
Assignees
Milestone

Comments

@isaacovercast
Copy link
Contributor

Run this:

## init_type may be bridge_island or oceanic_island
p <- roleParams(individuals_local = 100, individuals_meta = 10000,
                species_meta = 10, speciation_local = 0, 
                speciation_meta = 0.05, extinction_meta = 0.05, env_sigma = 0.5,
                trait_sigma=1, comp_sigma = 0.5, dispersal_prob = 0.01, mutation_rate = 1e-6,
                equilib_escape = 1, num_basepairs = 500, alpha = 10000,
                init_type = 'bridge_island', niter = 100000, niterTimestep = 10000)
model <- runRole(roleModel(p))

The this getSumStats(model) and some genetic diversity and trait Hill's for some timesteps will be 'NaN'. It is non-deterministic, and we assume it's because perhaps there is no variation in these values at a given timestep?

@isaacovercast
Copy link
Contributor Author

isaacovercast commented May 22, 2023

Indeed, for the genetic diversity hill #s if there are any '0' in the vector of raw pi values within a given timestep then this fails because of log(0):

hill[q == 1] <- exp(sum(-n * log(n)))

'q' values other than 1 are perfectly happy with zeros in the pi vector.

In MESS I just ignore pi==0 values and I think this is justifiable. This is also how scipy.stats.entropy treats zeros.

Update: I pushed a fix for the genetic diversity Hill's, simply removing 0s from the vector.

@isaacovercast
Copy link
Contributor Author

For trait hills, you get 'NaN' in the case where there is only 1 species present in the local community. If there's only 1 species present then in .hillDivTrait() the distance matrix between traits (dij <- as.matrix(dist(traits))) is a single value == 0, which eventually leads to a divide-by-zero 2 lines later.

@ajrominger ajrominger self-assigned this May 23, 2023
@ajrominger ajrominger added this to the Working beta milestone May 23, 2023
@diazrenata
Copy link
Contributor

After wrestling (philosophically) with whether it makes sense to try to hatch a trait diversity for when there is a single species, or whether to simply build in a catch to send single-species communities to evaluate to 0, I'm leaning towards just building in the catch sending single-species to evaluate to zero.

Considerations:

  • Functional diversity hill numbers are based on species-distance matrices, which just can't be evaluated for a single species.
  • In roleR, we do end up with trait variation in the absence of species variation (intraspecific variation) - but then that's summarized to a species mean anyway, and it seems a little out-of-scope to come up with a new version of trait diversity that looks both intra and inter specifically.

@diazrenata
Copy link
Contributor

diazrenata commented Jun 1, 2023

Amending: I think it should evaluate to 1, yes? Weighting by trait diversity is moot in this instance, so it should converge back to abundance (which evaluates to 1 anyway).

@diazrenata
Copy link
Contributor

See b0ddd3f. I feel like there's room for spirited discussion re: the interpretation of trait hill numbers in a single-species community, though! :P

@isaacovercast
Copy link
Contributor Author

agreed. In the case of no trait variation it should be 1.

@diazrenata
Copy link
Contributor

Getting sneakier, b1ba083. This one catches both edge cases: if there is only one species, or if there are multiple species all with the same trait. (Seems extreme but possible in special circumstances, like a super narrow optimum). In this instance, it effectively reverts to abundance hill numbers.

@diazrenata
Copy link
Contributor

I could be convinced in this instance (no trait, but species, variation) it should be something else, though. 1 for that, too?

@diazrenata
Copy link
Contributor

(hillR::hill_func just fails in this instance, btw)

@isaacovercast
Copy link
Contributor Author

IDK, I think conceptuall if there is literally zero trait variation then I would say it should be 1. If you think about it from the perspective of # of 'effective species traits' then it would be 1.

@ajrominger
Copy link
Member

ajrominger commented Jun 1, 2023

This is a super hard one to figure out!

After perhaps too much effort, I agree that for "effective number of species" (i.e. weighted Hill) the answer should be 1, both from the perspective of 1 species, or multiple species but with no trait variation.

I went deep down a rabbit hole on this (nothing against rabbits!)

The short answer is that unweighted hill goes to 0 both for 1 species and multiple species with no trait variance. (Script below showing that).

But weighted hill is $(D_{hill} / Q)^{1/2}$. And Q in the equation goes to 0 just as fast as $D_{hill}$ (where $D_{hill}$ is unweighted hill). So again weighted Hill in the limit of 0 trait variation (either from 1 species or multiple spp with same trait) goes to 1. I'm not showing the math for Q going to 0, but it's fairly clear from Chao's equations

Here's the script

# a function to explore limits of trait hill number
# sets q = 2 w/o loss of generality
# n = vector of spp abundances
# trtDiag =  intra-specific trait difference (in reality always 0, but we allow
#            it to be != 0 to explore limit)
# trtOff  =  inter-specific trait differences (i.e. "off diagnal" entries in a
#            distance matrix)

hillLim <- function(n, trtDiag, trtOff) {
    p <- n / sum(n)
    dij <- matrix(trtOff, nrow = length(p), ncol = length(p))
    diag(dij) <- trtDiag
    
    Q <- as.vector(p %*% dij %*% p)
    a <- outer(p, p, '*') / Q
    
    return(sum(dij * a^2)^(1 / (1 - 2)))
}


# explore limit as inter-specific differences go to 0
hillLim(rep(1, 5), trtDiag = 0, trtOff = 1)
## 20

hillLim(rep(1, 5), trtDiag = 0, trtOff = 0.1)
## 2

hillLim(rep(1, 5), trtDiag = 0, trtOff = 0.01)
## 0.2

hillLim(rep(1, 5), trtDiag = 0, trtOff = 0.001)
## 0.02

So this says that as trait variation between species goes to 0, so too does Hill diversity go to 0. But again, this is for unweighted hill.

Now for a single species:

# explore limit as diagonal element approaches 0 when there is only 1 species
# (i.e. figure out what hill div should be for only 1 species)
hillLim(1, trtDiag = 0.1, trtOff = 1)
## 0.1

hillLim(1, trtDiag = 0.01, trtOff = 1)
## 0.01

hillLim(1, trtDiag = 0.001, trtOff = 1)
## 0.001

Here again, for a single species Hill diversity goes to 0.

But again these calculations are for unweighted Hill. For weighted Hill, I convinced myself the answer is indeed equal to 1 because D_hill and Q go to 0 at the same rate.

@ajrominger
Copy link
Member

OH! Something important though that's coming out of this is: right now we're returning unweighted Hill numbers right? So is that what we want to do, or do we want to switch to returning weighted Hills?

@ajrominger
Copy link
Member

Damn! I been working on this more and I was wrong about weighted trait Hill for the case no variation: weighted trait Hill for the case with no variation might equal weighted abundance-based Hill with the same SAD...still working on it, stay tuned, or if you all have it figured out, even better!

@ajrominger
Copy link
Member

One thing for sure though we have to pick a column (attribute diversity or generalized hill):

image

@isaacovercast
Copy link
Contributor Author

isaacovercast commented Jun 1, 2023 via email

@diazrenata
Copy link
Contributor

I agree with Isaac about generalized hills.

@ajrominger
Copy link
Member

also agree with both of you about using generalized hills!

so this means we need to revise the trait and phylo hill numbers to return the generalized value (currently they return the "attribute diversity" from Chao's table). the abundance hill function (and thus the gen div hill) returns the generalized value because the generalized and attribute div for abundance are equivalent

@ajrominger
Copy link
Member

and my final thought on hill numbers when trait variation is absolutely 0 is that the generalized number should indeed be 1. if you all are good to go with that, let's go for it

@diazrenata
Copy link
Contributor

Unless I'm mistaken, the trait hill number function is already returning generalized hills! e4b7a47 implements a check if the trait diversity is 0 (which would occur if species richness is 0, or if trait diversity is 0) and returns 1 for all q.

I'll work on phylo next!

@ajrominger
Copy link
Member

right you are about generalized hills! don't know how i convinced myself otherwise!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

3 participants