Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing a user defined dissimilarity metric #12

Open
FriedaBella opened this issue Jun 18, 2015 · 2 comments
Open

Implementing a user defined dissimilarity metric #12

FriedaBella opened this issue Jun 18, 2015 · 2 comments

Comments

@FriedaBella
Copy link

Hi Gavin, I had a question about how to implement a user defined dissimilarity metric. I would say I am an ecologist and a mid-level R user, more familiar with the protocols of stack.overflow than GitHub. I actually signed-up just to ask this questions... Anyway, to get to the point, I would like to use the decomposition of beta diversity that Andres Baselga defined for the Sorensen and Bray-Curtis metrics (I want to analyse both abundance and binary differences between sites). I want to make pair-wise comparisons across two groups of sites, separated in geography and character rather than time--not that it changes anything materially in the way an analysis would be done. Baselga created the package 'betapart', which is cool and all, except that it only produces full matrices with all sites compared to all sites. In my case I just have a lot of sites to compare (it varies by species with a maximum of ~8000 total sites) and I am comparing changes in 494 species across these communities...so it is just not so feasible to use betapart for this, even if I were to split up the resultant matrix, it would just be really big. In the write up for analogue, you mention that a user could write/use their own method for the distance function. Again, maybe GitHub is not the right venue to ask this, but how would I do this? Or (even better for me-lol), would you be interested in adding these to the package? They are the new rage in diversity studies.

@gavinsimpson
Copy link
Owner

Hi @FriedaBella. It's perfectly fine to ask questions or make feature requests here. I'm not familiar with Andres' betapart work; are you asking to have Bray-Curtis and Sorensen distances implemented in analogue or does the approach need more than that? Another distance that is some function of one or both of these?

Any pointers you can give would be appreciated otherwise I'll have to read the papers in detail to understand what is needed.

The write-up you mention is probably quite out of date now; I have since moved all the distance computation to C code, and with it went the potential for having user-defined distance functions. I'd need to revisit the approach in order to allow user-defined coefficients.

Is this is time critical, you could look at the proxy package.

@FriedaBella
Copy link
Author

Thanks @gavinsimpson. I will check out the proxy package too, as I would like to get my analysis done by sometime in July (in an ideal situation).

I have pulled the notation from the Legandre 2014 paper's Appendix S1
All of the metrics are comparisons for two sites.
Definitions:
For presence absence or other binary comparison
a = number of species present at both sites
b = number of species present at site 1 but not site 2
c = number of species present at site 2 but not site 1
For abundance comparisons
A = sum of the minimum abundances for each of the various species; each minimum being the abundance at the site where the species is rarest
B = sum of the abundances at site 1 minus A
C = sum of the abundances at site 2 minus A
for clarity, an example calculation for A, B and C from site data:

            Sp. 1    Sp. 2    Sp.3      Sp.4    Sp.5
   site 1    7          3        5        0        6       
   site 2    2          4        0        3        0

A = 5, B = 16, C = 4

And now the formulas:
For presence absence or other binary comparison

Overall metric of dissimilarity: Beta Sorensen = (b + c)/(2a + b + c)

Turnover portion of this: Beta sim = (min(b,c))/(a + min (b,c))

Nestedness portion of this: Beta nest = ( | b - c | / (2a + b + c) ) x (a / (a + min(b,c)))

The nestedness portion could also just be calculated by subtracting Beta sim from Beta Sorensen, since the turnover and nestedness sum to the overall Sorensen diversity, since they are its decomposition.

For abundance comparisons:

Overall metric of Percentage difference (Odum, 1950--also called Bray-Curtis) = (B + C)/(2A + B + C)

Balanced variation component of this = (min(B,C))/(C + min (B,C))

Abundance gradient component of this = ( | B - C | / (2A + B + C) ) x (A / (A + min(B,C)))

Same story here about calculating the abundance gradient portion by subtraction.

And that's it. If anything is not clear, please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants