Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upHexp calcuation is erroneous #47
Comments
|
To ensure that things are working correctly, I will write a test based on the example presented in Kosman (2003). |
For haploids and diploids, the calculation will return the size-corrected index. For polyploids, locus_table will return a corrected simpson's index while poppr will return simpson's index. It's all very confusing....
This was why I was getting weird values. The clouds are beginning to clear on issue #47
|
Currently, the strategy is: If it's polyploid, change to unbiased Simpson's index over alleles: (n/(n - 1)) * 1 - sum(pi^2)This way a measure can actually be reached instead of having complaints of missing data in the result. Currenlty, locus table will report a different column name, and poppr probably should as well. Now, I just need to fix the documentation:
|
In another thrilling installment of addressing #47, we changed the output of locus_table to be Mu and not uSimpson because it's easier to type and I have a direct reference!
update documentation and tests for #47
|
The new column name for poppr and locus_table is Mu. |
|
Scratch that, reverse it. The calculation will be (n/(n - 1)) * 1 - sum(p^2)where n is the number of observed alleles. This will impact polyploids and mixed ploidy populations by increasing diversity, but it's better than using kN, which would increase it even more. |
The problem:
After re-reading Nei (1978), I realized that my implementation of Hexp is:
Where N is the number of allelic states and p is the vector of allele frequencies. Nei's definition is:
Where n is the number of observed samples at a locus and k is the ploidy (to account for dosage).
User-facing impacts:
poppr()locus_table()What needs to be fixed:
locus_table_pegas()Impacts after fix:
Unfortunately, I might have to wait until just before August to submit this patch, lest I anger our CRAN overlords.