Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NAs in agePrecision #9

Closed
droglenc opened this issue Feb 2, 2016 · 1 comment
Closed

NAs in agePrecision #9

droglenc opened this issue Feb 2, 2016 · 1 comment
Assignees

Comments

@droglenc
Copy link
Contributor

droglenc commented Feb 2, 2016

Here is the super brief summary of my confusion:

  1. When what = “difference” or “absolute difference” the N used to calculate the % appears to exclude the NA values. This makes sense to me.
  2. When what = “precision”, the calculation appears to count NA values as “agreement”.

The long winded version of my confusion follows:

First, a simple example using the formula notation two different ways to achieve the same result (i.e., simply compare agerA1 and agerA2):

Example 1) ap.A<-agePrecision(~agerA1+agerA2,data=shad)
summary(ap.A,what="precision")

Example 2) ap.Ax<-agePrecision(agerA1~agerA2,data=shad)
summary(ap.Ax,what="precision")

How does agePrecision handled “NA” values? Did I miss something in the documentation? Anyway, in the shad data, agerA1 codes the values for two of the samples as “NA” and agerA2 codes the same two scales as NA on the second read leaving 51 samples with an estimated age. There is no difference between agerA1 and agerA2 for 26 scales. 26/51 = 50.9% (NA excluded). However, to achieve the results calculated using agePrecision it is necessary to include the two NA values and consider them as “agreement” ((26+2)/53) = 52.83%. I supposed that as long as the reader is consistent in assigning the scales as “NA” it make sense to include the NA values in the percent agreement calculation because NA is functionally serving as a value, and in this example, readerA agreed with both reads on which scales were NA.

What if I try comparing agerA1 and agerB1?

summary(ap.AB,what="difference")
-2 -1 0 1 2
18.182 15.152 45.455 12.121 9.091
To calculate 45.455 (15 agreed out of 33). Looks like the 20 NAs were excluded. Sound reasonable.

summary(ap.AB,what="precision")
n R ACV APE PercAgree
53 2 12.17 8.608 66.04
Now I am confused. To get 66.04% it’s necessary to count the 20 “NA’s” as agreed and divide by 53. (20 NAs+ 15 agreed)/53 * 100 = 66.04%

How do “NA” values work with >2 reads? At the bottom of page 84.

ap.ABC<-agePrecision(~agerA1 + agerB1 + agerC1, data = shad)
summary(ap.ABC, what = “difference”)
summary(ap.ABC,what= “precision”)

Footnote 7 says “The sample size is much smaller ...because Ager C did not estimate an age for several fish.” Should it be Ager “B” who did not estimate ages?. Either way, there are 33 samples where all 3 agers give an Age. The summary results for what = “difference” are easy to calculate and the row totals clearly show NA values are excluded. For example, ap.ABC$absdiff shows row totals of 33 (A vs B, 51 (A vs C), and 33 (B vs C). With these row totals, I can duplicate the results in summary(ap.ABS, what=”difference”). If what = “difference” or “absolute difference”, NA values appear to be excluded from the N.

However, I am again confused by the inclusion of the “NA” when calculating the PercAgree when what = “precision”. With reader B there are 20 instances where the scale age is NA and only 2 instances where all three readers agree. Therefore, PercAgree is calculated as = (20 of the NA values + 2 where everyone agrees)/53 * 100= 41.51%. Why is the NA being counted as an agreement when reader B essentially said “I can’t read the scale”? Should percent agreement be calculated on all 53 samples or only the 33 samples where everyone provided an age? If we only use the 33 samples with an age, then 2/33 = 6.01%.

Basically I am confused because it seems like NA values are not being treated the same when what is changed from “difference” to “precision”.

@droglenc droglenc self-assigned this Feb 2, 2016
@droglenc
Copy link
Contributor Author

droglenc commented Feb 6, 2016

I have corrected the bugs associated with this problem. Use the latest development version of FSA to get the corrections illustrated below.

The relevant examples from above are shown below.

> library(FSA)
> # User must set working directory appropriately.
> shad <- read.csv("ShadCR.csv")

Now the results that compare ager A and B match when using what="difference" and what="precision".

> ap.AB <- agePrecision(~agerA1+agerB1,data=shad)
> summary(ap.AB,what="difference")
    -2     -1      0      1      2 
18.182 15.152 45.455 12.121  9.091 
> summary(ap.AB,what="absolute")
    0     1     2 
45.45 27.27 27.27 
> summary(ap.AB,what="precision")
  n validn R   ACV   APE PercAgree
 53     33 2 12.17 8.608     45.45

The same in the three-way comparisons ...

> ap.ABC <- agePrecision(~agerA1+agerB1+agerC1,data=shad)
> summary(ap.ABC,what="difference")
                    -2     -1      0      1      2      3      4
agerA1 - agerB1 18.182 15.152 45.455 12.121  9.091  0.000  0.000
agerA1 - agerC1  0.000  5.882 19.608 41.176 19.608 11.765  1.961
agerB1 - agerC1  0.000  6.061  6.061 36.364 36.364 15.152  0.000
> summary(ap.ABC,what="absolute")
                      0      1      2      3      4
agerA1 v. agerB1 45.455 27.273 27.273  0.000  0.000
agerA1 v. agerC1 19.608 47.059 19.608 11.765  1.961
agerB1 v. agerC1  6.061 42.424 36.364 15.152  0.000
> summary(ap.ABC,what="precision")
  n validn R   ACV  APE PercAgree
 53     33 3 22.98 16.7     6.061

And, yes, it was Ager B that did not age many fish. I will make an errata entry for the book.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant