Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad handling of NAs in psdAdd() #64

Closed
droglenc opened this issue Jan 15, 2021 · 2 comments
Closed

Bad handling of NAs in psdAdd() #64

droglenc opened this issue Jan 15, 2021 · 2 comments
Assignees

Comments

@droglenc
Copy link
Contributor

In short, I have two issues related to situations where there is a species=NA in the data frame. First, the species=NA generates an NA for the PSD name (as would be expected), but this occurs at the end of the list of items returned, so this makes it hard to pair things up with cbind(). Second, psdAdd() is returning more items than I expect (i.e., more items than there were rows in the dataframe). The extra items returned only appears to occur if I have an NA for species and more than one species for the values that are not NA. This is making it difficult for me to add psd size classes to a dataframe using mutate() or creating a list that I then cbind() to the original as the number of elements does not match. I originally just deleted rows with species=NA, but cannot do that in this new case (CPUE by PSD class rather than calculating PSD values) as I need to track samples where no species were caught (so no legit species name is available) in order to use complete() to add zero catch data in for any PSD size class that was not caught.

Here are some trivial examples illustrating the issue:

library(FSA)

# first 4 examples work as I’m expecting (either no spp=NA or all the same
# species for the non-NA species)

## has 5 items just like original data, so NA in length not a problem
testdf <- data.frame(TL=c(400,90,250,NA,50),
                     Spp=c("White Crappie","White Crappie","White Crappie",
                           "White Crappie","White Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)

## has 5 items just like original data, so single NA in length not a problem
## even with mix of spp names
testdf <- data.frame(TL=c(400,90,250,NA,50),
                     Spp=c("White Crappie","Black Crappie","Black Crappie",
                           "White Crappie","White Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)

# has 5 items just like original data, so multiple NA in length not a problem
testdf <- data.frame(TL=c(400,NA,250,NA,50),
                     Spp=c("White Crappie","White Crappie","White Crappie",
                           "White Crappie","White Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)

## has 5 items just like original data, but order of NA's not as expected for
## missing spp (has moved to end and will cause erroneous results if I try to
## cbind or mutate)
testdf <- data.frame(TL=c(400,90,250,NA,50),
                     Spp=c("White Crappie",NA,"White Crappie",
                           "White Crappie","White Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)

# below examples have extra elements…all have one record with spp=NA and the
# number of extra elements returned seems related to the number of different
# spp in the dataframe or the number of times 2 different species occur.

## has 1 extra NA
testdf <- data.frame(TL=c(400,90,250,NA,50),
                     Spp=c("White Crappie",NA,"White Crappie",
                           "White Crappie","Black Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)

## has 1 extra NA, so does not appear related to NA's in length, just species
testdf <- data.frame(TL=c(400,90,250,130,50),
                     Spp=c("White Crappie",NA,"White Crappie",
                           "White Crappie","Black Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)

## has 2 extra NA's
testdf <- data.frame(TL=c(400,90,250,NA,50),
                     Spp=c("White Crappie",NA,"Largemouth Bass",
                           "White Crappie","Black Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)

## still 2 extra NA's, so species names with no PSD categories behave same as
## species names with PSD categories
testdf <- data.frame(TL=c(400,90,250,NA,50),
                     Spp=c("White Crappie",NA,"badSpp",
                           "White Crappie","Black Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)

## has 3 extra NA's
testdf <- data.frame(TL=c(400,90,250,NA,50),
                     Spp=c("White Crappie",NA,"Largemouth Bass",
                           "Bluegill","Black Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)

## has 5 rows as expected...so extra NA's only happen if there is at
## least one spp=NA
testdf <- data.frame(TL=c(400,90,250,NA,50),
                     Spp=c("White Crappie","White Crappie","Largemouth Bass",
                           "Bluegill","Black Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)

## has 2 extra NA even though only 1 species other than White Crappie...
## so not purely a function of # of spp used
testdf <- data.frame(TL=c(400,90,250,NA,50,100),
                     Spp=c("White Crappie",NA,"White Crappie","White Crappie",
                           "Black Crappie","Black Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)
@droglenc
Copy link
Contributor Author

I think the issue is related to this line ...

tmpdf <- data[data[,2]==specs[i],]

as the species with NA get carried along with this such that tmpdf does not contain just specs[i] (it also contains species==NA). I think that I can fix this by pulling off the NA species first and then adding them back in after the other species have been worked through.

@droglenc
Copy link
Contributor Author

Will be fixed in v0.8.32

droglenc added a commit that referenced this issue Jan 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant