Bad handling of `NA`s in `psdAdd()` #64

droglenc · 2021-01-15T03:26:26Z

In short, I have two issues related to situations where there is a species=NA in the data frame. First, the species=NA generates an NA for the PSD name (as would be expected), but this occurs at the end of the list of items returned, so this makes it hard to pair things up with cbind(). Second, psdAdd() is returning more items than I expect (i.e., more items than there were rows in the dataframe). The extra items returned only appears to occur if I have an NA for species and more than one species for the values that are not NA. This is making it difficult for me to add psd size classes to a dataframe using mutate() or creating a list that I then cbind() to the original as the number of elements does not match. I originally just deleted rows with species=NA, but cannot do that in this new case (CPUE by PSD class rather than calculating PSD values) as I need to track samples where no species were caught (so no legit species name is available) in order to use complete() to add zero catch data in for any PSD size class that was not caught.

Here are some trivial examples illustrating the issue:

library(FSA)

# first 4 examples work as I’m expecting (either no spp=NA or all the same
# species for the non-NA species)

## has 5 items just like original data, so NA in length not a problem
testdf <- data.frame(TL=c(400,90,250,NA,50),
                     Spp=c("White Crappie","White Crappie","White Crappie",
                           "White Crappie","White Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)

## has 5 items just like original data, so single NA in length not a problem
## even with mix of spp names
testdf <- data.frame(TL=c(400,90,250,NA,50),
                     Spp=c("White Crappie","Black Crappie","Black Crappie",
                           "White Crappie","White Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)

# has 5 items just like original data, so multiple NA in length not a problem
testdf <- data.frame(TL=c(400,NA,250,NA,50),
                     Spp=c("White Crappie","White Crappie","White Crappie",
                           "White Crappie","White Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)

## has 5 items just like original data, but order of NA's not as expected for
## missing spp (has moved to end and will cause erroneous results if I try to
## cbind or mutate)
testdf <- data.frame(TL=c(400,90,250,NA,50),
                     Spp=c("White Crappie",NA,"White Crappie",
                           "White Crappie","White Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)

# below examples have extra elements…all have one record with spp=NA and the
# number of extra elements returned seems related to the number of different
# spp in the dataframe or the number of times 2 different species occur.

## has 1 extra NA
testdf <- data.frame(TL=c(400,90,250,NA,50),
                     Spp=c("White Crappie",NA,"White Crappie",
                           "White Crappie","Black Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)

## has 1 extra NA, so does not appear related to NA's in length, just species
testdf <- data.frame(TL=c(400,90,250,130,50),
                     Spp=c("White Crappie",NA,"White Crappie",
                           "White Crappie","Black Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)

## has 2 extra NA's
testdf <- data.frame(TL=c(400,90,250,NA,50),
                     Spp=c("White Crappie",NA,"Largemouth Bass",
                           "White Crappie","Black Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)

## still 2 extra NA's, so species names with no PSD categories behave same as
## species names with PSD categories
testdf <- data.frame(TL=c(400,90,250,NA,50),
                     Spp=c("White Crappie",NA,"badSpp",
                           "White Crappie","Black Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)

## has 3 extra NA's
testdf <- data.frame(TL=c(400,90,250,NA,50),
                     Spp=c("White Crappie",NA,"Largemouth Bass",
                           "Bluegill","Black Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)

## has 5 rows as expected...so extra NA's only happen if there is at
## least one spp=NA
testdf <- data.frame(TL=c(400,90,250,NA,50),
                     Spp=c("White Crappie","White Crappie","Largemouth Bass",
                           "Bluegill","Black Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)

## has 2 extra NA even though only 1 species other than White Crappie...
## so not purely a function of # of spp used
testdf <- data.frame(TL=c(400,90,250,NA,50,100),
                     Spp=c("White Crappie",NA,"White Crappie","White Crappie",
                           "Black Crappie","Black Crappie"))
psdAdd(TL~Spp,data=testdf,drop.levels=TRUE)

The text was updated successfully, but these errors were encountered:

droglenc · 2021-01-15T03:58:12Z

I think the issue is related to this line ...

tmpdf <- data[data[,2]==specs[i],]

as the species with NA get carried along with this such that tmpdf does not contain just specs[i] (it also contains species==NA). I think that I can fix this by pulling off the NA species first and then adding them back in after the other species have been worked through.

droglenc · 2021-01-15T12:23:36Z

Will be fixed in v0.8.32

droglenc added bug question labels Jan 15, 2021

droglenc self-assigned this Jan 15, 2021

droglenc closed this as completed Jan 15, 2021

droglenc added a commit that referenced this issue Jan 15, 2021

Fixed #64

6deab7f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad handling of `NA`s in `psdAdd()` #64

Bad handling of `NA`s in `psdAdd()` #64

droglenc commented Jan 15, 2021

droglenc commented Jan 15, 2021

droglenc commented Jan 15, 2021

Bad handling of NAs in psdAdd() #64

Bad handling of NAs in psdAdd() #64

Comments

droglenc commented Jan 15, 2021

droglenc commented Jan 15, 2021

droglenc commented Jan 15, 2021

Bad handling of `NA`s in `psdAdd()` #64

Bad handling of `NA`s in `psdAdd()` #64