You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've beein using qc_read_collection(), on many "*_fastqc.zip" files, and noticed that this function suffers from dplyr issue #5358 when binding data.frames.
Here is a reprex leading to the error in lapply(res, dplyr::bind_rows, .id = "sample") inside qc_read_collection():
library(dplyr)
#> #> Attaching package: 'dplyr'#> The following objects are masked from 'package:stats':#> #> filter, lag#> The following objects are masked from 'package:base':#> #> intersect, setdiff, setequal, union# create example data.frames to be bound using dplyr::bind_rows()dn<-data.frame(Length=150, Count=2)
ds<-data.frame(Length= c("150-155"), Count=4)
de<-data.frame(array(NA, dim= c(0,0)))
res<-list(module=list(Sample1=dn, Sample2=ds, Sample3=de))
str(res)
#> List of 1#> $ module:List of 3#> ..$ Sample1:'data.frame': 1 obs. of 2 variables:#> .. ..$ Length: num 150#> .. ..$ Count : num 2#> ..$ Sample2:'data.frame': 1 obs. of 2 variables:#> .. ..$ Length: chr "150-155"#> .. ..$ Count : num 4#> ..$ Sample3:'data.frame': 0 obs. of 0 variables# reproduce the errorres<- lapply(res, dplyr::bind_rows, .id="sample")
#> Error: Can't combine `Sample1$Length` <double> and `Sample2$Length` <character>.
The error above will occur when calling qc_read_collection(files, sample_names, modules = "all") on a collection of "*_fastqc.zip" files, if there is a sample in files that has a different class for any variable in the data.frame to be bound.
In my case, this happened mostly with the modules $sequence_length_distribution (variable "Length") or $kmer_content (variable "Max Obs/Exp Position").
Here is a possible fix I came up with:
# convert <double> to <character> if a column should be <character>res<- lapply(res, function(x) {
# tibble with classes for each non-emtpy data.frame columndcl<-dplyr::bind_rows(lapply(x, function(y) {
if (nrow(y) >0) sapply(y, class)
}))
# define classes to assigncl<- apply(dcl, 2, function(z) ifelse(any(z=="character"),"character",z[1]))
# assign classes
lapply(x, function(w) {
if (nrow(w) >0) {for (iin names(w)) {class(w[,i]) <-cl[i]} ; w}
})
})
# reproduce the fixres<- lapply(res, dplyr::bind_rows, .id="sample")
str(res)
#> List of 1#> $ module:'data.frame': 2 obs. of 3 variables:#> ..$ sample: chr [1:2] "Sample1" "Sample2"#> ..$ Length: chr [1:2] "150" "150-155"#> ..$ Count : num [1:2] 2 4
Perhaps a patch for qc_read_collection() similar to the one below (enclosed by ##<##<##) could be useful generally, given that dplyr is not going to fix this because it is a "deliberate design decision" (see #5358)?
Dear Alboukadel,
Many thanks for this and other handy R packages!
I've beein using
qc_read_collection()
, on many"*_fastqc.zip"
files, and noticed that this function suffers from dplyr issue #5358 when binding data.frames.Here is a reprex leading to the error in
lapply(res, dplyr::bind_rows, .id = "sample")
insideqc_read_collection()
:The error above will occur when calling
qc_read_collection(files, sample_names, modules = "all")
on a collection of"*_fastqc.zip"
files, if there is a sample infiles
that has a different class for any variable in thedata.frame
to be bound.In my case, this happened mostly with the modules
$sequence_length_distribution
(variable "Length") or$kmer_content
(variable "Max Obs/Exp Position").Here is a possible fix I came up with:
Created on 2021-12-21 by the reprex package (v2.0.1)
Perhaps a patch for
qc_read_collection()
similar to the one below (enclosed by ##<##<##) could be useful generally, given thatdplyr
is not going to fix this because it is a "deliberate design decision" (see #5358)?Perhaps you'd like to look into this yourself, and maybe come up with an easier and prettier solution? :)
Cheers,
Simon
The text was updated successfully, but these errors were encountered: