Error with merging paired ends using mergeAmpliconReads() #18

APepey · 2022-01-25T21:21:49Z

Hello! Thank you for all your work!
I have a question about one the HaplotypR functions.

I am working on MiSeq data with the HaplotypR package. I have 35 samples with five P. falciparum markers: ama1-d3, cpmp, csp, cpp and msp7.
I could proceed with the following steps using an R code I received from the WEHI in December 2019:

demultiplexing by sample
demultiplexing by marker
read quality per marker

I am then trying to merge paired ends but I am encountering different issues depending on the syntax I am trying:

Syntax 1:

>   merged_reads <- 
+     marker_deplex %>%
+     select(SampleID, SampleName, MarkerID) %>%
+     bind_cols(mergeAmpliconReads(marker_deplex$FileR1, 
+                                                         marker_deplex$FileR2,
+                                                         outputDir = 'merge_reads'))
Processing file 3D7-A_BC_GTAAGGAG-CGTACTAG_cpmp_F.fastq.gz and 3D7-A_BC_GTAAGGAG-CGTACTAG_cpmp_R.fastq.gz ...1
Warning in if (method == "vsearch") { :
  the condition has length > 1 and only the first element will be used
Warning in system(call, intern = TRUE) :
  running command '"C:/Users/apepey/Documents/R/R-4.1.2/library/Rvsearch/vsearch" --fastq_mergepairs "marker_deplex/3D7-A_BC_GTAAGGAG-CGTACTAG_cpmp_F.fastq.gz" --reverse "marker_deplex/3D7-A_BC_GTAAGGAG-CGTACTAG_cpmp_R.fastq.gz" --fastqout "merge_reads/3D7-A_BC_GTAAGGAG-CGTACTAG_cpmp_merge.fastq.gz" --fastq_truncqual 1 --fastq_maxns 0 --fastq_allowmergestagger' had status 1
Error: Input/Output
  no input files found
  dirPath: merge_reads/3D7-A_BC_GTAAGGAG-CGTACTAG_cpmp_merge.fastq.gz
  pattern: character(0)

Syntax 2:

>   merged_reads <- 
+     marker_deplex %>%
+     mergeAmpliconReads(fastqFileR1 = marker_deplex$FileR1, 
+                                         fastqFileR2 = marker_deplex$FileR2,
+                                         outputDir = 'merge_reads')
Processing file 3D7-A_BC_GTAAGGAG-CGTACTAG_cpmp_F.fastq.gz and 3D7-A_BC_GTAAGGAG-CGTACTAG_cpmp_R.fastq.gz ...1
Warning in if (method == "vsearch") { :
  the condition has length > 1 and only the first element will be used
Warning in if (method == "NGmerge") { :
  the condition has length > 1 and only the first element will be used
Error in FUN(X[[i]], ...) : 
  Merge c("3D7-A", "3D7-A", "3D7-A", "3D7-A", "3D7-A", "3D7-B", "3D7-B", "3D7-B", "3D7-B", "3D7-B", "M0-176", "M0-176", "M0-176", "M0-176", "M0-176", "M0-316", "M0-316", "M0-316", "M0-316", "M0-316", "M0-441", "M0-441", "M0-441", "M0-441", "M0-441", "M0-445", "M0-445", "M0-445", "M0-445", "M0-445", "M0-5", "M0-5", "M0-5", "M0-5", "M0-5", "M0-561", "M0-561", "M0-561", "M0-561", "M0-561", "M0-562", "M0-562", "M0-562", "M0-562", "M0-562", "M0-687", "M0-687", "M0-687", "M0-687", "M0-687", "M0-801", "M0-801", 
"M0-801", "M0-801", "M0-801", "M0-96", "M0-96", "M0-96", "M0-96", "M0-96", "M1-316", "M1-316", "M1-316", "M1-316", "M1-316", "M1-561", "M1-561", "M1-561", "M1-561", "M1-561", "M1-687", "M1-687", "M1-687", "M1-687", "M1-687", "M1-801", "M1-801", "M1-801", "M1-801", "M1-801", "M1-831", "M1-831", "M1-831", "M1-831", "M1-831", "M11-441", "M11-441", "M11-441", "M11-441", "M11-441", "M11-792", "M11-792", "M11-792", "M11-792", "M11-792", "M2-445", "M2-445", "M2-445", "M2-445", "M2-445

Syntax 3:

>   merged_reads <-
+     marker_deplex %>%
+     select(SampleID, SampleName, MarkerID) %>% 
+     bind_cols(mergeAmpliconReads(fastqFileR1 = as.character(marker_deplex$FileR1), 
+                                                         fastqFileR2 = as.character(marker_deplex$FileR2),
+                                                         outputDir = './merge_reads'))
Processing file 3D7-A_BC_GTAAGGAG-CGTACTAG_cpmp_F.fastq.gz and 3D7-A_BC_GTAAGGAG-CGTACTAG_cpmp_R.fastq.gz ...1
Warning in if (method == "vsearch") { :
  the condition has length > 1 and only the first element will be used
Warning in system(call, intern = TRUE) :
  running command '"C:/Users/apepey/Documents/R/R-4.1.2/library/Rvsearch/vsearch" --fastq_mergepairs "marker_deplex/3D7-A_BC_GTAAGGAG-CGTACTAG_cpmp_F.fastq.gz" --reverse "marker_deplex/3D7-A_BC_GTAAGGAG-CGTACTAG_cpmp_R.fastq.gz" --fastqout "./merge_reads/3D7-A_BC_GTAAGGAG-CGTACTAG_cpmp_merge.fastq.gz" --fastq_truncqual 1 --fastq_maxns 0 --fastq_allowmergestagger' had status 1
Error: Input/Output
  no input files found
  dirPath: ./merge_reads/3D7-A_BC_GTAAGGAG-CGTACTAG_cpmp_merge.fastq.gz
  pattern: character(0)

Syntax 4:

>   merged_reads <-
+     marker_deplex %>%
+     mergeAmpliconReads(as.character(marker_deplex$FileR1), 
+                                         as.character(marker_deplex$FileR2),
+                                         outputDir = './merge_reads')
Error in mergeAmpliconReads(., as.character(marker_deplex$FileR1), as.character(marker_deplex$FileR2),  : 
  Vector length of fastqFileR1 and fastqFileR2 not identical

What could I try to merge paired ends?
Any help is appreciated, thank you.
My marker_table is available here: https://gist.github.com/APepey/054e40b9a10f4b25987dc73b3421e2f5

Here is my session info:

> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 14393)

Matrix products: default

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 
 
locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] Rvsearch_0.99.1             cowplot_1.1.1               dada2_1.21.0                Rcpp_1.0.7                  HaplotypR_0.3.3            
 [6] devtools_2.4.3              usethis_2.1.5               ShortRead_1.52.0            GenomicAlignments_1.30.0    SummarizedExperiment_1.24.0
[11] Biobase_2.54.0              MatrixGenerics_1.6.0        matrixStats_0.61.0          Rsamtools_2.10.0            GenomicRanges_1.46.1       
[16] Biostrings_2.62.0           GenomeInfoDb_1.30.0         XVector_0.34.0              IRanges_2.28.0              S4Vectors_0.32.3           
[21] BiocParallel_1.28.3         BiocGenerics_0.40.0         pheatmap_1.0.12             forcats_0.5.1               stringr_1.4.0              
[26] dplyr_1.0.7                 purrr_0.3.4                 tidyr_1.1.4                 tibble_3.1.6                ggplot2_3.3.5              
[31] tidyverse_1.3.1             readr_2.1.1                

loaded via a namespace (and not attached):
 [1] colorspace_2.0-2       hwriter_1.3.2          ellipsis_0.3.2         rprojroot_2.0.2        fs_1.5.2               rstudioapi_0.13       
 [7] farver_2.1.0           remotes_2.4.2          bit64_4.0.5            fansi_0.5.0            lubridate_1.8.0        xml2_1.3.3            
[13] cachem_1.0.6           knitr_1.37             pkgload_1.2.4          jsonlite_1.7.3         broom_0.7.11           dbplyr_2.1.1          
[19] png_0.1-7              compiler_4.1.2         httr_1.4.2             backports_1.4.1        assertthat_0.2.1       Matrix_1.4-0          
[25] fastmap_1.1.0          cli_3.1.0              htmltools_0.5.2        prettyunits_1.1.1      tools_4.1.2            gtable_0.3.0          
[31] glue_1.6.0             GenomeInfoDbData_1.2.7 reshape2_1.4.4         cellranger_1.1.0       vctrs_0.3.8            xfun_0.29             
[37] ps_1.6.0               testthat_3.1.1         rvest_1.0.2            lifecycle_1.0.1        zlibbioc_1.40.0        scales_1.1.1          
[43] vroom_1.5.7            hms_1.1.1              parallel_4.1.2         RColorBrewer_1.1-2     yaml_2.2.1             memoise_2.0.1         
[49] latticeExtra_0.6-29    stringi_1.7.6          desc_1.4.0             pkgbuild_1.3.1         rlang_0.4.12           pkgconfig_2.0.3       
[55] bitops_1.0-7           evaluate_0.14          lattice_0.20-45        labeling_0.4.2         bit_4.0.4              tidyselect_1.1.1      
[61] processx_3.5.2         plyr_1.8.6             magrittr_2.0.1         R6_2.5.1               snow_0.4-4             generics_0.1.1        
[67] DelayedArray_0.20.0    DBI_1.1.2              pillar_1.6.4           haven_2.4.3            withr_2.4.3            RCurl_1.98-1.5        
[73] modelr_0.1.8           crayon_1.4.2           utf8_1.2.2             tzdb_0.2.0             rmarkdown_2.11         jpeg_0.1-9            
[79] grid_4.1.2             readxl_1.3.1           callr_3.7.0            reprex_2.0.1           digest_0.6.29          RcppParallel_5.1.5    
[85] munsell_0.5.0          sessioninfo_1.2.2

Thank you again!

The text was updated successfully, but these errors were encountered:

lerch-a · 2022-01-27T14:53:46Z

Thanks for the information.
First, I can not give you support for the code you got from WEHI (code using %>%), but I can help you with the merging reads.
The code from WEHI combines multiple step into one step and makes it very difficult understand. It is far better to split up the code if you have issues. So just try to run the following lines individually and also read my instructions to Run HaplotypR on https://github.com/lerch-a/HaplotypR.

procReadsMerge <- mergeAmpliconReads(fastqFileR1 = as.character(marker_deplex$FileR1), fastqFileR2 = as.character(marker_deplex$FileR2), outputDir = 'merge_reads')

merged_reads <- cbind(marker_deplex[,c("SampleID", "SampleName","BarcodePair", "MarkerID")], procReadsMerge)

Make sure that the folder 'merge_reads' exists in getwd() !

If you still have issues, then please let me know.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error with merging paired ends using mergeAmpliconReads() #18

Error with merging paired ends using mergeAmpliconReads() #18

APepey commented Jan 25, 2022

lerch-a commented Jan 27, 2022 •

edited

Loading

Error with merging paired ends using mergeAmpliconReads() #18

Error with merging paired ends using mergeAmpliconReads() #18

Comments

APepey commented Jan 25, 2022

lerch-a commented Jan 27, 2022 • edited Loading

lerch-a commented Jan 27, 2022 •

edited

Loading