Illumina single-end data #4

hoelzer · 2020-01-24T12:37:26Z

No description provided.

MarieLataretu · 2020-05-20T17:55:13Z

I'd add an extra parameter --illumina-single-end (like --nano and --illumina), so that one can clean single- and paired-end reads in one clean run

hoelzer · 2020-05-20T19:03:35Z

Ah yes, that's a good solution! MarieLataretu <notifications@github.com> schrieb am Mi., 20. Mai 2020, 19:55:

…

I'd add an extra parameter --illumina-single-end (like --nano and --illumina), so that one can clean single- and paired-end reads in one clean run — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#4 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADN2CZ6RIQCSXNXUBPOBRKDRSQKQ7ANCNFSM4KLFRFNQ> .

fixes #4

MarieLataretu · 2020-05-22T09:11:57Z

I just scrolled by - Is the renaming of the reads applicable also for the single-end reads?

hoelzer · 2020-05-22T09:38:33Z

renaming of the reads? do you have an example?

MarieLataretu · 2020-05-22T09:55:54Z

We do this, before mapping:

  # this is working for ENA reads that have at the end of a read id '/1' or '/2'
  EXAMPLE_ID=\$(zcat ${reads[0]} | head -1)
  if [[ \$EXAMPLE_ID == */1 ]]; then 
    if [[ ${reads[0]} =~ \\.gz\$ ]]; then
      zcat ${reads[0]} | sed 's/ /DECONTAMINATE/g' > ${name}.R1.id.fastq
      TOTALREADS_1=\$(zcat ${reads[0]} | echo \$((`wc -l`/4)))
    else
      sed 's/ /DECONTAMINATE/g' ${reads[0]} > ${name}.R1.id.fastq
      TOTALREADS_1=\$(cat ${reads[0]} | echo \$((`wc -l`/4)))
    fi
    if [[ ${reads[1]} =~ \\.gz\$ ]]; then
      zcat ${reads[1]} | sed 's/ /DECONTAMINATE/g' > ${name}.R2.id.fastq
      TOTALREADS_2=\$(zcat ${reads[1]} | echo \$((`wc -l`/4)))
    else
      sed 's/ /DECONTAMINATE/g' ${reads[1]} > ${name}.R2.id.fastq
      TOTALREADS_2=\$(cat ${reads[1]} | echo \$((`wc -l`/4)))
    fi
  else
[....]```

But I just saw, that we also do this for the ONT data, so I'll implement this also for the Illumina singe-end data!

hoelzer · 2020-05-22T11:17:36Z

Ah sorry, I got confused with the rnaseq pipeline ;)

Yeah, I introduced this renaming stuff because I experienced problems with some FASTQ headers. I think what we could also have is a more convenient renaming Python script or so that

renames the reads
saves the mapping between the original and new ids in a tsv
restores ids based on the tsv

see:
https://github.com/EBI-Metagenomics/emg-viral-pipeline/blob/master/bin/rename_fasta.py

So we could have a separate rename step for any FASTQ, then the filtering happens, and then we have a restore module...

maybe that's cleaner?

But I am also happy with any other simple solution

MarieLataretu · 2020-05-25T09:52:24Z

yeah, an extra process for renaming would definitely reduce code redundancy!

I'll go for the copy-paste solution for the moment and open a new issue

hoelzer added the enhancement New feature or request label Jan 24, 2020

hoelzer self-assigned this Jan 24, 2020

hoelzer closed this as completed in 83f64f7 May 21, 2020

hoelzer added a commit that referenced this issue May 21, 2020

Merge pull request #18 from hoelzer/single-end

2762099

fixes #4

MarieLataretu mentioned this issue May 25, 2020

Encapsulate read renaming #19

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Illumina single-end data #4

Illumina single-end data #4

hoelzer commented Jan 24, 2020

MarieLataretu commented May 20, 2020

hoelzer commented May 20, 2020 via email

MarieLataretu commented May 22, 2020

hoelzer commented May 22, 2020

MarieLataretu commented May 22, 2020

hoelzer commented May 22, 2020

MarieLataretu commented May 25, 2020

Illumina single-end data #4

Illumina single-end data #4

Comments

hoelzer commented Jan 24, 2020

MarieLataretu commented May 20, 2020

hoelzer commented May 20, 2020 via email

MarieLataretu commented May 22, 2020

hoelzer commented May 22, 2020

MarieLataretu commented May 22, 2020

hoelzer commented May 22, 2020

MarieLataretu commented May 25, 2020