Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Add reads de-duplication step before assembly #104

Closed
Tracked by #88
Michaelijesse opened this issue Sep 13, 2023 · 8 comments
Closed
Tracked by #88
Assignees
Labels
enhancement New feature or request

Comments

@Michaelijesse
Copy link

Michaelijesse commented Sep 13, 2023

ERROR ~ Error executing process > 'BACANNOT:FLYE (SRR5637694_subreads)'

Caused by:
  Process `BACANNOT:FLYE (SRR5637694_subreads)` terminated with an error exit status (1)

Command executed:

  # Save flye version
  flye -v > flye_version.txt ;

  # Run flye
  flye \
    --pacbio-raw \
    SRR5637694_subreads.fastq.gz \
    --out-dir flye_SRR5637694_subreads \
    --threads 8 &> flye.log ;

  # Save a copy for annotation
  cp flye_SRR5637694_subreads/assembly.fasta flye_SRR5637694_subreads.fasta

Command exit status:
  1

Command output:
  (empty)

Work dir:
  /home/centos/Michael/Bacannot/work/df/3a7e2a7d655c71e4e4abcbac5d1704

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details
@fmalmeida
Copy link
Owner

fmalmeida commented Sep 14, 2023

Hello hello,

Can you check and share what is the content of the file called ‘flye.log’ inside this work dir? The '.nextflow.log' file which contains the command line used is also helpful.

With only this we cannot understand what caused the problem.

@Michaelijesse
Copy link
Author

Michaelijesse commented Sep 14, 2023

Hello Thank you for your Response

The contents of flye log show duplicated IDs in subreads.

[2023-09-14 11:29:48] INFO: Starting Flye 2.9-b1768
[2023-09-14 11:29:48] INFO: >>>STAGE: configure
[2023-09-14 11:29:48] INFO: Configuring run
[2023-09-14 11:30:01] INFO: Total read length: 407342569
[2023-09-14 11:30:01] INFO: Reads N50/N90: 5930 / 880
[2023-09-14 11:30:01] INFO: Minimum overlap set to 1000
[2023-09-14 11:30:01] INFO: >>>STAGE: assembly
[2023-09-14 11:30:01] INFO: Assembling disjointigs
[2023-09-14 11:30:01] INFO: Reading sequences
[2023-09-14 11:30:12] ERROR: The input contain reads with duplicated IDs. Make sure all reads have unique IDs and restart. The first problematic ID was: SRR3667790.17
[2023-09-14 11:30:12] ERROR: Command '['flye-modules', 'assemble', '--reads', '/home/centos/Michael/Bacannot/work/06/36060129688447e2db9077eb8d6650/SRR3667790_subreads.fastq.gz', '--out-asm', '/home/centos/Michael/Bacannot/work/06/36060129688447e2db9077eb8d6650/flye_SRR3667790/00-assembly/draft_assembly.fasta', '--config', '/usr/local/lib/python3.9/site-packages/flye/config/bin_cfg/asm_raw_reads.cfg', '--log', '/home/centos/Michael/Bacannot/work/06/36060129688447e2db9077eb8d6650/flye_SRR3667790/flye.log', '--threads', '8', '--min-ovlp', '1000']' returned non-zero exit status 1.
[2023-09-14 11:30:12] ERROR: Pipeline aborted

I think it is better to add dedup.sh from BBMap suite or some other option to remove dupicates from subreads

@fmalmeida
Copy link
Owner

Hi @Michaelijesse ,
Thanks for sharing. Good to know that it is not a bug in the pipeline. Thus I recommend the folllowing:

  • That in order to use the pipeline for now, you manually run a de-duplication step in your reads so you can analyse them.

In the mean time, I will modify the name of this ticket and add it as a feature request for adding such step in the pipeline.

Once again, thanks for sharing.

@fmalmeida fmalmeida changed the title FLYE SHOWS ERROR Feature request: Add reads de-duplication step before assembly Sep 14, 2023
@fmalmeida fmalmeida self-assigned this Sep 14, 2023
@fmalmeida fmalmeida added the enhancement New feature or request label Sep 14, 2023
@Michaelijesse
Copy link
Author

Michaelijesse commented Sep 14, 2023

Hi @fmalmeida Thank you for helping out. I will do accordingly.. I will dedup and run once...

@Michaelijesse
Copy link
Author

seqkit rename <fastq.gz> works fine.

@fmalmeida
Copy link
Owner

Perfect, will add this. Thanks!

@Michaelijesse
Copy link
Author

image

I close this issue.

@Michaelijesse
Copy link
Author

All solved!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Release v3.3
Awaiting triage
Development

No branches or pull requests

2 participants