Replies: 1 comment 2 replies
-
The Nextflow trace metrics can help you here. You can take So I would compare both As for solutions, migrating to Google Batch would be a good first step. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi Team,
I am trying Nextflow with GLS (not yet updated to GCB) that uses a BLAST database files of around 335 GB (nt BLAST database from NCBI) and the input sequence query is around 100 sequences.
The task is to perform BLASTN for the input sequence against the 335 GB nt BLAST database. In the .config file, the GS bucket path that contains the database files are added.
The process was submitted using two different process as shown below
From the first case, the complete process takes 1hr 22m, with BLAST analysis taking 8m 4s
For the second case, the same process takes 4h 18m, with BLAST analysis taking 3h 6m
Could the team please let me know why it is taking 1hr 22m in the first case and around 4h in second case, since the same process in one of the high end local linux machine takes only 10 minutes to complete. The BLAST database/index files are usually loaded into memory during the alignment process, as seen in the local linux machines.
Does the 335GB BLAST database is being copied to the instance that is spinned off or to the BLAST container that is native for the execution ?
Could the team also suggest recommended/best practice to handle this much large data using Nextflow
Any inputs/suggestions is highly appreciated. Thanks you !
Beta Was this translation helpful? Give feedback.
All reactions