-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sortmerna 4.0 processing stuck #212
Comments
Dear SortMeRNA users, |
@QianqianJena Please, also provide the size |
Dear SortMeRNA group, Thanks for your reply. Please have a look at the attached execution-trace file. Here is the detailed report: Input forward file:IS13_f.fastq:34936244849:400155328. Input reverse file:IS13_f.fastq:34558730639:400155328. Output mRNA file:IS13_other.fastq:4212633783:51077492. I feel sorry to delete the aligned fastq file. If you need further information, please let me know. Thank you very much. Best regards, Qianqian |
One problem is immediately seen:
The Otherwise not only the run time is ridiculous, but the alignment results are non-valid too. |
Thanks for your reply and it makes sense now. |
Hello Biocodz, |
Some info is chopped off the trace, so I cannot see how many CPUs your machine has. Is there a particular reason your are using 8 threads? Can you increase the number, or may me even run with the default (all cores)
Mostly the processing time is consistent with the reference file size, although position 5 and 6 differ dramatically in time in spite the size is about the same. This can be for different reasons, but one is your machine is busy with other tasks, which I cannot verify of course. |
Hello Biocodz, |
Hi, kvdb Output file: RNA-1117-00_5-120/ |
@Young331 If you don't see a line Sortmerna stores all the calculations in a database, so the output files will be empty until the processing finished. Can you send me the complete execution trace like the one here? One problem in your command line: I tested your command (modified to use my local directory structure):
|
slurm-10690575.txt same problem. stop at "testing file......" |
You don't need to delete You can try the command I used in my previous message to confirm it works. Then compare it to yours. It appears there is a problem creating the directories/files in your case, although it's not clear why an error is not thrown |
I have exactly the same problem. After indexing, one cpu is active with 100% (even though I have 28 available) and 0% memory usage but it does not show any output. Here is my command: sortmerna \
--ref ../../rRNA_Method/rRNA_databases/silva-arc-16s-id95.fasta \
--ref ../../rRNA_Method/rRNA_databases/silva-bac-16s-id90.fasta \
--reads ../../rRNA_Method/R1.fastq.gz \
--reads ../../rRNA_Method/R2.fastq.gz \
--workdir workdir \
--fastx \
--aligned \
--other \
--best 1 \
--paired_in \
--threads 28 \
-v Can you see any problems? Thank you! EDIT: the problem does not occur with v. 2.1b (I used that version as this is the next lower version available in bioconda). |
The program is not running. Single 100% loaded CPU most likely means the process main thread is stuck in a loop. |
@biocodz Thank you very much for your help. input files compressed with gzip 4.5G R26_S25_L005_R1_001.fastq.gz (268058796 lines) terminal output: sortmerna --ref ../../rRNA_Method/rRNA_databases/silva-arc-16s-id95.fasta
--ref ../../rRNA_Method/rRNA_databases/silva-bac-16s-id90.fasta --reads ../../rRNA_Method/R26_S25_L005_R1_001.fastq.gz --reads ../.
./rRNA_Method/R26_S25_L005_R2_001.fastq.gz --workdir workdir --fastx --aligned --other --best 1 --paired_in --threads 28
-v
[process:1369] === Options processing starts ... ===
Found value: sortmerna
Found flag: --ref
Found value: ../../rRNA_Method/rRNA_databases/silva-arc-16s-id95.fasta of previous flag: --ref
Found flag: --ref
Found value: ../../rRNA_Method/rRNA_databases/silva-bac-16s-id90.fasta of previous flag: --ref
Found flag: --reads
Found value: ../../rRNA_Method/R26_S25_L005_R1_001.fastq.gz of previous flag: --reads
Found flag: --reads
Found value: ../../rRNA_Method/R26_S25_L005_R2_001.fastq.gz of previous flag: --reads
Found flag: --workdir
Found value: workdir of previous flag: --workdir
Found flag: --fastx
Previous flag: --fastx is Boolean. Setting to True
Found flag: --aligned
Previous flag: --aligned is Boolean. Setting to True
Found flag: --other
Previous flag: --other is Boolean. Setting to True
Found flag: --best
Found value: 1 of previous flag: --best
Found flag: --paired_in
Previous flag: --paired_in is Boolean. Setting to True
Found flag: --threads
Found value: 28 of previous flag: --threads
Found flag: -v
[opt_workdir:1066] Using WORKDIR: ["/data/folder/repo/sortmerna_test/workdir"] as specified
process:1453] Processing option: aligned with value:
[opt_aligned:256] Directory and Prefix for the aligned output was not provided. Using default dir/pfx: 'WORKDIR/out/aligned'
[process:1453] Processing option: best with value: 1
[process:1453] Processing option: fastx with value:
[process:1453] Processing option: other with value:
[opt_other:285] other was specified without argument. Will use default Directory and Prefix for the non-aligned output.
[process:1453] Processing option: paired_in with value:
[process:1453] Processing option: reads with value: ../../rRNA_Method/R26_S25_L005_R1_001.fastq.gz
[opt_reads:73] Processing reads file [1] out of total [2] files
[process:1453] Processing option: reads with value: ../../rRNA_Method/R26_S25_L005_R2_001.fastq.gz
[opt_reads:73] Processing reads file [2] out of total [2] files
[process:1453] Processing option: ref with value: ../../rRNA_Method/rRNA_databases/silva-arc-16s-id95.fasta
[opt_ref:166] Processing reference [1] out of total [2] references
[opt_ref:220] File ["/data/folder/repo/sortmerna_test/../../rRNA_Method/rRNA_databases/silva-arc-16s-id95.fasta"] exists and is readable
[process:1453] Processing option: ref with value: ../../rRNA_Method/rRNA_databases/silva-bac-16s-id90.fasta
[opt_ref:166] Processing reference [2] out of total [2] references
[opt_ref:220] File ["/data/folder/repo/sortmerna_test/../../rRNA_Method/rRNA_databases/silva-bac-16s-id90.fasta"] exists and is readable
[process:1453] Processing option: threads with value: 28
[process:1453] Processing option: v with value:
[process:1473] === Options processing done ===
[validate_kvdbdir:1252] Key-value DB location "/data/folder/repo/sortmerna_test/workdir/kvdb"
[validate_kvdbdir:1288] Creating KVDB directory: "/data/folder/repo/sortmerna_test/workdir/kvdb"
[validate_aligned_pfx:1307] Checking output directory: "/data/folder/repo/sortmerna_test/workdir/out"
WARNING: [validate:1557] 'best' [INT] has been set but no output format has been chosen (--blast | --sam | --otu_map). Using default 'b
last'
Program: SortMeRNA version 4.2.0
Copyright: 2016-2020 Clarity Genomics BVBA:
Turnhoutseweg 30, 2340 Beerse, Belgium
2014-2016 Knight Lab:
Department of Pediatrics, UCSD, La Jolla
2012-2014 Bonsai Bioinformatics Research Group:
LIFL, University Lille 1, CNRS UMR 8022, INRIA Nord-Europe
Disclaimer: SortMeRNA comes with ABSOLUTELY NO WARRANTY; without even the
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the GNU Lesser General Public License for more details.
Contributors: Jenya Kopylova jenya.kopylov@gmail.com
Laurent Noé laurent.noe@lifl.fr
Pierre Pericard pierre.pericard@lifl.fr
Daniel McDonald wasade@gmail.com
Mikaël Salson mikael.salson@lifl.fr
Hélène Touzet helene.touzet@lifl.fr
Rob Knight robknight@ucsd.edu
[main:63] Running command:
sortmerna --ref ../../rRNA_Method/rRNA_databases/silva-arc-16s-id95.fasta --ref ../../rRNA_Method/rRNA_databases/silva-bac-16s-id90.fas
ta --reads ../../rRNA_Method/R26_S25_L005_R1_001.fastq.gz --reads ../../rRNA_Method/R26_S25_L005_R2_001.fastq.gz --workdir workdir --fa
stx --aligned --other --best 1 --paired_in --threads 28 -v
Parameters summary:
K-mer size: 19
K-mer interval: 1
Maximum positions to store per unique K-mer: 10000
Total number of databases to index: 2
[build_index:1189] Begin indexing file ../../rRNA_Method/rRNA_databases/silva-arc-16s-id95.fasta of size: 3893959 under index name work
dir/idx/3436099190853847617
Collecting nucleotide distribution statistics .. done [0.029558 sec]
start index part # 0:
(1/3) building burst tries .. done [1.788349 sec]
(2/3) building CMPH hash .. done [3.443561 sec]
(3/3) building position lookup tables .. done [4.097246 sec]
total number of sequences in this part = 3193
temporary file was here: workdirsortmerna_keys_26593.txt
writing kmer data to workdir/idx/3436099190853847617.kmer_0.dat
writing burst tries to workdir/idx/3436099190853847617.bursttrie_0.dat
writing position lookup table to workdir/idx/3436099190853847617.pos_0.dat
writing nucleotide distribution statistics to workdir/idx/3436099190853847617.stats
done.
[build_index:1189] Begin indexing file ../../rRNA_Method/rRNA_databases/silva-bac-16s-id90.fasta of size: 19437013 under index name wor
kdir/idx/15734375058464002811
Collecting nucleotide distribution statistics .. done [0.161797 sec]
start index part # 0:
(1/3) building burst tries .. done [15.391355 sec]
(2/3) building CMPH hash .. done [15.824412 sec]
(3/3) building position lookup tables .. done [69.375532 sec]
total number of sequences in this part = 12798
temporary file was here: workdirsortmerna_keys_26593.txt
writing kmer data to workdir/idx/15734375058464002811.kmer_0.dat
writing burst tries to workdir/idx/15734375058464002811.bursttrie_0.dat
writing position lookup table to workdir/idx/15734375058464002811.pos_0.dat
writing nucleotide distribution statistics to workdir/idx/15734375058464002811.stats
done. and since that not much has happened. If there is any additional information I can provide for you, please let me know. |
The program has a problem reading your
Similar problem recently solved issue 221 |
Hm, something is indeed very fishy: $ gzip --version
gzip 1.6
Copyright (C) 2007, 2010, 2011 Free Software Foundation, Inc.
Copyright (C) 1993 Jean-loup Gailly.
This is free software. You may redistribute copies of it under the terms of
the GNU General Public License <http://www.gnu.org/licenses/gpl.html>.
There is NO WARRANTY, to the extent permitted by law.
Written by Jean-loup Gailly.
$ gzip -l ../../rRNA_Method/R26_S25_L005_R1_001.fastq.gz
compressed uncompressed ratio uncompressed_name
4767463884 21442 -22234131.2% ../../rRNA_Method/R26_S25_L005_R1_001.fastq
$ gzip -l ../../rRNA_Method/R26_S25_L005_R2_001.fastq.gz
compressed uncompressed ratio uncompressed_name
4866268810 22002 -22117292.9% ../../rRNA_Method/R26_S25_L005_R2_001.fastq EDIT: I did not generate the sequence data by myself, hence I don't know how they were generated. |
what's the output of
|
$ gzip -l --verbose ../../rRNA_Method/R26_S25_L005_R1_001.fastq.gz
gzip: ../../rRNA_Method/R26_S25_L005_R1_001.fastq.gz: extra field of 6 bytes ignored
method crc date time compressed uncompressed ratio uncompressed_name
defla 2da034f2 Jun 24 16:00 4767463884 21442 -22234131.2% ../../rRNA_Method/R26_S25_L005_R1_001.fastq
$ gzip -l --verbose ../../rRNA_Method/R26_S25_L005_R2_001.fastq.gz
gzip: ../../rRNA_Method/R26_S25_L005_R2_001.fastq.gz: extra field of 6 bytes ignored
method crc date time compressed uncompressed ratio uncompressed_name
defla 8eb0f23b Jun 24 16:02 4866268810 22002 -22117292.9% ../../rRNA_Method/R26_S25_L005_R2_001.fastq thank you for your help btw |
Could you try to recompress the files as per
|
seems to run smoothly now, thank you very much. I admit, I would not have anticipated that the compression was the problem. Thank you very much and happy easter. :-) |
Glad to hear. Happy Easter! We need to add a better handling of such cases. I'll look into it |
Dear @biocodz I am writing with regards to the running time of sortmeRNA. I have 300 human RNA-seq samples (~ 39.7 million read pairs). Each file is roughly 1GB (so 2 GB for reverse and forward reads). It's taking nearly a day to run a one-sample, is that normal? Here is my code Also attaching the log file here |
Please, refer to Issue 231 |
SortMeRNA is still at version 4.2.0. When is a fast version (at least without the data corruption of 2.1b and the same speed) expected to come out? I love the tool but it has these persisting problems that damage its use... |
The version |
all tests are done now. Preparing the release documentation and the Conda recipe. Few more days... |
No description provided.
The text was updated successfully, but these errors were encountered: