Error in repeat binary #18

zkstewart · 2017-10-03T22:47:53Z

Hi,

I've attempted to assemble a genome using PacBio Sequel reads, and encountered an error on my first run and when I attempt to --resume the run. I have been running these as jobs on a PBS system on SUSE. I don't think it is a memory error since the job would be "killed" if I tried to use more than the amount I allotted.

I am using github commit 9c3f166 (v 2.1b) to assemble this.

For further information, I'm assembling a genome from a eukaryotic organism that does not have closely related species (< 100mya) genomes previously sequenced, so I don't have a strong idea of the exact genome size. I have used kmer-based genome estimates on corrected reads and assembled this genome with about 6 assemblers, so the consensus seems to be a genome of roughly 295-360MB in size (kmer estimates provide the lower range, many assemblers including abruijn's polished assembly provide the upper range). Using the lower range of that estimate, I have roughly 115x coverage including all the reads in my subreads. The stats of my raw reads are below (just in case you need this information to track down why this is occurring).

Number of contigs: 1247879
Shortest contig: 1001
Longest contig: 76985

N50: 13316
Median: 12214
Mean: 12937.225609213714

Below are the stderr from the first run and the --resume run

[2017-09-30 09:32:52] INFO: Running ABruijn
[2017-09-30 09:32:52] INFO: Assembling reads
[2017-09-30 09:32:52] INFO: Reading FASTA
[2017-09-30 09:42:22] INFO: Generating solid k-mer index
[2017-09-30 09:42:26] INFO: Counting kmers (1/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2017-09-30 10:51:57] INFO: Counting kmers (2/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2017-09-30 11:08:20] INFO: Filling index table
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2017-09-30 11:55:37] INFO: Extending reads
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2017-09-30 19:49:12] INFO: Assembled 3244 draft contigs
[2017-09-30 19:49:12] INFO: Generating contig sequences
[2017-09-30 20:25:11] INFO: Running BLASR
[2017-10-01 12:09:43] INFO: Computing rough consensus
[2017-10-01 12:45:26] INFO: Performing repeat analysis
[2017-10-01 12:45:27] INFO: Reading FASTA
[2017-10-01 12:55:09] INFO: Building repeat graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2017-10-01 21:06:40] INFO: Simplifying the graph
[2017-10-01 21:06:40] INFO: Aligning reads to the graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
terminate called without an active exception
[2017-10-02 04:36:49] ERROR: Error: Error in repeat binary: Command '['abruijn-repeat', '-k', '17', '-l', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/abruijn.log', '-t', '12', '-v', '5000', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/polished_0.fasta', '/home/user/genome_assembly/assembly_ready/species_subreads.fasta', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir']' returned non-zero exit status -6

[2017-10-02 22:36:10] INFO: Running ABruijn
[2017-10-02 22:36:10] INFO: Resuming previous run
[2017-10-02 22:36:10] INFO: Performing repeat analysis
[2017-10-02 22:36:10] INFO: Reading FASTA
[2017-10-02 22:45:31] INFO: Building repeat graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2017-10-03 04:02:13] INFO: Simplifying the graph
[2017-10-03 04:02:13] INFO: Aligning reads to the graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 
[2017-10-03 09:33:50] ERROR: Resource temporarily unavailable
[2017-10-03 09:33:50] ERROR: Error: Error in repeat binary: Command '['abruijn-repeat', '-k', '17', '-l', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/abruijn.log', '-t', '12', '-v', '5000', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/polished_0.fasta', '/home/user/genome_assembly/assembly_ready/species_subreads.fasta', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir']' returned non-zero exit status 1

This is an excerpt from the log file (it's quite large), I've tried to just get the relevant portions and have used ellipses to abbreviate repetitive sections. If you want to see the whole log file I can do that.

[2017-09-30 09:32:52] root: INFO: Running ABruijn						
[2017-09-30 09:32:52] root: DEBUG: Estimated genome size: 303080041						
[2017-09-30 09:32:52] root: DEBUG: Chosen k-mer size: 17						
[2017-09-30 09:32:52] root: INFO: Assembling reads						
[2017-09-30 09:32:52] root: DEBUG: -----Begin assembly log------						
[2017-09-30 09:32:52] DEBUG: Build date: Sep 28 2017 21:15:18						
[2017-09-30 09:32:52] INFO: Reading FASTA						
[2017-09-30 09:42:22] DEBUG: Mean read length: 7481						
[2017-09-30 09:42:22] INFO: Generating solid k-mer index						
[2017-09-30 09:42:22] DEBUG: Hard threshold set to 11						
[2017-09-30 09:42:22] DEBUG: Started kmer counting						
[2017-09-30 09:42:26] INFO: Counting kmers (1/2):						
[2017-09-30 10:51:57] INFO: Counting kmers (2/2):						
[2017-09-30 11:08:20] DEBUG: Genome size estimate: 291937348						
[2017-09-30 11:08:20] DEBUG: Filtered 472253 repetitive kmers						
[2017-09-30 11:08:20] DEBUG: Estimated minimum kmer coverage: 18, 279329292 unique kmers selected						
[2017-09-30 11:08:20] INFO: Filling index table						
[2017-09-30 11:08:39] DEBUG: Kmer index size: 11425325554						
[2017-09-30 11:55:37] INFO: Extending reads						
[2017-09-30 11:56:05] DEBUG: Mean read coverage: 75						
[2017-09-30 11:56:17] DEBUG: Assembled contig						
	With 31 reads					
	Start read: -m54105_170625_161744/56558013/0_15235					
	At position: 15					
	leftTip: 0 rightTip: 0					
	Suspicios: 0					
	Mean overlaps: 118					
	Inner reads: 0					
[2017-09-30 11:56:17] DEBUG: Inner: 2120 covered: 3752 total: 9115006						
[2017-09-30 11:57:03] DEBUG: Assembled contig						
	With 97 reads					
	Start read: -m54105_170623_233908/28902028/43_14990					
	At position: 34					
	leftTip: 0 rightTip: 0					
	Suspicios: 1					
	Mean overlaps: 86					
	Inner reads: 0					
[2017-09-30 11:57:03] DEBUG: Inner: 7162 covered: 12394 total: 9115006						
[2017-09-30 11:57:25] DEBUG: Assembled contig						
…						
…						
[2017-09-30 19:48:42] DEBUG: Inner: 2628858 covered: 3768626 total: 9115006						
[2017-09-30 19:49:12] INFO: Assembled 3244 draft contigs						
[2017-09-30 19:49:12] INFO: Generating contig sequences						
-----------End assembly log------------						
[2017-09-30 20:25:11] root: INFO: Running BLASR						
[2017-09-30 20:25:11] root: DEBUG: Reading contigs file						
[2017-10-01 12:06:06] root: DEBUG: Sorting alignment file						
[2017-10-01 12:09:43] root: INFO: Computing rough consensus						
[2017-10-01 12:09:43] root: DEBUG: Reading contigs file						
[2017-10-01 12:45:26] root: INFO: Performing repeat analysis						
[2017-10-01 12:45:26] root: DEBUG: -----Begin repeat analyser log------						
[2017-10-01 12:45:27] DEBUG: Build date: Sep 28 2017 21:14:44						
[2017-10-01 12:45:27] INFO: Reading FASTA						
[2017-10-01 12:55:09] INFO: Building repeat graph						
[2017-10-01 12:55:09] DEBUG: Hard threshold set to 1						
[2017-10-01 12:55:09] DEBUG: Started kmer counting						
[2017-10-01 12:57:10] DEBUG: Kmer index size: 344619478						
[2017-10-01 20:48:04] DEBUG: Computing gluepoints						
[2017-10-01 20:48:25] DEBUG: Initializing edges						
[2017-10-01 21:06:24] DEBUG: *	5152	=+contig_430	0	455	455	
…						
…						
[2017-10-01 21:06:40] INFO: Simplifying the graph						
[2017-10-01 21:06:40] DEBUG: 12800 tips removed						
[2017-10-01 21:06:40] DEBUG: Removed 1140 fake loops						
[2017-10-01 21:06:40] DEBUG: Unrolled 447, removed 576						
[2017-10-01 21:06:40] DEBUG: Removed 6345 edges						
[2017-10-01 21:06:40] DEBUG: Added 2149 edges						
[2017-10-01 21:06:40] DEBUG: Unrolled 20, removed 38						
[2017-10-01 21:06:40] DEBUG: Removed 531 edges						
[2017-10-01 21:06:40] DEBUG: Added 518 edges						
[2017-10-01 21:06:40] DEBUG: Removed 1 chimeric junctions						
[2017-10-01 21:06:40] INFO: Aligning reads to the graph						
[2017-10-01 21:06:41] DEBUG: Hard threshold set to 1						
[2017-10-01 21:06:41] DEBUG: Started kmer counting						
[2017-10-01 21:08:21] DEBUG: Kmer index size: 323587749						
[2017-10-02 04:22:24] DEBUG: Aligned 6749792 / 9115006						
[2017-10-02 04:23:42] DEBUG: Mean edge coverage: 99						
[2017-10-02 04:23:42] DEBUG: *	21618	20897	0	1	105	1.06061
…						
…						
[2017-10-02 04:23:43] DEBUG: Unique coverage threshold 105						
[2017-10-02 04:23:45] DEBUG: Outputs: -14731 0						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: Outputs: -15258 1						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: Outputs: -18150 0						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: Outputs: 20552 3						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: Outputs: -13609 0						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: Outputs: 18386 0						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: Outputs: -16283 0						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: Outputs: -19265 1						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: Outputs: 15722 0						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: Outputs: -7893 1						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: 						
…						
…						
[2017-10-02 04:23:45] DEBUG: 						
[2017-10-02 04:23:45] DEBUG: R   21617 1 -> 2 (2,1) 142010	99					
[2017-10-02 04:23:45] DEBUG: R   21615 1 -> 2 (2,0) 36996	100					
[2017-10-02 04:23:45] DEBUG: R   21613 1 -> 2 (2,1) 18727	95					
[2017-10-02 04:23:45] DEBUG: R   21605 1 -> 2 (1,2) 26980	81					
[2017-10-02 04:23:45] DEBUG: R   21601 1 -> 2 (1,2) 40593	106					
[2017-10-02 04:23:45] DEBUG: R   21600 1 -> 2 (2,1) 23606	113					
[2017-10-02 04:23:45] DEBUG: R   21599 1 -> 2 (0,2) 132657	111					
[2017-10-02 04:23:45] DEBUG: R   21595 1 -> 2 (2,1) 103909	87					
[2017-10-02 04:23:45] DEBUG: R   21586 1 -> 2 (0,2) 10054	109					
[2017-10-02 04:23:45] DEBUG: R   21584 1 -> 2 (2,0) 3451	89					
…						
…						
[2017-10-02 04:36:49] root: ERROR: Error: Error in repeat binary: Command '['abruijn-repeat', '-k', '17', '-l', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/abruijn.log', '-t', '12', '-v', '5000', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/polished_0.fasta', '/home/user/genome_assembly/assembly_ready/species_subreads.fasta', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir']' returned non-zero exit status -6						
[2017-10-02 22:36:10] root: INFO: Running ABruijn						
[2017-10-02 22:36:10] root: DEBUG: Estimated genome size: 303080041						
[2017-10-02 22:36:10] root: DEBUG: Chosen k-mer size: 17						
[2017-10-02 22:36:10] root: INFO: Resuming previous run						
[2017-10-02 22:36:10] root: INFO: Performing repeat analysis						
[2017-10-02 22:36:10] root: DEBUG: -----Begin repeat analyser log------						
[2017-10-02 22:36:10] DEBUG: Build date: Sep 28 2017 21:14:44						
[2017-10-02 22:36:10] INFO: Reading FASTA						
[2017-10-02 22:45:31] INFO: Building repeat graph						
[2017-10-02 22:45:31] DEBUG: Hard threshold set to 1						
[2017-10-02 22:45:31] DEBUG: Started kmer counting						
[2017-10-02 22:46:38] DEBUG: Kmer index size: 344619478						
[2017-10-03 03:43:14] DEBUG: Computing gluepoints						
[2017-10-03 03:43:35] DEBUG: Initializing edges						
[2017-10-03 04:01:59] DEBUG: *	14224	=+contig_472	0	9	9	
…						
…						
[2017-10-03 04:02:13] INFO: Simplifying the graph						
[2017-10-03 04:02:13] DEBUG: 12798 tips removed						
[2017-10-03 04:02:13] DEBUG: Removed 1160 fake loops						
[2017-10-03 04:02:13] DEBUG: Unrolled 441, removed 572						
[2017-10-03 04:02:13] DEBUG: Removed 6351 edges						
[2017-10-03 04:02:13] DEBUG: Added 2147 edges						
[2017-10-03 04:02:13] DEBUG: Unrolled 21, removed 42						
[2017-10-03 04:02:13] DEBUG: Removed 532 edges						
[2017-10-03 04:02:13] DEBUG: Added 518 edges						
[2017-10-03 04:02:13] DEBUG: Removed 1 chimeric junctions						
[2017-10-03 04:02:13] INFO: Aligning reads to the graph						
[2017-10-03 04:02:14] DEBUG: Hard threshold set to 1						
[2017-10-03 04:02:14] DEBUG: Started kmer counting						
[2017-10-03 04:03:39] DEBUG: Kmer index size: 324344284						
[2017-10-03 09:32:13] DEBUG: Aligned 6748592 / 9115006						
[2017-10-03 09:33:15] DEBUG: Mean edge coverage: 99						
[2017-10-03 09:33:15] DEBUG: *	-21645	20897	0	1	101	1.0202
…						
…						
[2017-10-03 09:33:15] DEBUG: Unique coverage threshold 105						
[2017-10-03 09:33:17] DEBUG: Outputs: 4330 0						
[2017-10-03 09:33:17] DEBUG: 	13662 2 0					
[2017-10-03 09:33:17] DEBUG: 						
[2017-10-03 09:33:17] DEBUG: Outputs: 21367 2						
[2017-10-03 09:33:17] DEBUG: 	21368 26 0					
[2017-10-03 09:33:17] DEBUG: 						
[2017-10-03 09:33:17] DEBUG: Outputs: 1047 0						
[2017-10-03 09:33:17] DEBUG: 	20778 1 0					
[2017-10-03 09:33:17] DEBUG: 						
[2017-10-03 09:33:17] DEBUG: Outputs: 7396 1						
[2017-10-03 09:33:17] DEBUG: 	7397 2 1					
[2017-10-03 09:33:17] DEBUG: 	7405 32 1					
[2017-10-03 09:33:17] DEBUG: 						
[2017-10-03 09:33:17] DEBUG: Outputs: 8526 0						
…						
…						
[2017-10-03 09:33:18] DEBUG: 						
[2017-10-03 09:33:18] DEBUG: R   21644 1 -> 2 (2,1) 142010	99					
[2017-10-03 09:33:18] DEBUG: R   21641 1 -> 2 (1,2) 68125	106					
[2017-10-03 09:33:18] DEBUG: R   21640 1 -> 2 (2,1) 22106	112					
[2017-10-03 09:33:18] DEBUG: R   21632 1 -> 2 (1,2) 143630	90					
[2017-10-03 09:33:18] DEBUG: R   21630 1 -> 2 (2,0) 3169	108					
[2017-10-03 09:33:18] DEBUG: R   21629 1 -> 2 (1,2) 12733	79					
[2017-10-03 09:33:18] DEBUG: R   21625 1 -> 2 (0,2) 7580	105					
[2017-10-03 09:33:18] DEBUG: R   21624 1 -> 2 (2,0) 116506	92					
…						
…						
[2017-10-03 09:33:18] DEBUG: R   16067 1 -> 2 (1,2) 10058	104					
[2017-10-03 09:33:18] DEBUG: R   16134 1 -> 2 (1,2) 5990	107					
[2017-10-03 09:33:50] ERROR: Resource temporarily unavailable						
-----------End assembly log------------						
[2017-10-03 09:33:50] root: ERROR: Error: Error in repeat binary: Command '['abruijn-repeat', '-k', '17', '-l', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/abruijn.log', '-t', '12', '-v', '5000', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/polished_0.fasta', '/home/user/genome_assembly/assembly_ready/species_subreads.fasta', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir']' returned non-zero exit status 1

If you could give me a hand to find out what is causing this issue that would be really appreciated.

Thanks,
Zac.

The text was updated successfully, but these errors were encountered:

mikolmogorov · 2017-10-04T01:57:51Z

Hi Zac,

Thanks for the detailed report! This seems to be a strange error, provided that all memory/performance bottlenecks have passed... In the second run, the error says "Resource temporarily unavailable", which might mean many things. My first suspicion would be that abruijn can't create an output file for contigs - this might happen because of lack of space or other system policies. It seems that this error also might arise from exceeding memory/thread limitations (but it sounds less likely to me).

Could you actually send me the full log file - it might help to figure where exactly error is happening? In the mean time, I suggest you to try the latest version from the devel branch (it contain a few updates that might be important). You still should be able to --resume though (however if it turns to be working I would suggest to restart from scratch later). Also check that there are no problems with creating files in your filesystem (maybe its full?).

Best,
Mikhail

zkstewart · 2017-10-04T03:32:32Z

Thanks for the response Mikhail,

I'll re-run it with the latest commit - it took about 11 hrs to reach the point where it crashed when I resumed the run so it will take me a bit of time to see if the latest update changes anything. With regards to the filesystem, I should have no problems with file creation/filesystem limits, other assemblers that are currently running aren't having any issues. The HPC environment I use doesn't place limits on file use, but they would email me if I was abusing that.

Full log should be attached here.

abruijn.log

mikolmogorov · 2017-10-04T07:51:24Z

Thanks! I am pretty sure the problem is related to generation of output files (might not be the filesystem though). If you will have a chance, could you also do 'ls -l' of the working directory?

zkstewart · 2017-10-04T21:31:45Z

Okay, so I resumed the run using the latest version of ABruijn, and this time the program ends after 1 hr (instead of 11 hrs). The stderr says this:

[2017-10-04 22:15:50] INFO: Running ABruijn
[2017-10-04 22:15:50] INFO: Resuming previous run
[2017-10-04 22:15:50] INFO: Computing rough consensus
[2017-10-04 22:47:03] INFO: Performing repeat analysis
[2017-10-04 22:47:04] INFO: Reading FASTA
[2017-10-04 22:57:03] INFO: Building repeat graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2017-10-04 23:13:14] ERROR: Graph is not symmetric
[2017-10-04 23:13:14] ERROR: Command '['abruijn-repeat', '-k', '17', '-l', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/abruijn.log', '-t', '12', '-v', '5000', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/polished_0.fasta', '/home/user/genome_assembly/assembly_ready/species_subreads.fasta', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir']' returned non-zero exit status 1

Also, ls -l of the working directory is below.

total 88236068
-rw------- 1 user default    12121523 Oct  4 23:13 abruijn.log
-rw------- 1 user default          39 Oct  4 22:47 abruijn.save
-rw------- 1 user default 89614032792 Oct  1 12:09 blasr_0.m5
-rw------- 1 user default   368491296 Sep 30 20:20 draft_assembly.fasta
-rw------- 1 user default   359056296 Oct  4 22:47 polished_0.fasta

I'm assuming the "Graph is not symmetric" line is relatively important, but I'm not sure how that relates to my input data. Perhaps I need to re-run the program from start to finish and see what changes?

edit: Should have included the log. This is the new lines appended to the log I attached above.

[2017-10-04 22:15:50] root: INFO: Running ABruijn
[2017-10-04 22:15:50] root: DEBUG: Estimated genome size: 303080041
[2017-10-04 22:15:50] root: DEBUG: Chosen k-mer size: 17
[2017-10-04 22:15:50] root: INFO: Resuming previous run
[2017-10-04 22:15:50] root: INFO: Computing rough consensus
[2017-10-04 22:47:03] root: INFO: Performing repeat analysis
[2017-10-04 22:47:03] root: DEBUG: -----Begin repeat analyser log------
[2017-10-04 22:47:04] DEBUG: Build date: Oct  4 2017 13:37:13
[2017-10-04 22:47:04] INFO: Reading FASTA
[2017-10-04 22:57:03] INFO: Building repeat graph
[2017-10-04 22:57:03] DEBUG: Hard threshold set to 1
[2017-10-04 22:57:03] DEBUG: Started kmer counting
[2017-10-04 22:59:19] DEBUG: Kmer index size: 344619478
[2017-10-04 23:09:04] DEBUG: Found 6461012 overlaps
[2017-10-04 23:10:37] DEBUG: Left 1316792 overlaps after filtering
[2017-10-04 23:10:37] DEBUG: Computing gluepoints
[2017-10-04 23:12:42] DEBUG: Created 2720138 gluepoints
[2017-10-04 23:12:44] DEBUG: Tandems removed: 1172 left, 1175 right, 3732 both
[2017-10-04 23:12:44] DEBUG: Initializing edges
[2017-10-04 23:13:14] ERROR: Graph is not symmetric
-----------End assembly log------------
[2017-10-04 23:13:14] root: ERROR: Command '['abruijn-repeat', '-k', '17', '-l', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/abruijn.log', '-t', '12', '-v', '5000', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/polished_0.fasta', '/home/user/genome_assembly/assembly_ready/species_subreads.fasta', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir']' returned non-zero exit status 1

mikolmogorov · 2017-10-04T21:44:48Z

Interesting. This error indicates an issue in the algorithm, rather than input data, so there is no need to re-run for now - I will try to fix this problem shortly.

mikolmogorov · 2017-10-04T23:01:40Z

By the way, are you sure that you got the latest revision from devel branch (not the master)? This log looks a bit outdated..

zkstewart · 2017-10-04T23:59:43Z

You're right, I updated it with the master branch. I'll run the devel branch and give an update regarding what happens.

zkstewart · 2017-10-05T01:50:20Z

Error appears to be the same, but the output is different. Stderr:

[2017-10-05 11:15:56] INFO: Running ABruijn
[2017-10-05 11:15:56] INFO: Resuming previous run
[2017-10-05 11:15:56] INFO: Performing repeat analysis
[2017-10-05 11:15:56] INFO: Reading sequences
[2017-10-05 11:25:23] INFO: Building repeat graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2017-10-05 11:37:38] ERROR: Caught unhandled exception: Graph is not symmetric
[2017-10-05 11:37:38] ERROR:    abruijn-repeat(_Z16exceptionHandlerv+0xd0) [0x44c5b0]
[2017-10-05 11:37:38] ERROR:    /pkg/suse12/software/gcccore/4.9.3/lib64/libstdc++.so.6(+0x64296) [0x2aaaaab38296]
[2017-10-05 11:37:38] ERROR:    /pkg/suse12/software/gcccore/4.9.3/lib64/libstdc++.so.6(+0x642e1) [0x2aaaaab382e1]
[2017-10-05 11:37:38] ERROR:    /pkg/suse12/software/gcccore/4.9.3/lib64/libstdc++.so.6(+0x644f8) [0x2aaaaab384f8]
[2017-10-05 11:37:38] ERROR:    abruijn-repeat(_ZN11RepeatGraph15initializeEdgesERK16OverlapContainer+0x11a9) [0x443979]
[2017-10-05 11:37:38] ERROR:    abruijn-repeat(_ZN11RepeatGraph5buildEv+0x55f) [0x4483af]
[2017-10-05 11:37:38] ERROR:    abruijn-repeat(main+0x557) [0x41e407]
[2017-10-05 11:37:38] ERROR:    /lib64/libc.so.6(__libc_start_main+0xf5) [0x2aaaab2086e5]
[2017-10-05 11:37:38] ERROR:    abruijn-repeat(_start+0x29) [0x41f1b9]
[2017-10-05 11:37:39] ERROR: Command '['abruijn-repeat', '-k', '17', '-l', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/abruijn.log', '-t', '12', '-v', '5000', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/polished_0.fasta', '/home/user/genome_assembly/assembly_ready/species_subreads.fasta', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir']' returned non-zero exit status 1

Appended to abruijn.log

[2017-10-05 11:15:56] root: INFO: Running ABruijn
[2017-10-05 11:15:56] root: DEBUG: Estimated genome size: 303080041
[2017-10-05 11:15:56] root: DEBUG: Chosen k-mer size: 17
[2017-10-05 11:15:56] root: INFO: Resuming previous run
[2017-10-05 11:15:56] root: INFO: Performing repeat analysis
[2017-10-05 11:15:56] root: DEBUG: -----Begin repeat analyser log------
[2017-10-05 11:15:56] DEBUG: Build date: Oct  5 2017 10:02:26
[2017-10-05 11:15:56] INFO: Reading sequences
[2017-10-05 11:25:23] INFO: Building repeat graph
[2017-10-05 11:25:23] DEBUG: Hard threshold set to 1
[2017-10-05 11:25:23] DEBUG: Started kmer counting
[2017-10-05 11:26:27] DEBUG: Kmer index size: 344619478
[2017-10-05 11:34:42] DEBUG: Found 6461012 overlaps
[2017-10-05 11:36:18] DEBUG: Left 1316792 overlaps after filtering
[2017-10-05 11:36:18] DEBUG: Computing gluepoints
[2017-10-05 11:37:36] DEBUG: Created 2720012 gluepoints
[2017-10-05 11:37:38] DEBUG: Tandems removed: 1172 left, 1175 right, 3732 both
[2017-10-05 11:37:38] DEBUG: Initializing edges
[2017-10-05 11:37:38] ERROR: Caught unhandled exception: Graph is not symmetric
[2017-10-05 11:37:38] ERROR:    abruijn-repeat(_Z16exceptionHandlerv+0xd0) [0x44c5b0]
[2017-10-05 11:37:38] ERROR:    /pkg/suse12/software/gcccore/4.9.3/lib64/libstdc++.so.6(+0x64296) [0x2aaaaab38296]
[2017-10-05 11:37:38] ERROR:    /pkg/suse12/software/gcccore/4.9.3/lib64/libstdc++.so.6(+0x642e1) [0x2aaaaab382e1]
[2017-10-05 11:37:38] ERROR:    /pkg/suse12/software/gcccore/4.9.3/lib64/libstdc++.so.6(+0x644f8) [0x2aaaaab384f8]
[2017-10-05 11:37:38] ERROR:    abruijn-repeat(_ZN11RepeatGraph15initializeEdgesERK16OverlapContainer+0x11a9) [0x443979]
[2017-10-05 11:37:38] ERROR:    abruijn-repeat(_ZN11RepeatGraph5buildEv+0x55f) [0x4483af]
[2017-10-05 11:37:38] ERROR:    abruijn-repeat(main+0x557) [0x41e407]
[2017-10-05 11:37:38] ERROR:    /lib64/libc.so.6(__libc_start_main+0xf5) [0x2aaaab2086e5]
[2017-10-05 11:37:38] ERROR:    abruijn-repeat(_start+0x29) [0x41f1b9]

mikolmogorov · 2017-10-05T05:19:19Z

Thank you! I got this error in one of my datasets as well - will try to fix shortly.

mikolmogorov · 2017-10-07T05:28:37Z

Please try the latest version from the master branch, it should be resolved now.

zkstewart · 2017-10-07T22:40:21Z

The original problem has been fixed now, but I unfortunately seem to have encountered another error. I'll rerun the program to see if it happens again, but because it took 12 hours to get to this point it will be a while before I can verify that it is repeatable/occurs at roughly the same time.

This is the stderr:

[2017-10-07 15:54:56] INFO: Running ABruijn
[2017-10-07 15:54:57] INFO: Resuming previous run
[2017-10-07 15:54:57] INFO: Performing repeat analysis
[2017-10-07 15:54:57] INFO: Reading sequences
[2017-10-07 16:04:14] INFO: Building repeat graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2017-10-07 16:20:31] INFO: Simplifying the graph
[2017-10-07 16:20:31] INFO: Aligning reads to the graph
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2017-10-08 04:11:24] ERROR: Caught unhandled exception: Resource temporarily unavailable
[2017-10-08 04:11:24] ERROR:    abruijn-repeat(_Z16exceptionHandlerv+0xd0) [0x450f60]
[2017-10-08 04:11:24] ERROR:    /pkg/suse12/software/gcccore/4.9.3/lib64/libstdc++.so.6(+0x64296) [0x2aaaaab38296]
[2017-10-08 04:11:24] ERROR:    /pkg/suse12/software/gcccore/4.9.3/lib64/libstdc++.so.6(+0x642e1) [0x2aaaaab382e1]
[2017-10-08 04:11:24] ERROR:    /pkg/suse12/software/gcccore/4.9.3/lib64/libstdc++.so.6(+0x644f8) [0x2aaaaab384f8]
[2017-10-08 04:11:24] ERROR:    /pkg/suse12/software/gcccore/4.9.3/lib64/libstdc++.so.6(_ZSt20__throw_system_errori+0x7f) [0x2aaaaab8e87f]
[2017-10-08 04:11:24] ERROR:    /pkg/suse12/software/gcccore/4.9.3/lib64/libstdc++.so.6(_ZNSt6thread15_M_start_threadESt10shared_ptrINS_10_Impl_baseEE+0x11d) [0x2aaaaab902cd]
[2017-10-08 04:11:24] ERROR:    abruijn-repeat(_Z17processInParallelImEvRKSt6vectorIT_SaIS1_EESt8functionIFvRKS1_EEmb+0x171) [0x45f561]
[2017-10-08 04:11:24] ERROR:    abruijn-repeat(_ZN15ContigGenerator18generateAlignmentsERK10ContigPath+0x236) [0x45d646]
[2017-10-08 04:11:24] ERROR:    abruijn-repeat(_ZN15ContigGenerator14generateLinearERK10ContigPath+0x28) [0x45da98]
[2017-10-08 04:11:24] ERROR:    abruijn-repeat(_ZNK15OutputGenerator23generateContigSequencesERSt6vectorI15UnbranchingPathSaIS1_EE+0x98e) [0x45660e]
[2017-10-08 04:11:24] ERROR: Command '['abruijn-repeat', '-k', '17', '-l', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/abruijn.log', '-t', '12', '-v', '5000', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir/polished_0.fasta', '/home/user/genome_assembly/assembly_ready/species_subreads.fasta', '/lustre/home-lustre/user/genome_assembly/abruijn/abruijn_dir']' returned non-zero exit status 1

ls -l:

total 88244196
-rw------- 1 user default    17391744 Oct  8 04:11 abruijn.log
-rw------- 1 user default          39 Oct  7 15:54 abruijn.save
-rw------- 1 user default 89614032792 Oct  1 12:09 blasr_0.m5
-rw------- 1 user default   368491296 Sep 30 20:20 draft_assembly.fasta
-rw------- 1 user default     1236467 Oct  8 04:11 graph_before_rr.dot
-rw------- 1 user default     1813629 Oct  7 16:20 graph_raw.dot
-rw------- 1 user default   359056296 Oct  4 22:47 polished_0.fasta

And the details from the latest run in the log file are attached.
abruijn.appended.log

The main thing I notice is that the program crashed at the same time the graph_before_rr.dot file was created, so it's possibly related.

mikolmogorov · 2017-10-07T23:44:09Z

Thank you,
This is the same error as in the very beginning, but now we know where it happened. What happens is the program wants to create a new thread but the system does not allow it - seems that you are hitting the maximum thread limit (either of kernel or PBS system). How many threads are you using, and is it consistent with how many threads you are requesting through PBS submission? I suggest to try the following - try to run again, but request more threads with PBS (maybe twice more), than you will be actually setting for abruijn (-t argument).

Best,
Mikhail

zkstewart · 2017-10-08T01:00:58Z

I have to admit I'm not 100% certain how the HPC system is set up, but to my knowledge it works by allocating the number of CPU cores and memory within the submission script, so I am not sure how that relates to the underlying kernel/threads. Previously, I was running the program with 12 abruijn threads and asking PBS for 12 cores. I'll try it with 12 threads but 16 cores (asking for more cores means a significantly longer queue time, and running it with less threads means longer run time), and if the same error occurs I'll lower the threads to 8 and ask for 16 cores again. Will update with results, and thanks for your help thus far.

zkstewart · 2017-10-10T00:56:44Z

I've tried running the program with 16 cores/12 threads as well as 16 cores/8 threads with the same result as above. As soon as the program either starts writing the graph_before_rr.dot file or begins the step after writing this file, it crashes. At this point I'm not sure how to proceed other than by reducing the number of abruijn threads to 1 (which will take approximately 100-140 hours to run to the point where it is currently crashing), but I will try this to exhaust the possibilities that it's related to threading requirements.

Below is the result of ulimit -a if that might reveal some obvious problem with the HPC setup, although I've never encountered any problems with this previously.

core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 514210
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 16384
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 300000
cpu time               (seconds, -t) unlimited
max user processes              (-u) 514210
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

mikolmogorov · 2017-10-10T02:36:29Z

Ok, my current best guess is that system might not like threads to be created/destroyed too often. I tried to address this in the latest devel commit - could you check? Otherwise, is there any chance I can take a look on the data - it would be much faster I I could reproduce it on my machine..

Best,
Mikhail

zkstewart · 2017-10-10T04:17:40Z

I'm currently running the devel branch now, and will let you know how it goes. If it doesn't work, I'll retry the single threaded master branch and discuss with my supervisors if they have any problems with sharing the data (data is still private, but they would probably not have any major problems with this).

mikolmogorov · 2017-10-10T16:33:27Z

Let me know if the devel branch does not work - I will prepare a special single-threaded version.

zkstewart · 2017-10-12T01:19:42Z

Success! The changes to the devel branch have allowed the program to make it through the point where it was normally crashing. It's currently running BLASR after the repeat processing step has completed. I'll let you know how the final output looks, and thanks again for all your help.

mikolmogorov · 2017-10-12T06:03:09Z

Yay! Now when it's working I would actually suggest to re-run assembly module as well (if you haven't done so yet after using results from v2.1) :)

Also, do you expect your genome to be highly diploid? I see some patterns of this in the log.. Currently ABruijn is a bit conservative and will not span alternative alleles if the structural difference is high, which might lead to fragmentation.

zkstewart · 2017-10-12T08:13:06Z

I'll make sure to rerun it with the latest updates once this current run completes. With regards to its diploid nature... I'm not entirely certain since the animal is very distantly related to any relatives that have had a genome previously sequenced, and the organism was taken directly from the wild. I'll keep that in mind regarding ABruijn's assembly - perhaps something like Redundans might help? At the moment I'm trying everything I can get my hands on.

zkstewart · 2017-10-15T00:08:59Z

Final update before I close this issue. I was able to run the assembly to completion, and will now begin to reassemble it completely using the latest updates. The final stats appear to reflect what you have suggested with respect to the genome having some heterozygosity, as the final genome size is a fair bit smaller after the final repeat and bubble correction compared to the polished_0.fasta file. I have been experimenting with programs like quickmerge which may make the ABruijn output useful as a 'backbone' assembly which can be filled in using other assemblies assuming that its repeat regions are better resolved than other assemblers. It will take me some time to figure out the optimal configuration of things. Thanks again for fixing these issues, best of luck with the future development of this program.

Zac.

zkstewart closed this as completed Oct 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in repeat binary #18

Error in repeat binary #18

zkstewart commented Oct 3, 2017 •

edited

Loading

mikolmogorov commented Oct 4, 2017

zkstewart commented Oct 4, 2017

mikolmogorov commented Oct 4, 2017 •

edited

Loading

zkstewart commented Oct 4, 2017 •

edited

Loading

mikolmogorov commented Oct 4, 2017

mikolmogorov commented Oct 4, 2017

zkstewart commented Oct 4, 2017

zkstewart commented Oct 5, 2017

mikolmogorov commented Oct 5, 2017

mikolmogorov commented Oct 7, 2017

zkstewart commented Oct 7, 2017

mikolmogorov commented Oct 7, 2017

zkstewart commented Oct 8, 2017

zkstewart commented Oct 10, 2017

mikolmogorov commented Oct 10, 2017 •

edited

Loading

zkstewart commented Oct 10, 2017

mikolmogorov commented Oct 10, 2017

zkstewart commented Oct 12, 2017

mikolmogorov commented Oct 12, 2017

zkstewart commented Oct 12, 2017

zkstewart commented Oct 15, 2017

Error in repeat binary #18

Error in repeat binary #18

Comments

zkstewart commented Oct 3, 2017 • edited Loading

mikolmogorov commented Oct 4, 2017

zkstewart commented Oct 4, 2017

mikolmogorov commented Oct 4, 2017 • edited Loading

zkstewart commented Oct 4, 2017 • edited Loading

mikolmogorov commented Oct 4, 2017

mikolmogorov commented Oct 4, 2017

zkstewart commented Oct 4, 2017

zkstewart commented Oct 5, 2017

mikolmogorov commented Oct 5, 2017

mikolmogorov commented Oct 7, 2017

zkstewart commented Oct 7, 2017

mikolmogorov commented Oct 7, 2017

zkstewart commented Oct 8, 2017

zkstewart commented Oct 10, 2017

mikolmogorov commented Oct 10, 2017 • edited Loading

zkstewart commented Oct 10, 2017

mikolmogorov commented Oct 10, 2017

zkstewart commented Oct 12, 2017

mikolmogorov commented Oct 12, 2017

zkstewart commented Oct 12, 2017

zkstewart commented Oct 15, 2017

zkstewart commented Oct 3, 2017 •

edited

Loading

mikolmogorov commented Oct 4, 2017 •

edited

Loading

zkstewart commented Oct 4, 2017 •

edited

Loading

mikolmogorov commented Oct 10, 2017 •

edited

Loading