Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test fails after new install #43

Closed
mathog opened this issue Jan 9, 2018 · 7 comments
Closed

test fails after new install #43

mathog opened this issue Jan 9, 2018 · 7 comments

Comments

@mathog
Copy link

mathog commented Jan 9, 2018

Greetings. On CentOS (64 bit), installed in /usr/local automake 1.15 autoconf to 2.65 then, successfully
installed redundans (according to the log file), but the test failed. Suggestions? The devtoolset-4 is needed for a compiler which is recent enough to understand all the g++ command line switches. Typically programs produced this way need no special treatment when they are run.

cd ~/src
scl enable devtoolset-4 'source <(curl -Ls http://bit.ly/redundans_installer)' 2>&1 | tee redundans_installer.log
#worked!  Says to test it like so:
cd redundans
./redundans.py -v -i test/*.fq.gz -f test/contigs.fa -o test/run1
Options: Namespace(fasta='test/contigs.fa', fastq=['test/5000_1.fq.gz', 'test/5000_2.fq.gz', 'test/600_1.fq.gz', 'test/600_2.fq.gz', 'test/pacbio.fq.gz'], identity=0.51, iters=2, joins=5, limit=0.2, linkratio=0.7, log=<open file '<stderr>', mode 'w' at 0x7fad26fc2270>, longreads=[], mapq=10, minLength=200, nocleaning=True, nogapclosing=True, norearrangements=False, noreduction=True, noscaffolding=True, outdir='test/run1', overlap=0.8, reference='', resume=False, threads=4, verbose=True)

##################################################
[Tue Jan  9 15:17:39 2018] Reduction...
#file name      genome size     contigs heterozygous size       [%]     heterozygous contigs    [%]     identity [%]    possible joins  homozygous size    [%]     homozygous contigs      [%]
test/run1/contigs.fa    163897  245     66377   40.50   221     90.20   94.854  0       97520   59.50   24      9.80

##################################################
[Tue Jan  9 15:17:40 2018] Estimating parameters of libraries...
 Aligning 19504 mates per library...
Insert size statistics                          Mates orientation stats
FastQ files     read length     median  mean    stdev   FF      FR      RF      RR
test/5000_1.fq.gz test/5000_2.fq.gz     50      4986    4981.70 692.22  0       4067    14      0
test/600_1.fq.gz test/600_2.fq.gz       100     599     598.74  47.22   0       10000   0       0

##################################################
[Tue Jan  9 15:17:42 2018] Scaffolding...
 iteration 1.1: test/run1/contigs.reduced.fa    24      97520   39.355  17      94157   7321    2195    0       29603
   19505 pairs. 17302 passed filtering [88.71%]. 1627 in different contigs [8.34%].
    1526 pairs. 558 in different contigs [36.57%].
 iteration 1.2: test/run1/_sspace.1.1.fa        3       97626   39.344  3       97626   87536   6063    821     87536
   19505 pairs. 17607 passed filtering [90.27%]. 182 in different contigs [0.93%].
    1077 pairs. 124 in different contigs [11.51%].
 iteration 2.1: test/run1/_sspace.1.2.fa        3       97626   39.344  3       97626   87536   6063    821     87536
   19505 pairs. 15112 passed filtering [77.48%]. 1295 in different contigs [6.64%].
    3417 pairs. 396 in different contigs [11.59%].
 iteration 2.2: test/run1/_sspace.2.1.fa        1       99115   39.344  1       99115   99115   99115   2310    99115
   19505 pairs. 15151 passed filtering [77.68%]. 0 in different contigs [0.00%].
    3398 pairs. 0 in different contigs [0.00%].

##################################################
[Tue Jan  9 15:17:48 2018] Gap closing...
 iteration 1.1: test/run1/scaffolds.fa  1       99115   39.344  1       99115   99115   99115   2310    99115

##################################################
[Tue Jan  9 15:17:49 2018] Final reduction...
#file name      genome size     contigs heterozygous size       [%]     heterozygous contigs    [%]     identity [%]    possible joins  homozygous size    [%]     homozygous contigs      [%]
Traceback (most recent call last):
  File "./redundans.py", line 521, in <module>
    main()
  File "./redundans.py", line 516, in main
    o.norearrangements, o.verbose, o.log)
  File "./redundans.py", line 391, in redundans
    info = fasta2homozygous(out, open(lastOutFn), identity, overlap,  minLength, threads, verbose=0, log=log)
  File "/home/mathog/src/redundans/bin/fasta2homozygous.py", line 207, in fasta2homozygous
    contig2skip = fasta2skip(out, fasta, faidx, threads, identity, overlap, minLength, verbose)
  File "/home/mathog/src/redundans/bin/fasta2homozygous.py", line 136, in fasta2skip
    plot_histograms(out.name, contig2skip, identities, sizes)
  File "/home/mathog/src/redundans/bin/fasta2homozygous.py", line 160, in plot_histograms
    for i, isize in zip(np.digitize(best, bins, right=1), bestalgsizes):
ValueError: Both x and bins must have non-zero length
rm -rf test/run1
scl enable devtoolset-4 './redundans.py -v -i test/*.fq.gz -f test/contigs.fa -o test/run1'
#fails exactly the same way

Suggestions?

Thanks.

@mathog
Copy link
Author

mathog commented Jan 9, 2018

Hmm, the numbers in the run above do not match those in README.md. Significant or just the documentation being slightly out of sync with the software?

@mathog
Copy link
Author

mathog commented Jan 10, 2018

Inserted this before the original line 136 in fasta2homozygous.py

print "DEBUG 136 identities", identities, " sizes ", sizes, " i ", i #DEBUG

and reran the test. The log file now has:

 Reduction...
#file name      genome size     contigs heterozygous size       [%]     heterozygous contigs    [%]     identity [%]    possible joins  homozygous size [%]    homozygous contigs       [%]
DEBUG 136 identities [0.9531478770131772, 0.949937106918239, 0.9471886495007882, 0.9524475524475524, 0.9423763386027537, 0.9504310344827587, 0.9414634146341463, 0.9590163934426229, 0.9799777530589544, 0.9449152542372882, 0.9450222882615156, 0.9526813880126183, 0.9495268138801262, 0.9509493670886076, 0.9456869009584664, 0.92914653784219, 0.9511400651465798, 0.947107438016529, 0.9576271186440678, 0.9449378330373002, 0.9396709323583181, 0.9481481481481482, 0.9398496240601504, 0.9580152671755725, 0.9496124031007752, 0.95703125, 0.9636363636363636, 0.9559748427672956, 0.9308176100628931, 0.9621848739495799, 0.952914798206278, 0.9565217391304348, 0.9564220183486238, 0.9597156398104265, 0.9501187648456056, 0.9569377990430622, 0.9553349875930521, 0.9428571428571428, 0.8882978723404256, 0.8457446808510638, 0.9547872340425532, 0.9491978609625669, 0.9438502673796791, 0.9592391304347826, 0.9536784741144414, 0.967032967032967, 0.9497206703910615, 0.9606741573033708, 0.9602272727272727, 0.9498525073746312, 0.878698224852071, 0.8372781065088757, 0.9497041420118343, 0.8169491525423729, 0.960960960960961, 0.963963963963964, 0.9637462235649547, 0.9636363636363636, 0.9465408805031447, 0.9577922077922078, 0.9802631578947368, 0.9503311258278145, 0.9662162162162162, 0.7941176470588235, 0.8577405857740585, 0.9646643109540636, 0.9672727272727273, 0.9550561797752809, 0.96484375, 0.8731707317073171, 0.9484978540772532, 0.9655172413793104, 0.9567099567099567, 0.9696969696969697, 0.9641255605381166, 0.7352941176470589, 0.6683417085427136, 0.732620320855615, 0.8108108108108109, 0.7812971342383107, 0.7035175879396985, 0.8206521739130435, 0.9585253456221198, 0.9626168224299065, 0.9626168224299065]  sizes  [6830, 3975, 3806, 3575, 1961, 1856, 1230, 976, 899, 708, 673, 634, 634, 632, 626, 621, 614, 605, 590, 563, 547, 540, 532, 524, 516, 512, 495, 477, 477, 476, 446, 437, 436, 422, 421, 418, 403, 385, 376, 376, 376, 374, 374, 368, 367, 364, 358, 356, 352, 339, 338, 338, 338, 295, 333, 333, 331, 330, 318, 308, 304, 302, 296, 272, 239, 283, 275, 267, 256, 205, 233, 232, 231, 231, 223, 221, 199, 187, 185, 221, 199, 184, 217, 214, 214]  i  85
test/run1/contigs.fa    163897  245     66377   40.50   221     90.20   94.854  0       97520   59.50   24      9.80
<snip>
DEBUG 136 identities []  sizes  []  i 
Traceback (most recent call last):
<snip>
  File "/home/mathog/src/redundans/bin/fasta2homozygous.py", line 138, in fasta2skip
    print "DEBUG 136 identities", identities, " sizes ", sizes, " i ", i  #DEBUG   
UnboundLocalError: local variable 'i' referenced before assignment

In other words, "hits" in fasta2skip the 2nd time it is called is empty. The first time it was called there were some of these. Is no "hits" the 2nd time a reasonable result for this version of the code? Even if that is not itself a problem, the code should handle that state, which it does not. So that much is certainly a bug.

Other than these debug related changes the output is the same.

@mathog
Copy link
Author

mathog commented Jan 11, 2018

Did a clean install on a CentOS 7.4 system. Compared to previous system:

Python 2.7 in /usr/bin only (previous had it in /usr/local/bin, with 2.6 in /usr/bin)
lastal not in path (previous had a version of lastal in the path)
bwa not in path (previous had a version of bwa in the path)
parallel not in path (previous had a version of parallel in the path)
perl 5.16 in /bin/perl (previous had 5.20 in /home/mathog/perl5/perlbrew/perls/perl-5.20.0t/bin/perl)

Test run:

./redundans.py -v -i test/*.fq.gz -f test/contigs.fa -o test/run1
Options: Namespace(fasta='test/contigs.fa', fastq=['test/5000_1.fq.gz', 'test/5000_2.fq.gz', 'test/600_1.fq.gz', 'test/600_2.fq.gz', 'test/pacbio.fq.gz'], identity=0.51, iters=2, joins=5, limit=0.2, linkratio=0.7, log=<open file '<stderr>', mode 'w' at 0x7f2e8b7281e0>, longreads=[], mapq=10, minLength=200, nocleaning=True, nogapclosing=True, norearrangements=False, noreduction=True, noscaffolding=True, outdir='test/run1', overlap=0.8, reference='', resume=False, threads=4, verbose=True)

##################################################
[Thu Jan 11 10:17:28 2018] Reduction...
#file name      genome size     contigs heterozygous size       [%]     heterozygous contigs    [%]     identity [%]    possible joins  homozygous size [%]     homozygous contigs       [%]
[WARNING] numpy or matplotlib missing! Cannot plot histogram
test/run1/contigs.fa    163897  245     66377   40.50   221     90.20   94.854  0       97520   59.50   24      9.80

##################################################
[Thu Jan 11 10:17:28 2018] Estimating parameters of libraries...
 Aligning 19504 mates per library...
Insert size statistics                          Mates orientation stats
FastQ files     read length     median  mean    stdev   FF      FR      RF      RR
test/5000_1.fq.gz test/5000_2.fq.gz     50      4986    4981.70 692.22  0       4067    14      0
test/600_1.fq.gz test/600_2.fq.gz       100     599     598.56  47.48   0       10000   0       0

##################################################
[Thu Jan 11 10:17:29 2018] Scaffolding...
 iteration 1.1: test/run1/contigs.reduced.fa    24      97520   39.355  17      94157   7321    2195    0       29603
   19505 pairs. 17325 passed filtering [88.82%]. 1641 in different contigs [8.41%].
    1526 pairs. 556 in different contigs [36.44%].
 iteration 1.2: test/run1/_sspace.1.1.fa        3       97829   39.344  3       97829   87528   6274    1024    87528
   19505 pairs. 17607 passed filtering [90.27%]. 185 in different contigs [0.95%].
    1188 pairs. 113 in different contigs [9.51%].
 iteration 2.1: test/run1/_sspace.1.2.fa        2       98197   39.344  2       98197   94170   94170   1392    94170
   19505 pairs. 15104 passed filtering [77.44%]. 720 in different contigs [3.69%].
    3420 pairs. 264 in different contigs [7.72%].
 iteration 2.2: test/run1/_sspace.2.1.fa        1       99484   39.344  1       99484   99484   99484   2679    99484
   19505 pairs. 15145 passed filtering [77.65%]. 0 in different contigs [0.00%].
    3396 pairs. 0 in different contigs [0.00%].

##################################################
[Thu Jan 11 10:17:37 2018] Gap closing...
 iteration 1.1: test/run1/scaffolds.fa  1       99484   39.344  1       99484   99484   99484   2679    99484
 iteration 1.2: test/run1/_gapcloser.1.1.fa     1       99503   39.483  1       99503   99503   99503   985     99503

[Thu Jan 11 10:17:39 2018] Final reduction...
#file name      genome size     contigs heterozygous size       [%]     heterozygous contigs    [%]     identity [%]    possible joins  homozygous size [%]     homozygous contigs       [%]
[WARNING] numpy or matplotlib missing! Cannot plot histogram
test/run1/scaffolds.filled.fa   99504   1       0       0.00    0       0.00    0.000   0       99504   100.00  1       100.00

##################################################
[Thu Jan 11 10:17:39 2018] Reporting statistics...
#fname  contigs bases   GC [%]  contigs >1kb    bases in contigs >1kb   N50     N90     Ns      longest
test/contigs.fa 245     163897  40.298  24      117391  3975    233     0       29603
test/run1/contigs.fa    245     163897  40.298  24      117391  3975    233     0       29603
test/run1/contigs.reduced.fa    24      97520   39.355  17      94157   7321    2195    0       29603
test/run1/_sspace.1.1.fa        3       97829   39.344  3       97829   87528   6274    1024    87528
test/run1/_sspace.1.2.fa        2       98197   39.344  2       98197   94170   94170   1392    94170
test/run1/_sspace.2.1.fa        1       99484   39.344  1       99484   99484   99484   2679    99484
test/run1/_sspace.2.2.fa        1       99484   39.344  1       99484   99484   99484   2679    99484
test/run1/scaffolds.fa  1       99484   39.344  1       99484   99484   99484   2679    99484
test/run1/_gapcloser.1.1.fa     1       99503   39.483  1       99503   99503   99503   985     99503
test/run1/_gapcloser.1.2.fa     1       99504   39.483  1       99504   99504   99504   985     99504
test/run1/scaffolds.filled.fa   1       99504   39.483  1       99504   99504   99504   985     99504
test/run1/scaffolds.reduced.fa  1       99504   39.483  1       99504   99504   99504   985     99504

##################################################
[Thu Jan 11 10:17:39 2018] Cleaning-up...
#Time elapsed: 0:00:11.161135

That looks like it might be correct. It diverges from the Centos 6.9 run at iteration 1.2 in scaffolding. Tried to make the 6.8 environment more like that on Centos 7.4 with:

cd ~/src/redundans
export PATH=.:/bin:/usr/bin:/usr/sbin:/sbin
ln -s /usr/local/bin/python2.7 python
# lastal, bwa, parallel no longer in path, python 2.7 is, perl 5.10 is
rm -rf test/run1
./redundans.py -v -i test/*.fq.gz -f test/contigs.fa -o test/run1

Same results on this system as before.

Examined the contents of ~/src/redundans/test/run1 and found that the directory structure was different. On both there are directories named "_sspace.1.1" but the contents were not the same. The one which worked had:


ls -alR _sspace.1.1
_sspace.1.1:
total 124
drwxr-xr-x. 6 mathog biostaff  4096 Jan 11 10:17 .
drwxr-xr-x. 6 mathog biostaff  4096 Jan 11 10:17 ..
drwxr-xr-x. 2 mathog biostaff  4096 Jan 11 10:17 alignoutput
drwxr-xr-x. 2 mathog biostaff  4096 Jan 11 10:17 intermediate_results
drwxr-xr-x. 2 mathog biostaff  4096 Jan 11 10:17 pairinfo
drwxr-xr-x. 2 mathog biostaff  4096 Jan 11 10:17 reads
-rw-r--r--. 1 mathog biostaff 99522 Jan 11 10:17 _sspace.1.1.final.scaffolds.fasta

_sspace.1.1/alignoutput:
total 8
drwxr-xr-x. 2 mathog biostaff 4096 Jan 11 10:17 .
drwxr-xr-x. 6 mathog biostaff 4096 Jan 11 10:17 ..

_sspace.1.1/intermediate_results:
total 204
drwxr-xr-x. 2 mathog biostaff  4096 Jan 11 10:17 .
drwxr-xr-x. 6 mathog biostaff  4096 Jan 11 10:17 ..
-rw-r--r--. 1 mathog biostaff 99000 Jan 11 10:17 _sspace.1.1.formattedcontigs_min0.fasta
-rw-r--r--. 1 mathog biostaff 97893 Jan 11 10:17 _sspace.1.1.lib1.scaffolds.fasta

_sspace.1.1/pairinfo:
total 8
drwxr-xr-x. 2 mathog biostaff 4096 Jan 11 10:17 .
drwxr-xr-x. 6 mathog biostaff 4096 Jan 11 10:17 ..

_sspace.1.1/reads:
total 8
drwxr-xr-x. 2 mathog biostaff 4096 Jan 11 10:17 .
drwxr-xr-x. 6 mathog biostaff 4096 Jan 11 10:17 ..

the one which failed had:

 ls -alR _sspace.1.1
_sspace.1.1:
total 136
drwxrwxr-x  6 mathog mathog  4096 Jan 11 10:56 .
drwxrwxr-x 10 mathog mathog  4096 Jan 11 10:56 ..
drwxrwxr-x  2 mathog mathog  4096 Jan 11 10:56 alignoutput
drwxrwxr-x  2 mathog mathog  4096 Jan 11 10:56 intermediate_results
drwxrwxr-x  2 mathog mathog  4096 Jan 11 10:56 pairinfo
drwxrwxr-x  2 mathog mathog  4096 Jan 11 10:56 reads
-rw-rw-r--  1 mathog mathog   904 Jan 11 10:56 _sspace.1.1.final.evidence
-rw-rw-r--  1 mathog mathog 99507 Jan 11 10:56 _sspace.1.1.final.scaffolds.fasta
-rw-rw-r--  1 mathog mathog  1124 Jan 11 10:56 _sspace.1.1.logfile.txt
-rw-rw-r--  1 mathog mathog  1802 Jan 11 10:56 _sspace.1.1.summaryfile.txt

_sspace.1.1/alignoutput:
total 8
drwxrwxr-x 2 mathog mathog 4096 Jan 11 10:56 .
drwxrwxr-x 6 mathog mathog 4096 Jan 11 10:56 ..

_sspace.1.1/intermediate_results:
total 220
drwxrwxr-x 2 mathog mathog  4096 Jan 11 10:56 .
drwxrwxr-x 6 mathog mathog  4096 Jan 11 10:56 ..
-rw-rw-r-- 1 mathog mathog 99000 Jan 11 10:56 _sspace.1.1.formattedcontigs_min0.fasta
-rw-rw-r-- 1 mathog mathog  2328 Jan 11 10:56 _sspace.1.1_lib1.foundlinks.txt
-rw-rw-r-- 1 mathog mathog     0 Jan 11 10:56 _sspace.1.1_lib1.repeats.txt
-rw-rw-r-- 1 mathog mathog   420 Jan 11 10:56 _sspace.1.1.lib1.scaffolds
-rw-rw-r-- 1 mathog mathog   904 Jan 11 10:56 _sspace.1.1.lib1.scaffolds.evidence
-rw-rw-r-- 1 mathog mathog 97878 Jan 11 10:56 _sspace.1.1.lib1.scaffolds.fasta
-rw-rw-r-- 1 mathog mathog    34 Jan 11 10:56 _sspace.1.1.libraries.txt

_sspace.1.1/pairinfo:
total 208
drwxrwxr-x 2 mathog mathog   4096 Jan 11 10:56 .
drwxrwxr-x 6 mathog mathog   4096 Jan 11 10:56 ..
-rw-rw-r-- 1 mathog mathog      0 Jan 11 10:56 _sspace.1.1.lib1.pairing_distribution.csv
-rw-rw-r-- 1 mathog mathog 202638 Jan 11 10:56 _sspace.1.1.lib1.pairing_issues

_sspace.1.1/reads:
total 8
drwxrwxr-x 2 mathog mathog 4096 Jan 11 10:56 .
drwxrwxr-x 6 mathog mathog 4096 Jan 11 10:56 ..

There are also many more _sspace.1.1* in the run1 directory on the one that failed than there are on the one which completed. Bizarre.

The 7.4 system has bash 4.2.46 and the 6.9 system has bash 4.1.2. Hard to believe that matters.

I don't see any error messages in the log files in run1.

Suggestions???

Thanks.

@mathog
Copy link
Author

mathog commented Jan 11, 2018

On a second Centos 6.9 system, which mounts my ~/src from the first 6.9 machine, and has pretty much identical software, the test was run again. This uses the exact same redundans install that was unable to complete the test on the first machine. On this machine it completed, but the results differ from the Centos 7.4 machine! Here is this 3rd set of "test" results:

./redundans.py -v -i test/*.fq.gz -f test/contigs.fa -o test/run1
Options: Namespace(fasta='test/contigs.fa', fastq=['test/5000_1.fq.gz', 'test/5000_2.fq.gz', 'test/600_1.fq.gz', 'test/600_2.fq.gz', 'test/pacbio.fq.gz'], identity=0.51, iters=2, joins=5, limit=0.2, linkratio=0.7, log=<open file '<stderr>', mode 'w' at 0x7f97e63341e0>, longreads=[], mapq=10, minLength=200, nocleaning=True, nogapclosing=True, norearrangements=False, noreduction=True, noscaffolding=True, outdir='test/run1', overlap=0.8, reference='', resume=False, threads=4, verbose=True)

##################################################
[Thu Jan 11 12:47:47 2018] Reduction...
#file name      genome size     contigs heterozygous size       [%]     heterozygous contigs    [%]     identity [%]    possible joins  homozygous size [%]     homozygous contigs     [%]
test/run1/contigs.fa    163897  245     66377   40.50   221     90.20   94.854  0       97520   59.50   24      9.80

##################################################
[Thu Jan 11 12:48:18 2018] Estimating parameters of libraries...
 Aligning 19504 mates per library...
Insert size statistics                          Mates orientation stats
FastQ files     read length     median  mean    stdev   FF      FR      RF      RR
test/5000_1.fq.gz test/5000_2.fq.gz     50      4986    4981.70 692.22  0       4067    14      0
test/600_1.fq.gz test/600_2.fq.gz       100     599     598.80  47.26   0       10000   0       0

##################################################
[Thu Jan 11 12:48:18 2018] Scaffolding...
 iteration 1.1: test/run1/contigs.reduced.fa    24      97520   39.355  17      94157   7321    2195    0       29603
   19505 pairs. 17330 passed filtering [88.85%]. 1658 in different contigs [8.50%].
    1506 pairs. 559 in different contigs [37.12%].
 iteration 1.2: test/run1/_sspace.1.1.fa        4       97554   39.344  3       97311   87541   5743    749     87541
   19505 pairs. 17640 passed filtering [90.44%]. 212 in different contigs [1.09%].
    1053 pairs. 109 in different contigs [10.35%].
 iteration 2.1: test/run1/_sspace.1.2.fa        3       97869   39.344  3       97869   87541   6301    1064    87541
   19505 pairs. 15114 passed filtering [77.49%]. 1294 in different contigs [6.63%].
    3412 pairs. 392 in different contigs [11.49%].
 iteration 2.2: test/run1/_sspace.2.1.fa        1       100051  39.344  1       100051  100051  100051  3246    100051
   19505 pairs. 15152 passed filtering [77.68%]. 0 in different contigs [0.00%].
    3392 pairs. 0 in different contigs [0.00%].

##################################################
[Thu Jan 11 12:48:33 2018] Gap closing...
 iteration 1.1: test/run1/scaffolds.fa  1       100051  39.344  1       100051  100051  100051  3246    100051
 iteration 1.2: test/run1/_gapcloser.1.1.fa     1       100546  39.563  1       100546  100546  100546  1412    100546

##################################################
[Thu Jan 11 12:48:34 2018] Final reduction...
#file name      genome size     contigs heterozygous size       [%]     heterozygous contigs    [%]     identity [%]    possible joins  homozygous size [%]     homozygous contigs     [%]
test/run1/scaffolds.filled.fa   100547  1       0       0.00    0       0.00    0.000   0       100547  100.00  1       100.00

##################################################
[Thu Jan 11 12:48:35 2018] Reporting statistics...
#fname  contigs bases   GC [%]  contigs >1kb    bases in contigs >1kb   N50     N90     Ns      longest
test/contigs.fa 245     163897  40.298  24      117391  3975    233     0       29603
test/run1/contigs.fa    245     163897  40.298  24      117391  3975    233     0       29603
test/run1/contigs.reduced.fa    24      97520   39.355  17      94157   7321    2195    0       29603
test/run1/_sspace.1.1.fa        4       97554   39.344  3       97311   87541   5743    749     87541
test/run1/_sspace.1.2.fa        3       97869   39.344  3       97869   87541   6301    1064    87541
test/run1/_sspace.2.1.fa        1       100051  39.344  1       100051  100051  100051  3246    100051
test/run1/_sspace.2.2.fa        1       100051  39.344  1       100051  100051  100051  3246    100051
test/run1/scaffolds.fa  1       100051  39.344  1       100051  100051  100051  3246    100051
test/run1/_gapcloser.1.1.fa     1       100546  39.563  1       100546  100546  100546  1412    100546
test/run1/_gapcloser.1.2.fa     1       100547  39.562  1       100547  100547  100547  1407    100547
test/run1/scaffolds.filled.fa   1       100547  39.562  1       100547  100547  100547  1407    100547
test/run1/scaffolds.reduced.fa  1       100547  39.562  1       100547  100547  100547  1407    100547

##################################################
[Thu Jan 11 12:48:35 2018] Cleaning-up...
#Time elapsed: 0:00:48.213627


Comparing the two Centos 6.9 machines...

PATH: identical
alias: identical
bash: same version 4.1.2
python --version: 2.7 (failed) 2.7.14(passed) [Centos 7.4, python 2.7.5]
site-packges: probably differences
perl: same version (same binary)

The directory structure in test/run1 when it ran well on the second Centos 6.9 machine matched the
Centos 7.4 machine, not the first Centos 6.9 machine (where the test failed).

@mathog
Copy link
Author

mathog commented Jan 11, 2018

Figured it out! Version numbers on failing machine were NumPy(1.9.2) and matplotlib(1.5.0). The similar 6.9 machine which worked had 1.13.3 and 2.1.0. So upgraded to NumPy(1.14.0) and matplotlib(2.1.1) on the problem machine and afterwards the redundans test would run to completion.

The test results are not stable from run to run though, ie:

rm -rf test/run1
./redundans.py -v -i test/*.fq.gz -f test/contigs.fa -o test/run1 >/tmp/run1A 2>&1
rm -rf test/run1
./redundans.py -v -i test/*.fq.gz -f test/contigs.fa -o test/run1 >/tmp/run1B 2>&1
diff /tmp/run1A /tmp/run1B

and they differ in many lines.

@lpryszcz
Copy link
Collaborator

Hi, Thanks a lot for solving that!
Yes, I've noticed that individual runs may produce slightly different results - this is due to snap-aligner. It's super fast, it runs in multiple threads and I guess the differences comes from the fact that snap-aligner outputs reads in ambigous order (depending which thread finishes first), while redundans process only certain number of reads to speeds the things up, so some reads may be included and some not in individual runs. Using BWA MEM was giving stable results, yet it's much slower, especially for larger genomes.

@lpryszcz
Copy link
Collaborator

lpryszcz commented Jan 12, 2018

numpy/numpy#4219 - added safecheck in Redundans 37f832f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants