Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It does not iterate beyond 3 ! #44

Closed
jnarayan81 opened this issue Oct 4, 2018 · 5 comments
Closed

It does not iterate beyond 3 ! #44

jnarayan81 opened this issue Oct 4, 2018 · 5 comments

Comments

@jnarayan81
Copy link

I tried -i 10 but it stop at 3rd iteration !

➜  SALSA git:(master) ✗ python2 run_pipeline.py -a /media/urbe/MyPassport/ONTPoreChopped/concatPacBioONT50K/mince/A_plus_collapsed.fa -l /media/urbe/MyPassport/ONTPoreChopped/concatPacBioONT50K/mince/A_plus_collapsed.fa.fai -b /media/urbe/MyCDrive/JitDATA/adineta_reads/alignment.bed -e GATC -o scaffolds -i 10 -c 100

bedfile loaded
python2 /home/urbe/Tools/SALSA/RE_sites.py -a scaffolds/assembly.cleaned.fasta -e GATC > scaffolds/re_counts_iteration_1
Starting Iteration 1
python2 /home/urbe/Tools/SALSA/make_links.py -b scaffolds/alignment_iteration_1.bed -d scaffolds -i 1 -x abc
bedfile started
bedfile loaded
python2 /home/urbe/Tools/SALSA/fast_scaled_scores.py -d scaffolds -i 1
sort -k 5 -gr scaffolds/contig_links_scaled_iteration_1 > scaffolds/contig_links_scaled_sorted_iteration_1
python2 /home/urbe/Tools/SALSA/layout_unitigs.py -x abc -l scaffolds/contig_links_scaled_sorted_iteration_1 -c 100 -i 1 -d scaffolds
Loading Hi-C links 
Hybrid scaffold graph loaded, nodes = 0 edges = 0
Hi-C implied edges = 0
/home/urbe/Tools/SALSA/break_contigs -a scaffolds/alignment_iteration_2.bed -b scaffolds/breakpoints_iteration_2.txt -l scaffolds/scaffold_length_iteration_2 -i 2 -s 100   > scaffolds/misasm_iteration_2.report
python2 /home/urbe/Tools/SALSA/refactor_breaks.py -d scaffolds -i 2
Starting Iteration 2
python2 /home/urbe/Tools/SALSA/make_links.py -b scaffolds/alignment_iteration_2.bed -d scaffolds -i 2
bedfile started
bedfile loaded
Starting Iteration 2
python2 /home/urbe/Tools/SALSA/layout_unitigs.py -x abc -l scaffolds/contig_links_scaled_sorted_iteration_2 -c 100 -i 2 -d scaffolds
Loading Hi-C links 
Hybrid scaffold graph loaded, nodes = 0 edges = 0
Hi-C implied edges = 0
/home/urbe/Tools/SALSA/break_contigs -a scaffolds/alignment_iteration_3.bed -b scaffolds/breakpoints_iteration_3.txt -l scaffolds/scaffold_length_iteration_3 -i 3 -s 100  > scaffolds/misasm_iteration_3.report
python2 /home/urbe/Tools/SALSA/refactor_breaks.py -d scaffolds -i 3 > scaffolds/misasm_3.log

Interestingly, some of the intermediate files were empty ! Is that alright ?

➜  scaffolds git:(master) ✗ ls -lh
total 4,6G
-rw-rw-r-- 1 urbe urbe 4,4G Okt  2 10:10 alignment_iteration_1.bed
-rw-rw-r-- 1 urbe urbe    0 Okt  2 10:13 alignment_iteration_2.bed
-rw-rw-r-- 1 urbe urbe    0 Okt  2 10:16 alignment_iteration_3.bed
-rw-rw-r-- 1 urbe urbe 114M Okt  2 10:10 assembly.cleaned.fasta
-rw-rw-r-- 1 urbe urbe    0 Okt  2 10:11 breakpoints_iteration_2.txt
-rw-rw-r-- 1 urbe urbe    0 Okt  2 10:13 breakpoints_iteration_3.txt
-rw-rw-r-- 1 urbe urbe 2,1K Okt  2 10:16 commands.log
-rw-rw-r-- 1 urbe urbe    0 Okt  2 10:11 contig_links_iteration_1
-rw-rw-r-- 1 urbe urbe    0 Okt  2 10:13 contig_links_iteration_2
-rw-rw-r-- 1 urbe urbe    0 Okt  2 10:11 contig_links_scaled_iteration_1
-rw-rw-r-- 1 urbe urbe    0 Okt  2 10:13 contig_links_scaled_iteration_2
-rw-rw-r-- 1 urbe urbe    0 Okt  2 10:11 contig_links_scaled_sorted_iteration_1
-rw-rw-r-- 1 urbe urbe    0 Okt  2 10:13 contig_links_scaled_sorted_iteration_2
-rw-rw-r-- 1 urbe urbe    0 Okt  2 10:08 input_breaks
-rw-rw-r-- 1 urbe urbe    0 Okt  2 10:13 links_avoid_iteration_2
-rw-rw-r-- 1 urbe urbe    0 Okt  2 10:16 links_avoid_iteration_3
-rw-rw-r-- 1 urbe urbe    0 Okt  2 10:13 misasm_2.DONE
-rw-rw-r-- 1 urbe urbe    0 Okt  2 10:16 misasm_3.DONE
-rw-rw-r-- 1 urbe urbe    0 Okt  2 10:16 misasm_3.log
-rw-rw-r-- 1 urbe urbe   69 Okt  2 10:13 misasm_iteration_2.report
-rw-rw-r-- 1 urbe urbe   69 Okt  2 10:16 misasm_iteration_3.report
-rw-rw-r-- 1 urbe urbe  34K Okt  2 10:10 re_counts_iteration_1
-rw-rw-r-- 1 urbe urbe  26K Okt  2 10:13 re_counts_iteration_2
-rw-rw-r-- 1 urbe urbe  26K Okt  2 10:16 re_counts_iteration_3
-rwxrwxr-x 1 urbe urbe  34K Okt  2 10:10 scaffold_length_iteration_1
-rw-rw-r-- 1 urbe urbe  25K Okt  2 10:13 scaffold_length_iteration_2
-rw-rw-r-- 1 urbe urbe  25K Okt  2 10:16 scaffold_length_iteration_3
-rw-rw-r-- 1 urbe urbe  72K Okt  2 10:16 scaffolds_FINAL.agp
-rw-rw-r-- 1 urbe urbe 114M Okt  2 10:16 scaffolds_FINAL.fasta
-rw-rw-r-- 1 urbe urbe  45K Okt  2 11:13 scaffolds_FINAL.fasta.fai
-rw-rw-r-- 1 urbe urbe 120K Okt  2 10:13 scaffolds_iteration_1.p
-rw-rw-r-- 1 urbe urbe 120K Okt  2 10:16 scaffolds_iteration_2.p

Did I do anything wrong ?
Note: My HiC reads are just 33*2

@ViriatoII
Copy link

ViriatoII commented Nov 5, 2018

Exact same thing for me! Btw, do you also have 0 missassemblies reported?

I think this issue is repeated. Look here: #24
(Awesome pipeline, Machinegun!)

@ghuryejay
Copy link
Collaborator

Actually, the -i parameter is overridden inside the code. It would stop scaffolding after the linking information in the data is exhausted. I will update the command line options and the README to make it clear. Sorry for the inconvenience.

The reason for not generating scaffolds can be that the alignment.bed file is not sorted by the read IDs. Please let me know if you have sorted the bed file appropriately and still getting the same error.

@ViriatoII
Copy link

ViriatoII commented Nov 6, 2018

Ups, I meant misassemblies, not assemblies (edited the comment now). I have 0 reported missassemblies.

My bed file seems to be ordered correctly:

tig00000091 78514 78615 ERR1725392.100001/1 60 +
tig00000091 78725 78804 ERR1725392.100001/2 43 -
tig00004162 1076584 1076685 ERR1725392.100002/1 60 -
tig00004162 1076245 1076346 ERR1725392.100002/2 60 +
...
tig00004276 315427 315469 ERR1725392.100025/2 26 +
tig00004452 525650 525751 ERR1725392.100030/1 60 -
tig00004207 1453068 1453169 ERR1725392.100030/2 60 +
tig00004284 408440 408482 ERR1725392.100031/1 60 -
tig00004284 414594 414695 ERR1725392.100031/2 60 +
tig00004594 477141 477242 ERR1725392.10003/1 60 - ## although this situation is confusing
tig00000037 914482 914583 ERR1725392.10003/2 46 +
tig00004316 257909 258010 ERR1725392.100036/1 60 -

I read I might not have enough HiC read coverage? I'm repeating a paper that used hirise and found 50 missassemblies.

Here's the read coverage of the contig with most coverage (red -> reverse reads):
image

A contig with less coverage:
image

Cheers,

@jnarayan81
Copy link
Author

Hi @MachineGun
I did sorted correctly, and it works well for 1 round of iterations but fail afterward with no scaffolds_FINAL file.

➜  SALSA git:(master) ✗ python2 run_pipeline.py -a /media/urbe/MyCDrive/JitDATA/adineta_reads/A_plus_collapsed.fa -l /media/urbe/MyCDrive/JitDATA/adineta_reads/A_plus_collapsed.fa.fai -b /media/urbe/MyCDrive/JitDATA/adineta_reads/alignment.bed -e GATC -o scaffolds_avaga2 -i 1 -c 100
bedfile loaded
python2 /home/urbe/Tools/SALSA/RE_sites.py -a scaffolds_avaga2/assembly.cleaned.fasta -e GATC > scaffolds_avaga2/re_counts_iteration_1
Starting Iteration 1
python2 /home/urbe/Tools/SALSA/make_links.py -b scaffolds_avaga2/alignment_iteration_1.bed -d scaffolds_avaga2 -i 1 -x abc
bedfile started
bedfile loaded
python2 /home/urbe/Tools/SALSA/fast_scaled_scores.py -d scaffolds_avaga2 -i 1
sort -k 5 -gr scaffolds_avaga2/contig_links_scaled_iteration_1 > scaffolds_avaga2/contig_links_scaled_sorted_iteration_1
python2 /home/urbe/Tools/SALSA/layout_unitigs.py -x abc -l scaffolds_avaga2/contig_links_scaled_sorted_iteration_1 -c 100 -i 1 -d scaffolds_avaga2
Loading Hi-C links 
Hybrid scaffold graph loaded, nodes = 2334 edges = 1810
Hi-C implied edges = 0
/home/urbe/Tools/SALSA/break_contigs -a scaffolds_avaga2/alignment_iteration_2.bed -b scaffolds_avaga2/breakpoints_iteration_2.txt -l scaffolds_avaga2/scaffold_length_iteration_2 -i 2 -s 100   > scaffolds_avaga2/misasm_iteration_2.report
python2 /home/urbe/Tools/SALSA/refactor_breaks.py -d scaffolds_avaga2 -i 2

One round worked well.

➜  SALSA git:(master) ✗ python2 run_pipeline.py -a /media/urbe/MyCDrive/JitDATA/adineta_reads/A_plus_collapsed.fa -l /media/urbe/MyCDrive/JitDATA/adineta_reads/A_plus_collapsed.fa.fai -b /media/urbe/MyCDrive/JitDATA/adineta_reads/alignment.bed -e GATC -o scaffolds_avaga3            
bedfile loaded
python2 /home/urbe/Tools/SALSA/RE_sites.py -a scaffolds_avaga3/assembly.cleaned.fasta -e GATC > scaffolds_avaga3/re_counts_iteration_1
Starting Iteration 1
python2 /home/urbe/Tools/SALSA/make_links.py -b scaffolds_avaga3/alignment_iteration_1.bed -d scaffolds_avaga3 -i 1 -x abc
bedfile started
bedfile loaded
python2 /home/urbe/Tools/SALSA/fast_scaled_scores.py -d scaffolds_avaga3 -i 1
sort -k 5 -gr scaffolds_avaga3/contig_links_scaled_iteration_1 > scaffolds_avaga3/contig_links_scaled_sorted_iteration_1
python2 /home/urbe/Tools/SALSA/layout_unitigs.py -x abc -l scaffolds_avaga3/contig_links_scaled_sorted_iteration_1 -c 1000 -i 1 -d scaffolds_avaga3
Loading Hi-C links 
Hybrid scaffold graph loaded, nodes = 2324 edges = 1803
Hi-C implied edges = 0
/home/urbe/Tools/SALSA/break_contigs -a scaffolds_avaga3/alignment_iteration_2.bed -b scaffolds_avaga3/breakpoints_iteration_2.txt -l scaffolds_avaga3/scaffold_length_iteration_2 -i 2 -s 100   > scaffolds_avaga3/misasm_iteration_2.report
python2 /home/urbe/Tools/SALSA/refactor_breaks.py -d scaffolds_avaga3 -i 2
Starting Iteration 2
python2 /home/urbe/Tools/SALSA/make_links.py -b scaffolds_avaga3/alignment_iteration_2.bed -d scaffolds_avaga3 -i 2
bedfile started
bedfile loaded
Starting Iteration 2

terminated abruptly

@ViriatoII
Copy link

I'm noticing that something is different in your messages now:

Hybrid scaffold graph loaded, nodes = 2334 edges = 1810

These nodes and edges were 0 in your first post. What changed? A different assembly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants