Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Haplomerger2 after FALCON run #6

Open
a-velt opened this issue Aug 7, 2017 · 5 comments
Open

Haplomerger2 after FALCON run #6

a-velt opened this issue Aug 7, 2017 · 5 comments

Comments

@a-velt
Copy link

a-velt commented Aug 7, 2017

Dear Haplomerger2 developer,

First of all, thank you for this nice tool, which give me very good results !

But I have one question about on how it works. I have assembled my genome (Grape genome, 500Mb, highly heterozygous) from pacbio reads (120X of depth) with the FALCON tool. So I have two file at the end, one file containing the primary contigs (which can be represented by one allele version) and one file containing the alternative contigs (called haplotigs). Thanks to the length of pacbio reads, my assembly is (normally) phased.

But, after analyzing my assembly, I realized that some "haplotigs" were still present in my primary contigs, due to excessive heterozygosity in this "region". That's why I decided to use haplomerger2 to fix this.

I put my primary contigs and my haplotigs in the same file and I launched haplomerger2, after creating my own score matrix.

I have very good results, but I have a question. Is my assembly always phased? In haplomerger2's way of working, is it possible that he could have changed something in my variations? More clearly, did he make any changes in my sequences or did he "just" separate my two alleles.

I thank you again for this super tool which allowed to clean very effectively my primary contigs!

Best,
Amandine

@mapleforest
Copy link
Owner

mapleforest commented Aug 7, 2017 via email

@a-velt
Copy link
Author

a-velt commented Aug 7, 2017

Thank you for your advices !

In addition to creating my own score matrix, I also set the maskFilter parameter to 85 and the redundantFilter parameter to 85 as I was losing too many sequences during the first HM2 run.

Now, I remove the step 4 (remove tandem errors from haploid assemblies) and I set the minOverlap parameter to 99999999.

During the _A3.pathFinder_preparation step, I got the following error :
'Species included: corrected_assembly_windowmasker corrected_assembly_windowmaskerx
Set the scoring scheme to score, and set the filter score/ali_len to 100000 .
Set to OVER-WRITING mode!
Set to DELETING mode!
Produce tsc/qsc_ids, total id are 4733 !
Read the zeroMinSpace.rbest.net.gz file and do some basic node filtering ...
Modification of non-creatable array value attempted, subscript -1 at HaploMerger2_20161205/bin/HM_pathFinder_preparation.pl line 252.'

With the command :
"HaploMerger2_20161205/bin/HM_pathFinder_preparation.pl --Species corrected_assembly_windowmasker corrected_assembly_windowmaskerx --scoreScheme=score --filter=100000 --Force --Delete"

And in A3.pathFinder step I got "Can not open corrected_assembly_windowmasker.corrected_assembly_windowmaskerx.result/hm.scaffolds!".

Have you an idea about it ?

Best,
Amandine

EDIT : I forgot to copy the ".ctl" files to my working folder, everything works fine. Thanks again.

@mapleforest
Copy link
Owner

mapleforest commented Aug 8, 2017 via email

@a-velt
Copy link
Author

a-velt commented Aug 9, 2017

  1. Yes, I seen few differences between my _A_ref.fa and my ref_D.fa files, so few tandem errors.

  2. After HM2 run, with minOverlap=99999999 or the default value, I have a size genome (from ref.fa) of 515 Mb. How can I determine the polymorphism rate ?

  3. Ok, I will re-run HM2 without changing minOverlap parameter for hm.batchA.

  4. The error was because I forgot to copy the ".ctl" files to my working folder. Everything is working fine now.

  5. In the manual of Falcon assembly tool, it is said that Falcon phase heterozygous SNPs, so I want to keep this phase after HM2 run. I have primary contigs and alternative contigs after Falcon, and I want to keep the same sequence. With HM2, I just want to put the remaining alternative contigs in the primary contigs file to the alternative contigs file. And I don't know if HM2 change something in my input sequences. Before launching HM2, I put my primary contigs and alternative contigs in one file.

Before changing the minOverlap option, I had good statistics on my reference assembly after HM2 :
Number of contigs : 1537 (before HM2 I had 1825 contigs)
Genome size : 515 Mb (before HM2 I had 588 Mb)
L50 : 85 contigs (before HM2 I had 110 contigs)

I launched BUSCO on the assembly after HM2 and I found more complete genes than before HM2, so I'm very happy of these results, but I don't know if my assembly has kept the same phase.

After changing the minOverlap option, I have worse statistics on my reference assembly :
Number of contigs : 2335 (more than before HM2 because I added the haplotigs to my primary contigs, this did not cause a problem before changing the minOverlap option)
Genome size : 515 Mb (before HM2 I had 588 Mb)
L50 : 154 contigs (before HM2 I had 110 contigs)

So I think the results are better by not changing the minOverlap option, but I don't know how to check if the phase has not changed compared to the assembly of Falcon.

Sorry for all these questions and the time it takes you.

Best,
Amandine

@mapleforest
Copy link
Owner

Dear Amandine,
If you have two contigs of different phases, then HaploMerger might join these two contigs together.

I still recommend to compute a better phased genome assembly based on the initial reference genome coming off from the falcon-HaploMerger2 pipeline.

Anyway, if you want to know in the reference assembly, which scaffolds from which contigs,
you can refer to the output file named "hm.new_scaffolds" from the batchB*.
This file shows how HM2 chooses the raw contigs to form the final scaffolds.

Best regards,
Shengfeng.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants