-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Haplomerger2 after FALCON run #6
Comments
Dear Amandine,
Only the "gap filling" stage will add new nucleatides to the assembly.
And the "tandem removal" stage may remove some sequences from the assembly.
And the "HaploMerger" stage may connect two super contigs together if
they have suffciently long overlap.
In your case, since you have a very good diploid and well-phased diploid
assemby,
if you do not want HM2 to connect any supper contigs (to avoid
HM2-induced phase switches),
you may set minOverlap to a very large value, such as 99,999,999 (see
details in page 15 of the manual),
in order to forbid contig joining.
Besides, you may try to understand the output file hm.new_scaffolds,
which contains the
layout used to create the reference haploid assembly from the original
diploid assembly.
You can actually create you own choice of haploid assembly based on the
data of this file.
Hope these will help you.
Best regards,
在 2017/8/7 17:08, Amandine Velt 写道:
…
Dear Haplomerger2 developer,
First of all, thank you for this nice tool, which give me very good
results !
But I have one question about on how it works. I have assembled my
genome from pacbio reads (120X of depth) with the FALCON tool. So I
have two file at the end, one file containing the primary contigs
(which can be represented by one allele version) and one file
containing the alternative contig (called haplotig). Thanks to the
length of pacbio reads, my assembly is phased.
But, after analyzing my assembly, I realized that some "haplotigs"
were still present in my primary contigs. That's why I decided to use
haplomerger2 to fix this.
I put my primary contigs and my haplotigs in the same file and I
launched haplomerger2, after creating my own score matrix.
I have very good results, but I have a question. Is my assembly always
phased? In haplomerger2's way of working, is it possible that he could
have changed something in my SVs? More clearly, did he make any
changes in my sequences or did he "just" separate my two alleles.
I thank you again for this super tool which allowed to clean very
effectively my primary contigs!
Best,
Amandine
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#6>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AOtnAB9iY0_UmY4lElbyywfu3Y3jvWxTks5sVtQEgaJpZM4OvLUc>.
--
best regards,
黄盛丰
Shengfeng Huang
中山大学生命科学学院
School of life sciences, Sun Yat-sen university
hshengf2@mail.sysu.edu.cn
http://sklbc.sysu.edu.cn/Team/User/info.aspx?typeid=283&pid=46
--------------------------------------------------------------------------------
本邮件及其附件含有发送给特定个人和用于特定目的的保密信息。如果您不是预期的收件人,请立即删除本邮件并通知发件人。严禁任何非预期的收件人使用、传播、分发或复制本邮件或其附件。
This email and its attachments may contain confidential information intended for a specific individual and purpose. If you are not the intended recipient, you should delete this email and notify the sender immediately. Any use, dissemination, distribution, or copying of this email or its attachments by persons other than the intended recipient(s), is strictly prohibited.
|
Thank you for your advices ! In addition to creating my own score matrix, I also set the maskFilter parameter to 85 and the redundantFilter parameter to 85 as I was losing too many sequences during the first HM2 run. Now, I remove the step 4 (remove tandem errors from haploid assemblies) and I set the minOverlap parameter to 99999999. During the _A3.pathFinder_preparation step, I got the following error : With the command : And in A3.pathFinder step I got "Can not open corrected_assembly_windowmasker.corrected_assembly_windowmaskerx.result/hm.scaffolds!". Have you an idea about it ? Best, EDIT : I forgot to copy the ".ctl" files to my working folder, everything works fine. Thanks again. |
Dear Amandine,
1. I think there are fewer tandem errors in a PacBio-based assembly.
2. If many sequences are sent to the unpaired sequence file, it means
the polymorphism rate is low and many contigs do not have a
corresponding alleles in the raw assembly. What you need to do is to
merge the unpaired sequences and the main assembly file by using
hm.batchB4-5. So what is the size and the polymorphism rate in the
diploid assembly?
3. hm.batchA does not remove any sequences, it just break up the
potential misjoin, you do not need to set the minOverlap in this stage.
4. About the error it appears the chainNet result is not complete. Is it
possible the open file handles are not sufficient? How many contigs do
you have? (If the raw diploid assembly has >2000 scaffold
sequences,bmake sure to lift your system’s openable file handle limit by
invocating “ulimit –n 655350” before running HM2).
5. A very relevant point I forgot to mentioned: The algorithms of de
novo assemblers (and Haplomerger, too) never gurantee phased assemblies.
So I recommend first to create a good reference haploid assembly which
is focusing on correctness, continuity and representativity; then to use
this haploid assembly and a professional phasing tool to create good
phased assemblies for this sequenced individuals or other resequencing
indidivuals. In this case you do not need to re-set minOvelap, after all
HM2 many connect different contigs,but will not tamper the phased sequences.
Best regards,
在 2017/8/7 21:46, Amandine Velt 写道:
…
Thank you for your advices !
In addition to creating my own score matrix, I also set the maskFilter
parameter to 85 and the redundantFilter parameter to 85 as I was
losing too many sequences during the first HM2 run.
Now, I remove the step 4 (remove tandem errors from haploid
assemblies) and I set the minOverlap parameter to 99999999.
During the _A3.pathFinder_preparation step, I got the following error :
'Species included: corrected_assembly_windowmasker
corrected_assembly_windowmaskerx
Set the scoring scheme to score, and set the filter score/ali_len to
100000 .
Set to OVER-WRITING mode!
Set to DELETING mode!
Produce tsc/qsc_ids, total id are 4733 !
Read the zeroMinSpace.rbest.net.gz file and do some basic node
filtering ...
*Modification of non-creatable array value attempted, subscript -1 at
HaploMerger2_20161205/bin/HM_pathFinder_preparation.pl line 252.*'
With the command :
"HaploMerger2_20161205/bin/HM_pathFinder_preparation.pl --Species
corrected_assembly_windowmasker corrected_assembly_windowmaskerx
--scoreScheme=score --filter=100000 --Force --Delete"
Have you an idea about it ?
Best,
Amandine
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#6 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AOtnANBDzfEnCWnoIh6AdvB2xehqCEnRks5sVxU7gaJpZM4OvLUc>.
--
best regards,
黄盛丰
Shengfeng Huang
中山大学生命科学学院
School of life sciences, Sun Yat-sen university
hshengf2@mail.sysu.edu.cn
http://sklbc.sysu.edu.cn/Team/User/info.aspx?typeid=283&pid=46
--------------------------------------------------------------------------------
本邮件及其附件含有发送给特定个人和用于特定目的的保密信息。如果您不是预期的收件人,请立即删除本邮件并通知发件人。严禁任何非预期的收件人使用、传播、分发或复制本邮件或其附件。
This email and its attachments may contain confidential information intended for a specific individual and purpose. If you are not the intended recipient, you should delete this email and notify the sender immediately. Any use, dissemination, distribution, or copying of this email or its attachments by persons other than the intended recipient(s), is strictly prohibited.
|
Before changing the minOverlap option, I had good statistics on my reference assembly after HM2 : I launched BUSCO on the assembly after HM2 and I found more complete genes than before HM2, so I'm very happy of these results, but I don't know if my assembly has kept the same phase. After changing the minOverlap option, I have worse statistics on my reference assembly : So I think the results are better by not changing the minOverlap option, but I don't know how to check if the phase has not changed compared to the assembly of Falcon. Sorry for all these questions and the time it takes you. Best, |
Dear Amandine, I still recommend to compute a better phased genome assembly based on the initial reference genome coming off from the falcon-HaploMerger2 pipeline. Anyway, if you want to know in the reference assembly, which scaffolds from which contigs, Best regards, |
Dear Haplomerger2 developer,
First of all, thank you for this nice tool, which give me very good results !
But I have one question about on how it works. I have assembled my genome (Grape genome, 500Mb, highly heterozygous) from pacbio reads (120X of depth) with the FALCON tool. So I have two file at the end, one file containing the primary contigs (which can be represented by one allele version) and one file containing the alternative contigs (called haplotigs). Thanks to the length of pacbio reads, my assembly is (normally) phased.
But, after analyzing my assembly, I realized that some "haplotigs" were still present in my primary contigs, due to excessive heterozygosity in this "region". That's why I decided to use haplomerger2 to fix this.
I put my primary contigs and my haplotigs in the same file and I launched haplomerger2, after creating my own score matrix.
I have very good results, but I have a question. Is my assembly always phased? In haplomerger2's way of working, is it possible that he could have changed something in my variations? More clearly, did he make any changes in my sequences or did he "just" separate my two alleles.
I thank you again for this super tool which allowed to clean very effectively my primary contigs!
Best,
Amandine
The text was updated successfully, but these errors were encountered: