-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HaploMerger2 Multi thread implementation #7
Comments
Dear Jose,
HM2 will create files for each contig. When you have 3 million contigs,
the Linux system takes forever to finish creating these files.
HM2 will never work on an assembly with an N50 of 800bp, and you have 3
million contigs/scaffolds because your assembly is too fragmented.
I am very sorry to say that HM2 is not suitable for this situation.
You may look for some way to increase the continuity of the raw assembly before
trying HM2.
Best regards,
Shengfeng.
在 2017/9/8 16:39, Jose Francisco Sanchez-Herrero 写道:
…
Dear @mapleforest <https://github.com/mapleforest>,
I am working on a genome assembly project which is a quite big genome
(1.8 Gbp) and is highly partitioned maybe because of the
heterozigosity level. There are nearly 3 millions contigs/scaffolds
with an N50 around 800bp.
After reading your manuscript and the manual, I decided to use
HaploMerger2 on the genome project I am working on. I knew the N50 was
not suitable but I tried anyway.
I installed and run the test examples and I was delighted by the
possible implications it might have in my data. I then followed the
manual in order to set parameters, variables and threads for
HaploMerger2 for my genome project and I sent the command using the
run_all_bacth file.
I was astonished when I realised the amount of time It might take only
for the initiation.pl step!! Due to the implementation you are using,
that step is not multi-threading and so it would read and take one by
one each sequence in the fasta file provided. After running for one
day and a half it had only converted into nib files around 400k
sequences, so I calculated it would take around 7-8 days just to
finish the initiation step.
I realised that the initiation step is quite common through the
HaploMerger2 workflow so I decided to further check the code and try
implementing threads.
And so I did! I tested and debugged the results of a small test and I
sent the process for my big genome project. The initiation step
process that was about to take 7-8 days running was run in 9 hours
using 60 CPUs.
It is still running, [hm.batchB2 right now], so I keep my fingers
crossed for it to finishing successfully and give me some light into
my genome assembly project!
You can find the information and details of the implementation in mi
github profile:
https://github.com/JFsanchezherrero/Haplomerger2_Multi-threads
I will give you further details if the code is working but also if it
worked for my genome.
Regards,
Jose F.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AOtnAIm4LMCAWIt1MlWaCss0ZhWpoY66ks5sgP1bgaJpZM4PQ3gu>.
--
best regards,
黄盛丰
Shengfeng Huang
中山大学生命科学学院
School of life sciences, Sun Yat-sen university
hshengf2@mail.sysu.edu.cn
http://sklbc.sysu.edu.cn/Team/User/info.aspx?typeid=283&pid=46
--------------------------------------------------------------------------------
本邮件及其附件含有发送给特定个人和用于特定目的的保密信息。如果您不是预期的收件人,请立即删除本邮件并通知发件人。严禁任何非预期的收件人使用、传播、分发或复制本邮件或其附件。
This email and its attachments may contain confidential information intended for a specific individual and purpose. If you are not the intended recipient, you should delete this email and notify the sender immediately. Any use, dissemination, distribution, or copying of this email or its attachments by persons other than the intended recipient(s), is strictly prohibited.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Dear @mapleforest,
I am working on a genome assembly project which is a quite big genome (1.8 Gbp) and is highly partitioned maybe because of the heterozigosity level. There are nearly 3 millions contigs/scaffolds with an N50 around 800bp.
After reading your manuscript and the manual, I decided to use HaploMerger2 on the genome project I am working on. I knew the N50 was not suitable but I tried anyway.
I installed and run the test examples and I was delighted by the possible implications it might have in my data. I then followed the manual in order to set parameters, variables and threads for HaploMerger2 for my genome project and I sent the command using the run_all_bacth file.
I was astonished when I realised the amount of time It might take only for the initiation.pl step!! Due to the implementation you are using, that step is not multi-threading and so it would read and take one by one each sequence in the fasta file provided. After running for one day and a half it had only converted into nib files around 400k sequences, so I calculated it would take around 7-8 days just to finish the initiation step.
I realised that the initiation step is quite common through the HaploMerger2 workflow so I decided to further check the code and try implementing threads.
And so I did! I tested and debugged the results of a small test and I sent the process for my big genome project. The initiation step process that was about to take 7-8 days running was run in 9 hours using 60 CPUs.
It is still running, [hm.batchB2 right now], so I keep my fingers crossed for it to finishing successfully and give me some light into my genome assembly project!
You can find the information and details of the implementation in mi github profile: https://github.com/JFsanchezherrero/Haplomerger2_Multi-threads
I will give you further details if the code is working but also if it worked for my genome.
Regards,
Jose F.
The text was updated successfully, but these errors were encountered: