Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HaploMerger2 Multi thread implementation #7

Open
JFsanchezherrero opened this issue Sep 8, 2017 · 1 comment
Open

HaploMerger2 Multi thread implementation #7

JFsanchezherrero opened this issue Sep 8, 2017 · 1 comment

Comments

@JFsanchezherrero
Copy link

Dear @mapleforest,
I am working on a genome assembly project which is a quite big genome (1.8 Gbp) and is highly partitioned maybe because of the heterozigosity level. There are nearly 3 millions contigs/scaffolds with an N50 around 800bp.

After reading your manuscript and the manual, I decided to use HaploMerger2 on the genome project I am working on. I knew the N50 was not suitable but I tried anyway.

I installed and run the test examples and I was delighted by the possible implications it might have in my data. I then followed the manual in order to set parameters, variables and threads for HaploMerger2 for my genome project and I sent the command using the run_all_bacth file.

I was astonished when I realised the amount of time It might take only for the initiation.pl step!! Due to the implementation you are using, that step is not multi-threading and so it would read and take one by one each sequence in the fasta file provided. After running for one day and a half it had only converted into nib files around 400k sequences, so I calculated it would take around 7-8 days just to finish the initiation step.

I realised that the initiation step is quite common through the HaploMerger2 workflow so I decided to further check the code and try implementing threads.

And so I did! I tested and debugged the results of a small test and I sent the process for my big genome project. The initiation step process that was about to take 7-8 days running was run in 9 hours using 60 CPUs.

It is still running, [hm.batchB2 right now], so I keep my fingers crossed for it to finishing successfully and give me some light into my genome assembly project!

You can find the information and details of the implementation in mi github profile: https://github.com/JFsanchezherrero/Haplomerger2_Multi-threads

I will give you further details if the code is working but also if it worked for my genome.

Regards,

Jose F.

@mapleforest
Copy link
Owner

mapleforest commented Sep 8, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants