Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to improve run time? #14

Closed
MartinPippel opened this issue Dec 3, 2021 · 1 comment
Closed

How to improve run time? #14

MartinPippel opened this issue Dec 3, 2021 · 1 comment

Comments

@MartinPippel
Copy link

I tried to solve my issue #10 by creating a Singularity container and it seems to work on our compute cluster. However it is very slow. Do you have any advice how to speed up the deepconsenus step?

I created a toy data set, with the following specs:

2.0G m54345U_211128_022942.chunk0.subreads.bam
60M  m54345U_211128_022942.chunk0.ccs.fasta
1.8G subreads_to_ccs.bam

I started deepconsensus (on a 24 core machine, 250Gb RAM) with the default args:

$SING_CMD python3 -m deepconsensus.scripts.run_deepconsensus --input_subreads_aligned=subreads_to_ccs.bam --input_subreads_unaligned=split/m54345U_211128_022942.chunk0.subreads.bam --input_ccs_fasta=ccs/m54345U_211128_022942.chunk0.ccs.fasta --output_directory=deepconsensus --checkpoint=${CHECKPOINT_PATH}

After almost 7 hours run time it is still in step 2 2_generate_input.
It is also using only a single thread. This is a snapshot of htop:

74354 pippel     20   0 70.6G 64.8G  100M R 100. 25.8  6h26:45 python3 -m deepconsensus.preprocess.generate_input --merged_datasets_path=deepconsensus/1_merge_datasets --output_path=deepconsensus/2_generate_input --input_ccs_fasta=ccs/m54345U_211128_022942.chunk0.ccs.fas
74160 pippel     20   0 70.6G 64.8G  100M S 100. 25.8  6h37:12 python3 -m deepconsensus.preprocess.generate_input --merged_datasets_path=deepconsensus/1_merge_datasets --output_path=deepconsensus/2_generate_input --input_ccs_fasta=ccs/m54345U_211128_022942.chunk0.ccs.fas

Additionally, I do get the following tensorflow error that might be related to my problem:

 $SING_CMD python3 -m deepconsensus.preprocess.generate_input --helpfull
2021-12-03 09:26:35.456162: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /.singularity.d/libs
2021-12-03 09:26:35.456208: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

Is deepconsensus using a gpu? Any advice is highly appreciated.

Thanks,
Martin

@danielecook
Copy link
Collaborator

danielecook commented Dec 3, 2021

The current release of DeepConsensus (v0.1.0) is a proof-of-principle version. In the near future we are planning on a new release (v0.2.0) which should greatly speed up DeepConsensus.

Please let us know if you have any further suggestions for future releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants