Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compare with needletail and kseq.h #2

Closed
jianshu93 opened this issue Oct 6, 2022 · 4 comments
Closed

compare with needletail and kseq.h #2

jianshu93 opened this issue Oct 6, 2022 · 4 comments

Comments

@jianshu93
Copy link

Hello Rayan,

It would be interesting to compare with biofast: https://github.com/lh3/biofast

It seems for the test dataset: M_abscessus_HiSeq.fq, needtail is still faster than even paralleled version of seq_io. This repo takes about 3 seconds with 12 threads while the needtail takes 0.8 seconds for plain fastq( M_abscessus_HiSeq.fq) file.

Thanks,

Jianshu

@rchikhi
Copy link
Owner

rchikhi commented Oct 16, 2022

dear Jianshu, very interesting, thanks much for the pointer!

I was under the impression that the seq_io benchmark was exhaustive in terms of rust implementations, but it's not. That said, M_abscessus_HiSeq.fq is too small a workload. seq_io claims to parse at 2-4 GB/sec so it should also be close to half a second on that file. A larger file would be needed to mitigate overheads.

@rchikhi rchikhi closed this as completed Oct 16, 2022
@rchikhi
Copy link
Owner

rchikhi commented Oct 23, 2022

Got around to doing some benchmarks: natir/in_place_fastx#1 (comment)

@jianshu93
Copy link
Author

it seems very promising. also the in_place one.

I will also do an experiment on that one. I suggest using metagenomes,like an environmental sample,can be 100G+ and very diverse.

Thanks,

Jianshu

@jianshu93
Copy link
Author

I tried your in place fastx commit (main.rs) with the error (inside the in place fastx clone from you fork):

cargo run --release -- ~/Github/test_data/T4AerOil_sbsmpl5.fa
Finished release [optimized + debuginfo] target(s) in 0.06s
Running target/release/in_place_fastx /Users/jianshuzhao/Github/test_data/T4AerOil_sbsmpl5.fa
Error: NotAFastaFile

I can confirm that it is fasta file:

Screen Shot 2022-10-24 at 4 05 52 PM

Thanks,

Jianshu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants