Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance claims #60

Closed
sjneph opened this issue Sep 29, 2023 · 6 comments
Closed

performance claims #60

sjneph opened this issue Sep 29, 2023 · 6 comments
Labels
question Further information is requested

Comments

@sjneph
Copy link

sjneph commented Sep 29, 2023

I meant to do some testing on my own, but I may never get there. I'm one of the authors of BEDOPS. It is not easy to imagine a 6x or so improvement in runtimes, as these are linear (or n log n for sorting) time algorithms in bedops/closest-features utilities.

There are a couple of things that stand out to me in the bioarxiv paper. Mainly, timed tests are at most 1 second for the slowest tool which indicates very, very small inputs (Figures 1 and 2). If the trend held with large inputs, that would be far more interesting and impressive. Right now, the differences might be attributable to things that do not generalize beyond 1 second, for example.

The memory overhead shown for bedops (Figure 5) makes me think that they used the "megarow" build of BEDOPS. That build is meant for very large sequencing results (nanopore and pacbio). It scales to those much larger data at the cost of some small memory overhead but also considerable time overhead. It would be worth measuring time/memory against that larger build but also against the more popular (and default) build for utilities in BEDOPS.

You can use the switch-BEDOPS-binary-type utility to switch between typical (default) and megarow builds of utilities in BEDOPS.

@noamteyssier
Copy link
Owner

noamteyssier commented Sep 29, 2023

Hey Shane, thanks for opening this issue and I welcome inquiries on performance because I agree it is surprising.

Would be happy for you to explore my benchmarks and compare it for yourself:
https://github.com/noamteyssier/gia_benchmark/

If you'd like to explore the algorithms as well and potentially benchmark the individual components you can see them at
https://github.com/noamteyssier/bedrs

bedops version

I did not change any settings from default bedops which I downloaded via brew.

version from bedops --version

bedops
  citation: http://bioinformatics.oxfordjournals.org/content/28/14/1919.abstract
            https://doi.org/10.1093/bioinformatics/bts277
  version:  2.4.41 (typical)
  authors:  Shane Neph & Scott Kuehn

File sizes in figures 1 and 2

You're correct the line numbers are not enormous for the benchmarks in figures 1 and 2 (measuring at 200,000 intervals) - but I wanted to select an interval set size that was similar to the sizes I work with regularly and was also appropriate to measure algorithmic, IO, and memory performances.

gia isn't just a stream-focused tool so at those interval sizes I am using an in-place version (where all intervals are loaded into memory first). Currently I only have a streamable intersect function so I could not do a full comparison of all the same functionality as BEDOPS at very high interval sizes because it's not really comparing apples to apples and the load-in time will definitely cripple the run time. As I mention in the paper (figure 6) I measure about equal run-time costs of gia (in-place) and BEDOPS between 400,000 and 500,000 intervals (which I agree is not very high).

Large inputs

However, for the streamed function I do have (intersect) we can test your claim that it does not generalize. I show in the paper input sizes up to 10,000,000, which I did just for practical reasons of running each command 30 or so times to get some statistics.

I'll do a more thorough analysis soon with multiple runs instead of a single timestamp but for a single run we can get an estimate of run-time and memory overhead.

I'll use gnu time so we can see memory usage as well as runtime.

100,000,000 random intervals

# generate the random bed files
gia random -n 100000000 | gia sort -o random_large_a.bed
gia random -n 100000000 | gia sort -o random_large_b.bed

# get an estimate for bedops
time bedops -i random_large_a.bed random_large_b.bed > /dev/null

# get an estimate for streamed gia
time gia intersect -S -a random_large_a.bed -b random_large_b.bed -o /dev/null

bedops

27.38user 0.71system 0:28.10elapsed 99%CPU (0avgtext+0avgdata 9920maxresident)k
0inputs+0outputs (0major+735minor)pagefaults 0swaps

gia

12.09user 0.64system 0:12.74elapsed 99%CPU (0avgtext+0avgdata 2384maxresident)k
0inputs+0outputs (6major+256minor)pagefaults 0swaps

1,000,000,000 random intervals

# generate the random bed files
gia random -n 1000000000 | gia sort -o random_very_large_a.bed
gia random -n 1000000000 | gia sort -o random_very_large_b.bed

# get an estimate for bedops
time bedops -i random_very_large_a.bed random_very_large_b.bed > /dev/null

# get an estimate for streamed gia
time gia intersect -S -a random_very_large_a.bed -b random_very_large_b.bed -o /dev/null

bedops

258.92user 10.06system 4:29.79elapsed 99%CPU (0avgtext+0avgdata 9952maxresident)k
0inputs+0outputs (20major+720minor)pagefaults 0swaps

gia

113.71user 9.04system 2:03.26elapsed 99%CPU (0avgtext+0avgdata 2560maxresident)k
0inputs+0outputs (6major+270minor)pagefaults 0swaps

10,000,000 random interval sorting

Another function which is comparable is the sort function since both tools must do an in-place sort.

# generate 10,000,000 random intervals
gia random -n 10000000 -o random.bed

# get an estimate for bedops
time sort-bed random.bed > /dev/null

# get an estimate for gia
time gia sort -i random.bed -o /dev/null

bedops

4.85user 0.14system 0:05.01elapsed 99%CPU (0avgtext+0avgdata 356592maxresident)k
0inputs+0outputs (0major+37768minor)pagefaults 0swaps

gia

1.32user 0.07system 0:01.40elapsed 99%CPU (0avgtext+0avgdata 261312maxresident)k
0inputs+0outputs (3major+17979minor)pagefaults 0swaps

@noamteyssier noamteyssier added the question Further information is requested label Sep 29, 2023
@sjneph
Copy link
Author

sjneph commented Sep 29, 2023

Thanks for the quick response.
One thing I'd suggest is that you do those same tests you just showed, but do it in the opposite order, with gia first and bedops second. Caching can be problematic with timed tests. However, if what you are showing holds, it would be pretty exciting for your implementation.
It may not be the focus of your current work, but if the time/memory continue to show such improvements, I'd also suggest that you allow an arbitrary number of inputs for the various operations (like intersect) if you are not already. That's just coming from my experience in working with lots of inputs through the years.
I like that you are doing this and giving ol' bedops a good run. I've been looking at rust for a while and haven't yet been brave enough to start switching over. It's a pretty language.

@noamteyssier
Copy link
Owner

Great point, here are those same timestamps with gia run first and bedops second. it seems like the trend continues to hold.

I'd definitely like to include support for arbitrary number of inputs - I think that's a very useful feature but I haven't had a chance yet to really take a crack at it. I would love for gia to be more of a community project and if you're interested at all in contributing I'd be happy to accept PRs.

I agree rust is a pretty language - the learning curve is steeper than others but honestly I think for anybody who has worked with systems programming languages before the switch will be much easier. I think the language will really continue to grow in bioinformatics. There are excellent resources for getting started as well and in my opinion the best way to learn is to just start arguing with the compiler!

Benchmarks with swapped order

10,000,000 random interval sorting

time gia intersect -S -a random_large_a.bed -b random_large_b.bed -o /dev/null
time bedops -i random_large_a.bed random_large_b.bed > /dev/null

gia

12.25user 0.92system 0:13.20elapsed 99%CPU (0avgtext+0avgdata 2544maxresident)k
0inputs+0outputs (5major+267minor)pagefaults 0swaps

bedops

27.38user 0.84system 0:28.23elapsed 99%CPU (0avgtext+0avgdata 10032maxresident)k
0inputs+0outputs (0major+745minor)pagefaults 0swaps

1,000,000,000 random intervals

time gia intersect -S -a random_very_large_a.bed -b random_very_large_b.bed -o /dev/null
time bedops -i random_very_large_a.bed random_very_large_b.bed > /dev/null

gia

110.84user 8.88system 2:00.45elapsed 99%CPU (0avgtext+0avgdata 2560maxresident)k
0inputs+0outputs (6major+267minor)pagefaults 0swaps

bedops

262.48user 10.19system 4:33.34elapsed 99%CPU (0avgtext+0avgdata 10064maxresident)k
0inputs+0outputs (20major+727minor)pagefaults 0swaps

10,000,000 random interval sorting

time gia sort -i random.bed -o /dev/null
time sort-bed random.bed > /dev/null

gia

1.33user 0.10system 0:01.43elapsed 99%CPU (0avgtext+0avgdata 433440maxresident)k
0inputs+0outputs (6major+39485minor)pagefaults 0swaps

bedops

4.85user 0.12system 0:04.98elapsed 99%CPU (0avgtext+0avgdata 323808maxresident)k
0inputs+0outputs (0major+37623minor)pagefaults 0swaps

@sjneph
Copy link
Author

sjneph commented Sep 29, 2023

Nice!
I'll try to spend some time over the weekend to test things out too. sort-bed is quite fast, so if you're smoking that, you are doing very well. It was written in a somewhat non-standard way where a fixed number is added during reallocations rather than doubling each time. That was tested (circa 2006) extensively, and the inputs we used then showed that the non-standard way outperformed the standard algorithm. Those inputs may have been small compared to today; I am unsure now.
I'm excited to see your results!

@noamteyssier
Copy link
Owner

Thanks!

I’m actually not doing anything clever with my sorting algorithm. The interval container is really just a built in vector of generic intervals. The only trick is that my generic intervals all implement Ordinal functions so I can perform an in place sort that’s already been optimized in rust.

You can see the code in bedrs and how minimal the implementation is:
https://github.com/noamteyssier/bedrs/blob/6d9140d15c68ac4eee4d26767779e91182c54a95/src/traits/container/container.rs#L111-L124

The benefit of using the built in is that it can also be done in parallel without rewriting the underlying structure. The benchmarks I showed were using a single thread, but it can also be even more performant using multiple.

@sjneph
Copy link
Author

sjneph commented Oct 3, 2023

Yes, it would be great to see what improvements there will be with multiple threads. I will close this out - great job so far - I am looking forward to seeing this development continue.

@sjneph sjneph closed this as completed Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants