Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BTreeSet and BTreeMap append optimization #34666

Open
xosmig opened this issue Jul 5, 2016 · 9 comments
Open

BTreeSet and BTreeMap append optimization #34666

xosmig opened this issue Jul 5, 2016 · 9 comments
Labels
A-collections Area: std::collections. C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@xosmig
Copy link
Contributor

xosmig commented Jul 5, 2016

When the largest key of one tree is less than the smallest key of another tree, there is a way to merge trees using only O(log n) operations.
Maybe logarithmic merge should be a separate method.
I'm ready to work on it.

UPD:
Also sometimes it may be better to push elements of one tree to another one with O(min_length log(max_length)) operations instead of rebuild with O(n + m) operations (like in #32526).

But the problem is that it's hard to determine which approach is better in a particular case.

@apasel422 apasel422 added the A-collections Area: std::collections. label Jul 6, 2016
@xosmig
Copy link
Contributor Author

xosmig commented Jul 7, 2016

CC @gereeter

@alfiedotwtf
Copy link
Contributor

Am I missing something? I don't see append() on BTreeSet

@xosmig
Copy link
Contributor Author

xosmig commented Nov 25, 2016

@alfiedotwtf
Copy link
Contributor

Gah! I don't know how I managed to land on old documentation:

https://doc.rust-lang.org/1.6.0/std/collections/struct.BTreeSet.html

Thanks @xosmig :)

@kirillkh
Copy link

kirillkh commented Dec 5, 2016

I would love to see this fixed! I have recently developed a data structure [1], which is supposed to be faster than BTreeSet for a specific usage pattern, but I can't write a fair benchmark until BTreeSet supports a O(k + log n) delete-range operation.

[1] https://users.rust-lang.org/t/announcing-teardowntree/8259

@Mark-Simulacrum Mark-Simulacrum added the I-slow Issue: Problems and improvements with respect to performance of generated code. label May 16, 2017
@Mark-Simulacrum Mark-Simulacrum added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Jul 25, 2017
@Andlon
Copy link

Andlon commented Aug 18, 2020

I think perhaps the current append implementation is a little problematic. Arguably, one would expect a "batch" operation like set.append(other_set) to be generally the better choice when you have to insert one set into the other, because presumably it is able to take into consideration certain optimizations that can not be made when inserting one item at a time. However, the complexity O(n + m) means that if, say, n is large and m is small, the cost of calling append is enormous.

I ran into this problem today, where for a relatively large computational problem, in which I called append roughly 500k times on with n ~ 100k and m ~ 10, my program would not finish for half an hour. After changing to insert in a loop, the program finished in 2 seconds.

There are two problems here, as I see it:

  • append sounds like a straightforward batch operation, so at least I personally wouldn't expect it to have such a severe drawback compared to repeated insertion for unbalanced set sizes.
  • This drawback is not communicated by the docs for append itself.

Of course, ideally append would work well in all cases, but deciding when to pick each strategy is surely not a simple problem...

@memoryruins
Copy link
Contributor

cc @ssomers (due to their many improvements to BTreeMap/BTreeSet in the last couple of years)

@ssomers
Copy link
Contributor

ssomers commented Aug 18, 2020

Rest assured, I was subscribed... It's not hard to imrpove this in many cases, as Andlon proved, but hard to prove it doesn't become worse in some other situations. A lot of benchmarking to do.

@workingjubilee workingjubilee added the C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such label Oct 8, 2023
@mdacach
Copy link

mdacach commented Nov 17, 2023

I was bitten by this in the context of purposefully trying to merge a smaller BTreeSet into a larger one1.
Neither the function name nor the documentation helped to diagnose the issue. I think that improving the documentation for append would already help significantly.

Footnotes

  1. "Small to Large" (😅) optimization: https://soi.ch/wiki/smaller-to-larger/

evanlinjin added a commit to vladimirfomene/bdk that referenced this issue Jan 7, 2024
The implementation of `BTreeMap::append` is non-performant making
merging changesets very slow. We use `Extend::extend` instead.

Refer to:
rust-lang/rust#34666 (comment)
evanlinjin added a commit to vladimirfomene/bdk that referenced this issue Jan 7, 2024
The implementation of `BTreeMap::append` is non-performant making
merging changesets very slow. We use `Extend::extend` instead.

Refer to:
rust-lang/rust#34666 (comment)
evanlinjin added a commit to vladimirfomene/bdk that referenced this issue Jan 10, 2024
The implementation of `BTreeMap::append` is non-performant making
merging changesets very slow. We use `Extend::extend` instead.

Refer to:
rust-lang/rust#34666 (comment)
evanlinjin added a commit to vladimirfomene/bdk that referenced this issue Jan 10, 2024
The implementation of `BTreeMap::append` is non-performant making
merging changesets very slow. We use `Extend::extend` instead.

Refer to:
rust-lang/rust#34666 (comment)
evanlinjin added a commit to vladimirfomene/bdk that referenced this issue Jan 10, 2024
The implementation of `BTreeMap::append` is non-performant making
merging changesets very slow. We use `Extend::extend` instead.

Refer to:
rust-lang/rust#34666 (comment)
evanlinjin added a commit to vladimirfomene/bdk that referenced this issue Jan 10, 2024
The implementation of `BTreeMap::append` is non-performant making
merging changesets very slow. We use `Extend::extend` instead.

Refer to:
rust-lang/rust#34666 (comment)
evanlinjin added a commit to vladimirfomene/bdk that referenced this issue Jan 10, 2024
The implementation of `BTreeMap::append` is non-performant making
merging changesets very slow. We use `Extend::extend` instead.

Refer to:
rust-lang/rust#34666 (comment)
evanlinjin added a commit to vladimirfomene/bdk that referenced this issue Jan 11, 2024
The implementation of `BTreeMap::append` is non-performant making
merging changesets very slow. We use `Extend::extend` instead.

Refer to:
rust-lang/rust#34666 (comment)
evanlinjin added a commit to vladimirfomene/bdk that referenced this issue Jan 11, 2024
The implementation of `BTreeMap::append` is non-performant making
merging changesets very slow. We use `Extend::extend` instead.

Refer to:
rust-lang/rust#34666 (comment)
evanlinjin added a commit to vladimirfomene/bdk that referenced this issue Jan 11, 2024
The implementation of `BTreeMap::append` is non-performant making
merging changesets very slow. We use `Extend::extend` instead.

Refer to:
rust-lang/rust#34666 (comment)
evanlinjin added a commit to vladimirfomene/bdk that referenced this issue Jan 11, 2024
The implementation of `BTreeMap::append` is non-performant making
merging changesets very slow. We use `Extend::extend` instead.

Refer to:
rust-lang/rust#34666 (comment)
evanlinjin added a commit to vladimirfomene/bdk that referenced this issue Jan 11, 2024
The implementation of `BTreeMap::append` is non-performant making
merging changesets very slow. We use `Extend::extend` instead.

Refer to:
rust-lang/rust#34666 (comment)
evanlinjin added a commit to evanlinjin/bdk that referenced this issue Jan 15, 2024
The implementation of `BTreeMap::append` is non-performant making
merging changesets very slow. We use `Extend::extend` instead.

Refer to:
rust-lang/rust#34666 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-collections Area: std::collections. C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

10 participants