Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore replacing BTreeSet in IndexedMap with a sorted Vec #1992

Closed
wpaulino opened this issue Jan 26, 2023 · 9 comments · Fixed by #2040
Closed

Explore replacing BTreeSet in IndexedMap with a sorted Vec #1992

wpaulino opened this issue Jan 26, 2023 · 9 comments · Fixed by #2040
Labels
good first issue Good for newcomers

Comments

@wpaulino
Copy link
Contributor

In #1799, we replaced the use of BTreeMaps to store the network graph with IndexedMaps, which are just a HashMap with an additional BTreeSet that keeps the keys sorted. This yielded a ~30% speed improvement (on @TheBlueMatt's Xeon W-10885M with a SK hynix Gold P31) in pathfinding across the board.

We can likely do better again by replacing the BTreeSet with a sorted Vec.

@TheBlueMatt
Copy link
Collaborator

This won't impact the speed of the route-finding, but should reduce the time taken to sync the graph, load the graph from disk, and reduce memory fragmentation/usage.

@alecchendev
Copy link
Contributor

I can try and look into this! From what I understand, this would involve changing the keys type in IndexedMap to Vec and reimplementing associated functions to match the current functionality (mainly sorted keys iterable from a starting point)?

I noticed in a comment in IndexedMap it notes swapping for a sorted vec "that is only sorted on the first range() call" and was wondering the rationale behind this. I was thinking shouldn't it also keep sorted order upon inserts/removes?

@TheBlueMatt
Copy link
Collaborator

Well, keeping it strictly sorted is a lot of work. We only actually need a sort order while peers are fetching the graph from us, so keeping it sorted during deserialization is a lot of work that we don't need, and in general it may avoid doing a lot of sorting. Open to benchmarks that prove me wrong though :)

@alecchendev
Copy link
Contributor

Gotcha. Will play around with it 👍

@alecchendev
Copy link
Contributor

Hmm, from running the benchmarks locally, it seems like there is not much of a difference in performance. You were right about strictly sorting though - there was a decent regression in bench_reading_full_graph_from_file.

Main branch to compare with:

test chain::keysinterface::benches::bench_get_secure_random_bytes            ... bench:      34,403 ns/iter (+/- 7,553)
test ln::channelmanager::bench::bench_sends                                  ... bench:   5,109,735 ns/iter (+/- 263,913)
test routing::gossip::benches::read_network_graph                            ... bench: 232,054,708 ns/iter (+/- 11,851,338)
test routing::gossip::benches::write_network_graph                           ... bench:  44,764,954 ns/iter (+/- 2,124,544)
test routing::router::benches::generate_mpp_routes_with_probabilistic_scorer ... bench: 109,644,666 ns/iter (+/- 138,897,006)
test routing::router::benches::generate_mpp_routes_with_zero_penalty_scorer  ... bench:  85,266,474 ns/iter (+/- 59,511,759)
test routing::router::benches::generate_routes_with_probabilistic_scorer     ... bench:  35,258,812 ns/iter (+/- 7,460,698)
test routing::router::benches::generate_routes_with_zero_penalty_scorer      ... bench:  28,210,749 ns/iter (+/- 10,795,503)
test bench::bench_sends ... bench: 116,314,629 ns/iter (+/- 26,645,360)
test bench::bench_reading_full_graph_from_file ... bench: 990,017,525 ns/iter (+/- 34,608,096)

Replaced BTreeSet with Vec (sorted in IndexedMap::range) - see branch here:

test chain::keysinterface::benches::bench_get_secure_random_bytes            ... bench:      35,178 ns/iter (+/- 9,084)
test ln::channelmanager::bench::bench_sends                                  ... bench:   5,109,420 ns/iter (+/- 229,777)
test routing::gossip::benches::read_network_graph                            ... bench: 229,458,308 ns/iter (+/- 5,219,331)
test routing::gossip::benches::write_network_graph                           ... bench:  44,411,670 ns/iter (+/- 1,917,057)
test routing::router::benches::generate_mpp_routes_with_probabilistic_scorer ... bench: 107,560,166 ns/iter (+/- 24,541,342)
test routing::router::benches::generate_mpp_routes_with_zero_penalty_scorer  ... bench:  78,814,529 ns/iter (+/- 48,760,292)
test routing::router::benches::generate_routes_with_probabilistic_scorer     ... bench:  35,475,999 ns/iter (+/- 7,580,942)
test routing::router::benches::generate_routes_with_zero_penalty_scorer      ... bench:  24,938,262 ns/iter (+/- 12,819,234)
test bench::bench_sends ... bench: 162,190,758 ns/iter (+/- 49,382,600)
test bench::bench_reading_full_graph_from_file ... bench: 990,874,520 ns/iter (+/- 13,640,631)

Replaced BTreeSet with Vec (strictly sorted on inserts and removes) - see branch here:

test chain::keysinterface::benches::bench_get_secure_random_bytes            ... bench:      36,363 ns/iter (+/- 5,489)
test ln::channelmanager::bench::bench_sends                                  ... bench:   5,086,437 ns/iter (+/- 342,800)
test routing::gossip::benches::read_network_graph                            ... bench: 228,372,816 ns/iter (+/- 3,072,140)
test routing::gossip::benches::write_network_graph                           ... bench:  44,184,379 ns/iter (+/- 1,073,799)
test routing::router::benches::generate_mpp_routes_with_probabilistic_scorer ... bench: 106,936,799 ns/iter (+/- 13,438,643)
test routing::router::benches::generate_mpp_routes_with_zero_penalty_scorer  ... bench:  96,810,333 ns/iter (+/- 79,449,155)
test routing::router::benches::generate_routes_with_probabilistic_scorer     ... bench:  36,240,374 ns/iter (+/- 13,989,811)
test routing::router::benches::generate_routes_with_zero_penalty_scorer      ... bench:  27,308,558 ns/iter (+/- 16,067,959)
test bench::bench_sends ... bench: 118,100,891 ns/iter (+/- 14,743,729)
test bench::bench_reading_full_graph_from_file ... bench: 1,132,730,791 ns/iter (+/- 599,440,746)

Would it still be worth opening a PR for this for reduced memory fragmentation/usage? Also is there a recommended way to see/test that?

@TheBlueMatt
Copy link
Collaborator

I wouldn't expect the change to make any of our existing benchmarks (materially) faster, indeed (though it looks like both were very slightly faster on read_network_graph for you). Mostly just want to make sure we don't see a performance regression.

I'm not sure exactly the best way to test the memory overhead here, but you may just be able to basically run the read_network_graph bench once (ie not as a bench but as a test that runs once) then add a 60 second sleep and see what the process memory usage is (make sure to compile with release mode and run only that).

@alecchendev
Copy link
Contributor

Cool, I'll give that a try

@alecchendev
Copy link
Contributor

So with a brief test to run the read_network_graph bench once with a sleep at the end, comparing the main branch with my two branches, swapping in Vec seemed to lower the memory usage from ~94.5MB to ~93.8MB (similarly for the two implementations which I think makes sense). I found the memory usage from running the executables several times and using top and Activity Monitor on MacOS.

I was struggling to interpret this change, so I also ran the test without the line where it actually runs NetworkGraph::read and it's memory usage was about 92MB. So displacing this initial overhead, it seems the memory usage of solely reading the network graph decreases from ~2.5MB to ~1.8MB which seems pretty solid. Also just for fun I also looked at the memory usage for ldk-sample and with one channel and a couple payments it's usage was about 5MB.

This seems to me to be a reasonable improvement for a relatively small change - I can open a PR if others agree :)

@TheBlueMatt
Copy link
Collaborator

Sounds worth it to me! Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants