Cache function attributes #152

philipc · 2019-12-06T07:27:49Z

I'm not sure about this change, and I'm inclined to not merge it for now. While it does show benefits for the backtrace-rs benchmarks, it doesn't for the addr2line benchmarks, and it significantly slows down the parsing and increases the memory usage.

addr2line benchmarks:
 name                                        before ns/iter  after ns/iter  diff ns/iter   diff %  speedup 
 context_new_and_query_location_rc           2,432,182       2,512,901            80,719    3.32%   x 0.97 
 context_new_and_query_location_slice        758,006         759,694               1,688    0.22%   x 1.00 
 context_new_and_query_with_functions_rc     2,959,278       3,156,481           197,203    6.66%   x 0.94 
 context_new_and_query_with_functions_slice  1,152,324       1,254,633           102,309    8.88%   x 0.92 
 context_new_parse_functions_rc              33,387,535      47,752,251       14,364,716   43.02%   x 0.70 
 context_new_parse_functions_slice           26,858,657      37,181,259       10,322,602   38.43%   x 0.72 
 context_new_parse_lines_rc                  11,711,061      11,454,843         -256,218   -2.19%   x 1.02 
 context_new_parse_lines_slice               7,283,802       7,022,589          -261,213   -3.59%   x 1.04 
 context_new_rc                              2,162,930       2,315,698           152,768    7.06%   x 0.93 
 context_new_slice                           603,892         611,642               7,750    1.28%   x 0.99 
 context_query_location_rc                   733,083         742,450               9,367    1.28%   x 0.99 
 context_query_location_slice                729,294         731,203               1,909    0.26%   x 1.00 
 context_query_with_functions_rc             1,952,886       1,882,215           -70,671   -3.62%   x 1.04 
 context_query_with_functions_slice          1,744,765       1,749,215             4,450    0.26%   x 1.00 

backtrace-rs benchmarks:
 name                                        before ns/iter  after ns/iter  diff ns/iter   diff %  speedup 
 new                                         27,013          22,207               -4,806  -17.79%   x 1.22 
 new_unresolved_and_resolve_separate         26,986          22,746               -4,240  -15.71%   x 1.19 
 trace_and_resolve_callback                  11,471          8,521                -2,950  -25.72%   x 1.35 
 trace_and_resolve_separate                  9,801           7,245                -2,556  -26.08%   x 1.35

Memory usage for the test parse_functions_slice increased from 19.7MB to 24.0MB.

In order to lower the increase in memory usage, this PR also changes lines and columns to u32, which I think is fine because that's a reasonable limit and it's all backtrace-rs uses anyway. This change is why the lines benchmarks improved above, and it's probably worth merging this part at least.

cc @alexcrichton

coveralls · 2019-12-06T07:52:00Z

Coverage increased (+4.7%) to 81.688% when pulling b6d719b on philipc:function-cache into 2674396 on gimli-rs:master.

alexcrichton · 2019-12-06T15:00:23Z

Thanks for this!

I suspect that the difference between the benchmarks here and in the backtrace crate may be due to usage? With caching it definitely makes sense that parsing is a bit slower (since it has to build a cache), but the backtrace crate's use case is that it's repeatedly using the same pre-parsed information which would easily benefit from faster caches.

In that sense perhaps landing this depends on the envisioned use cases for addr2line? If it's expected to just be "read once and move on" then this wouldn't be a great change to land, but if it's intended to be more backtrace-crate-like where you parse once and can read many times, then this may make more sense?

If there's various config options I don't mind tweaking integration in the backtrace crate as well!

fitzgen · 2019-12-06T18:34:10Z

(Have not looked at the code yet)

While it does show benefits for the backtrace-rs benchmarks, it doesn't for the addr2line benchmarks,

I've been thinking in the back of my mind that our addr2line benchmarks are measuring the wrong thing, or at least not a common use case (that backtrace-rs has).

The addr2line benchmarks, last I looked at them in detail, were symbolicating every address in a bunch of functions once, in order, and never re-symbolicating any of those addresses again.

That is something that some binary analysis tool or something might do, but the other end of the spectrum is a sampling profiler, where the young frames are changing at a fairly frequent rate, but the older frames are most often the same. So in this scenario we're repeatedly symbolicating the same functions, and caching should be a huge win.

I think the way that backtrace-rs is commonly used falls somewhere in between those two ends of the spectrum, but probably most often is used to capture a single backtrace for an error, and then that's it. This use case probably is best served by prioritizing a lower memory footprint over symbolication-of-addresses-we've-seen-before speed.

Although I also haven't looked at the benchmarks in backtrace-rs, and have no idea if they are representative of its common uses.

One more thought: if we need a fast, LRU associative cache, I happen to have one right here: https://github.com/fitzgen/associative-cache

fitzgen

I wonder if, rather than having every file/line/column be cache-able in an Option, we had a fixed-size LRU cache in front of the symbolication (either in this crate, as a new struct/method, or in consumer crates). This would let us balance the memory overhead to whatever is appropriate for the given use case, and I think should be faster as well, since the do-we-have-it-cached-or-not check only happens once at the outer boundary, rather than in the middle of the symbolication loop. But also maybe I'm not reading closely enough to what is happening here (and I wouldn't say I'm super up to date with this code anymroe) so take what I am saying with a pinch of salt...

philipc · 2019-12-06T21:40:06Z

The addr2line benchmarks, last I looked at them in detail, were symbolicating every address in a bunch of functions once, in order, and never re-symbolicating any of those addresses again.

I've tried to isolate various parts of the code in different benchmarks. So for the benchmarks most relevant to this PR:

context_new_parse_lines_slice is parsing and caching only (of everything in .debug_line)
context_new_parse_functions_slice is parsing and caching only (of everything in .debug_info)
context_query_with_functions_slice is querying a number of addresses in order, but the parsing and caching for those addresses is already done outside the benchmark loop
context_new_and_query_with_functions_slice is parsing and querying 100 different addresses. This is meant to be a 'typical' use case that balances the costs of parsing and querying, but I'm not sure of the value of this benchmark.

The benchmarks in backtrace-rs are making a single trace (= a small number of addresses) while reusing the global cache, so this is effectively benchmarking querying only, with parsing being mostly ignored since it will only occur for the first iteration of the benchmark loop. I'm actually surprised that backtrace-rs saw an improvement but context_query_with_functions_slice did not, so I probably should look into that more.

I wonder if, rather than having every file/line/column be cache-able in an Option, we had a fixed-size LRU cache in front of the symbolication (either in this crate, as a new struct/method, or in consumer crates).

Maybe I misunderstand, but we don't really cache file/line/column in Options. Prior to this PR we never cache them, and after this PR the caching is at the compilation unit level. The Option being used for file/line/column only signifies the existence of these in the DWARF, but I changed that in this PR to reduce the memory size, since DWARF never uses 0. Maybe Option<NonZeroU32> would be better.

backtrace-rs already does have a simple LRU at the file level (MAPPINGS_CACHE). Do you mean something different from that?

As for intended usage of addr2line... I'm not using it anywhere, so I don't know :) Looking at crates.io, there is backtrace-rs and py-spy (a sampling profiler). I also know of not-perf (a sampling profiler).

alexcrichton · 2019-12-09T17:28:47Z

FWIW I would definitely take the backtrace benchmarks with a grain of salt, they're basically the simplest ones you could imagine, "just call Backtrace::new" in a loop, and doesn't really reflect reality of symbolicating actually different stack traces in a program. It's a stress test which basically tries to "prove" you need an cache of some kind, but just want to be careful to remember it's a microbenchmark and isn't the sole source of truth!

Reduces memory usage.

philipc · 2019-12-11T08:21:02Z

I've figured out the problem with the addr2line benchmarks and updated this PR with the fix. The new benchmarks look better, still slower for initial parsing as expected, but much better for repeated queries:

 context_new_and_query_location_rc           2,483,019       2,446,546           -36,473   -1.47%   x 1.01 
 context_new_and_query_location_slice        764,594         747,887             -16,707   -2.19%   x 1.02 
 context_new_and_query_with_functions_rc     3,059,209       3,173,993           114,784    3.75%   x 0.96 
 context_new_and_query_with_functions_slice  1,159,546       1,246,534            86,988    7.50%   x 0.93 
 context_new_parse_functions_rc              33,674,783      48,453,053       14,778,270   43.89%   x 0.69 
 context_new_parse_functions_slice           26,536,580      37,605,990       11,069,410   41.71%   x 0.71 
 context_new_parse_lines_rc                  11,684,091      11,436,787         -247,304   -2.12%   x 1.02 
 context_new_parse_lines_slice               7,370,499       7,032,757          -337,742   -4.58%   x 1.05 
 context_new_rc                              2,213,470       2,323,758           110,288    4.98%   x 0.95 
 context_new_slice                           626,014         613,217             -12,797   -2.04%   x 1.02 
 context_query_location_rc                   732,357         722,450              -9,907   -1.35%   x 1.01 
 context_query_location_slice                720,975         734,235              13,260    1.84%   x 0.98 
 context_query_with_functions_rc             10,177,916      3,076,697        -7,101,219  -69.77%   x 3.31 
 context_query_with_functions_slice          7,793,662       2,590,330        -5,203,332  -66.76%   x 3.01

@fitzgen I'm happier to merge this now, if you want to review?

fitzgen

LGTM!

philipc requested a review from fitzgen December 6, 2019 07:27

fitzgen reviewed Dec 6, 2019

View reviewed changes

philipc added 3 commits December 11, 2019 17:48

Include frame iteration in benchmark

4bf948e

Use u32 for line/column

995490a

Reduces memory usage.

Cache function attributes

b6d719b

philipc force-pushed the function-cache branch from 8842c95 to b6d719b Compare December 11, 2019 08:15

fitzgen approved these changes Dec 11, 2019

View reviewed changes

philipc merged commit dca4b90 into gimli-rs:master Dec 12, 2019

philipc deleted the function-cache branch December 12, 2019 01:57

philipc mentioned this pull request Jan 11, 2020

Improve performance of gimli dwarf parsing rust-lang/backtrace-rs#232

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache function attributes #152

Cache function attributes #152

philipc commented Dec 6, 2019

coveralls commented Dec 6, 2019 •

edited

Loading

alexcrichton commented Dec 6, 2019

fitzgen commented Dec 6, 2019

fitzgen left a comment

philipc commented Dec 6, 2019 •

edited

Loading

alexcrichton commented Dec 9, 2019

philipc commented Dec 11, 2019

fitzgen left a comment

Cache function attributes #152

Cache function attributes #152

Conversation

philipc commented Dec 6, 2019

coveralls commented Dec 6, 2019 • edited Loading

alexcrichton commented Dec 6, 2019

fitzgen commented Dec 6, 2019

fitzgen left a comment

Choose a reason for hiding this comment

philipc commented Dec 6, 2019 • edited Loading

alexcrichton commented Dec 9, 2019

philipc commented Dec 11, 2019

fitzgen left a comment

Choose a reason for hiding this comment

coveralls commented Dec 6, 2019 •

edited

Loading

philipc commented Dec 6, 2019 •

edited

Loading