Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache function attributes #152

Merged
merged 3 commits into from
Dec 12, 2019
Merged

Cache function attributes #152

merged 3 commits into from
Dec 12, 2019

Conversation

philipc
Copy link
Contributor

@philipc philipc commented Dec 6, 2019

I'm not sure about this change, and I'm inclined to not merge it for now. While it does show benefits for the backtrace-rs benchmarks, it doesn't for the addr2line benchmarks, and it significantly slows down the parsing and increases the memory usage.

addr2line benchmarks:
 name                                        before ns/iter  after ns/iter  diff ns/iter   diff %  speedup 
 context_new_and_query_location_rc           2,432,182       2,512,901            80,719    3.32%   x 0.97 
 context_new_and_query_location_slice        758,006         759,694               1,688    0.22%   x 1.00 
 context_new_and_query_with_functions_rc     2,959,278       3,156,481           197,203    6.66%   x 0.94 
 context_new_and_query_with_functions_slice  1,152,324       1,254,633           102,309    8.88%   x 0.92 
 context_new_parse_functions_rc              33,387,535      47,752,251       14,364,716   43.02%   x 0.70 
 context_new_parse_functions_slice           26,858,657      37,181,259       10,322,602   38.43%   x 0.72 
 context_new_parse_lines_rc                  11,711,061      11,454,843         -256,218   -2.19%   x 1.02 
 context_new_parse_lines_slice               7,283,802       7,022,589          -261,213   -3.59%   x 1.04 
 context_new_rc                              2,162,930       2,315,698           152,768    7.06%   x 0.93 
 context_new_slice                           603,892         611,642               7,750    1.28%   x 0.99 
 context_query_location_rc                   733,083         742,450               9,367    1.28%   x 0.99 
 context_query_location_slice                729,294         731,203               1,909    0.26%   x 1.00 
 context_query_with_functions_rc             1,952,886       1,882,215           -70,671   -3.62%   x 1.04 
 context_query_with_functions_slice          1,744,765       1,749,215             4,450    0.26%   x 1.00 

backtrace-rs benchmarks:
 name                                        before ns/iter  after ns/iter  diff ns/iter   diff %  speedup 
 new                                         27,013          22,207               -4,806  -17.79%   x 1.22 
 new_unresolved_and_resolve_separate         26,986          22,746               -4,240  -15.71%   x 1.19 
 trace_and_resolve_callback                  11,471          8,521                -2,950  -25.72%   x 1.35 
 trace_and_resolve_separate                  9,801           7,245                -2,556  -26.08%   x 1.35 

Memory usage for the test parse_functions_slice increased from 19.7MB to 24.0MB.

In order to lower the increase in memory usage, this PR also changes lines and columns to u32, which I think is fine because that's a reasonable limit and it's all backtrace-rs uses anyway. This change is why the lines benchmarks improved above, and it's probably worth merging this part at least.

cc @alexcrichton

@philipc philipc requested a review from fitzgen December 6, 2019 07:27
@coveralls
Copy link

coveralls commented Dec 6, 2019

Coverage Status

Coverage increased (+4.7%) to 81.688% when pulling b6d719b on philipc:function-cache into 2674396 on gimli-rs:master.

@alexcrichton
Copy link
Contributor

Thanks for this!

I suspect that the difference between the benchmarks here and in the backtrace crate may be due to usage? With caching it definitely makes sense that parsing is a bit slower (since it has to build a cache), but the backtrace crate's use case is that it's repeatedly using the same pre-parsed information which would easily benefit from faster caches.

In that sense perhaps landing this depends on the envisioned use cases for addr2line? If it's expected to just be "read once and move on" then this wouldn't be a great change to land, but if it's intended to be more backtrace-crate-like where you parse once and can read many times, then this may make more sense?

If there's various config options I don't mind tweaking integration in the backtrace crate as well!

@fitzgen
Copy link
Member

fitzgen commented Dec 6, 2019

(Have not looked at the code yet)

While it does show benefits for the backtrace-rs benchmarks, it doesn't for the addr2line benchmarks,

I've been thinking in the back of my mind that our addr2line benchmarks are measuring the wrong thing, or at least not a common use case (that backtrace-rs has).

The addr2line benchmarks, last I looked at them in detail, were symbolicating every address in a bunch of functions once, in order, and never re-symbolicating any of those addresses again.

That is something that some binary analysis tool or something might do, but the other end of the spectrum is a sampling profiler, where the young frames are changing at a fairly frequent rate, but the older frames are most often the same. So in this scenario we're repeatedly symbolicating the same functions, and caching should be a huge win.

I think the way that backtrace-rs is commonly used falls somewhere in between those two ends of the spectrum, but probably most often is used to capture a single backtrace for an error, and then that's it. This use case probably is best served by prioritizing a lower memory footprint over symbolication-of-addresses-we've-seen-before speed.

Although I also haven't looked at the benchmarks in backtrace-rs, and have no idea if they are representative of its common uses.

One more thought: if we need a fast, LRU associative cache, I happen to have one right here: https://github.com/fitzgen/associative-cache

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if, rather than having every file/line/column be cache-able in an Option, we had a fixed-size LRU cache in front of the symbolication (either in this crate, as a new struct/method, or in consumer crates). This would let us balance the memory overhead to whatever is appropriate for the given use case, and I think should be faster as well, since the do-we-have-it-cached-or-not check only happens once at the outer boundary, rather than in the middle of the symbolication loop. But also maybe I'm not reading closely enough to what is happening here (and I wouldn't say I'm super up to date with this code anymroe) so take what I am saying with a pinch of salt...

@philipc
Copy link
Contributor Author

philipc commented Dec 6, 2019

The addr2line benchmarks, last I looked at them in detail, were symbolicating every address in a bunch of functions once, in order, and never re-symbolicating any of those addresses again.

I've tried to isolate various parts of the code in different benchmarks. So for the benchmarks most relevant to this PR:

  • context_new_parse_lines_slice is parsing and caching only (of everything in .debug_line)
  • context_new_parse_functions_slice is parsing and caching only (of everything in .debug_info)
  • context_query_with_functions_slice is querying a number of addresses in order, but the parsing and caching for those addresses is already done outside the benchmark loop
  • context_new_and_query_with_functions_slice is parsing and querying 100 different addresses. This is meant to be a 'typical' use case that balances the costs of parsing and querying, but I'm not sure of the value of this benchmark.

The benchmarks in backtrace-rs are making a single trace (= a small number of addresses) while reusing the global cache, so this is effectively benchmarking querying only, with parsing being mostly ignored since it will only occur for the first iteration of the benchmark loop. I'm actually surprised that backtrace-rs saw an improvement but context_query_with_functions_slice did not, so I probably should look into that more.

I wonder if, rather than having every file/line/column be cache-able in an Option, we had a fixed-size LRU cache in front of the symbolication (either in this crate, as a new struct/method, or in consumer crates).

Maybe I misunderstand, but we don't really cache file/line/column in Options. Prior to this PR we never cache them, and after this PR the caching is at the compilation unit level. The Option being used for file/line/column only signifies the existence of these in the DWARF, but I changed that in this PR to reduce the memory size, since DWARF never uses 0. Maybe Option<NonZeroU32> would be better.

backtrace-rs already does have a simple LRU at the file level (MAPPINGS_CACHE). Do you mean something different from that?

As for intended usage of addr2line... I'm not using it anywhere, so I don't know :) Looking at crates.io, there is backtrace-rs and py-spy (a sampling profiler). I also know of not-perf (a sampling profiler).

@alexcrichton
Copy link
Contributor

FWIW I would definitely take the backtrace benchmarks with a grain of salt, they're basically the simplest ones you could imagine, "just call Backtrace::new" in a loop, and doesn't really reflect reality of symbolicating actually different stack traces in a program. It's a stress test which basically tries to "prove" you need an cache of some kind, but just want to be careful to remember it's a microbenchmark and isn't the sole source of truth!

@philipc
Copy link
Contributor Author

philipc commented Dec 11, 2019

I've figured out the problem with the addr2line benchmarks and updated this PR with the fix. The new benchmarks look better, still slower for initial parsing as expected, but much better for repeated queries:

 context_new_and_query_location_rc           2,483,019       2,446,546           -36,473   -1.47%   x 1.01 
 context_new_and_query_location_slice        764,594         747,887             -16,707   -2.19%   x 1.02 
 context_new_and_query_with_functions_rc     3,059,209       3,173,993           114,784    3.75%   x 0.96 
 context_new_and_query_with_functions_slice  1,159,546       1,246,534            86,988    7.50%   x 0.93 
 context_new_parse_functions_rc              33,674,783      48,453,053       14,778,270   43.89%   x 0.69 
 context_new_parse_functions_slice           26,536,580      37,605,990       11,069,410   41.71%   x 0.71 
 context_new_parse_lines_rc                  11,684,091      11,436,787         -247,304   -2.12%   x 1.02 
 context_new_parse_lines_slice               7,370,499       7,032,757          -337,742   -4.58%   x 1.05 
 context_new_rc                              2,213,470       2,323,758           110,288    4.98%   x 0.95 
 context_new_slice                           626,014         613,217             -12,797   -2.04%   x 1.02 
 context_query_location_rc                   732,357         722,450              -9,907   -1.35%   x 1.01 
 context_query_location_slice                720,975         734,235              13,260    1.84%   x 0.98 
 context_query_with_functions_rc             10,177,916      3,076,697        -7,101,219  -69.77%   x 3.31 
 context_query_with_functions_slice          7,793,662       2,590,330        -5,203,332  -66.76%   x 3.01 

@fitzgen I'm happier to merge this now, if you want to review?

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@philipc philipc merged commit dca4b90 into gimli-rs:master Dec 12, 2019
@philipc philipc deleted the function-cache branch December 12, 2019 01:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants