Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add find_location_range #196

Merged
merged 1 commit into from
Nov 17, 2020
Merged

Add find_location_range #196

merged 1 commit into from
Nov 17, 2020

Conversation

jsgf
Copy link
Contributor

@jsgf jsgf commented Nov 13, 2020

I have a use-case where I want to get the source locations for a range of addresses. The simple approach is to just call Context::find_location on each address in the range - this works, but it is extremely slow since:

  1. it repeats all the setup cost for each byte
  2. the caller has no idea how much space each Location spans, so ends up redundantly probing the same locations over and over

This PR implements find_location_range which returns an iterator of all the locations in the range. It is much more efficient because it amortizes all the setup cost and directly returns the locations in the range. In addition to the Location, it also returns the address and length of the byte range the location describes.

It also implements find_location in terms of returning a single Location from a 1 byte range.

@jsgf jsgf force-pushed the master branch 2 times, most recently from 7549f55 to d2678d3 Compare November 13, 2020 10:04
@jsgf
Copy link
Contributor Author

jsgf commented Nov 14, 2020

Coverage check failure looks unrelated to this change.

Copy link
Contributor

@philipc philipc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Just letting you know that I'll probably bit a slow to properly review this, since it's a complex change. It'd be good if we could avoid this iterator complexity somehow, but I'm not sure if that will be possible. Also, it's a significant loss in the context_query_location_slice benchmark:

 context_query_location_rc                   892,787         1,459,657           566,870  63.49%   x 0.61 
 context_query_location_slice                898,126         1,452,600           554,474  61.74%   x 0.62 
 context_query_with_functions_rc             3,093,417       3,340,108           246,691   7.97%   x 0.93 
 context_query_with_functions_slice          2,763,489       2,912,667           149,178   5.40%   x 0.95 

src/lib.rs Outdated
})
.filter_map(move |i| {
// If this CU doesn't actually contain this address, move to the
// next CU.
if probe > i.range.end {
if probe_low > i.range.end || probe_high <= i.range.begin {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be probe_low >= i.range.end?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was replicating the original logic - does that mean the original should be probe >= i.range.end? (I suspect so.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I think that's correct, but I got probe_low <= i.max_end wrong a couple of lines above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A reason I commented is because this check and the one a couple of lines above need to be consistent, unless my understanding is wrong. They are consistent on master, but this PR made them inconsistent, and 6001cec changes them both again so they are still inconsistent. I think probe_low >= i.range.end is correct here, because i.range.end is exclusive (and i.max_end is exclusive too), meaning this was a bug on master. So you need to change the one a couple of lines above again to be probe_low < i.max_end.

@jsgf
Copy link
Contributor Author

jsgf commented Nov 15, 2020

@philipc:

It'd be good if we could avoid this iterator complexity somehow, but I'm not sure if that will be possible

In my initial implementation I just made it return impl Iterator<...> and use combinators - but I couldn't work out how to handle errors properly without eating them (albeit unlikely). So I made it return a concrete iterator type so that it could also implement FallibleIterator so errors were there for the use-cases where it matters (it doesn't particularly for mine).

But I wasn't sure what the conventions were for this library, since none of the other iterator-like return values seem to do this.

Iteration is important for me, since quite often I'll not consume the whole range, so there's a benefit to generating the values lazily.

Also, it's a significant loss in the context_query_location_slice benchmark:

I was wondering about that. The new code is definitely does more for the single lookup case. I was mostly focused on removing any code duplication, but I guess it could be factored in a way that allows the single lookup to have a fastpath.

@jsgf
Copy link
Contributor Author

jsgf commented Nov 16, 2020

I fixed the performance regression by quite a bit:

 context_query_location_rc           669,495          739,975                      70,480  10.53%   x 0.90 
 context_query_location_slice        665,773          723,207                      57,434   8.63%   x 0.92 
 context_query_with_functions_rc     2,533,924        2,747,176                   213,252   8.42%   x 0.92 
 context_query_with_functions_slice  2,265,792        2,366,429                   100,637   4.44%   x 0.96 

src/lib.rs Outdated

match unit_iter.next() {
Some(unit) => unit.find_location(probe, sections),
None => Ok(None),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change this function back to exactly how it was on master? It's important that we loop through the units here if required. Unfortunately we don't seem to have any tests that check this case...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right.

@philipc
Copy link
Contributor

philipc commented Nov 16, 2020

I fixed the performance regression by quite a bit:

Great! I guess that is mainly due to avoiding the allocation.

Can you see if it makes sense to apply the changes in philipc@de7aee3? I haven't tested these, but I think they simplify things a little.

@jsgf
Copy link
Contributor Author

jsgf commented Nov 17, 2020

OK I incorporated your simplifications with a tweak or two and fixed up the boundary check in find_units_range (I'm pretty sure it's correct now), and folded it all down to one change.

Bench comparison is now

 name                                bench.0 ns/iter  bench.1 ns/iter  diff ns/iter  diff %  speedup 
 context_query_location_rc           680,660          770,646                89,986  13.22%   x 0.88 
 context_query_location_slice        672,500          770,194                97,694  14.53%   x 0.87 
 context_query_with_functions_rc     2,618,079        2,678,960              60,881   2.33%   x 0.98 
 context_query_with_functions_slice  2,237,228        2,421,786             184,558   8.25%   x 0.92 

I think it very slightly regressed with the simplifications, but not enough to be confident.

This adds `find_location_range` which is the equivalent of
`find_location` except that it takes an lower and upper address bounds
and returns an iterator of all source locations found in that range.
@philipc
Copy link
Contributor

philipc commented Nov 17, 2020

I think it very slightly regressed with the simplifications, but not enough to be confident.

I think this is due to the correctness fix in Context::find_location.

Copy link
Contributor

@philipc philipc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@philipc philipc merged commit caa2932 into gimli-rs:master Nov 17, 2020
@philipc
Copy link
Contributor

philipc commented Nov 18, 2020

By the way, what application are you using this for such that you don't want to know the inlined function locations too?

@jsgf
Copy link
Contributor Author

jsgf commented Nov 19, 2020

Yeah I've been thinking about inlining. I'm using find_frames which is not too bad since I know the location boundaries but it would be nice to get better perf. For my use case I'm only interested in the top noninlined frame so I'm thinking it should return a double ended iterator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants