Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize EndianReader to use ptr/len #302

Merged
merged 1 commit into from
May 23, 2018

Conversation

philipc
Copy link
Collaborator

@philipc philipc commented May 22, 2018

This requires stable_deref_trait::CloneStableDeref for the bytes. If needed in future, we could relax that to stable_deref_trait::StableDeref by implementing a Clone for SubRange that recalculates the ptr.

This doesn't show any significant change in the gimli benchmarks, but I have modified addr2line to use Rc<Cow<[u8]>>, and it shows about 15% improvement for all operations in that (both context creation and lookup).

@philipc philipc requested a review from fitzgen May 22, 2018 06:58
@coveralls
Copy link

coveralls commented May 22, 2018

Coverage Status

Coverage increased (+0.002%) to 92.703% when pulling f4a43c2 on philipc:endian-reader-stablederef into a1c5bba on gimli-rs:master.

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! A couple nitpicks inline below. Thanks @philipc !

}
}
}

impl<Endian, T> EndianReader<Endian, T>
where
Endian: Endianity,
T: Deref<Target = [u8]> + Clone + Debug,
T: CloneStableDeref<Target = [u8]> + Clone + Debug,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need the Clone trait bound anymore for any of these, because CloneStableDeref has Clone as a super trait.

@@ -143,59 +149,77 @@ where
// `self.endian`. Splitting the sub-range out from the endian lets us work
// around this, making it so that only the `self.range` borrow is held active,
// not all of `self`.
//
// This also serves to encapsulate the unsafe code concerning `CloneStableDeref`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also comment about how we must keep a handle to bytes around since the ptr is only valid while we still hold the bytes (eg because holding the bytes holds a refcount for us).

r.range.end = r.range.start + idx.end;
r.range.start += idx.start;
assert!(r.range.start <= r.range.end);
assert!(r.range.end <= self.range.end);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is nice to have all this encapsulated.

@fitzgen
Copy link
Member

fitzgen commented May 22, 2018

This doesn't show any significant change in the gimli benchmarks, but I have modified addr2line to use Rc<Cow<[u8]>>, and it shows about 15% improvement for all operations in that (both context creation and lookup).

Interesting that our benches didn't show any speed up but the larger program did. I wonder if our benches are small enough that the caches are hiding some of the indirection costs for us here.

This requires `stable_deref_trait::CloneStableDeref` for the bytes. If
needed in future, we could relax that to `stable_deref_trait::StableDeref`
by implementing a `Clone` for `SubRange` that recalculates the `ptr`.

This doesn't show any significant change in the gimli benchmarks, but I
have modified `addr2line` to use `Rc<Cow<[u8]>>`, and it shows about 15%
improvement for all operations in that (both context creation and
lookup).
@philipc philipc force-pushed the endian-reader-stablederef branch from 8fc936e to f4a43c2 Compare May 23, 2018 05:39
@philipc
Copy link
Collaborator Author

philipc commented May 23, 2018

perf reports that a large chunk of the time in the benches is spent in a move and drop_in_place, and this didn't change due to this PR. This PR did make the code smaller and remove some slice bounds checks, but they weren't being reported as hot. That move and drop aren't present in the EndianSlice benchmark. I didn't try to see where the improvement comes from in addr2line, because its benchmarks aren't suitable for running under perf.

@philipc philipc merged commit 7da600f into gimli-rs:master May 23, 2018
@philipc philipc deleted the endian-reader-stablederef branch May 23, 2018 06:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants