Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API for finding sequences within a line number program and resuming at sequence boundaries #179

Merged
merged 2 commits into from
Feb 4, 2017

Conversation

khuey
Copy link
Contributor

@khuey khuey commented Jan 26, 2017

Fixes #178.

Something like this perhaps? I haven't actually tried modifying addr2line to use it yet.

@khuey khuey changed the title Add API for finding segments within a line number program and resuming at segment boundaries Add API for finding sequences within a line number program and resuming at sequence boundaries Jan 26, 2017
fitzgen
fitzgen previously approved these changes Jan 26, 2017
Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kyle!

LGTM, would like to get @philipc's opinion as well.

src/line.rs Outdated
LineNumberProgramHeaderHolder::Complete(_) => {
// All the filenames are already in the header, we don't
// need to do anything (except have this branch to get
// rustc not to warn about an unused 'Complete').
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the commented out version because

  • If we add more variants (however unlikely) then the compiler will yell at us to match them
  • The comment is nice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment about rustc warning is actually wrong. But sure, I can be explicit here.

src/line.rs Outdated
/// ```
pub fn sequences(self)
-> parser::Result<(LineNumberProgramHeader<'input, Endian>,
Vec<LineNumberSequence<'input, Endian>>)> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to let the caller control allocation (maybe they know when they can reuse a vec?) by accepting an &mut Vec<LineNumberSequence<'input, Endian>> parameter, asserting it is initially empty, and then filling it?

I'm not actually 100% sure if this would actually work or not due to the 'input lifetime.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add something like a next_sequence method to StateMachine instead, so that the caller completely controls allocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only comment I haven't addressed yet. Having the caller iterate through the sequences runs the risk of producing LineNumberSequences that can't actually be used yet if the StateMachine has not run to completion.

An outparam Vec doesn't have that problem, but we'd have to be careful to clear the contents of the Vec if we return early due to a parser error condition.

@@ -881,7 +881,7 @@ pub enum AttributeValue<'input, Endian>
/// different compilation unit from the current one.
DebugInfoRef(DebugInfoOffset),

/// An offset into the `.debug_lines` section.
/// An offset into the `.debug_line` section.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

src/line.rs Outdated
/// ```
pub fn sequences(self)
-> parser::Result<(LineNumberProgramHeader<'input, Endian>,
Vec<LineNumberSequence<'input, Endian>>)> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add something like a next_sequence method to StateMachine instead, so that the caller completely controls allocation.

.expect("should parse and execute the entire line number program");
assert!(sequences.len() > 0); // Should be at least one sequence.
for sequence in sequences {
let mut resumed = header.clone().resume_from(&sequence);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to avoid needing to clone the header to iterate the rows in a sequence. So the caller would need to somehow obtain a StateMachine that already has a Complete header, and resume on that instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be to change the Complete variant to only use a reference.

src/line.rs Outdated
// in a row.
start: sequence_start_addr.unwrap_or(0),
end: sequence_end_addr,
rest: state_machine.opcodes.clone(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the opcodes from after the sequence? Also, it would be nice to limit it to only the opcodes for the sequence (so reduce the size of the input buffer to cover only the sequence).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Er, yes, good catch.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 93.885% when pulling 4c86110 on khuey:sequences into 2a8d328 on gimli-rs:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 93.885% when pulling 4c86110 on khuey:sequences into 2a8d328 on gimli-rs:master.

@khuey
Copy link
Contributor Author

khuey commented Jan 27, 2017

Take a look at this one. I'm pretty happy with it, except that StateMachine::resume doesn't validate that the iterator has been run to completion. We could return an error there (a parser:Error seems a little goofy because its an API usage mistake and not a problem with the DWARF though). There's nothing that stops one from using a LineNumberSequence with the wrong StateMachine entirely though which is why I'm not too worried about it.

Copy link
Collaborator

@philipc philipc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one nit.

Regarding resume on an incomplete iterator, I think it's fine, but if you really wanted to do something then a debug_assert would be better than returning an error.

src/line.rs Outdated
self.input.len(),
other.input.as_ptr(),
other.input.len(),
unsafe { self.input.as_ptr().offset(self.input.len() as isize)});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this debug print.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, thanks.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.2%) to 93.906% when pulling 7609846 on khuey:sequences into 2a8d328 on gimli-rs:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.2%) to 93.902% when pulling ab29652 on khuey:sequences into 2a8d328 on gimli-rs:master.

@khuey
Copy link
Contributor Author

khuey commented Jan 27, 2017

I think I might want to switch the state machine to using references to the header though. As is you have to clone it to use it in a multithreaded environment, which means cloning the header. So don't merge this yet while I sleep on it :)

@khuey
Copy link
Contributor Author

khuey commented Jan 27, 2017

Ok, I ended up changing the LineNumberProgramHeaderHolder into a trait that is implemented by IncompleteLineNumberProgramHeaderHolder and CompleteLineNumberProgramHeaderHolder. StateMachine largely became StateMachineInner, which is parameterized over a LineNumberProgramHeaderHolder, and a new StateMachine and ResumableStateMachine provide the public facing APIs (the original and the resumable, respectively).

This allows me to make ResumableStateMachine implement Clone (because the underlying reference is immutable) while before we couldn't really do that. Now I can do addr2line lookups on multiple threads simultaneously from the sequence data because the sequence array is immutable and I can make a cheap clone of the ResumableStateMachine.

@fitzgen fitzgen dismissed their stale review January 27, 2017 22:49

want to take a new look

@philipc
Copy link
Collaborator

philipc commented Jan 28, 2017

I like the concept of a trait for add_file. We may want to take it further and add a file method to that trait too, although the current implementations don't need it. I think the trait and its implementations need a better names though.

I'm not sure about making StateMachineInner private. It's good to avoid exposing implementation details, but it's also good to keep the implementation simple. StateMachine and ResumableStateMachine could be type aliases instead of new-types. That would cut down on a bit of boilerplate. Open to opinions on that.

Changing StateMachine to use a mutable reference to the header is a step backwards; it makes #40 a problem again. I can't see a reason to need this change. If we really want this to be a reference then I prefer to make it a borrow, and store the mutable list of file names in IncompleteLineNumberProgramHeaderHolder instead.

src/line.rs Outdated
@@ -96,32 +97,102 @@ impl<'input, Endian> From<&'input [u8]> for DebugLine<'input, Endian>
}
}

/// Shutup
pub trait LineNumberProgramHeaderHolder<'input, Endian> : Deref<Target = LineNumberProgramHeader<'input, Endian>>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its generally frowned upon to Deref but also add methods. Generally a thing should be a magic pointer and only deref, or if it must also have its own methods, then it should use As[Mut]Ref instead of Deref. The reasoning is that code becomes harder to read with the "fake inheritence" that Deref provides, and method resolution is obscured as well.

So I would prefer:

pub trait LineNumberProgramHeaderHolder<'input, Endian> : AsRef<LineNumberProgramHeader<'input, Endian>>

src/line.rs Outdated
where Endian: Endianity
{
fn add_file(&mut self, _: FileEntry<'input>) {
// Nop
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps with debug assertions here?

src/line.rs Outdated
where Endian: Endianity
#[derive(Debug)]
struct StateMachineInner<'input, Holder, Endian>
where Holder: LineNumberProgramHeaderHolder<'input, Endian>, Endian: Endianity
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect rustfmt will put Endian: Endianity on a new line.

src/line.rs Outdated
self.row.registers.basic_block = false;
self.row.registers.prologue_end = false;
self.row.registers.epilogue_begin = false;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did rustfmt add this indentation??

src/line.rs Outdated
addr: &u64)
-> parser::Result<Option<(&LineNumberProgramHeader<'input, Endian>, &LineNumberRow)>> {
self.inner.run_to_address(addr)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since both StateMachine and ResumableStateMachine have {header, next_row, run_to_address} methods, it could be nice to define them in a trait and consolidate the documentation in one place.

@fitzgen
Copy link
Member

fitzgen commented Jan 29, 2017

I'm not sure about making StateMachineInner private. It's good to avoid exposing implementation details, but it's also good to keep the implementation simple. StateMachine and ResumableStateMachine could be type aliases instead of new-types. That would cut down on a bit of boilerplate. Open to opinions on that.

Either way works for me.

Changing StateMachine to use a mutable reference to the header is a step backwards; it makes #40 a problem again. I can't see a reason to need this change. If we really want this to be a reference then I prefer to make it a borrow, and store the mutable list of file names in IncompleteLineNumberProgramHeaderHolder instead.

+1

@khuey
Copy link
Contributor Author

khuey commented Feb 3, 2017

Using an owned header in one variant and immutable references in the other isn't feasible without using Rc/Arc. If we go back to an owned header then sequences has to take self instead of &mut self, and once we build the ResumableStateMachine we can't hand back out the completed header.

@philipc
Copy link
Collaborator

philipc commented Feb 3, 2017

Can you change sequences to take self and return CompleteLineNumberProgramHeaderHolder, and add a method to CompleteLineNumberProgramHeaderHolder to return a ResumableStateMachine?

@khuey
Copy link
Contributor Author

khuey commented Feb 3, 2017

Yeah, that's a good idea. I did essentially that, with some of the types renamed. Lmk what you think of this version.

Copy link
Collaborator

@philipc philipc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is looking great! Just a few small changes please.

src/line.rs Outdated
/// A `LineNumberProgramHolder` provides access to a `LineNumberProgramHeader` and
/// a way to add files to the files table if necessary. Gimli consumers should
/// never need to use or see this trait.
pub trait LineNumberProgramHolder<'input, Endian>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename this trait to LineNumberProgram.

src/line.rs Outdated
/// files defined by the line number program header to be added to it. Gimli consumers
/// should never need to use or see this trait.
#[derive(Debug)]
pub struct IncompleteLineNumberProgramHolder<'input, Endian>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this struct, it adds no benefit. Replace its usage with IncompleteLineNumberProgram.

src/line.rs Outdated
/// files defined by the line number program header to be added to it. Gimli consumers
/// should never need to use or see this trait.
#[derive(Debug, Clone)]
pub struct CompleteLineNumberProgramHolder<'program, 'input, Endian>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this struct, it adds no benefit. Replace its usage with &'program CompleteLineNumberProgram.

src/line.rs Outdated
pub struct StateMachine<'input, Endian>
where Endian: Endianity
#[derive(Debug, Clone)]
pub struct StateMachine<'input, Holder, Endian>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename Holder to Program.

src/line.rs Outdated
/// Construct a new `StateMachine` for executing line programs and
/// generating the line information matrix.
pub fn rows(self) -> OneShotStateMachine<'input, Endian> {
StateMachine::<IncompleteLineNumberProgramHolder<_>, _>::new(IncompleteLineNumberProgramHolder {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to OneShotStateMachine::new(self)

src/line.rs Outdated
-> ResumedStateMachine<'program, 'input, Endian> {
StateMachine::<CompleteLineNumberProgramHolder<_>, _>::resume(CompleteLineNumberProgramHolder {
program: self,
}, sequence)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to ResumedStateMachine::resume(self, sequence)

@khuey
Copy link
Contributor Author

khuey commented Feb 3, 2017

Review comments largely addressed. I'm not sure whether the LineNumberProgram trait is actually worth it at this point (instead of implementing Deref on LineNumberProgramHolder we could just implement a header() method).

@philipc
Copy link
Collaborator

philipc commented Feb 3, 2017

Why did you need to add the enum back in? That wasn't what I intended. The changes I requested should have only been a handful of lines changed. eg when you delete IncompleteLineNumberProgramHolder, just replace all its uses with IncompleteLineNumberProgram, including the impl of the LineNumberProgram trait. The LineNumberProgram trait should keep its add_file method.

@khuey
Copy link
Contributor Author

khuey commented Feb 3, 2017

I need to hold an IncompleteLineNumberProgram and a &'program CompleteLineNumberProgram. StateMachine needs to know which one to hold. We could do that with a type on the LineNumberProgram trait, but we also need to handle add_file. We can't pass &mut self to it in the CompleteLineNumberProgram case because we only have an immutable reference. Having an intermediate type of some sort solves both problems.

@philipc
Copy link
Collaborator

philipc commented Feb 3, 2017

But the point of the trait was to enable StateMachine to not care which it is. See philipc@cb924fc for what I intended.

@khuey
Copy link
Contributor Author

khuey commented Feb 3, 2017

Ah, I didn't realize you meant to implement LineNumberProgram on &'program CompleteLineNumberProgram. That's a neat trick.

Changes are folded in, and this should make benches build too.

LineNumberProgramHeader is now purely concerned with the header data (and file table).  Incomplete/CompleteLineNumberProgram are introduced to create StateMachines.  What was LineNumberProgramHeader::rows() is now IncompleteLineNumberProgram::rows().  IncompleteLineNumberProgram::sequences() produces a (CompleteLineNumberProgram, Vec<LineNumberSequence>).  And CompleteLineNumberProgram::resume_from(sequence) produces a state machine that iterates through one of those sequences.

StateMachine is parameterized over the LineNumberProgram trait, implemented by both IncompleteLineNumberProgram and &CompleteLineNumberProgram, depending on its origins.  Both implement Clone (which means StateMachine is still always cloneable) and implement add_file as needed.

Finally, DebugLines::header is now DebugLines::program and returns an IncompleteLineNumberProgram.
@philipc
Copy link
Collaborator

philipc commented Feb 3, 2017

@fitzgen Did you want to take another look before merging?

@fitzgen
Copy link
Member

fitzgen commented Feb 3, 2017

Nope, I've been following along without commenting (too many cooks in the kitchen), and it looks good to me. Most recent changes A+++

Ship it!

@khuey
Copy link
Contributor Author

khuey commented Feb 3, 2017

Wait for Travis before merging, I don't have nightly rustc installed so I don't know for sure that benches builds yet.

@khuey
Copy link
Contributor Author

khuey commented Feb 3, 2017

Well, Travis seems to be having a bad time today. benches did pass though on the PR.

@philipc philipc merged commit 10251f0 into gimli-rs:master Feb 4, 2017
@philipc
Copy link
Collaborator

philipc commented Feb 4, 2017

The Travis error is an unrelated error due to trying to parse a .d file as an ELF. Will fix that separately.

Thanks for the PR!

@khuey
Copy link
Contributor Author

khuey commented Feb 4, 2017

Thanks for the reviews!

@khuey khuey deleted the sequences branch March 5, 2017 00:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants