Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix lines for MASM symbols #68

Merged
merged 1 commit into from
Jun 2, 2022
Merged

Fix lines for MASM symbols #68

merged 1 commit into from
Jun 2, 2022

Conversation

jan-auer
Copy link
Member

@jan-auer jan-auer commented Oct 1, 2019

ProcedureSymbol records for MASM functions have code ranges associated that do not match the actual code or line records.

lines_at_offset assumed that if there is a lines subsection for a symbol, it shares the exact same start offset as the symbol. However, this is not the case for MASM functions, where the symbol's offset can lie after the actual start of the instructions / line records. Since the name of lines_at_offset was chosen badly, it is now renamed to lines_for_symbol, and explicitly states that it may return line records outside of the symbol's stated code range.

A consumer of this should probably collect the line records for each symbol and infer a proper code range.

Also, the internal implementation of lines_at_offset was slow in two regards: It had linear runtime complexity, and repeatedly had to iterate through large chunks of memory to find the appropriate section. LineProgram now collects and sorts all line sections ahead of time to perform faster binary search.

Fixes #61
Fixes #63

@jan-auer jan-auer self-assigned this Oct 1, 2019
@calixteman
Copy link
Contributor

Have a look on this file: https://github.com/mozilla/dump_syms/blob/master/test_data/basic-opt32.pdb, in particular for proc symbols located in d:\agent_work\3\s\src\vctools\crt\vcruntime\src\eh\i386\exsup4.asm.
There are: _local_unwind4, _unwind_handler4, _seh_longjmp_unwind4, _EH4_CallFilterFunc, _EH4_TransferToHandler, _EH4_GlobalUnwind2, _EH4_LocalUnwind:

[calixte@rouxpanda] ~/dev/mozilla/pdb$ cargo run --release --example pdb_lines ../dump_syms.calixteman/test_data/basic-opt32.pdb | rg '_local_unwind4|_unwind_handler4|_seh_longjmp_unwind4|_EH4_CallFilterFunc|_EH4_TransferToHandler| _EH4_GlobalUnwind2|_EH4_LocalUnwind' -A1
    Finished release [optimized] target(s) in 0.05s
     Running `target/release/examples/pdb_lines ../dump_syms.calixteman/test_data/basic-opt32.pdb`
+ @_EH4_CallFilterFunc@8
  0xaeb0 d:\agent\_work\3\s\src\vctools\crt\vcruntime\src\eh\i386\exsup4.asm:159::Some(0)
--
+ _local_unwind4
  0xaeb0 d:\agent\_work\3\s\src\vctools\crt\vcruntime\src\eh\i386\exsup4.asm:159::Some(0)
--
+ _seh_longjmp_unwind4
  0xaeb0 d:\agent\_work\3\s\src\vctools\crt\vcruntime\src\eh\i386\exsup4.asm:159::Some(0)
--
- _unwind_handler4
  0xaeb0 d:\agent\_work\3\s\src\vctools\crt\vcruntime\src\eh\i386\exsup4.asm:159::Some(0)
--
+ @_EH4_LocalUnwind@16
  0xaeb0 d:\agent\_work\3\s\src\vctools\crt\vcruntime\src\eh\i386\exsup4.asm:159::Some(0)
--
+ @_EH4_TransferToHandler@8
  0xaeb0 d:\agent\_work\3\s\src\vctools\crt\vcruntime\src\eh\i386\exsup4.asm:159::Some(0)

The lines for each symbol start at 0xaeb0 which is wrong.
So I think that the correct way to get the lines is to get the corresponding section and then get the line with the the correct offset to match the offset given in parameter.
But when getting lines iterator starting at correct line for _local_unwind4, we'll get some extra lines corresponding to _seh_longjmp_unwind4.
So I think lines_symbol should have a length parameter (which would be symbol length) to return an iterator on the lines where line offset is between given offset and given offset+length to be sure to get lines for the symbol.
But that means that line.length computation for the last line should be done with symbol length (and not next line address).

@calixteman
Copy link
Contributor

FYI, all the function findLines** from DIA sdk have a length argument:
https://docs.microsoft.com/en-us/visualstudio/debugger/debug-interface-access/idiasession-findlinesbyaddr?view=vs-2019
So I think it makes really sense to have the same here.

@jan-auer
Copy link
Member Author

jan-auer commented Oct 2, 2019

That's a good point. I was sort of assuming that the user of lines_at_offset might want to check manually, since it's quite easy to stop iterating. But with lines_for_symbol it makes much more sense to pass in the length as well.

Alright, let's update the tests to use the above functions and then see if we can get correct behavior. For reference, do you have the expected lines for the functions you pasted above?

Also, what if the given offset points into the middle of a line record? Do you expect the line record to be included or not? (i.e. is the first line covering the given offset or at or after the given offset)?

src/modi/mod.rs Outdated Show resolved Hide resolved
@calixteman
Copy link
Contributor

Yep the output for security_check_cookie is correct.
For the above functions: I get lines but as explained I get more lines than expected.
About offset in the middle of line record: I'd say it shouldn't happen but who knows...
So maybe return an Err in this case and let the user decides what to do in using lines for example or maybe something is wrong in the pdb...

impl fmt::Debug for C13LineIterator<'_> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.debug_struct("LineIterator")
.field("sections", &self.sections.as_slice())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jan-auer
Copy link
Member Author

This is still work in progress, unfortunately. I will have to rebase this on latest master, and then revisit the behavior based on:

For the above functions: I get lines but as explained I get more lines than expected.

Copy link
Member

@Swatinem Swatinem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @calixteman did point to a potential problem that should still be investigated. Is it possible to fuzz this somehow?

src/modi/c13.rs Outdated Show resolved Hide resolved
src/modi/mod.rs Outdated Show resolved Hide resolved
jan-auer added a commit that referenced this pull request May 20, 2022
@jan-auer
Copy link
Member Author

jan-auer commented Jun 2, 2022

All issues mentioned in this PR have been fixed, and we have tests for these cases now. Pending a final verification with @loewenheim, I'm going to merge this.

@jan-auer jan-auer merged commit 1419c8b into getsentry:master Jun 2, 2022
@jan-auer jan-auer deleted the fix/symbol-lines branch June 2, 2022 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot retrieve existing lines with lines_at_offset LineProgram::lines_at_offset is pretty slow
4 participants