Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial cut at using scroll+scroll_derive #18

Merged
merged 2 commits into from
Feb 8, 2018

Conversation

luser
Copy link
Contributor

@luser luser commented Feb 7, 2018

Hey! I got motivated to try out scroll so I started taking a crack at fixing #10.

This patch only changes over the code in src/msf/mod.rs, so it's mostly a wash since that code does a lot of reading individual u32s, but it does make reading the PDB header nicer. If you like how this looks I can go through and convert the rest of the crate's I/O to use scroll.

One thought: since we're going to wind up with a bunch of structs that define the on-disk PDB format, should those go in their own module?

cc @m4b

Copy link

@m4b m4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t know anything at all about semantic contents but otherwise looks good to me!

Also, I assume (like PE) the disk values are always little endian, right ?

src/msf/mod.rs Outdated
use std::fmt;

type PageNumber = u32;

/// The PDB header as stored on disk.
#[derive(Debug, Pread)]
struct RawHeader {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want this to be repr(C) or packed (I’m not sure how the disk format is, but I’ve noticed ms structs tend to be packed sometimes)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that seems correct:
https://github.com/luser/dump_syms/blob/772f6d35664e2cec57e0fd4d4fde87514a642d0e/PDBHeaders.h#L33

Does that actually matter for scroll_derive purposes, though? It looked like it just reads each field individually, right?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, unfortunately it can matter but I think it’s fine? It’s more important if you read sequences of structs so yea here shouldn’t matter

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching to scroll_derive would be a good opportunity to massage pdb structs towards their on-disk counterparts, especially in light of #16. #[repr(C)] mirroring the upstream structs seems like the right direction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just like to state for the record that I lost like 15 minutes trying to figure out why the szMagic field in the Microsoft code is only defined as 30 bytes long, but the definition in all the other implementations I've seen is 32 bytes. (Spoiler: the Microsoft code does not pack its structs, so those extra 2 bytes are padding!)

@luser
Copy link
Contributor Author

luser commented Feb 7, 2018

I don’t know anything at all about semantic contents but otherwise looks good to me!

Thanks for taking a look!

Also, I assume (like PE) the disk values are always little endian, right ?

I'm not sure this is actually documented anywhere, but the code Microsoft has released for reading PDB files just reads structs directly from the file, so it must be little-endian by default.

@willglynn
Copy link
Collaborator

willglynn commented Feb 7, 2018

@luser Awesome! I will look through the code later tonight.

Also, I assume (like PE) the disk values are always little endian, right ?

I'm not sure this is actually documented anywhere, but the code Microsoft has released for reading PDB files just reads structs directly from the file, so it must be little-endian by default.

Right. As far as I know, the upstream code is the only documentation, and that code doesn't have any endian-switching. pdb::ParseBuffer is 100% LE, so I'm happy to continue baking that assumption into this code.

src/msf/mod.rs Outdated
use std::fmt;

type PageNumber = u32;

/// The PDB header as stored on disk.
#[derive(Debug, Pread)]
struct RawHeader {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to suggest renaming RawHeader to match upstream, but then I remembered why that's a bad idea. Still, this particular header format is specific to BigMSF, and I'd like to group those two together, either by naming or by tucking both together into a submodule.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have read entirely too much of that Microsoft PDB source but I still almost lost it at PN32 mpspnpnSt. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this in a follow-up commit to make the initial diff easier to read since moving things into a submodule changes the indentation as well.

src/msf/mod.rs Outdated
@@ -108,7 +120,7 @@ impl<'s, S: Source<'s>> BigMSF<'s, S> {
// yes, this is a stupid level of indirection
let mut stream_table_page_list_page_list = PageList::new(header_object.page_size);
for _ in 0..size_of_stream_table_page_list_in_pages {
let n = header.parse_u32()?;
let n = bytes.gread_with(offset, LE)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm… bytes.gread_with(offset, LE) gets duplicated here a lot. More generally, it seems like it'll be a common pattern, particularly for replacing ParseBuffer code. Can we encapsulate this somehow without losing clarity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about just rewriting ParseBuffer to use gread_with internally, which I guess would result in less code churn overall. We could give it a generic method like fn parse<T: TryFromCtx<...>>(&mut self) -> T

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ponders

Yeah, let's do that. ParseBuffer really is just a (&[u8], usize) tuple – bytes and offset, exactly what's expanded here. There are a couple cases where pread might be more direct, but the vast majority of PDB work fits the gread pattern exposed by ParseBuffer.

Adding generic fn parse() to ParseBuffer would let us scroll-ify data structure reads without losing that encapsulation. I like it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conveniently I started doing that shortly after writing that comment, and I have the patches ready to go. I'll push them in a minute.

* Change ParseBuffer to use scroll's `Pread::gread_with` and `Pread::pread_with`
  methods internally.
* Use macros to define the now-redundant parse_T / peek_T functions.
* Add a `ParseBuffer::parse::<T>` method that parses any type that
  has `#[derive(Pread)`
* Add a `struct RawHeader` that defines the on-disk format of the Big MSF
  header and also has `#[derive(Pread)]`.
* Change `BigMSF::new` to use `ParseBuffer::parse` for `RawHeader` instead
  of reading the individual fields.
* Remove the no-longer-used dependency on byteorder.
…`small`

submodule currently containing only the small MSF `MAGIC` constant.
@willglynn willglynn merged commit a7cb171 into getsentry:master Feb 8, 2018
@willglynn
Copy link
Collaborator

Merged. Thanks!

@luser luser deleted the scroll branch February 8, 2018 19:14
@luser
Copy link
Contributor Author

luser commented Feb 8, 2018

You're welcome! It should be straightforward to convert other parts of the library now: just define structs the the on-disk format that derive Pread, and use ParseBuffer::read on them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants