-
-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial cut at using scroll+scroll_derive #18
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t know anything at all about semantic contents but otherwise looks good to me!
Also, I assume (like PE) the disk values are always little endian, right ?
src/msf/mod.rs
Outdated
use std::fmt; | ||
|
||
type PageNumber = u32; | ||
|
||
/// The PDB header as stored on disk. | ||
#[derive(Debug, Pread)] | ||
struct RawHeader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want this to be repr(C) or packed (I’m not sure how the disk format is, but I’ve noticed ms structs tend to be packed sometimes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that seems correct:
https://github.com/luser/dump_syms/blob/772f6d35664e2cec57e0fd4d4fde87514a642d0e/PDBHeaders.h#L33
Does that actually matter for scroll_derive
purposes, though? It looked like it just reads each field individually, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, unfortunately it can matter but I think it’s fine? It’s more important if you read sequences of structs so yea here shouldn’t matter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switching to scroll_derive
would be a good opportunity to massage pdb
structs towards their on-disk counterparts, especially in light of #16. #[repr(C)]
mirroring the upstream struct
s seems like the right direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just like to state for the record that I lost like 15 minutes trying to figure out why the szMagic
field in the Microsoft code is only defined as 30 bytes long, but the definition in all the other implementations I've seen is 32 bytes. (Spoiler: the Microsoft code does not pack its structs, so those extra 2 bytes are padding!)
Thanks for taking a look!
I'm not sure this is actually documented anywhere, but the code Microsoft has released for reading PDB files just reads structs directly from the file, so it must be little-endian by default. |
@luser Awesome! I will look through the code later tonight.
Right. As far as I know, the upstream code is the only documentation, and that code doesn't have any endian-switching. |
src/msf/mod.rs
Outdated
use std::fmt; | ||
|
||
type PageNumber = u32; | ||
|
||
/// The PDB header as stored on disk. | ||
#[derive(Debug, Pread)] | ||
struct RawHeader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to suggest renaming RawHeader
to match upstream, but then I remembered why that's a bad idea. Still, this particular header format is specific to BigMSF
, and I'd like to group those two together, either by naming or by tucking both together into a submodule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have read entirely too much of that Microsoft PDB source but I still almost lost it at PN32 mpspnpnSt
. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this in a follow-up commit to make the initial diff easier to read since moving things into a submodule changes the indentation as well.
src/msf/mod.rs
Outdated
@@ -108,7 +120,7 @@ impl<'s, S: Source<'s>> BigMSF<'s, S> { | |||
// yes, this is a stupid level of indirection | |||
let mut stream_table_page_list_page_list = PageList::new(header_object.page_size); | |||
for _ in 0..size_of_stream_table_page_list_in_pages { | |||
let n = header.parse_u32()?; | |||
let n = bytes.gread_with(offset, LE)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm… bytes.gread_with(offset, LE)
gets duplicated here a lot. More generally, it seems like it'll be a common pattern, particularly for replacing ParseBuffer
code. Can we encapsulate this somehow without losing clarity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about just rewriting ParseBuffer
to use gread_with
internally, which I guess would result in less code churn overall. We could give it a generic method like fn parse<T: TryFromCtx<...>>(&mut self) -> T
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ponders
Yeah, let's do that. ParseBuffer
really is just a (&[u8], usize)
tuple – bytes and offset, exactly what's expanded here. There are a couple cases where pread
might be more direct, but the vast majority of PDB work fits the gread
pattern exposed by ParseBuffer
.
Adding generic fn parse()
to ParseBuffer
would let us scroll
-ify data structure reads without losing that encapsulation. I like it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conveniently I started doing that shortly after writing that comment, and I have the patches ready to go. I'll push them in a minute.
* Change ParseBuffer to use scroll's `Pread::gread_with` and `Pread::pread_with` methods internally. * Use macros to define the now-redundant parse_T / peek_T functions. * Add a `ParseBuffer::parse::<T>` method that parses any type that has `#[derive(Pread)` * Add a `struct RawHeader` that defines the on-disk format of the Big MSF header and also has `#[derive(Pread)]`. * Change `BigMSF::new` to use `ParseBuffer::parse` for `RawHeader` instead of reading the individual fields. * Remove the no-longer-used dependency on byteorder.
…`small` submodule currently containing only the small MSF `MAGIC` constant.
Merged. Thanks! |
You're welcome! It should be straightforward to convert other parts of the library now: just define structs the the on-disk format that derive |
Hey! I got motivated to try out
scroll
so I started taking a crack at fixing #10.This patch only changes over the code in
src/msf/mod.rs
, so it's mostly a wash since that code does a lot of reading individualu32
s, but it does make reading the PDB header nicer. If you like how this looks I can go through and convert the rest of the crate's I/O to use scroll.One thought: since we're going to wind up with a bunch of structs that define the on-disk PDB format, should those go in their own module?
cc @m4b