Initial cut at using scroll+scroll_derive #18

luser · 2018-02-07T22:00:28Z

Hey! I got motivated to try out scroll so I started taking a crack at fixing #10.

This patch only changes over the code in src/msf/mod.rs, so it's mostly a wash since that code does a lot of reading individual u32s, but it does make reading the PDB header nicer. If you like how this looks I can go through and convert the rest of the crate's I/O to use scroll.

One thought: since we're going to wind up with a bunch of structs that define the on-disk PDB format, should those go in their own module?

cc @m4b

m4b

I don’t know anything at all about semantic contents but otherwise looks good to me!

Also, I assume (like PE) the disk values are always little endian, right ?

m4b · 2018-02-07T22:07:15Z

src/msf/mod.rs

 use std::fmt;

 type PageNumber = u32;

+/// The PDB header as stored on disk.
+#[derive(Debug, Pread)]
+struct RawHeader {


You may want this to be repr(C) or packed (I’m not sure how the disk format is, but I’ve noticed ms structs tend to be packed sometimes)

Yeah, that seems correct:
https://github.com/luser/dump_syms/blob/772f6d35664e2cec57e0fd4d4fde87514a642d0e/PDBHeaders.h#L33

Does that actually matter for scroll_derive purposes, though? It looked like it just reads each field individually, right?

Yes, unfortunately it can matter but I think it’s fine? It’s more important if you read sequences of structs so yea here shouldn’t matter

Switching to scroll_derive would be a good opportunity to massage pdb structs towards their on-disk counterparts, especially in light of #16. #[repr(C)] mirroring the upstream structs seems like the right direction.

I'd just like to state for the record that I lost like 15 minutes trying to figure out why the szMagic field in the Microsoft code is only defined as 30 bytes long, but the definition in all the other implementations I've seen is 32 bytes. (Spoiler: the Microsoft code does not pack its structs, so those extra 2 bytes are padding!)

luser · 2018-02-07T23:44:38Z

I don’t know anything at all about semantic contents but otherwise looks good to me!

Thanks for taking a look!

Also, I assume (like PE) the disk values are always little endian, right ?

I'm not sure this is actually documented anywhere, but the code Microsoft has released for reading PDB files just reads structs directly from the file, so it must be little-endian by default.

willglynn · 2018-02-07T23:53:25Z

@luser Awesome! I will look through the code later tonight.

Also, I assume (like PE) the disk values are always little endian, right ?

I'm not sure this is actually documented anywhere, but the code Microsoft has released for reading PDB files just reads structs directly from the file, so it must be little-endian by default.

Right. As far as I know, the upstream code is the only documentation, and that code doesn't have any endian-switching. pdb::ParseBuffer is 100% LE, so I'm happy to continue baking that assumption into this code.

willglynn · 2018-02-08T03:48:07Z

src/msf/mod.rs

 use std::fmt;

 type PageNumber = u32;

+/// The PDB header as stored on disk.
+#[derive(Debug, Pread)]
+struct RawHeader {


I was going to suggest renaming RawHeader to match upstream, but then I remembered why that's a bad idea. Still, this particular header format is specific to BigMSF, and I'd like to group those two together, either by naming or by tucking both together into a submodule.

I have read entirely too much of that Microsoft PDB source but I still almost lost it at PN32 mpspnpnSt. :)

I did this in a follow-up commit to make the initial diff easier to read since moving things into a submodule changes the indentation as well.

willglynn · 2018-02-08T03:54:11Z

src/msf/mod.rs

@@ -108,7 +120,7 @@ impl<'s, S: Source<'s>> BigMSF<'s, S> {
        // yes, this is a stupid level of indirection
        let mut stream_table_page_list_page_list = PageList::new(header_object.page_size);
        for _ in 0..size_of_stream_table_page_list_in_pages {
-            let n = header.parse_u32()?;
+            let n = bytes.gread_with(offset, LE)?;


Hmm… bytes.gread_with(offset, LE) gets duplicated here a lot. More generally, it seems like it'll be a common pattern, particularly for replacing ParseBuffer code. Can we encapsulate this somehow without losing clarity?

I thought about just rewriting ParseBuffer to use gread_with internally, which I guess would result in less code churn overall. We could give it a generic method like fn parse<T: TryFromCtx<...>>(&mut self) -> T

ponders

Yeah, let's do that. ParseBuffer really is just a (&[u8], usize) tuple – bytes and offset, exactly what's expanded here. There are a couple cases where pread might be more direct, but the vast majority of PDB work fits the gread pattern exposed by ParseBuffer.

Adding generic fn parse() to ParseBuffer would let us scroll-ify data structure reads without losing that encapsulation. I like it.

Conveniently I started doing that shortly after writing that comment, and I have the patches ready to go. I'll push them in a minute.

* Change ParseBuffer to use scroll's `Pread::gread_with` and `Pread::pread_with` methods internally. * Use macros to define the now-redundant parse_T / peek_T functions. * Add a `ParseBuffer::parse::<T>` method that parses any type that has `#[derive(Pread)` * Add a `struct RawHeader` that defines the on-disk format of the Big MSF header and also has `#[derive(Pread)]`. * Change `BigMSF::new` to use `ParseBuffer::parse` for `RawHeader` instead of reading the individual fields. * Remove the no-longer-used dependency on byteorder.

…`small` submodule currently containing only the small MSF `MAGIC` constant.

willglynn · 2018-02-08T17:48:31Z

Merged. Thanks!

luser · 2018-02-08T19:15:57Z

You're welcome! It should be straightforward to convert other parts of the library now: just define structs the the on-disk format that derive Pread, and use ParseBuffer::read on them.

m4b reviewed Feb 7, 2018

View reviewed changes

willglynn requested changes Feb 8, 2018

View reviewed changes

luser added 2 commits February 8, 2018 11:40

Move BigMSF and its related bits into a big submodule, and add a …

f87642d

…`small` submodule currently containing only the small MSF `MAGIC` constant.

luser force-pushed the scroll branch from 0664176 to f87642d Compare February 8, 2018 16:47

willglynn merged commit a7cb171 into getsentry:master Feb 8, 2018

luser deleted the scroll branch February 8, 2018 19:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial cut at using scroll+scroll_derive #18

Initial cut at using scroll+scroll_derive #18

luser commented Feb 7, 2018

m4b left a comment

m4b Feb 7, 2018

luser Feb 7, 2018

m4b Feb 8, 2018

willglynn Feb 8, 2018

luser Feb 8, 2018

luser commented Feb 7, 2018

willglynn commented Feb 7, 2018 •

edited

Loading

willglynn Feb 8, 2018

luser Feb 8, 2018

luser Feb 8, 2018

willglynn Feb 8, 2018

luser Feb 8, 2018

willglynn Feb 8, 2018

luser Feb 8, 2018

willglynn commented Feb 8, 2018

luser commented Feb 8, 2018

Initial cut at using scroll+scroll_derive #18

Initial cut at using scroll+scroll_derive #18

Conversation

luser commented Feb 7, 2018

m4b left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luser commented Feb 7, 2018

willglynn commented Feb 7, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

willglynn commented Feb 8, 2018

luser commented Feb 8, 2018

willglynn commented Feb 7, 2018 •

edited

Loading