serializing FilePtr #4

kitlith · 2020-11-01T22:58:19Z

Here's what roblabla had to say (on discord):

actually with seek+write, fileptr seems doable with some global context. Here's the idea: when you get a FilePtr, record the current location, write some zeroes, and schedule a sort of "late serialization" pass. When the structure is fully serialized, all scheduled late serialization pass will run sequentially. That serialization pass will just serialize the underlying structure, seek to the previously saved location, and write the proper offset

that does mean you'd need some sort of heap allocation support to keep track of the serialization passes

but I can't think of a better way.

I had a similar idea awhile back (though I wasn't looking to implement at the library level) but i don't think this'll quite work for some types of files -- specifically those which break it into different sections, and different things go into different sections. i.e.:

struct File {
    info: Vec<Info>,
    data: DataSection
}

struct DataSection {
    // header...
    // data blocks go here
}

struct InfoSection {
    // header...
    list: Vec<Info>,
    // string blocks go here
}

struct Info {
    data_name: FilePtr32<String>, // points into InfoSection
    data_ptr: FilePtr32<Vec<u8>> // points into DataSection
}

under the scheme of the current idea i don't think we have a good way to represent this: we'd write out info section, info, info, info, data section, string, data, string, data, string, data instead of info section, info, info, info, string, string, string, data section, data, data, data. I think this is fixable by adding some sort of marker type Pool with some way of specifying a identifier such that you can do the same thing as the original idea, but for each pool in turn as you come across them while writing the file.

Are there any other methods we might want to consider?

The text was updated successfully, but these errors were encountered:

jam1garner · 2020-11-01T23:55:40Z

I'd say we might want to go the route of file offset calculations, leading to a BinWrite trait that looks something like...

trait BinWrite {
    type Args;

    fn write_options<W: binrw::io::Write>(&self, writer: &mut W, options: &WriterOption, args: Self::Args, file_heap_pos: &mut u64) -> binrw::Result<()>;
    fn get_write_size(&self, options: &WriterOption, args: Self::Args) -> binrw::Result<u64>;
    fn write_file_heap_contents(&self, writer: &mut W, options: &WriterOption, args: Self::Args, file_heap_pos: &mut u64) -> binrw::Result<()>;
}

The general concept being:

The file_heap_pos starts at a value of relative_to + first_value.get_write_size() (that is typically just... immediately after the top-level BinWrite struct). So if I have a header of pointers of size 0x10, the file_heap_pos is initialized to 0x10.
The file_heap_pos is updated any time a pointer is written. So in a FilePtr<T>'s write_options implementation, you would write the current value of file_heap_pos and then increment it by inner.get_write_size().
After the write_options pass on the top-level struct, write_file_heap_contents is called on it (and then it recursively calls it on everything else), thus writing the file contents in the same order as before.

Thoughts?

Some known drawbacks:

this doesn't allow much room for deciding the layout of how things are allocated without a manually binwrite implementation. Not sure if there's really a great way to handle that though?
this should allow for alignment of file pointers, but we need to figure out the interface for that

docs: Fix typo

kitlith · 2021-04-03T07:14:07Z

fwiw my current take on this is that we should probably not provide an implementation of BinWrite for FilePtr for now and experiment out-of-tree for a bit. File formats are going to have different requirements for how things are positioned, and I don't think we have enough information at the moment to properly cover every file format.

what i think we should do instead is focus on having tools available such that people can implement a serialization scheme for their file pointers on top using a newtype, (or, heck, i guess serialize_with could be a thing) in a somewhat ergonomic fashion. To me, this means exposing some sort of user-expandable ReadOptions type thing that the user can stick some sort of mutable state (through rc?) in and perhaps implement the scheme that jam suggested in previous message, or some other scheme. Or maybe we do something else, idk.

once we experiment enough out of tree we can revisit actually including something in binrw proper. does that make sense?

jam1garner added a commit that referenced this issue Mar 12, 2021

Merge pull request #4 from timotree3/patch-1

3c34e9b

docs: Fix typo

kitlith mentioned this issue Apr 3, 2021

Extendible ReadOptions (and eventually equiv for BinWrite, maybe) #24

Open

Swiftb0y mentioned this issue Feb 25, 2022

anlz: Use binrw instead of nom to parse ANLZ*.DAT files Holzhaus/rekordcrate#47

Merged

Swiftb0y mentioned this issue Mar 9, 2022

Port pdb.rs from nom to binrw Holzhaus/rekordcrate#45

Closed

Holzhaus mentioned this issue Apr 8, 2022

Serialization Support for PDB files Holzhaus/rekordcrate#68

Open

csnover added the enhancement New feature or request label Sep 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

serializing FilePtr #4

serializing FilePtr #4

kitlith commented Nov 1, 2020

jam1garner commented Nov 1, 2020

kitlith commented Apr 3, 2021

serializing FilePtr #4

serializing FilePtr #4

Comments

kitlith commented Nov 1, 2020

jam1garner commented Nov 1, 2020

kitlith commented Apr 3, 2021