Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Section offsets are 1-indexed #124

Open
jan-auer opened this issue Jun 2, 2022 · 4 comments
Open

Section offsets are 1-indexed #124

jan-auer opened this issue Jun 2, 2022 · 4 comments
Milestone

Comments

@jan-auer
Copy link
Member

jan-auer commented Jun 2, 2022

#93 improved module references by using optional 0-based usize indexes instead of requiring to subtract 1 to resolve modules. In #93 (comment), @mstange points out that same is true for PdbInternalSectionOffset::section:

it was a section index, not a module index. sections.get((section_index - 1) as usize)

The most common usage for section offsets so far was to convert them to RVAs via an OMAP. Internally, this ran the checked sub (going back to #87):

https://github.com/willglynn/pdb/blob/e1b86a2c5c8e2c20dc0be9909baecaf1fdf0aa8a/src/omap.rs#L440-L441

However, since there are other direct uses of the section offset, it would be safer to make section: Option<usize> as well.

@jan-auer
Copy link
Member Author

jan-auer commented Jun 2, 2022

The most straight-forward refactor here would be to make the section field in SectionOffset and PdbInternalSectionOffset an Option<usize>. However, a slightly nicer API could be wrapping the entire struct into an option at every usage site. This is a far more invasive refactor and slightly harder to use internally, though it could be more semantic since it clearly shows that the reference -- to whatever -- is empty.

Would someone have preferences on the API here? To sum it up:

// Option 1:
pub struct PdbInternalSectionOffset {
    pub offset: u32,
    pub section: Option<usize>,
}

pub struct LabelSymbol<'t> {
    pub offset: PdbInternalSectionOffset,
    pub flags: ProcedureFlags,
    pub name: RawString<'t>,
}

// Option 2:
pub struct PdbInternalSectionOffset {
    pub offset: u32,
    pub section: usize,
}

pub struct LabelSymbol<'t> {
    pub offset: Option<PdbInternalSectionOffset>,
    pub flags: ProcedureFlags,
    pub name: RawString<'t>,
}

@Swatinem
Copy link
Member

Swatinem commented Jun 2, 2022

A third option would be…
Something that I explained today walking over our SymCache/SourceMapCache binary formats:

How about there wouldn’t be any kind of validation/transformation when reading these (internal)offsets from the file, but rather at the time you try to dereference them?

What does it mean if a PdbInternalSectionOffset has section: None? Does this mean "current section" or is it just impossible to dereference that offset?

@jan-auer
Copy link
Member Author

jan-auer commented Jun 2, 2022

How about there wouldn’t be any kind of validation/transformation when reading these (internal)offsets from the file, but rather at the time you try to dereference them?

That's the status quo. It can be error-prone at times, because from a u16 it's not clear whether it's zero or one indexed. This was also the primary motivation for #93.

Point granted, our structs would be diverging from the physical representation with such a change -- this makes me lean towards Option 1, still. It's a good compromise between a safe interface and staying truthful to the PDB format.

What does it mean if a PdbInternalSectionOffset has section: None? Does this mean "current section" or is it just impossible to dereference that offset?

The latter, it means the reference points into the void and should not be dereferenced, from what I can see. The PDB code always subtracts 1 to go (in their language) from an Ximod to an imod. Then, they have #define imodNil ((USHORT)(-1)) and bail if they see this value. This strongly suggests that an Ximod of zero means "nil reference".

@mstange
Copy link
Collaborator

mstange commented Jun 2, 2022

Neither option is particularly compelling, but I don't have a better suggestion either. I like/dislike both options about equally. I checked the pdb-addr2line code to see if either option would lead to simpler code, but I think it wouldn't make much of a difference. For example, first I thought that Option<PdbInternalSectionOffset> would make it easier to filter out symbols I don't care about without reaching into their offset.section property, but I need to access that property anyway because I want to check whether the section is executable (and filter out symbols for non-executable sections). And since I store PdbInternalSectionOffset in a bunch of places, I thought it might be nice to have the PdbInternalSectionOffset type guarantee that the section is non-nil, but this guarantee wouldn't give me much; for example, to_rva conversions can still fail for other reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants