Padding between debug_str_offsets is not supported by DWARF verifier #81558

felipepiovezan · 2024-02-13T00:40:06Z

If we run the verifier with --all on llvm/test/DebugInfo/X86/dwarfdump-str-offsets.s,
it will claim the input is invalid because it adds padding between debug_str_offsets:

        .long .debug_str_offsets_segment0_end-.debug_str_offsets_base0+4
        .short 5    # DWARF version
        .short 0    # Padding
.debug_str_offsets_base0:
        .long str_producer
        .long str_CU1
        .long str_CU1_dir
        .long str_Subprogram
        .long str_Variable1
        .long str_Variable2
        .long str_Variable3
.debug_str_offsets_segment0_end:
# A 4-byte gap.
        .long 0

Now, if we interpret the DWARF spec literally, it doesn't talk about padding.
It really makes it sound like offset_of_dwarf_version + length must point to
the next contribution. Indeed, this is what the verifier expects (and why this
test fails to verify).

Dwarfdump, on the other hand, takes a different approach. It calls
collectContributionData(Units), which scans all compile units, extracts their
DW_AT_str_offsets_base and uses those to build a list of "this is where the
different debug_str_offset contributions start". It even "subtracts" the header
size from those to find the headers.

So one utility acknowledges (and prints!) those gaps because it uses an indirect way
to find where they are, but the other rejects them. How to reconcile these?

The text was updated successfully, but these errors were encountered:

llvmbot · 2024-02-13T00:40:22Z

@llvm/issue-subscribers-debuginfo

Author: Felipe de Azevedo Piovezan (felipepiovezan)

If we run the verifier with --all on `llvm/test/DebugInfo/X86/dwarfdump-str-offsets.s`, it will claim the input is invalid because it adds padding between debug_str_offsets:

        .long .debug_str_offsets_segment0_end-.debug_str_offsets_base0+4
        .short 5    # DWARF version
        .short 0    # Padding
.debug_str_offsets_base0:
        .long str_producer
        .long str_CU1
        .long str_CU1_dir
        .long str_Subprogram
        .long str_Variable1
        .long str_Variable2
        .long str_Variable3
.debug_str_offsets_segment0_end:
# A 4-byte gap.
        .long 0

Now, if we interpret the DWARF spec literally, it doesn't talk about padding.
It really makes it sound like offset_of_dwarf_version + length must point to
the next contribution. Indeed, this is what the verifier expects (and why this
test fails to verify).

Dwarfdump, on the other hand, takes a different approach. It calls
collectContributionData(Units), which scans all compile units, extracts their
DW_AT_str_offsets_base and uses those to build a list of "this is where the
different debug_str_offset contributions start". It even "subtracts" the header
size from those to find the headers.

So one utility acknowledges (and prints!) those gaps because it uses an indirect way
to find where they are, but the other rejects them. How to reconcile these?

The current behavior of --verify is that it only verifies debug_info, debug_abbrev and debug_names. This seems fairly arbitrary and might have been unintentional, as originally the absence of any section flags implied "all". This patch changes the behavior so that the verifier now verifies everything by default. It revealed two tests that had potentially invalid DWARF: 1. dwarfdump-str-offsets.s is adding padding between two debug_str_offset contributions. The standard does not explicitly allow this behavior. See issue llvm#81558 2. dwarf5-macro.test uses a checked-in binary that has invalid debug_str_offsets. One of its entries points to the _middle_ of the string section: error: .debug_str_offsets: contribution 0x0: index 0x4: invalid string offset *0x18 == 0x455D, is neither zero nor immediately following a null character If we look at the closest offset to 0x455D in debug_str: ``` 0x0000454e: "__SLONG32_TYPE int" ``` 0x455D points to "int".

llvmbot · 2024-02-13T01:42:34Z

@llvm/issue-subscribers-tools-llvm-dwarfdump

Author: Felipe de Azevedo Piovezan (felipepiovezan)

If we run the verifier with --all on `llvm/test/DebugInfo/X86/dwarfdump-str-offsets.s`, it will claim the input is invalid because it adds padding between debug_str_offsets:

        .long .debug_str_offsets_segment0_end-.debug_str_offsets_base0+4
        .short 5    # DWARF version
        .short 0    # Padding
.debug_str_offsets_base0:
        .long str_producer
        .long str_CU1
        .long str_CU1_dir
        .long str_Subprogram
        .long str_Variable1
        .long str_Variable2
        .long str_Variable3
.debug_str_offsets_segment0_end:
# A 4-byte gap.
        .long 0

Now, if we interpret the DWARF spec literally, it doesn't talk about padding.
It really makes it sound like offset_of_dwarf_version + length must point to
the next contribution. Indeed, this is what the verifier expects (and why this
test fails to verify).

Dwarfdump, on the other hand, takes a different approach. It calls
collectContributionData(Units), which scans all compile units, extracts their
DW_AT_str_offsets_base and uses those to build a list of "this is where the
different debug_str_offset contributions start". It even "subtracts" the header
size from those to find the headers.

So one utility acknowledges (and prints!) those gaps because it uses an indirect way
to find where they are, but the other rejects them. How to reconcile these?

dwblaikie · 2024-02-13T02:28:17Z

Eh - there's been some chatter in the DWARF committee (@pogo59 ) about documenting whether sections can contain gaps, or padding, and how to implement that padding - and Paul's mostly been advocating, I tihnk, for saying that you can't have arbitrary gaps in sections and you should use encoding-valid representations of padding (exactly what is is different in different sections).

Historically, padding has been allowed, somewhat incidentally (because you only need to read the .debug_info (and debug_aranges, and debug_pubnames maybe) sections directly, and everything else is read indirectly) - sections read indirectly naturally support unspecified gaps/padding/garbage because there's no reason for a reader to read those bytes unless it knows some secret contract that suggests it should read it.

Binutils dwarfdump has been pretty good about not dumping these gaps - llvm-dwarfdump less so, I think it does read some sections end-to-end even when they are only needed to be accessed indirectly. I'm surprised we do the right thing with debug_str_offsets - maybe due to the DWARFv4 DWARFv5 encoding changes in debug_str_offsets.dwo, not sure... shrug

I don't think it's super important either way, and I'm not sure my opinion here is the popular one, probably not, I'm fine with not allowing gaps - wouldn't especially mind if llvm-dwarfdump dumped the debug_str_offsets section without consulting an units, just parsing table after table.

felipepiovezan · 2024-02-13T16:20:17Z

Thank you for the background, that helps a lot!

because there's no reason for a reader to read those bytes unless it knows some secret contract that suggests it should read it.

Yup, this makes sense, I guess it is only the verifier / dump tools that are special in that way.

I don't think it's super important either way, and I'm not sure my opinion here is the popular one, probably not, I'm fine with not allowing gaps

No strong preferences either way from me too, but it would be nice to hear if there are any good use cases for padding.

jmorse · 2024-02-13T16:32:25Z

Drive-by comment -- something similar came up a long time ago to do with padding in .debug_loc, recorded now at #42635. The tl;dr is that GNU ld will dead-strip code and install holes in .debug_loc, that readelf/dwarfdump can cope with, but the LLVM tools can't. I think the conclusion at the time was that it's just not a section you can read without also reading .debug_info. Less relevant to this issue as DWARF5 uses .debug_loclists instead, but this is certainly an existing category of limitation of the LLVM tools.

The current behavior of --verify is that it only verifies debug_info, debug_abbrev and debug_names. This seems fairly arbitrary and might have been unintentional, as originally the absence of any section flags implied "all". This patch changes the behavior so that the verifier now verifies everything by default. It revealed two tests that had potentially invalid DWARF: 1. dwarfdump-str-offsets.s is adding padding between two debug_str_offset contributions. The standard does not explicitly allow this behavior. See issue #81558 2. dwarf5-macro.test uses a checked-in binary that has invalid debug_str_offsets. One of its entries points to the _middle_ of the string section: error: .debug_str_offsets: contribution 0x0: index 0x4: invalid string offset *0x18 == 0x455D, is neither zero nor immediately following a null character If we look at the closest offset to 0x455D in debug_str: ``` 0x0000454e: "__SLONG32_TYPE int" ``` 0x455D points to "int".

felipepiovezan added the debuginfo label Feb 13, 2024

felipepiovezan mentioned this issue Feb 13, 2024

[DWARFDump] Make --verify handle all sections by default #81559

Merged

EugeneZelenko added tools:llvm-dwarfdump and removed debuginfo labels Feb 13, 2024

felipepiovezan changed the title ~~Padding between debug_str_offsets is not support by DWARF verifier~~ Padding between debug_str_offsets is not supported by DWARF verifier Feb 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Padding between debug_str_offsets is not supported by DWARF verifier #81558

Padding between debug_str_offsets is not supported by DWARF verifier #81558

felipepiovezan commented Feb 13, 2024

llvmbot commented Feb 13, 2024

llvmbot commented Feb 13, 2024

dwblaikie commented Feb 13, 2024

felipepiovezan commented Feb 13, 2024

jmorse commented Feb 13, 2024

Padding between debug_str_offsets is not supported by DWARF verifier #81558

Padding between debug_str_offsets is not supported by DWARF verifier #81558

Comments

felipepiovezan commented Feb 13, 2024

llvmbot commented Feb 13, 2024

llvmbot commented Feb 13, 2024

dwblaikie commented Feb 13, 2024

felipepiovezan commented Feb 13, 2024

jmorse commented Feb 13, 2024