Various improvements for Ada #24

pmderodat · 2018-03-20T09:45:13Z

Hello,

At @AdaCore, we have started to use cv2pdb to translate GCC’s DWARF output for Ada into PDB/CodeView. We found several bugs, which we tried to fix (for instance array size computation with non-zero low bound index), and also several kinds of types for which we could add translation (for instance enumeration types).

Do you think these changes can be merged? For the record, we tested them by hand on several Ada examples, and are currently trying to develop an automated testsuite, probably based on Microsoft’s CDB command-line debugger. However we haven’t checked the effect on D programs.

Thank you in advance for having a look! :-)

rainers

Wow, looks great.

Actually, the DWARF part is unlikely to be used much for D these days, as GDC lags behind and both DMD and LDC have native CodeView support.

It seems the DWARF conversion is used by a couple of other languages, too, so their users might be more affected than D users, but likely for the better :-)

rainers · 2018-03-20T20:28:24Z

src/cv2pdb.cpp


-	unsigned char* p = (unsigned char*) dfieldtype;
+	// Emit the enumerator value


better just call write_numeric_leaf() here.

Nice, I did not know write_numeric_leaf existed. :-) I’ll update this commit.

rainers · 2018-03-20T20:33:19Z

src/dwarf2pdb.cpp

@@ -716,7 +751,7 @@ bool CV2PDB::addDWARFProc(DWARF_InfoData& procid, DWARF_CompilationUnit* cu, DIE
 		int off = 8;

 		DIECursor prev = cursor;
-		while (cursor.readNext(id, true) && id.tag == DW_TAG_formal_parameter)
+		while (cursor.readNext(id, true))


This changes the loop and the result of prev. Was this a bug in the implementation?

This change (and the one right after) come from the fact that DWARF allows a DW_TAG_subprogram DIE to have the following sequence of children:

DW_TAG_subprogram DW_TAG_array_type DW_TAG_formal_parameter DW_TAG_variable DW_TAG_formal_parameter

My understanding is that the previous code assumed that the first children had to be DW_TAG_formal_parameter and only then the other children (none of which could be DW_TAG_formal_parameter again). So this change and the one below are a kind of generalization: we go through all children once to look for parameters, and then a second time for lexical blocks and their variables, so that we can handle “inerleaves” children.

Sounds reasonable. Thanks for the explanation,

rainers · 2018-03-20T20:35:33Z

src/dwarf2pdb.cpp

+	// In case of error, return plausible defaults
+	basetype = T_INT4;
+	lowerBound = currentDefaultLowerBound;
+	upperBound = 0;


Should upperBound default to the same value as lowerBound or lowerBound-1?

Good question: I’m not sure what makes most sense. I’ll go for lowerBound-1 so that by default we get an empty array, but maybe an array with one element would be useful for users facing bogous debug info… What do you think?

If it is a case of bogus debug information anyway, showing the first element of the array could be more helpful. So I'd propose upperBound = lowerBound.

Thank you! I’ll update this part accordingly.

This will make it possible to process DIE's that are interleaved with DW_TAG_formal_parameter ones.

0 is actually a valid .debug_ranges offset, so use something really unlikely for the "no value" special constant instead.

This is a hack to workaround something that seems to be missing in CodeView: lexical blocks with non-contiguous address ranges.

pmderodat · 2018-03-21T11:05:25Z

I hope it’s for the better, indeed. ;-) Thank you for your review! I’ve just pushed modified commits following it. As I’ve done some work since yesterday, I also added a couple of commits on top of it.

I’m curious: do you know which languages/compilers have users that rely on cv2pdb? As for D, so if I understand correctly, DMD/LDC already generate CodeView info in various PE sections, and so in this context, cv2pdb’s only job is to extract this info to store it as a PDB file, right?

pmderodat · 2018-03-21T15:48:19Z

(I just edited the last commit as more testing revealed a memory corruption… sorry about this! Hopefully automated testing will prevent this kind of mistake…)

rainers · 2018-03-23T07:40:46Z

do you know which languages/compilers have users that rely on cv2pdb?

It seems used by a number of people with mingw and C++ and Haskell. IIRC it has also has been considered with Rust (before LLVM got CodeView support).

As for D, so if I understand correctly, DMD/LDC already generate CodeView info in various PE sections, and so in this context, cv2pdb’s only job is to extract this info to store it as a PDB file, right?

No, DMD and LDC can generate COFF object files with embedded CodeView information that the MS linker can extract and combine to generate the PDB file.
cv2pdb is still used for DMD's ancient additional tool chain based on OMF object files and the Digital Mars linker. dmd/optlink emit CodeView 4 debug info which is not well supported by current debuggers and needs to be converted to something more recent.

pmderodat · 2018-03-23T09:10:28Z

That’s interesting. Thank you for the explanation!

This also makes room to get the index type information, but this is not implemented yet.

This isolates the part of the method that gets a type ID for a primitive type, so that it can be re-used elsewhere, in particular in enum translation.

…tion

C allows some types like enums or structs to be anonymous. Process them as if they had empty names.

This turns a linear lookup into a logarithmic binary search, which improves a lot DWARF to PDB conversion for big programs.

This prevents the generation of corrupt TPI streams, as padding is required at the end of leaves.

It seems that UDTs (User Defined Types) are required to have names, otherwise the resulting PDB type stream is considered to be corrupted. So just like what we do for structure types, provide a default type name for enumeration types.

pmderodat · 2018-03-23T18:08:06Z

Alright, so I’ve added a PDB corruption fix on top of the stack of fixes, plus the small fix (and then a typo fix…) mentionned in review. I think I’ll stop here, sorry if the back-and-forth complexified your review. :-)

rainers · 2018-03-30T07:51:59Z

I see a nice improvement for enums in the little test case that I have, but can't really test it a lot as my small test suite already fails with current master, so I'm going to merge this.
I'll fix up a few signed/unsigned warnings regarding -1u after that changing it to ~0.

Thanks a lot for your contribution.

pmderodat · 2018-03-30T08:37:19Z

Great, thank you very much for your review and merge!

rainers reviewed Mar 20, 2018

View reviewed changes

pmderodat added 5 commits March 21, 2018 11:55

CV2PDB::createTypes: handle out of order formal DIEs in subprograms

8861da3

CV2PDB::createTypes: after formals processing, reset cursor to beginning

9d9c686

This will make it possible to process DIE's that are interleaved with DW_TAG_formal_parameter ones.

DIECursor::readNext: use -1u for missing DW_AT_ranges attributes

6bf2602

0 is actually a valid .debug_ranges offset, so use something really unlikely for the "no value" special constant instead.

CV2PDB: store the current unit's base address

6c71972

CV2PDB::addDWARFProc: turn uncontiguous ranges into smallest cvring one

11b6db8

This is a hack to workaround something that seems to be missing in CodeView: lexical blocks with non-contiguous address ranges.

pmderodat force-pushed the master branch from 6896f24 to 2e93aa9 Compare March 21, 2018 11:00

pmderodat force-pushed the master branch from 2e93aa9 to ccbf8e3 Compare March 21, 2018 15:47

pmderodat force-pushed the master branch from ccbf8e3 to ba34538 Compare March 23, 2018 09:14

pmderodat added 17 commits March 23, 2018 19:02

CV2PDB::addDWARFArray: refactor to get lower bound info from DWARF

7e57a6e

This also makes room to get the index type information, but this is not implemented yet.

CV2PDB::getDWARFTypeSize: fix computation from lower/upper bounds

4bdbb50

DWARF_InfoData: track the DW_AT_language attribute

4a20baa

CV2PDB: keep track of the default lower bound for the curret unit

f9c5437

CV2PDB::getDWARFSubrangeInfo: use language-specific default lower bound

dca8f5d

CV2PDB::getDWARFSubrangeInfo: use an appropriate base type

dcf8249

CV2PDB::createTypes: materialize subranges as modifiers for base types

c31c964

CV2PDB::addDWARFEnum: new, first attempt at enum types translation

322fcc1

CV2PDB::addDWARFBasicType: split primitive type handling out

321b8b7

This isolates the part of the method that gets a type ID for a primitive type, so that it can be re-used elsewhere, in particular in enum translation.

CV2PDB::addDWARFEnumType: use getDWARFBasicType for base type transla…

bd4e49e

…tion

Ignore, but still allow block and string forms for DW_AT_const_value

5d21c12

Fix pasto: restore DW_TAG_subroutine_type handling as opaque type

290170c

Do not crash when handling anonymous entities

4b62d8f

C allows some types like enums or structs to be anonymous. Process them as if they had empty names.

Reduce complexity of best CFA lookups

460c6dd

This turns a linear lookup into a logarithmic binary search, which improves a lot DWARF to PDB conversion for big programs.

Fix handling of discontinuous address ranges on X64

96e74de

dwarflines: fix last insn. address computation for DW_LNE_end_sequence

922fdb8

Fix LF_ENUMERATE emission for values > 0x8000

e1129b2

pmderodat added 4 commits March 23, 2018 19:02

CV2PDB::appendModifierType: uncomment code to add padding

b54d04b

This prevents the generation of corrupt TPI streams, as padding is required at the end of leaves.

CV2PDB::addDWARFEnum: fix handling of big enumerated types

d3e00de

cv2pdb.h: add missing #include <stdint.h> to build with VS 12.0

f505bf6

pmderodat force-pushed the master branch from 3867395 to f505bf6 Compare March 23, 2018 18:05

rainers closed this Mar 30, 2018

rainers reopened this Mar 30, 2018

rainers merged commit 74615ce into rainers:master Mar 30, 2018

rainers mentioned this pull request Apr 19, 2018

debug gcc #30

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Various improvements for Ada #24

Various improvements for Ada #24

pmderodat commented Mar 20, 2018

rainers left a comment

rainers Mar 20, 2018

pmderodat Mar 21, 2018

rainers Mar 20, 2018

pmderodat Mar 21, 2018

rainers Mar 23, 2018

rainers Mar 20, 2018

pmderodat Mar 21, 2018

rainers Mar 23, 2018

pmderodat Mar 23, 2018

pmderodat commented Mar 21, 2018

pmderodat commented Mar 21, 2018

rainers commented Mar 23, 2018

pmderodat commented Mar 23, 2018

pmderodat commented Mar 23, 2018

rainers commented Mar 30, 2018

pmderodat commented Mar 30, 2018


		unsigned char* p = (unsigned char*) dfieldtype;
		// Emit the enumerator value

Various improvements for Ada #24

Various improvements for Ada #24

Conversation

pmderodat commented Mar 20, 2018

rainers left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pmderodat commented Mar 21, 2018

pmderodat commented Mar 21, 2018

rainers commented Mar 23, 2018

pmderodat commented Mar 23, 2018

pmderodat commented Mar 23, 2018

rainers commented Mar 30, 2018

pmderodat commented Mar 30, 2018