Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various improvements for Ada #24

Merged
merged 26 commits into from
Mar 30, 2018
Merged

Various improvements for Ada #24

merged 26 commits into from
Mar 30, 2018

Conversation

pmderodat
Copy link
Contributor

Hello,

At @AdaCore, we have started to use cv2pdb to translate GCC’s DWARF output for Ada into PDB/CodeView. We found several bugs, which we tried to fix (for instance array size computation with non-zero low bound index), and also several kinds of types for which we could add translation (for instance enumeration types).

Do you think these changes can be merged? For the record, we tested them by hand on several Ada examples, and are currently trying to develop an automated testsuite, probably based on Microsoft’s CDB command-line debugger. However we haven’t checked the effect on D programs.

Thank you in advance for having a look! :-)

Copy link
Owner

@rainers rainers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, looks great.

Actually, the DWARF part is unlikely to be used much for D these days, as GDC lags behind and both DMD and LDC have native CodeView support.

It seems the DWARF conversion is used by a couple of other languages, too, so their users might be more affected than D users, but likely for the better :-)

src/cv2pdb.cpp Outdated

unsigned char* p = (unsigned char*) dfieldtype;
// Emit the enumerator value
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better just call write_numeric_leaf() here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I did not know write_numeric_leaf existed. :-) I’ll update this commit.

@@ -716,7 +751,7 @@ bool CV2PDB::addDWARFProc(DWARF_InfoData& procid, DWARF_CompilationUnit* cu, DIE
int off = 8;

DIECursor prev = cursor;
while (cursor.readNext(id, true) && id.tag == DW_TAG_formal_parameter)
while (cursor.readNext(id, true))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the loop and the result of prev. Was this a bug in the implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change (and the one right after) come from the fact that DWARF allows a DW_TAG_subprogram DIE to have the following sequence of children:

DW_TAG_subprogram
    DW_TAG_array_type
    DW_TAG_formal_parameter
    DW_TAG_variable
    DW_TAG_formal_parameter

My understanding is that the previous code assumed that the first children had to be DW_TAG_formal_parameter and only then the other children (none of which could be DW_TAG_formal_parameter again). So this change and the one below are a kind of generalization: we go through all children once to look for parameters, and then a second time for lexical blocks and their variables, so that we can handle “inerleaves” children.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable. Thanks for the explanation,

// In case of error, return plausible defaults
basetype = T_INT4;
lowerBound = currentDefaultLowerBound;
upperBound = 0;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should upperBound default to the same value as lowerBound or lowerBound-1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question: I’m not sure what makes most sense. I’ll go for lowerBound-1 so that by default we get an empty array, but maybe an array with one element would be useful for users facing bogous debug info… What do you think?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is a case of bogus debug information anyway, showing the first element of the array could be more helpful. So I'd propose upperBound = lowerBound.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I’ll update this part accordingly.

This will make it possible to process DIE's that are interleaved with
DW_TAG_formal_parameter ones.
0 is actually a valid .debug_ranges offset, so use something really
unlikely for the "no value" special constant instead.
This is a hack to workaround something that seems to be missing in
CodeView: lexical blocks with non-contiguous address ranges.
@pmderodat
Copy link
Contributor Author

I hope it’s for the better, indeed. ;-) Thank you for your review! I’ve just pushed modified commits following it. As I’ve done some work since yesterday, I also added a couple of commits on top of it.

I’m curious: do you know which languages/compilers have users that rely on cv2pdb? As for D, so if I understand correctly, DMD/LDC already generate CodeView info in various PE sections, and so in this context, cv2pdb’s only job is to extract this info to store it as a PDB file, right?

@pmderodat
Copy link
Contributor Author

(I just edited the last commit as more testing revealed a memory corruption… sorry about this! Hopefully automated testing will prevent this kind of mistake…)

@rainers
Copy link
Owner

rainers commented Mar 23, 2018

do you know which languages/compilers have users that rely on cv2pdb?

It seems used by a number of people with mingw and C++ and Haskell. IIRC it has also has been considered with Rust (before LLVM got CodeView support).

As for D, so if I understand correctly, DMD/LDC already generate CodeView info in various PE sections, and so in this context, cv2pdb’s only job is to extract this info to store it as a PDB file, right?

No, DMD and LDC can generate COFF object files with embedded CodeView information that the MS linker can extract and combine to generate the PDB file.
cv2pdb is still used for DMD's ancient additional tool chain based on OMF object files and the Digital Mars linker. dmd/optlink emit CodeView 4 debug info which is not well supported by current debuggers and needs to be converted to something more recent.

@pmderodat
Copy link
Contributor Author

That’s interesting. Thank you for the explanation!

This also makes room to get the index type information, but this is not
implemented yet.
This isolates the part of the method that gets a type ID for a primitive
type, so that it can be re-used elsewhere, in particular in enum
translation.
C allows some types like enums or structs to be anonymous. Process them
as if they had empty names.
This turns a linear lookup into a logarithmic binary search, which
improves a lot DWARF to PDB conversion for big programs.
This prevents the generation of corrupt TPI streams, as padding is
required at the end of leaves.
It seems that UDTs (User Defined Types) are required to have names,
otherwise the resulting PDB type stream is considered to be corrupted.
So just like what we do for structure types, provide a default type name
for enumeration types.
@pmderodat
Copy link
Contributor Author

Alright, so I’ve added a PDB corruption fix on top of the stack of fixes, plus the small fix (and then a typo fix…) mentionned in review. I think I’ll stop here, sorry if the back-and-forth complexified your review. :-)

@rainers
Copy link
Owner

rainers commented Mar 30, 2018

I see a nice improvement for enums in the little test case that I have, but can't really test it a lot as my small test suite already fails with current master, so I'm going to merge this.
I'll fix up a few signed/unsigned warnings regarding -1u after that changing it to ~0.

Thanks a lot for your contribution.

@rainers rainers closed this Mar 30, 2018
@rainers rainers reopened this Mar 30, 2018
@rainers rainers merged commit 74615ce into rainers:master Mar 30, 2018
@pmderodat
Copy link
Contributor Author

Great, thank you very much for your review and merge!

@rainers rainers mentioned this pull request Apr 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants