-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Various improvements for Ada #24
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, looks great.
Actually, the DWARF part is unlikely to be used much for D these days, as GDC lags behind and both DMD and LDC have native CodeView support.
It seems the DWARF conversion is used by a couple of other languages, too, so their users might be more affected than D users, but likely for the better :-)
src/cv2pdb.cpp
Outdated
|
||
unsigned char* p = (unsigned char*) dfieldtype; | ||
// Emit the enumerator value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better just call write_numeric_leaf() here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I did not know write_numeric_leaf
existed. :-) I’ll update this commit.
@@ -716,7 +751,7 @@ bool CV2PDB::addDWARFProc(DWARF_InfoData& procid, DWARF_CompilationUnit* cu, DIE | |||
int off = 8; | |||
|
|||
DIECursor prev = cursor; | |||
while (cursor.readNext(id, true) && id.tag == DW_TAG_formal_parameter) | |||
while (cursor.readNext(id, true)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes the loop and the result of prev
. Was this a bug in the implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change (and the one right after) come from the fact that DWARF allows a DW_TAG_subprogram
DIE to have the following sequence of children:
DW_TAG_subprogram
DW_TAG_array_type
DW_TAG_formal_parameter
DW_TAG_variable
DW_TAG_formal_parameter
My understanding is that the previous code assumed that the first children had to be DW_TAG_formal_parameter
and only then the other children (none of which could be DW_TAG_formal_parameter again
). So this change and the one below are a kind of generalization: we go through all children once to look for parameters, and then a second time for lexical blocks and their variables, so that we can handle “inerleaves” children.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable. Thanks for the explanation,
src/dwarf2pdb.cpp
Outdated
// In case of error, return plausible defaults | ||
basetype = T_INT4; | ||
lowerBound = currentDefaultLowerBound; | ||
upperBound = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should upperBound
default to the same value as lowerBound
or lowerBound-1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question: I’m not sure what makes most sense. I’ll go for lowerBound-1
so that by default we get an empty array, but maybe an array with one element would be useful for users facing bogous debug info… What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is a case of bogus debug information anyway, showing the first element of the array could be more helpful. So I'd propose upperBound = lowerBound
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I’ll update this part accordingly.
This will make it possible to process DIE's that are interleaved with DW_TAG_formal_parameter ones.
0 is actually a valid .debug_ranges offset, so use something really unlikely for the "no value" special constant instead.
This is a hack to workaround something that seems to be missing in CodeView: lexical blocks with non-contiguous address ranges.
I hope it’s for the better, indeed. ;-) Thank you for your review! I’ve just pushed modified commits following it. As I’ve done some work since yesterday, I also added a couple of commits on top of it. I’m curious: do you know which languages/compilers have users that rely on cv2pdb? As for D, so if I understand correctly, DMD/LDC already generate CodeView info in various PE sections, and so in this context, cv2pdb’s only job is to extract this info to store it as a PDB file, right? |
(I just edited the last commit as more testing revealed a memory corruption… sorry about this! Hopefully automated testing will prevent this kind of mistake…) |
It seems used by a number of people with mingw and C++ and Haskell. IIRC it has also has been considered with Rust (before LLVM got CodeView support).
No, DMD and LDC can generate COFF object files with embedded CodeView information that the MS linker can extract and combine to generate the PDB file. |
That’s interesting. Thank you for the explanation! |
This also makes room to get the index type information, but this is not implemented yet.
This isolates the part of the method that gets a type ID for a primitive type, so that it can be re-used elsewhere, in particular in enum translation.
C allows some types like enums or structs to be anonymous. Process them as if they had empty names.
This turns a linear lookup into a logarithmic binary search, which improves a lot DWARF to PDB conversion for big programs.
This prevents the generation of corrupt TPI streams, as padding is required at the end of leaves.
It seems that UDTs (User Defined Types) are required to have names, otherwise the resulting PDB type stream is considered to be corrupted. So just like what we do for structure types, provide a default type name for enumeration types.
Alright, so I’ve added a PDB corruption fix on top of the stack of fixes, plus the small fix (and then a typo fix…) mentionned in review. I think I’ll stop here, sorry if the back-and-forth complexified your review. :-) |
I see a nice improvement for enums in the little test case that I have, but can't really test it a lot as my small test suite already fails with current master, so I'm going to merge this. Thanks a lot for your contribution. |
Great, thank you very much for your review and merge! |
Hello,
At @AdaCore, we have started to use cv2pdb to translate GCC’s DWARF output for Ada into PDB/CodeView. We found several bugs, which we tried to fix (for instance array size computation with non-zero low bound index), and also several kinds of types for which we could add translation (for instance enumeration types).
Do you think these changes can be merged? For the record, we tested them by hand on several Ada examples, and are currently trying to develop an automated testsuite, probably based on Microsoft’s CDB command-line debugger. However we haven’t checked the effect on D programs.
Thank you in advance for having a look! :-)