-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement DWARF support #51
Conversation
if args.dwarf { | ||
// DWARF relocations | ||
obj.link_with( | ||
Link { from: ".debug_info", to: "main", at: 0}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is sort of what I meant by “understanding the semantics”; ideally you shouldn’t have to specify the explicit relocation type, the backend should figure it out depending on the kind of declaration and who/what it’s relocating against.
Unfortunately, while I’m for adding this in order to move debug along (which I’m super excited for!!!) it’s not going to work generally for Mach-o, etc.
If the current system doesn’t work or you don’t have the expression you need, then I think we need to add it :) so historically RelocOverride was added by Pat so they could do some complicated custom relocations that faerie didn’t support at the time. I worried it might get used again, so I’ll just blame him 🤣
Anyway: I’m fine with this used for now for prototyping and just getting something working, but I’d also like to see a robust cross platform solution moving forward as well. I’d like to hear what your exact needs are here, it seems like you need explicit addends (which I thought we had) plus a more fine grained control over which which relocations appear against which symbols?
I’m sure there’s a consistency we can exploit which we can use to encode as our “backend” semantics, or faerie easy mode, as it were :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The relocations needed for DWARF are fairly simple: either 32-bit or 64-bit absolute relocations, very often with an addend. Deciding whether it is 32-bit or 64-bit is mostly straightforward, and can be done based on the from/to of the link. There's one exception that I'm aware of: DW_FORM_ref_addr
in DWARF version 2. I don't see any way to support that without extra info from the user.
However, all this is a trivial decision in the DWARF generation, and we already need to do it there. So I don't see the value in trying to handle that decision in both gimli and faerie. It's much easier if gimli decides itself. What I would like to see though is for gimli to be able to specify this in a way that is independent of ELF/Mach-O, without it needing to specify raw relocation values. That is, we have some generic relocation abstraction which faerie translates into ELF/Mach-O relocation types.
I need user defined addends, and link
doesn't support that. It does the opposite, and tries to figure out for itself what the addend needs to be. One way this could work is that I store the addend as an implicit addend in the DWARF section data, and if faerie needs an explicit addend then it could read it from the data.
My understanding of RelocOverride
is that cranelift now uses that exclusively, for reasons similar to why I want to handle relocations in DWARF: cranelift already has to decide what the relocation is when generating the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I would like to see though is for gimli to be able to specify this in a way that is independent of ELF/Mach-O
What I'll probably do for this in gimli is simply call a method on a DebugSink
trait or simliar, so there won't be a need for an additional abstraction there (eg cranelift can use its own internal representation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got some working code using gimli with this. gimli will have a Writer
trait with methods to write addresses and offsets. These methods have parameters for: the symbol/section that is being referenced, the addend, and the size. The from section/offset is implicit. All this is enough to create a relocation.
For an example implementation, here's a Writer implementation, which stores the relocation info for later, and the faerie links are created here.
I'm open to suggestions on how to improve the faerie API to support this, instead of using link_with
. The problem with the existing link
API is it doesn't support addends, and it can't always know what size relocation to use.
Otherwise, should I proceed with implementing the Mach-O link_with
support for debug sections?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just gave a cursory examination, haven't had a chance to review fully yet, but super excited for this!
So it seems the primary relocations used are 4/8 _64
and _32
? If this is the case, and you need addends, perhaps we could add a link_debug(Link {}, Debug { addend, size/kind })
or something to this effect, which allows you to specify enough semantic information (without explicit raw relocations), which allows the backend to emit the x86/arm (when we get it) _64 or _32 for ELF, and the similar required relocations for mach-o? I'll have to check what the typical relocations are for mach-o but it's probably pretty uniform, and i don't imagine there will be a lot of issues abstracting it, compared to other things that were already abstracted.
One question is if we want to abstract into the current system, or for time being add a link_debug
as I mentioned above. Since debug relocations are dfifferent and kind of special, I'm ok with that (especially if it makes things easier, and gets us farther away from using the RelocOverride
api. Dunno, need to investigate more :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mach-O uses X86_64_RELOC_UNSIGNED
/GENERIC_RELOC_VANILLA
etc. Adding a link_debug
sounds like a good path forward that keeps the machine specific details in faerie. I'll try that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed a commit to implement this, but instead of adding link_debug
, I've kept using link_with
and replaced RelocOverride
with an enum Reloc
that has Auto
/Raw
/Debug
variants. So it's a breaking change but I think this is a cleaner API and it's simpler internally too.
Still no Mach-O support, but at this stage I think that belongs in another PR, since it's going to be a larger change in general (it needs to support implicit addends, this was already broken for RelocOverride
), and it's not something I can properly test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me!
So we don’t have to do this now, and this is really tedious and annoying, but I would really want to see actual debug bytes in the reference/prototype object file emitter in faerie. Similar to the executable and data bits, they are actually executable so we can emit, link, and verify everything works.
For debug info it could be as simple as a basic type or whatever, maybe so far as loading gdb, and doing ptype on say one of the c strings, etc.
What I normally do here is create a c binary, then dump the bytes (and relocations) and then put them in the prototype.
It is tedious but it’s really great way to verify everything is working and I think we’ll have to have something like that soon for debug.
Otherwise this is great work and I really like the new reloc/reloc override api :)
// only debug sections should link to debug sections | ||
(_, &Decl::DebugSection {..}) => panic!("invalid DebugSection link"), | ||
// various static function pointers in the .data section | ||
(&Decl::Data {..}, &Decl::Function {..}) => (true, X86_64_RELOC_UNSIGNED), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One day I hope to see this same logic but replaced with a to_arch
function for the actual relocations, and what a glorious day that will be.
Also this isn’t super related but I’d like to see the testing infrastructure updated at some point. This PR makes a lot of internal changes and we’ve already seen subtle breakage occur; I’m not sure what the best solution is but I’d almost like a emitting and linking and executing suite. I dunno |
This doesn't seem to break anything for me on linux. |
Can you share the C source you used? I'd like to keep it consistent with what's there. |
I've added a couple of commits to reuse the section creation code. I think this is ready now. |
I should check this in, but I'm pretty sure it's currently: #include<stdio.h>
int STATIC = 0xcafebabe;
int* STATIC_REF = &STATIC;
extern int deadbeef();
int main(){
printf("deadbeef: 0x%x - %d\n", deadbeef(), *STATIC_REF);
return 0;
} I compiled it originally with
|
Note: this also changes the indices used for self.sections and self.code, but these don't appear to be used anywere.
Note: this no longer sets "section.sh_entsize = 1" for data
Rebased to fix conflict. |
@sunfishcode did you want to review ? I don’t have any more feedback other than wanting to protect “auto mode” capabilities in faerie, and moving any new code away from the “raw api”, which seems to be proceeding well in new direction (hopefully the Mach o version of this PR maintains this). Also to be clear part of my motivations for having an auto mode/easy mode is I want faerie to be a backend not just for compiler writers, but anyone who wants to generate object files, for whatever reason, eg binary analysis, evolutionary programs, something like objcopy, etc., but also just because I think it’s technically possible and an interesting approach. This is still the case in faerie, but in case my motivations for why weren’t clear, that’s why :) |
@sunfishcode last chance I’m going to merge this tonight probably unless I hear from you. @philipc this is ready to go, yes? |
Yes it's ready. I am still working through the cranelift support for this, but I don't expect more changes to be needed yet. |
Catching up here, this looks headed in the right direction. For DWARF it seems to make more sense for Faerie to "understand the semantics" like this than for machine code, because DWARF has the same encoding across many architectures. |
On the other hand, if faerie's link abstraction allowed something like cranelift's |
This is required when creating links that reference sections.
I've pushed a bug fix for section indices. Didn't catch this earlier because the values happened to be the same in my previous tests. I've now tested this with a basic .debug_info creation in rustc_codegen_cranelift. |
This is what i had in mind, but @sunfishcode put it much more succinctly than I could :) So I can see both sides to this, whether to put this logic in faerie or not, and I think it's probably out of scope for this PR, but I wouldn't mind at least opening a discussion about what something like this would look like, since I think it would be worth pursuing/pulling on it a bit. Anyway, @sunfishcode unless you have any other concerns/want to review, otherwise I'm going to merge this. Great work @philipc! |
Cool! Could you share it? |
https://github.com/philipc/rustc_codegen_cranelift/tree/debuginfo Both still WIP. The rustc support will be better once it's using more of rustc_codegen_ssa (eg it has |
Thanks. Will take a look soon! |
Mach support will come in a later PR.
Fixes #49
While I start working on Mach support, it'd be good if someone could have a quick look at the last commit to see if this approach looks ok.