Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ks_asm() Address to symbol offset. #45

Merged
merged 1 commit into from
May 13, 2016

Conversation

iksteen
Copy link
Contributor

@iksteen iksteen commented May 12, 2016

This passes the Address provided to ks_asm down to MCAssembler::layout where it's added to the fixed up value and applied to the encoded output.

Not sure what the impact is on other architectures and whether it's the right approach, but it seems to work for x86/x86_64.

@@ -658,17 +658,19 @@ void MCAssembler::layout(MCAsmLayout &Layout) {
uint64_t FixedValue;
bool IsPCRel;
std::tie(FixedValue, IsPCRel) = handleFixup(Layout, *F, Fixup);
if (!IsPCRel)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be sure, you can check for X86 with:

  if (getBackend().getArch() == KS_ARCH_X86)

Copy link
Contributor Author

@iksteen iksteen May 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I thought about that (or to move it further down to the X86 specific
version of handleFixup), but I'm not sure whether the problem happens on
other architectures as well.

There's another problem on ARM which prevents me from testing it there
(I'll open an issue about that later, ldr r0, =label doesn't do what it
should do) and I'm not experienced enough with other archs for a quick
test.

In fact, I'm not even sure this is the right approach, it might be possible to add the address to the symbol itself which means the fixup will automatically have the correct absolute address.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes pass it further down to X86 handler is another solution.

this issue looks arch-specific to me, so i am not sure how you add "address to symbol" in generic way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tested this with PPC code generation, it has the same issue without this address fixup so I'm going to assume this applies to all platforms. Not yet convinced this is the proper solution though, setting the fragment's offset to Address is probably a better idea.

@iksteen iksteen changed the title Add ks_asm() Address to symbol offset. [don't merge yet] Add ks_asm() Address to symbol offset. May 12, 2016
By adding BaseAddress to MCContext, passing it from ks_asm and using it
to initialise the address of the first fragment we make sure that
symbols get the right absolute offset.
@iksteen
Copy link
Contributor Author

iksteen commented May 12, 2016

Much cleaner version. By setting the address of the first MCFragment to the address passed to ks_asm via the context, the label addresses get offset automatically.

@iksteen iksteen changed the title [don't merge yet] Add ks_asm() Address to symbol offset. Add ks_asm() Address to symbol offset. May 12, 2016
@aquynh
Copy link
Member

aquynh commented May 13, 2016

how can we be sure that this works for all other non-x86 architectures?

@iksteen
Copy link
Contributor Author

iksteen commented May 13, 2016

From a theoretical point of view: keystone uses the ELF object (.o) streamer, ELF object files are relocatable as the final address isn't known until link time. The relocations are recorded in the .o file and the linker stitches it all together: All addresses in the binary code are relative to the section start and the linker offsets these so they become correct. This patch effectively patches the section start so all addresses become relative to the base address.

From a technical point of view: The assembler and emitter have no idea where the code will live (well, you added it to the instruction class, but it's not generally used), addresses can only be relative to the current fragment because there is no other point of reference.

From an (incomplete) empirical point of view: I've tested this on X86 and PPC and it happens on both those architectures. On ARM this is currently not testable because labels are interpreted as strings.

@aquynh aquynh merged commit 1768adc into keystone-engine:master May 13, 2016
@aquynh
Copy link
Member

aquynh commented May 13, 2016

yes basically we have to implement a simple "linker" that original assembler of LLVM does not do.

merged, thanks!

@iksteen iksteen deleted the fix-43 branch May 13, 2016 08:11
aquynh added a commit that referenced this pull request May 14, 2016
@aquynh
Copy link
Member

aquynh commented May 14, 2016

actually this causes a segfault on Ubuntu 14.04 64bit, on the testcase arm_blx_rel_thumb.py. i fixed that with this commit ab450a6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants