Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long semantically direct jumps #125

Closed
sorear opened this issue Jun 26, 2023 · 12 comments
Closed

Long semantically direct jumps #125

sorear opened this issue Jun 26, 2023 · 12 comments

Comments

@sorear
Copy link

sorear commented Jun 26, 2023

There is a presupposition in #112 that functions/PLT entries whose address is not taken don't need landing pads, and that direct calls to a function/PLT entry can bypass the landing pad. This is currently true only for objects less than 1 MiB and only if relaxation is enabled in the code model; if relaxation is not possible, jumps become an AUIPC/JALR(rd=x0,rs1=x6) sequence and calls AUIPC/JALR(rd=x1,rs1=x1) sequence, both of which are considered indirect calls by the currently proposed ISA and therefore require landing pads despite being semantically direct.

TAIL can be changed to use (rd=x7,rs1=x7) but there is no feasible option for CALL (all NO_LP_EXPECTED JALRs have rd=x0 or rd=x7) or PLT entries that can report failure.

I think that the simplest expression that allows for both ABI and alternate(t0) returns, software-guarded jumps, and semantically direct calls with an ABI or non-ABI return address is

    is_lp_expected = ( (JALR || C.JR || C.JALR) &&
                              (rs1 != x1 && rs1 != x5 && rs1 != x6) ) ? 1 : 0;

Requiring all CALLs outside the relaxation range to set up x7 would cost 1085KiB in a labeled version of libQt5WebKitWidgets.so.5.212.0. This increases to 1649KiB if x7 setup is done by the compiler and not deleteable after CALL->JAL relaxation.

@ved-rivos
Copy link
Collaborator

You are right that this issue got introduced by mistake in PR #122. We forgot about the semantically direct jumps. We should undo that PR. The x7 as the source for SW guarded jumps could be changed to x6 and should provide more compatibility as well to assembly sequences that may be using TAIL.

@ved-rivos
Copy link
Collaborator

Update in #126

@ved-rivos
Copy link
Collaborator

How strong is the preference to retaining use of x6 for TAIL. Using x7 has a property that x7 also being used to hold landing pad label is by definition not usable for indirect call/jump. It would be preferable to avoid adding another special register to the architecture. If this is acceptable to the compilers, I suggest that we update to below:
is_lp_expected = ( (JALR || C.JR || C.JALR) && (rs1 != x1 && rs1 != x5 && rs1 != x7) ) ? 1 : 0;

@sorear
Copy link
Author

sorear commented Jul 25, 2023

When I originally posted this, it was a strong preference for any register other than x7 since I wanted to be able to do guarded jumps to functions that might have a landing pad, and didn't want the needed value of x7 to conflict between the jump and the landing pad.

Now that lpad doesn't check x7 subject to NO_LP_EXPECTED x7 should theoretically be usable, although I want to get further into toolchain prototyping before committing to this.

@ved-rivos
Copy link
Collaborator

Yes, will stick with x7. Got some inputs from GCC and LLVM and this does not seem to be problematic. We can revisit if an issue is discovered downstream.

@jrtc27
Copy link

jrtc27 commented Aug 17, 2023

I would much prefer that we not break the long-standing convention of using x6 for tail. The choice is entirely arbitrary, so it should be for the CFI spec to bend to the existing standards and conventions, not for them to bend to your wishes, unless necessary. Please reopen this issue.

@yetingk
Copy link

yetingk commented Aug 21, 2023

Is it possible to use x6 as label register? Is there any benefit for using x7 instead of x6 as label register?

@ved-rivos
Copy link
Collaborator

The choice of t2/x7 was due to the fact that it is just a temporary and does not play an ABI role. By contrast, t0/x5, and t1/x6 do play an ABI role in passing information between the PLT header and the dynamic linker. Since the choice of x6 would have been problematic for this reason, x7 was chosen as the landing pad label register. Further, t2/x7 is also designated as the register for software-guarded jumps since their use is mutually exclusive.

@yetingk
Copy link

yetingk commented Aug 25, 2023

Sorry, I don't really understand your idea.
Don't we also use t2/x7 to get address of .got.plt for PLT Header? And we always use t3/x8 as rs1 for the indirect jump for PLT. I don't get the point what is the problem about using t2/x6 as rs1 for software guarded branch. Could you elaborate more on this?

@ved-rivos
Copy link
Collaborator

ved-rivos commented Aug 26, 2023

You asked "Is there any benefit for using x7 instead of x6 as label register"
The t0/x5 and t1/x6 do play an ABI role - see _dl_runtime_resolve . If we used t1/x6 as label register then we would break this abi compatibility since _dl_runtime_resolve expects to receive the relocation offset in t1/x6 and we would have had to change that if x6 was used as label register since label register cannot be clobbered. The PLT stubs do use t2/x7 presently but that use does not have a ABI role and can be changed. Please see the updated PLT stubs are discussed in issue #112 - they preserve the use of t1/x6 to pass the .got.plt offset to the dynamic loader and the use of t0/x5 to pass the link map. Hence the choice of t2/x7 as the label register instead of t1/x6

@yetingk
Copy link

yetingk commented Aug 28, 2023

Thank you for your elaboration. I have no opinion about to use x6 as label register now.

@jrtc27
Copy link

jrtc27 commented Aug 28, 2023

I don't care that you're using x7 for the label register. I care that you're not using x6 as the tail call register, which is a long-standing convention codified in riscv-asm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants