Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ssamoswap traps / asynchronous runtime disabling #206

Open
sorear opened this issue Feb 27, 2024 · 14 comments
Open

ssamoswap traps / asynchronous runtime disabling #206

sorear opened this issue Feb 27, 2024 · 14 comments

Comments

@sorear
Copy link

sorear commented Feb 27, 2024

We likely need to be able to disable shadow stacks in all threads when a shared library without shadow stack compatibility is loaded. This is under discussion on linux-riscv. This is mostly a code generation issue (userspace code needs to allow for SSE to switch from 1 to 0 at any instruction boundary), but we need to decide what ssamoswap does in the uABI when shadow stacks are disabled, since "raise SIGILL" isn't an option. Presumably ssamoswap should perform no memory accesses and write 0 to the destination register if the shadow stack is disabled, and CSR writes to ssp should be ignored?

On the hardware side, do we want to change the definition of ssamoswap and ssp to do this automatically, or do we expect the ABI to be implemented by handling the illegal instruction trap?

@ved-rivos
Copy link
Collaborator

Since the libary loading happens from the dynamic loader, clearing SSE for that thread as part of the library load is okay. The ssamoswap use in functions such as swapcontext and setcontext is guarded by the use of ssprr as follows - and if SSE is 0 i.e. ssrdp reads 0 then ssamoswap is skipped.

stack_switch:
    ssrdp ra
    beqz ra, 2f # skip if Zicfiss not active
    ssamoswap ra, x0, (a0) # ra=*[a0] and *[a0]=0
    beq ra, a0, 1f # [a0] must be == [ra]
    unimp # else crash
1: addi ra, ra, XLEN/8 # pop the checkpoint
    csrrw ra, ssp, ra # swap ssp: ra=ssp, ssp=ra
    addi ra, ra, -(XLEN/8) # checkpoint = "old ssp - XLEN/8"
    ssamoswap x0, ra, (ra) # Save checkpoint at "old ssp - XLEN/8"
2:

@sorear
Copy link
Author

sorear commented Feb 27, 2024

stack_switch:
    ssrdp ra               # nonzero value since SSE=1
    beqz ra, 2f            # branch not taken
--- IPI OR BROADCAST SIGNAL TAKEN HERE, SSE TRANSITIONS FROM 1 to 0 ---
    ssamoswap ra, x0, (a0) # illegal instruction since SSE=0
    beq ra, a0, 1f
    unimp
1: addi ra, ra, XLEN/8
    csrrw ra, ssp, ra
    addi ra, ra, -(XLEN/8)
    ssamoswap x0, ra, (ra)
2:

Presuming that the broadcast disable of SSE is done via a libc-internal signal, libc could protect its own stack switch code by masking the internal signal for the duration of the stack switch, but that doesn't help if users want to write their own switch code.

@ved-rivos
Copy link
Collaborator

That is why I said - disable it for the thread that did dlopen - not all threads.

@ved-rivos
Copy link
Collaborator

Is there however a reason to disable SSE on dlopen of a library which is not compiled with Zicfiss. That library will not do any push/pop/unwind of shadow stack and will not get any shadow stack protections but that is okay.

@sorear
Copy link
Author

sorear commented Feb 27, 2024

That is why I said - disable it for the thread that did dlopen - not all threads.

I don't understand how you think this could possibly work. If a library that contains code that mishandles shadow stacks, and it's loaded in one thread, the functions it defines can be called on all threads as soon as inter-thread communication happens, which could be as simple as writing a pointer to shared memory.

Is there however a reason to disable SSE on dlopen of a library which is not compiled with Zicfiss. That library will not do any push/pop/unwind of shadow stack and will not get any shadow stack protections but that is okay.

We don't need to disable SSE on dlopen of a library which is not compiled with shadow stack prologues/epilogues. We do need to disable SSE on dlopen of a library which contains its own shadow stack unaware implementation of unwinding or stack switching. The current ELF ABI conflates these into a single bit.

@ved-rivos
Copy link
Collaborator

ved-rivos commented Feb 27, 2024

If the library does stack switching but returns to the caller on the stack it was called on then that would not be an issue. If the library only unwinds its own call frames but not the callers call frames then it would be okay. So the we worry about a library that unwinds past its entry function or does not return to the caller on the stack it was called on?

NOPing the ssamoswap either by hardware or by the kernel/runtime is not enough. For instance see the code snipped I pasted where returning a value of 0 may lead to a crash by invoking the unimp instruction. This is not limited to ssamoswap - it can happen with longjmp as well where if shadow stack is disable midway through long jump then it would lead to a crash - see the csrw:

longjmp() {
    :
    // Read current shadow stack pointer and
    // compute number of call frames to unwind
    asm("ssrdp %0" : "=r"(cur_ssp):);
    // Skip the unwind if backward-edge CFI not active
    asm("beqz %0, back_cfi_not_active" : "=r"(cur_ssp):);
    // Unwind the frames in a loop
    while ( jmp_buf->saved_ssp > cur_ssp ) {
         // advance by a maximum of 4K at a time to avoid
         // unwinding past bounds of the shadow stack
         cur_ssp = ( (jmp_buf->saved_ssp - cur_ssp) >= 4096 ) ?
                         (cur_ssp + 4096) : jmp_buf->saved_ssp;
         asm("csrw ssp, %0" : : "r" (cur_ssp));
         // Test if unwound past the shadow stack bounds
         asm("sspush x5");
         asm("sspopchk x5");
    }
back_cfi_not_active:
:
}

@ved-rivos
Copy link
Collaborator

ved-rivos commented Feb 27, 2024

If we really want to disable Zicfiss and not fail dlopen in this case then there may be couple of additional options:
Option 1:

  1. Shadow stack enabled applications and libraries that do unwind will need to have a flag in their thread pointer to indicate that they are in this critical section.
  2. If a signal arrives in this critical section then the signal handler sets another flag - make a disable system call when the thread exits the critical section. Else it makes the disable system call to disable it for that thread.
  3. When the critical section is over, the unwinder will check this flag asking to make a disable system call and will invoke the disable for that thread.
  4. Any application/library writing its own shadow stack aware unwinder/switcher will need to adhere to this protocol.

Option 2:

  1. When loading a library with dlopen if the library is one that would require disabling SSE then dynamic loader as part of linking the dynamic library - inserts a shim in between the application and the library.
  2. This shim does following to disable SSE on the first invocation of such a library.
     rdssp caller_saved_temp
     beqz skip
     syscall to disable SSE for this thread
skip:
     jump to target

@ved-rivos
Copy link
Collaborator

But cleanest may be to just not allow dlopen of a incompatible library - strict always - and this is also the safer option.

@sorear
Copy link
Author

sorear commented Feb 28, 2024

If the library does stack switching but returns to the caller on the stack it was called on then that would not be an issue. If the library only unwinds its own call frames but not the callers call frames then it would be okay. So the we worry about a library that unwinds past its entry function or does not return to the caller on the stack it was called on?

Yes, these are the sorts of things that are problematic.

NOPing the ssamoswap either by hardware or by the kernel/runtime is not enough. For instance see the code snipped I pasted where returning a value of 0 may lead to a crash by invoking the unimp instruction.

If the ABI part of this proposal is adopted, the stack switch code would recheck ssrdp before the unimp.

see the csrw

Good catch. Will fix.

Option 1

This is equivalent to signal masking as described in my previous message, somewhat faster but doesn't solve the actual problem where it tightly couples user code to libc implementation details.

Option 2

I don't see how you can "insert a shim" on every single function in a shared library, including static functions that don't have dynamic symbols and are returned by pointer.

Even if you could do this I don't see how it would be anything other than a maintenance, performance, and compatibility disaster.

But cleanest may be to just not allow dlopen of a incompatible library - strict always - and this is also the safer option.

I think this will be a very hard sell for musl, which has a high bar for breaking standards-compliant C code (much lower for things that already use asm or GNU extensions).

@ved-rivos
Copy link
Collaborator

Yes, these are the sorts of things that are problematic.

Is that compliant with ABI. At least not returning on the stack that the caller used seems functionally wrong.

If the ABI part of this proposal is adopted, the stack switch code would recheck ssrdp before the unimp.

That seems like a good way to address this case.

I don't see how you can "insert a shim" on every single function in a shared library, including static functions that don't have dynamic symbols and are returned by pointer.

I was thinking that the shim is created by the loader when returning a function pointer in response to dlsym that the caller of dlopen would use to obtain the address of some symbol in the loaded library for use. It is possible to have static pointers returned by the library in response to an initial library call to a function whose address was obtained using dlsym but then the disabling was already done as part of that function call. I am not sure however if the loader knows if the symbols are functions or other variables. It is more involved however as the loader now needs to generate these shim functions.

I think this will be a very hard sell for musl, which has a high bar for breaking standards-compliant C code (much lower for things that already use asm or GNU extensions).

The thought here was that the user code is already opting into the Zicfiss extension. The restriction on not opening incompatible libraries could be part of this opt-in ABI. Code unaware of Zicfiss would see no change.

@sorear
Copy link
Author

sorear commented Mar 14, 2024

Is that compliant with ABI.

Yes. These functions all cause a call site to return where the new stack and pc are correct.

I was thinking that the shim is created by the loader when returning a function pointer in response to dlsym that the caller of dlopen would use to obtain the address of some symbol in the loaded library for use. It is possible to have static pointers returned by the library in response to an initial library call to a function whose address was obtained using dlsym but then the disabling was already done as part of that function call. I am not sure however if the loader knows if the symbols are functions or other variables. It is more involved however as the loader now needs to generate these shim functions.

I don't see how any of this solves the problem I was pointing out.

/* dso.c */
static void foo() { ... }
typedef void (*pfv)();
void bar(_Atomic pfv *x) { *x = foo; }

/* caller.c */
typedef void (*pfv)();
typedef void (*pbar)(_Atomic pfv *);
_Atomic pfv f;
void *thr(void *arg) {
    while (!f) ;
    f();
    return 0;
}
int main() {
    pthread_t t;
    pthread_create(&t, 0, thr, 0);
    void *h = dlopen("dso.so",RTLD_NOW);
    void *ret;
    ((pbar)dlsym(h,"bar"))(&f);
    return pthread_join(t,&ret);
}

How does the shim get executed before the execution of foo in the second thread? foo doesn't have a dynamic symbol and at the time the thread is created, dlopen hasn't been called.

The thought here was that the user code is already opting into the Zicfiss extension. The restriction on not opening incompatible libraries could be part of this opt-in ABI. Code unaware of Zicfiss would see no change.

CFI enablement appears to be being done globally at a distro level, not as an opt-in.

@ved-rivos
Copy link
Collaborator

How does the shim get executed before the execution of foo in the second thread?

The shim would be on the bar() so would not help this case.

CFI enablement appears to be being done globally at a distro level, not as an opt-in.

Each application needs to be compiled with the CFI options and the corresponding attributes set in the image for CFI to be activated on that program. The distro should not be shipping an application an application compiled with CFI if it is tested to be compatible with CFI - i.e., all its dependent libraries are also compatible.

The two options suggested in the Linux thread seem feasible:

  1. Fail the dlopen - a CFI compiled application should not be opening a non CFI compiled library. This would make the incompatibility to clear such as in case where someone failed to compile all dependencies of the application. This would also prevent silently disabling protections on an application for cases like where the installed library was replaced maliciously with an intent to disable protections.

  2. Suppress the illegal instruction on ssamoswap and csr* ssp if shadow stacks were disabled.

@sorear
Copy link
Author

sorear commented Mar 14, 2024

for cases like where the installed library was replaced maliciously with an intent to disable protections.

This threat model is incoherent. If an attacker can replace libraries, they don't need to bypass CFI because they already have full control of the process.

Suppress the illegal instruction on ssamoswap and csr* ssp if shadow stacks were disabled.

The ISA specification is the contract between software and the execution environment. It is not a hardware functionality specification, and existing cases where privileged software does interesting things in response to exceptions like misaligned accesses are reflected in the ISA specification. So if you want to go down this route it ought to be in the ISA specification.

@ved-rivos
Copy link
Collaborator

I am suggesting to fail the dlopen. However, software can choose to do the second option similar to other architectures like ARM and x86 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants