Potential for call stack corruption #10

ghost · 2017-05-04T16:43:18Z

It looks like the RTL doesn't describe a machine scratch register among the CSRs which have been implemented. This is intended to be used to point to a trap-handler context storage container, such that on entry to the trap-handler its contents is swapped with that of the register holding the current stack-pointer (single instruction to do the swap).

Without this, the trap-handler may have to use the stack of the thread which was interrupted, in order to save its context. In the absence of any atomic stack operations, this is fine provided that the stack pointer is moved to reflect the increase in size of the stack before the new stack space is used, i.e. on a push the stack pointer is decremented before the store is made. If this is not done, and an interrupt occurs during a push operation after the store and before the SP decrement, the handler could end up pushing the context of the interrupted thread on top of the value pushed by the interrupted thread.

The problem is that the RISC-V standard C calling convention doesn't appear to constrain the order in which the store and decrement happen (as expected), and I looked at some assembly produced by the C compiler earlier and it does not appear to exhibit the behaviour required to avoid this.

FIX: Implement the machine scratch register.

sorear · 2017-05-05T03:53:30Z

The problem is that the RISC-V standard C calling convention doesn't appear to constrain the order in which the store and decrement happen (as expected), and I looked at some assembly produced by the C compiler earlier and it does not appear to exhibit the behaviour required to avoid this.

If true, that is a serious compiler bug.

The C compiler doesn't know it's generating code for a bare-metal environment; as far as the C compiler knows, the code is going to run on a POSIXish OS and possibly with signal handlers. When a signal handler is invoked it saves registers immediately below sp (there is no red zone on RISC-V); this means that everything on the stack at a negative offset from sp can be clobbered at any instruction boundary. So the sp subtraction has to happen before any registers can be saved.

Just did a brief test with a 7.0.x compiler I have lying around and I can't reproduce the behavior you're seeing:

int bar(int x);
int foo(int x) { return bar(bar(x)); }

foo:
        add     sp,sp,-16
        sd      ra,8(sp)
        call    bar
        call    bar
        ld      ra,8(sp)
        add     sp,sp,16
        jr      ra

sp decremented before anything is saved on the stack…

atraber · 2017-05-05T09:03:16Z

I agree with @sorear's analysis. I remember when we were discussing if we actually need the machine scratch register that we checked this exact situation.
As you were reporting, gcc always decrements the stack pointer before using it, and increases it only after popping all values from the stack, so from that perspective this is safe.

We have been using this productively for more than a year now and never had any issues, so I would conclude that it is indeed safe.

I actually thought this was part of the calling convention, but I can't seem to find it anymore? The only thing I did find the v2.1 user spec is this:

In the standard RISC-V calling convention, the stack grows downward and the stack pointer is
always kept 16-byte aligned

sorear · 2017-05-05T09:06:31Z

Calling convention docs are being moved from the user spec to a dedicated document (which is currently a markdown file in https://github.com/riscv/riscv-elf-psabi-doc but will probably be TeX'd at some point).

Standard privileged architecture needs mscratch because it is ABI-agnostic…

atraber · 2017-05-05T09:11:20Z

Thanks for the link, I wasn't aware of the fact that a separate ABI spec exists/is being developed.

aswaterman · 2017-05-05T09:48:54Z

Hi Andreas :)

It is indeed the ABI's intent that all data below the stack pointer is to be considered volatile, so the compiler must adjust the stack pointer before spilling and after filling. I think GCC is getting this right. Regardless, we need to specify this behavior in the psABI document.

ghost · 2017-05-05T17:12:50Z

Having looked more closely at the code generated by GCC, I too agree that it appears to do the correct thing. Although, I could not find this to be part of the calling convention either, hence my initial concern.

I am, however, of the belief that a thread should be agnostic to interrupts, and as far as I can tell so far, the current implementation (i.e. in the absence of the scratch register, for example) renders this not the case.

Why did you choose not to add the scratch register btw? As far as I can tell having done it locally, it is a very simple change to make.

atraber · 2017-05-05T17:45:29Z

For the use case that we designed this core, the scratch register is not necessary. The original design goal was to create a core for a many-core cluster based accelerator platform which would never use an OS and run everything bare metal. Having 32 registers that we did not intend to use seemed like a waste and thus we did not include it.

I think if you go through some very old commits you will actually see that the mscratch register was included at some point, but later on removed.

I admit though that 32 flops more do not make real difference in the end ;-)

ghost · 2017-05-17T16:46:12Z

Can you guys point me to some documentation which states that a C compiler is required to behave in this way and/or the RISC-V calling convention mandates this behaviour, so we can close out this issue?

I cannot find anywhere which states the required stack behaviour as a requirement.

sorear · 2017-05-17T19:05:02Z

As Andrew said, there is no documentation at this time.

Anyway, it's a self-enforcing requirement (misuse of the stack will lead to memory corruption on POSIX signal handling)

ghost · 2017-05-17T22:58:23Z

Thank you for your reply, @sorear. I don't think POSIX specifies a particular stack behaviour, that's more the job of an ABI as @aswaterman says.

davideschiavone · 2020-05-17T11:30:30Z

CV32E40P now has a mscratch CSR. Shall we close the issue now @origintfj ?

Silabs-ArjanB · 2020-07-14T05:32:52Z

We are closing this issue as it either has been fixed with the stated (merged) pull request, because it is no longer valid, or because a follow up question for further clarification or feedback remained unanswered for at least 7 days. Please do not hesitate to re-open this issue if you do not agree that the issue has been handled satisfactorily or if you have further questions.

sorear mentioned this issue May 5, 2017

Document implications of signal handling riscv-non-isa/riscv-elf-psabi-doc#32

Closed

davideschiavone added the Status:Resolved Issue has been resolved, but closure is pending on git merge and/or issuer confirmation label May 17, 2020

Silabs-ArjanB closed this as completed Jul 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential for call stack corruption #10

Potential for call stack corruption #10

ghost commented May 4, 2017

sorear commented May 5, 2017

atraber commented May 5, 2017

sorear commented May 5, 2017

atraber commented May 5, 2017

aswaterman commented May 5, 2017

ghost commented May 5, 2017

atraber commented May 5, 2017

ghost commented May 17, 2017

sorear commented May 17, 2017

ghost commented May 17, 2017

davideschiavone commented May 17, 2020

Silabs-ArjanB commented Jul 14, 2020

Potential for call stack corruption #10

Potential for call stack corruption #10

Comments

ghost commented May 4, 2017

sorear commented May 5, 2017

atraber commented May 5, 2017

sorear commented May 5, 2017

atraber commented May 5, 2017

aswaterman commented May 5, 2017

ghost commented May 5, 2017

atraber commented May 5, 2017

ghost commented May 17, 2017

sorear commented May 17, 2017

ghost commented May 17, 2017

davideschiavone commented May 17, 2020

Silabs-ArjanB commented Jul 14, 2020