Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential for call stack corruption #10

Closed
ghost opened this issue May 4, 2017 · 12 comments
Closed

Potential for call stack corruption #10

ghost opened this issue May 4, 2017 · 12 comments
Labels
Status:Resolved Issue has been resolved, but closure is pending on git merge and/or issuer confirmation

Comments

@ghost
Copy link

ghost commented May 4, 2017

It looks like the RTL doesn't describe a machine scratch register among the CSRs which have been implemented. This is intended to be used to point to a trap-handler context storage container, such that on entry to the trap-handler its contents is swapped with that of the register holding the current stack-pointer (single instruction to do the swap).

Without this, the trap-handler may have to use the stack of the thread which was interrupted, in order to save its context. In the absence of any atomic stack operations, this is fine provided that the stack pointer is moved to reflect the increase in size of the stack before the new stack space is used, i.e. on a push the stack pointer is decremented before the store is made. If this is not done, and an interrupt occurs during a push operation after the store and before the SP decrement, the handler could end up pushing the context of the interrupted thread on top of the value pushed by the interrupted thread.

The problem is that the RISC-V standard C calling convention doesn't appear to constrain the order in which the store and decrement happen (as expected), and I looked at some assembly produced by the C compiler earlier and it does not appear to exhibit the behaviour required to avoid this.

FIX: Implement the machine scratch register.

@sorear
Copy link

sorear commented May 5, 2017

The problem is that the RISC-V standard C calling convention doesn't appear to constrain the order in which the store and decrement happen (as expected), and I looked at some assembly produced by the C compiler earlier and it does not appear to exhibit the behaviour required to avoid this.

If true, that is a serious compiler bug.

The C compiler doesn't know it's generating code for a bare-metal environment; as far as the C compiler knows, the code is going to run on a POSIXish OS and possibly with signal handlers. When a signal handler is invoked it saves registers immediately below sp (there is no red zone on RISC-V); this means that everything on the stack at a negative offset from sp can be clobbered at any instruction boundary. So the sp subtraction has to happen before any registers can be saved.


Just did a brief test with a 7.0.x compiler I have lying around and I can't reproduce the behavior you're seeing:

int bar(int x);
int foo(int x) { return bar(bar(x)); }
foo:
        add     sp,sp,-16
        sd      ra,8(sp)
        call    bar
        call    bar
        ld      ra,8(sp)
        add     sp,sp,16
        jr      ra

sp decremented before anything is saved on the stack…

@atraber
Copy link
Contributor

atraber commented May 5, 2017

I agree with @sorear's analysis. I remember when we were discussing if we actually need the machine scratch register that we checked this exact situation.
As you were reporting, gcc always decrements the stack pointer before using it, and increases it only after popping all values from the stack, so from that perspective this is safe.

We have been using this productively for more than a year now and never had any issues, so I would conclude that it is indeed safe.

I actually thought this was part of the calling convention, but I can't seem to find it anymore? The only thing I did find the v2.1 user spec is this:

In the standard RISC-V calling convention, the stack grows downward and the stack pointer is
always kept 16-byte aligned

@sorear
Copy link

sorear commented May 5, 2017

Calling convention docs are being moved from the user spec to a dedicated document (which is currently a markdown file in https://github.com/riscv/riscv-elf-psabi-doc but will probably be TeX'd at some point).

Standard privileged architecture needs mscratch because it is ABI-agnostic…

@atraber
Copy link
Contributor

atraber commented May 5, 2017

Thanks for the link, I wasn't aware of the fact that a separate ABI spec exists/is being developed.

@aswaterman
Copy link

Hi Andreas :)

It is indeed the ABI's intent that all data below the stack pointer is to be considered volatile, so the compiler must adjust the stack pointer before spilling and after filling. I think GCC is getting this right. Regardless, we need to specify this behavior in the psABI document.

@ghost
Copy link
Author

ghost commented May 5, 2017

Having looked more closely at the code generated by GCC, I too agree that it appears to do the correct thing. Although, I could not find this to be part of the calling convention either, hence my initial concern.

I am, however, of the belief that a thread should be agnostic to interrupts, and as far as I can tell so far, the current implementation (i.e. in the absence of the scratch register, for example) renders this not the case.

Why did you choose not to add the scratch register btw? As far as I can tell having done it locally, it is a very simple change to make.

@atraber
Copy link
Contributor

atraber commented May 5, 2017

For the use case that we designed this core, the scratch register is not necessary. The original design goal was to create a core for a many-core cluster based accelerator platform which would never use an OS and run everything bare metal. Having 32 registers that we did not intend to use seemed like a waste and thus we did not include it.

I think if you go through some very old commits you will actually see that the mscratch register was included at some point, but later on removed.

I admit though that 32 flops more do not make real difference in the end ;-)

@ghost
Copy link
Author

ghost commented May 17, 2017

Can you guys point me to some documentation which states that a C compiler is required to behave in this way and/or the RISC-V calling convention mandates this behaviour, so we can close out this issue?

I cannot find anywhere which states the required stack behaviour as a requirement.

@sorear
Copy link

sorear commented May 17, 2017

As Andrew said, there is no documentation at this time.

Anyway, it's a self-enforcing requirement (misuse of the stack will lead to memory corruption on POSIX signal handling)

@ghost
Copy link
Author

ghost commented May 17, 2017

Thank you for your reply, @sorear. I don't think POSIX specifies a particular stack behaviour, that's more the job of an ABI as @aswaterman says.

@davideschiavone
Copy link
Contributor

CV32E40P now has a mscratch CSR. Shall we close the issue now @origintfj ?

@davideschiavone davideschiavone added the Status:Resolved Issue has been resolved, but closure is pending on git merge and/or issuer confirmation label May 17, 2020
@Silabs-ArjanB
Copy link
Contributor

We are closing this issue as it either has been fixed with the stated (merged) pull request, because it is no longer valid, or because a follow up question for further clarification or feedback remained unanswered for at least 7 days. Please do not hesitate to re-open this issue if you do not agree that the issue has been handled satisfactorily or if you have further questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status:Resolved Issue has been resolved, but closure is pending on git merge and/or issuer confirmation
Projects
None yet
Development

No branches or pull requests

5 participants