Documentation: kvx: Add basic documentation

Add some documentation for kvx arch and its Linux port. CC: Jonathan Corbet <corbet@lwn.net> CC: linux-doc@vger.kernel.org CC: linux-kernel@vger.kernel.org Co-developed-by: Clement Leger <clement.leger@bootlin.com> Signed-off-by: Clement Leger <clement.leger@bootlin.com> Co-developed-by: Guillaume Thouvenin <gthouvenin@kalray.eu> Signed-off-by: Guillaume Thouvenin <gthouvenin@kalray.eu> Signed-off-by: Yann Sionneau <ysionneau@kalray.eu>
intel-lab-lkp · Jan 3, 2023 · 4ac33ed · 4ac33ed
1 parent 6132a49
commit 4ac33ed
Show file tree

Hide file tree

Showing 5 changed files with 1,005 additions and 0 deletions.
diff --git a/Documentation/kvx/kvx-exceptions.txt b/Documentation/kvx/kvx-exceptions.txt
@@ -0,0 +1,246 @@
+Exceptions
+==========
+On kvx, handlers are set using $ev (exception vector) register which
+specifies a base address.
+An offset is added to $ev upon exception and the result is used as
+"Next $pc".
+The offset depends on which exception vector the cpu wants to jump to:
+* $ev + 0x00 for debug
+* $ev + 0x40 for trap
+* $ev + 0x80 for interrupt
+* $ev + 0xc0 for syscall
+
+Then, handlers are laid in the following order:
+
+         _____________
+        |             |
+        |   Syscall   |
+        |_____________|
+        |             |
+        |  Interrupts |
+        |_____________|
+        |             |
+        |    Traps    |
+        |_____________|
+        |             | ^
+        |    Debug    | | Stride
+BASE -> |_____________| v
+
+
+Interrupts, and traps are serviced similarly, ie:
+- Jump to handler
+- Save all registers
+- Prepare the call (do_IRQ or trap_handler)
+- restore all registers
+- return from exception
+
+entry.S file is (as for other architectures) the entry point into the kernel.
+It contains all assembly routines related to interrupts/traps/syscall.
+
+Syscall handling
+================
+
+When executing a syscall, it must be done using "scall $r6"
+where $r6 contains the syscall number. Using this convention allow to
+modify and restart a syscall from the kernel.
+
+Syscalls are handled differently than interrupts/exceptions. From an ABI
+point of view, scalls are like function calls: any caller saved register
+can be clobbered by the syscall. However, syscall parameters are passed
+using registers r0 through r7. These registers must be preserved to avoid
+cloberring them before the actual syscall function.
+
+On syscall from userspace (scall instruction), the processor will put
+the syscall number in $es.sn and switch from user to kernel privilege
+mode. kvx_syscall_handler will be called in kernel mode.
+
+The following steps are then taken:
+
+- Switch to kernel stack
+- Extract syscall number
+- Check that the syscall number is not bogus
+ - If so, set syscall func to a not implemented one
+- Check if tracing is enabled
+ - If so, jump to trace_syscall_enter
+ - Save syscall arguments (r0 -> r7) on stack in pt_regs
+ - Call do_trace_syscall_enter function
+- Restore syscall arguments since they have been modified by C call
+- Call the syscall function
+- Save $r0 in pt_regs since it can be cloberred afterward
+- If tracing was enabled, call trace_syscall_exit
+- Call work_pending
+- Return to user !
+
+The trace call is handled out of the fast path. All slow path handling
+is done in another part of code to avoid messing with the cache.
+
+Signals
+=======
+
+Signals are handled when exiting kernel before returning to user.
+When handling a signal, the path is the following:
+
+1 - User application is executing normally
+    Then any exception happens (syscall, interrupt, trap)
+2 - The exception handling path is taken
+    and before returning to user, pending signals are checked
+3 - Signal are handled by do_signal
+    Registers are saved and a special part of the stack is modified
+    to create a trampoline to call rt_sigreturn
+    $spc is modified to jump to user signal handler
+    $ra is modified to jump to sigreturn trampoline directly after
+    returning from user signal handler.
+4 - User signal handler is called after rfe from exception
+    when returning, $ra is retored to $pc, resulting in a call
+    to the syscall trampoline.
+5 - syscall trampoline is executed, leading to rt_sigreturn syscall
+6 - rt_sigreturn syscall is executed
+    Previous registers are restored to allow returning to user correctly
+7 - User application is restored at the exact point it was interrupted
+    before.
+
+
+        +----------+
+        |    1     |
+        | User app | @func
+        |  (user)  |
+        +---+------+
+            |
+            | it/trap/scall
+            |
+        +---v-------+
+        |    2      |
+        | exception |
+        | handling  |
+        | (kernel)  |
+        +---+-------+
+            |
+            | Check if signal are pending, if so, handle signals
+            |
+        +---v--------+
+        |    3       |
+        | do_signal  |
+        |  handling  |
+        |  (kernel)  |
+        +----+-------+
+             |
+             | Return to user signal handler
+             |
+        +----v------+
+        |    4      |
+        |  signal   |
+        |  handler  |
+        |  (user)   |
+        +----+------+
+             |
+             | Return to sigreturn trampoline
+             |
+        +----v-------+
+        |    5       |
+        |  syscall   |
+        |rt_sigreturn|
+        |  (user)    |
+        +----+-------+
+             |
+             | Syscall to rt_sigreturn
+             |
+        +----v-------+
+        |    6       |
+        |  sigreturn |
+        |  handler   |
+        |  (kernel)  |
+        +----+-------+
+             |
+             | Modify context to return to original func
+             |
+        +----v-----+
+        |    7     |
+        | User app | @func
+        |  (user)  |
+        +----------+
+
+Registers handling
+==================
+
+MMU is disabled in all exceptions paths, during register save and restoration.
+This will prevent from triggering MMU fault (such as TLB miss) which could
+clobber the current register state. Such event can occurs when RWX mode is
+enabled and the memory accessed to save register can trigger a TLB miss.
+Aside from that which is common for all exceptions path, registers are saved
+differently regarding the type of exception.
+
+Interrupts and traps
+--------------------
+
+When interrupt and traps are triggered, we only save the caller-saved registers.
+Indeed, we rely on the fact that C code will save and restore callee-saved and
+hence, there is no need to save them. This path is the following:
+
+     +------------+          +-----------+        +---------------+
+IT   | Save caller| C Call   | Execute C |  Ret   | Restore caller| Ret from IT
++--->+   saved    +--------->+  handler  +------->+     saved     +----->
+     | registers  |          +-----------+        |   registers   |
+     +------------+                               +---------------+
+
+However, when returning to user, we check if there is work_pending. If a signal
+is pending and there is a signal handler to be called, then we need all
+registers to be saved on the stack in the pt_regs before executing the signal
+handler and restored after that. Since we only saved caller-saved registers, we
+need to also save callee-saved registers to restore them correctly when
+returning to user. This path is the following (a bit more complicated !):
+
+      +------------+
+      | Save caller|          +-----------+  Ret   +------------+
+ IT   |   saved    | C Call   | Execute C | to asm | Check work |
+ +--->+ registers  +--------->+  handler  +------->+   pending  |
+      | to pt_regs |          +-----------+        +--+---+-----+
+      +------------+                                  |   |
+                        Work pending                  |   | No work pending
+         +--------------------------------------------+   |
+         |                                                |
+         |                                   +------------+
+         v                                   |
+  +------+------+                            v
+  | Save callee |                    +-------+-------+
+  |   saved     |                    | Restore caller|  RFE from IT
+  | registers   |                    |     saved     +------->
+  | to pt_regs  |                    |   registers   |
+  +--+-------+--+                    | from pt_regs  |
+     |       |                       +-------+-------+
+     |       |         +---------+           ^
+     |       |         | Execute |           |
+     |       +-------->+ needed  +-----------+
+     |                 |  work   |
+     |                 +---------+
+     |Signal handler ?
+     v
++----+----------+ RFE to user +-------------+       +--------------+
+|   Copy all    | handler     |  Execute    |  ret  | rt_sigreturn |
+|   registers   +------------>+ user signal +------>+ trampoline   |
+| from pt_regs  |             |  handler    |       |  to kernel   |
+| to user stack |             +-------------+       +------+-------+
++---------------+                                          |
+                         syscall rt_sigreturn              |
+         +-------------------------------------------------+
+         |
+         v
++--------+-------+                      +-------------+
+|   Recopy all   |                      | Restore all |  RFE
+| registers from +--------------------->+    saved    +------->
+|   user stack   |       Return         |  registers  |
+|   to pt_regs   |    from sigreturn    |from pt_regs |
++----------------+  (via ret_from_fork) +-------------+
+
+
+Syscalls
+--------
+As explained before, for syscalls, we can use whatever callee-saved registers
+we want since syscall are seen as a "classic" call from ABI pov.
+Only different path is the one for clone. For this path, since the child expects
+to find same callee-registers content than his parent, we must save them before
+executing the clone syscall and restore them after that for the child. This is
+done via a redefinition of __sys_clone in assembly which will be called in place
+of the standard sys_clone. This new call will save callee saved registers
+in pt_regs. Parent will return using the syscall standard path. Freshly spawned
+child however will be woken up via ret_from_fork which will restore all
+registers (even if caller saved are not needed).