forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 1
/
kvx-exceptions.txt
246 lines (220 loc) · 9.25 KB
/
kvx-exceptions.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
Exceptions
==========
On kvx, handlers are set using $ev (exception vector) register which
specifies a base address.
An offset is added to $ev upon exception and the result is used as
"Next $pc".
The offset depends on which exception vector the cpu wants to jump to:
* $ev + 0x00 for debug
* $ev + 0x40 for trap
* $ev + 0x80 for interrupt
* $ev + 0xc0 for syscall
Then, handlers are laid in the following order:
_____________
| |
| Syscall |
|_____________|
| |
| Interrupts |
|_____________|
| |
| Traps |
|_____________|
| | ^
| Debug | | Stride
BASE -> |_____________| v
Interrupts, and traps are serviced similarly, ie:
- Jump to handler
- Save all registers
- Prepare the call (do_IRQ or trap_handler)
- restore all registers
- return from exception
entry.S file is (as for other architectures) the entry point into the kernel.
It contains all assembly routines related to interrupts/traps/syscall.
Syscall handling
================
When executing a syscall, it must be done using "scall $r6"
where $r6 contains the syscall number. Using this convention allow to
modify and restart a syscall from the kernel.
Syscalls are handled differently than interrupts/exceptions. From an ABI
point of view, scalls are like function calls: any caller saved register
can be clobbered by the syscall. However, syscall parameters are passed
using registers r0 through r7. These registers must be preserved to avoid
cloberring them before the actual syscall function.
On syscall from userspace (scall instruction), the processor will put
the syscall number in $es.sn and switch from user to kernel privilege
mode. kvx_syscall_handler will be called in kernel mode.
The following steps are then taken:
- Switch to kernel stack
- Extract syscall number
- Check that the syscall number is not bogus
- If so, set syscall func to a not implemented one
- Check if tracing is enabled
- If so, jump to trace_syscall_enter
- Save syscall arguments (r0 -> r7) on stack in pt_regs
- Call do_trace_syscall_enter function
- Restore syscall arguments since they have been modified by C call
- Call the syscall function
- Save $r0 in pt_regs since it can be cloberred afterward
- If tracing was enabled, call trace_syscall_exit
- Call work_pending
- Return to user !
The trace call is handled out of the fast path. All slow path handling
is done in another part of code to avoid messing with the cache.
Signals
=======
Signals are handled when exiting kernel before returning to user.
When handling a signal, the path is the following:
1 - User application is executing normally
Then any exception happens (syscall, interrupt, trap)
2 - The exception handling path is taken
and before returning to user, pending signals are checked
3 - Signal are handled by do_signal
Registers are saved and a special part of the stack is modified
to create a trampoline to call rt_sigreturn
$spc is modified to jump to user signal handler
$ra is modified to jump to sigreturn trampoline directly after
returning from user signal handler.
4 - User signal handler is called after rfe from exception
when returning, $ra is retored to $pc, resulting in a call
to the syscall trampoline.
5 - syscall trampoline is executed, leading to rt_sigreturn syscall
6 - rt_sigreturn syscall is executed
Previous registers are restored to allow returning to user correctly
7 - User application is restored at the exact point it was interrupted
before.
+----------+
| 1 |
| User app | @func
| (user) |
+---+------+
|
| it/trap/scall
|
+---v-------+
| 2 |
| exception |
| handling |
| (kernel) |
+---+-------+
|
| Check if signal are pending, if so, handle signals
|
+---v--------+
| 3 |
| do_signal |
| handling |
| (kernel) |
+----+-------+
|
| Return to user signal handler
|
+----v------+
| 4 |
| signal |
| handler |
| (user) |
+----+------+
|
| Return to sigreturn trampoline
|
+----v-------+
| 5 |
| syscall |
|rt_sigreturn|
| (user) |
+----+-------+
|
| Syscall to rt_sigreturn
|
+----v-------+
| 6 |
| sigreturn |
| handler |
| (kernel) |
+----+-------+
|
| Modify context to return to original func
|
+----v-----+
| 7 |
| User app | @func
| (user) |
+----------+
Registers handling
==================
MMU is disabled in all exceptions paths, during register save and restoration.
This will prevent from triggering MMU fault (such as TLB miss) which could
clobber the current register state. Such event can occurs when RWX mode is
enabled and the memory accessed to save register can trigger a TLB miss.
Aside from that which is common for all exceptions path, registers are saved
differently regarding the type of exception.
Interrupts and traps
--------------------
When interrupt and traps are triggered, we only save the caller-saved registers.
Indeed, we rely on the fact that C code will save and restore callee-saved and
hence, there is no need to save them. This path is the following:
+------------+ +-----------+ +---------------+
IT | Save caller| C Call | Execute C | Ret | Restore caller| Ret from IT
+--->+ saved +--------->+ handler +------->+ saved +----->
| registers | +-----------+ | registers |
+------------+ +---------------+
However, when returning to user, we check if there is work_pending. If a signal
is pending and there is a signal handler to be called, then we need all
registers to be saved on the stack in the pt_regs before executing the signal
handler and restored after that. Since we only saved caller-saved registers, we
need to also save callee-saved registers to restore them correctly when
returning to user. This path is the following (a bit more complicated !):
+------------+
| Save caller| +-----------+ Ret +------------+
IT | saved | C Call | Execute C | to asm | Check work |
+--->+ registers +--------->+ handler +------->+ pending |
| to pt_regs | +-----------+ +--+---+-----+
+------------+ | |
Work pending | | No work pending
+--------------------------------------------+ |
| |
| +------------+
v |
+------+------+ v
| Save callee | +-------+-------+
| saved | | Restore caller| RFE from IT
| registers | | saved +------->
| to pt_regs | | registers |
+--+-------+--+ | from pt_regs |
| | +-------+-------+
| | +---------+ ^
| | | Execute | |
| +-------->+ needed +-----------+
| | work |
| +---------+
|Signal handler ?
v
+----+----------+ RFE to user +-------------+ +--------------+
| Copy all | handler | Execute | ret | rt_sigreturn |
| registers +------------>+ user signal +------>+ trampoline |
| from pt_regs | | handler | | to kernel |
| to user stack | +-------------+ +------+-------+
+---------------+ |
syscall rt_sigreturn |
+-------------------------------------------------+
|
v
+--------+-------+ +-------------+
| Recopy all | | Restore all | RFE
| registers from +--------------------->+ saved +------->
| user stack | Return | registers |
| to pt_regs | from sigreturn |from pt_regs |
+----------------+ (via ret_from_fork) +-------------+
Syscalls
--------
As explained before, for syscalls, we can use whatever callee-saved registers
we want since syscall are seen as a "classic" call from ABI pov.
Only different path is the one for clone. For this path, since the child expects
to find same callee-registers content than his parent, we must save them before
executing the clone syscall and restore them after that for the child. This is
done via a redefinition of __sys_clone in assembly which will be called in place
of the standard sys_clone. This new call will save callee saved registers
in pt_regs. Parent will return using the syscall standard path. Freshly spawned
child however will be woken up via ret_from_fork which will restore all
registers (even if caller saved are not needed).