-
Notifications
You must be signed in to change notification settings - Fork 587
/
supervisor.adoc
1979 lines (1684 loc) · 81.2 KB
/
supervisor.adoc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
[[supervisor]]
== Supervisor-Level ISA, Version 1.12
This chapter describes the RISC-V supervisor-level architecture, which
contains a common core that is used with various supervisor-level
address translation and protection schemes.
[NOTE]
====
Supervisor mode is deliberately restricted in terms of interactions with
underlying physical hardware, such as physical memory and device
interrupts, to support clean virtualization. In this spirit, certain
supervisor-level facilities, including requests for timer and
interprocessor interrupts, are provided by implementation-specific
mechanisms. In some systems, a supervisor execution environment (SEE)
provides these facilities in a manner specified by a supervisor binary
interface (SBI). Other systems supply these facilities directly, through
some other implementation-defined mechanism.
====
=== Supervisor CSRs
A number of CSRs are provided for the supervisor.
[NOTE]
====
The supervisor should only view CSR state that should be visible to a
supervisor-level operating system. In particular, there is no
information about the existence (or non-existence) of higher privilege
levels (machine level or other) visible in the CSRs accessible by the
supervisor.
Many supervisor CSRs are a subset of the equivalent machine-mode CSR,
and the machine-mode chapter should be read first to help understand the
supervisor-level CSR descriptions.
====
[[sstatus]]
==== Supervisor Status Register (`sstatus`)
The `sstatus` register is an SXLEN-bit read/write register formatted as
shown in <<sstatusreg-rv32>> when SXLEN=32
and <<sstatusreg>> when SXLEN=64. The `sstatus`
register keeps track of the processor's current operating state.
include::images/bytefield/sstatus32-1.edn[]
[[sstatusreg-rv32]]
.Supervisor-mode status register (`sstatus`) when SXLEN=32.
include::images/bytefield/sstatus32-2.edn[]
include::images/bytefield/sstatus64.edn[]
[[sstatusreg]]
.Supervisor-mode status register (`sstatus`) when SXLEN=64.
include::images/bytefield/sstatus32-2.edn[]
The SPP bit indicates the privilege level at which a hart was executing
before entering supervisor mode. When a trap is taken, SPP is set to 0
if the trap originated from user mode, or 1 otherwise. When an SRET
instruction (see <<otherpriv>>) is executed to
return from the trap handler, the privilege level is set to user mode if
the SPP bit is 0, or supervisor mode if the SPP bit is 1; SPP is then
set to 0.
The SIE bit enables or disables all interrupts in supervisor mode. When
SIE is clear, interrupts are not taken while in supervisor mode. When
the hart is running in user-mode, the value in SIE is ignored, and
supervisor-level interrupts are enabled. The supervisor can disable
individual interrupt sources using the `sie` CSR.
The SPIE bit indicates whether supervisor interrupts were enabled prior
to trapping into supervisor mode. When a trap is taken into supervisor
mode, SPIE is set to SIE, and SIE is set to 0. When an SRET instruction
is executed, SIE is set to SPIE, then SPIE is set to 1.
The `sstatus` register is a subset of the `mstatus` register.
[NOTE]
====
In a straightforward implementation, reading or writing any field in
`sstatus` is equivalent to reading or writing the homonymous field in
`mstatus`.
====
===== Base ISA Control in `sstatus` Register
The UXL field controls the value of XLEN for U-mode, termed _UXLEN_,
which may differ from the value of XLEN for S-mode, termed _SXLEN_. The
encoding of UXL is the same as that of the MXL field of `misa`, shown in
<<misabase>>.
When SXLEN=32, the UXL field does not exist, and UXLEN=32. When
SXLEN=64, it is a *WARL* field that encodes the current value of UXLEN. In
particular, an implementation may make UXL be a read-only field whose
value always ensures that UXLEN=SXLEN.
If UXLEN≠SXLEN, instructions executed in the narrower
mode must ignore source register operand bits above the configured XLEN,
and must sign-extend results to fill the widest supported XLEN in the
destination register.
If UXLEN latexmath:[$<$] SXLEN, user-mode instruction-fetch addresses
and load and store effective addresses are taken modulo
latexmath:[$2^{\text{UXLEN}}$]. For example, when UXLEN=32 and SXLEN=64,
user-mode memory accesses reference the lowest of the address space.
[[sum]]
===== Memory Privilege in `sstatus` Register
The MXR (Make eXecutable Readable) bit modifies the privilege with which
loads access virtual memory. When MXR=0, only loads from pages marked
readable (R=1 in <<sv32pte>>) will succeed. When
MXR=1, loads from pages marked either readable or executable (R=1 or
X=1) will succeed. MXR has no effect when page-based virtual memory is
not in effect.
The SUM (permit Supervisor User Memory access) bit modifies the
privilege with which S-mode loads and stores access virtual memory. When
SUM=0, S-mode memory accesses to pages that are accessible by U-mode
(U=1 in <<sv32pte>>) will fault. When SUM=1, these
accesses are permitted. SUM has no effect when page-based virtual memory
is not in effect, nor when executing in U-mode. Note that S-mode can
never execute instructions from user pages, regardless of the state of
SUM.
SUM is read-only 0 if `satp`.MODE is read-only 0.
[NOTE]
====
The SUM mechanism prevents supervisor software from inadvertently
accessing user memory. Operating systems can execute the majority of
code with SUM clear; the few code segments that should access user
memory can temporarily set SUM.
The SUM mechanism does not avail S-mode software of permission to
execute instructions in user code pages. Legitimate uses cases for
execution from user memory in supervisor context are rare in general and
nonexistent in POSIX environments. However, bugs in supervisors that
lead to arbitrary code execution are much easier to exploit if the
supervisor exploit code can be stored in a user buffer at a virtual
address chosen by an attacker.
Some non-POSIX single address space operating systems do allow certain
privileged software to partially execute in supervisor mode, while most
programs run in user mode, all in a shared address space. This use case
can be realized by mapping the physical code pages at multiple virtual
addresses with different permissions, possibly with the assistance of
the instruction page-fault handler to direct supervisor software to use
the alternate mapping.
====
===== Endianness Control in `sstatus` Register
The UBE bit is a *WARL* field that controls the endianness of explicit memory
accesses made from U-mode, which may differ from the endianness of
memory accesses in S-mode. An implementation may make UBE be a read-only
field that always specifies the same endianness as for S-mode.
UBE controls whether explicit load and store memory accesses made from
U-mode are little-endian (UBE=0) or big-endian (UBE=1).
UBE has no effect on instruction fetches, which are _implicit_ memory
accesses that are always little-endian.
For _implicit_ accesses to supervisor-level memory management data
structures, such as page tables, S-mode endianness always applies and
UBE is ignored.
[NOTE]
====
Standard RISC-V ABIs are expected to be purely little-endian-only or
big-endian-only, with no accommodation for mixing endianness.
Nevertheless, endianness control has been defined so as to permit an OS
of one endianness to execute user-mode programs of the opposite
endianness.
====
==== Supervisor Trap Vector Base Address Register (`stvec`)
The `stvec` register is an SXLEN-bit read/write register that holds trap
vector configuration, consisting of a vector base address (BASE) and a
vector mode (MODE).
.Supervisor trap vector base address register (`stvec`).
include::images/bytefield/stvec.edn[]
The BASE field in `stvec` is a field that can hold any valid virtual or
physical address, subject to the following alignment constraints: the
address must be 4-byte aligned, and MODE settings other than Direct
might impose additional alignment constraints on the value in the BASE
field.
[[stvec-mode]]
.Encoding of `stvec` MODE field.
[%autowidth,float="center",align="center",cols=">,^,<",options="header",]
|===
|Value |Name |Description
|0 +
1 +
≥2
|Direct +
Vectored
|All exceptions set `pc` to BASE. +
Asynchronous interrupts set `pc` to BASE+4×cause. +
_Reserved_
|===
The encoding of the MODE field is shown in
<<stvec-mode>>. When MODE=Direct, all traps into
supervisor mode cause the `pc` to be set to the address in the BASE
field. When MODE=Vectored, all synchronous exceptions into supervisor
mode cause the `pc` to be set to the address in the BASE field, whereas
interrupts cause the `pc` to be set to the address in the BASE field
plus four times the interrupt cause number. For example, a
supervisor-mode timer interrupt (see <<scauses>>)
causes the `pc` to be set to BASE+`0x14`. Setting MODE=Vectored may
impose a stricter alignment constraint on BASE.
==== Supervisor Interrupt Registers (`sip` and `sie`)
The `sip` register is an SXLEN-bit read/write register containing
information on pending interrupts, while `sie` is the corresponding
SXLEN-bit read/write register containing interrupt enable bits.
Interrupt cause number _i_ (as reported in CSR `scause`,
<<scause>>) corresponds with bit _i_ in both `sip` and
`sie`. Bits 15:0 are allocated to standard interrupt causes only, while
bits 16 and above are designated for platform or custom use.
.Supervisor interrupt-pending register (`sip`).
include::images/bytefield/sip.edn[]
.Supervisor interrupt-enable register (`sie`).
include::images/bytefield/sie.edn[]
An interrupt _i_ will trap to S-mode if both of the following are true:
(a) either the current privilege mode is S and the SIE bit in the
`sstatus` register is set, or the current privilege mode has less
privilege than S-mode; and (b) bit _i_ is set in both `sip` and `sie`.
These conditions for an interrupt trap to occur must be evaluated in a
bounded amount of time from when an interrupt becomes, or ceases to be,
pending in `sip`, and must also be evaluated immediately following the
execution of an SRET instruction or an explicit write to a CSR on which
these interrupt trap conditions expressly depend (including `sip`, `sie`
and `sstatus`).
Interrupts to S-mode take priority over any interrupts to lower
privilege modes.
Each individual bit in register `sip` may be writable or may be
read-only. When bit _i_ in `sip` is writable, a pending interrupt _i_
can be cleared by writing 0 to this bit. If interrupt _i_ can become
pending but bit _i_ in `sip` is read-only, the implementation must
provide some other mechanism for clearing the pending interrupt (which
may involve a call to the execution environment).
A bit in `sie` must be writable if the corresponding interrupt can ever
become pending. Bits of `sie` that are not writable are read-only zero.
The standard portions (bits 15:0) of registers `sip` and `sie` are
formatted as shown in Figures <<sipreg-standard>>
and <<siereg-standard>> respectively.
[[sipreg-standard]]
.Standard portion (bits 15:0) of `sip`.
include::images/bytefield/sipreg-standard.edn[]
[[siereg-standard]]
.Standard portion (bits 15:0) of `sie`.
include::images/bytefield/siereg-standard.edn[]
Bits `sip`.SEIP and `sie`.SEIE are the interrupt-pending and
interrupt-enable bits for supervisor-level external interrupts. If
implemented, SEIP is read-only in `sip`, and is set and cleared by the
execution environment, typically through a platform-specific interrupt
controller.
Bits `sip`.STIP and `sie`.STIE are the interrupt-pending and
interrupt-enable bits for supervisor-level timer interrupts. If
implemented, STIP is read-only in `sip`, and is set and cleared by the
execution environment.
Bits `sip`.SSIP and `sie`.SSIE are the interrupt-pending and
interrupt-enable bits for supervisor-level software interrupts. If
implemented, SSIP is writable in `sip` and may also be set to 1 by a
platform-specific interrupt controller.
[NOTE]
====
Interprocessor interrupts are sent to other harts by
implementation-specific means, which will ultimately cause the SSIP bit
to be set in the recipient hart’s `sip` register.
====
Each standard interrupt type (SEI, STI, or SSI) may not be implemented,
in which case the corresponding interrupt-pending and interrupt-enable
bits are read-only zeros. All bits in `sip` and `sie` are *WARL* fields. The
implemented interrupts may be found by writing one to every bit location
in `sie`, then reading back to see which bit positions hold a one.
[NOTE]
====
The `sip` and `sie` registers are subsets of the `mip` and `mie`
registers. Reading any implemented field, or writing any writable field,
of `sip`/`sie` effects a read or write of the homonymous field of
`mip`/`mie`.
Bits 3, 7, and 11 of `sip` and `sie` correspond to the machine-mode
software, timer, and external interrupts, respectively. Since most
platforms will choose not to make these interrupts delegatable from
M-mode to S-mode, they are shown as 0 in
<<sipreg-standard>> and <<siereg-standard>>.
====
Multiple simultaneous interrupts destined for supervisor mode are
handled in the following decreasing priority order: SEI, SSI, STI.
==== Supervisor Timers and Performance Counters
Supervisor software uses the same hardware performance monitoring
facility as user-mode software, including the `time`, `cycle`, and
`instret` CSRs. The implementation should provide a mechanism to modify
the counter values.
The implementation must provide a facility for scheduling timer
interrupts in terms of the real-time counter, `time`.
==== Counter-Enable Register (`scounteren`)
.Counter-enable register (`scounteren`)
include::images/bytefield/scounteren.edn[]
The counter-enable register `scounteren` is a 32-bit register that
controls the availability of the hardware performance monitoring
counters to U-mode.
When the CY, TM, IR, or HPM__n__ bit in the `scounteren` register is
clear, attempts to read the `cycle`, `time`, `instret`, or `hpmcountern`
register while executing in U-mode will cause an illegal-instruction
exception. When one of these bits is set, access to the corresponding
register is permitted.
`scounteren` must be implemented. However, any of the bits may be
read-only zero, indicating reads to the corresponding counter will cause
an exception when executing in U-mode. Hence, they are effectively
*WARL* fields.
[NOTE]
====
The setting of a bit in `mcounteren` does not affect whether the
corresponding bit in `scounteren` is writable. However, U-mode may only
access a counter if the corresponding bits in `scounteren` and
`mcounteren` are both set.
====
==== Supervisor Scratch Register (`sscratch`)
The `sscratch` register is an SXLEN-bit read/write register, dedicated
for use by the supervisor. Typically, `sscratch` is used to hold a
pointer to the hart-local supervisor context while the hart is executing
user code. At the beginning of a trap handler, `sscratch` is swapped
with a user register to provide an initial working register.
.Supervisor Scratch Register
include::images/bytefield/sscratch.edn[]
==== Supervisor Exception Program Counter (`sepc`)
`sepc` is an SXLEN-bit read/write register formatted as shown in
<<epcreg>>. The low bit of `sepc` (`sepc[0]`) is always zero. On implementations that support only IALIGN=32, the two low bits (`sepc[1:0]`) are always zero.
If an implementation allows IALIGN to be either 16 or 32 (by changing
CSR `misa`, for example), then, whenever IALIGN=32, bit `sepc[1]` is
masked on reads so that it appears to be 0. This masking occurs also for
the implicit read by the SRET instruction. Though masked, `sepc[1]`
remains writable when IALIGN=32.
`sepc` is a *WARL* register that must be able to hold all valid virtual
addresses. It need not be capable of holding all possible invalid
addresses. Prior to writing `sepc`, implementations may convert an
invalid address into some other invalid address that `sepc` is capable
of holding.
When a trap is taken into S-mode, `sepc` is written with the virtual
address of the instruction that was interrupted or that encountered the
exception. Otherwise, `sepc` is never written by the implementation,
though it may be explicitly written by software.
[[epcreg]]
.Supervisor exception program counter register.
include::images/bytefield/epcreg.edn[]
[[scause]]
==== Supervisor Cause Register (`scause`)
The `scause` register is an SXLEN-bit read-write register formatted as
shown in <<scausereg>>. When a trap is taken into
S-mode, `scause` is written with a code indicating the event that
caused the trap. Otherwise, `scause` is never written by the
implementation, though it may be explicitly written by software.
The Interrupt bit in the `scause` register is set if the trap was caused
by an interrupt. The Exception Code field contains a code identifying
the last exception or interrupt. <<scauses>> lists
the possible exception codes for the current supervisor ISAs. The
Exception Code is a *WLRL* field. It is required to hold the values 0–31
(i.e., bits 4–0 must be implemented), but otherwise it is only
guaranteed to hold supported exception codes.
[[scausereg]]
.Supervisor Cause register `scause`.
include::images/bytefield/scausereg.edn[]
[[scauses]]
.Supervisor cause register (`scause`) values after trap. Synchronous exception priorities are given by <<exception-priority>>.
[%autowidth,float="center",align="center",cols=">,>,3",options="header"]
|===
|Interrupt |Exception Code |Description
|1 +
1 +
1 +
1 +
1 +
1 +
1 +
1
|0 +
1 +
2-4 +
5 +
6-8 +
9 +
10-15 +
≥16
|_Reserved_ +
Supervisor software interrupt +
_Reserved_ +
Supervisor timer interrupt +
_Reserved_ +
Supervisor external interrupt +
_Reserved_ +
_Designated for platform use_
|0 +
0 +
0 +
0 +
0 +
0 +
0 +
0 +
0 +
0 +
0 +
0 +
0 +
0 +
0 +
0 +
0 +
0 +
0 +
0
|0 +
1 +
2 +
3 +
4 +
5 +
6 +
7 +
8 +
9 +
10-11 +
12 +
13 +
14 +
15 +
16-23 +
24-31 +
32-47 +
48-63 +
≥64
|Instruction address misaligned +
Instruction access fault +
Illegal instruction +
Breakpoint +
Load address misaligned +
Load access fault +
Store/AMO address misaligned +
Store/AMO access fault +
Environment call from U-mode +
Environment call from S-mode +
_Reserved_ +
Instruction page fault +
Load page fault +
_Reserved_ +
Store/AMO page fault +
_Reserved_ +
_Designated for custom use_ +
_Reserved_ +
_Designated for custom use_ +
_Reserved_
|===
==== Supervisor Trap Value (`stval`) Register
The `stval` register is an SXLEN-bit read-write register formatted as
shown in <<stvalreg>>. When a trap is taken into
S-mode, `stval` is written with exception-specific information to assist
software in handling the trap. Otherwise, `stval` is never written by
the implementation, though it may be explicitly written by software. The
hardware platform will specify which exceptions must set `stval`
informatively and which may unconditionally set it to zero.
If `stval` is written with a nonzero value when a breakpoint,
address-misaligned, access-fault, or page-fault exception occurs on an
instruction fetch, load, or store, then `stval` will contain the
faulting virtual address.
[[stvalreg]]
.Supervisor Trap Value register.
include::images/bytefield/stvalreg.edn[]
If `stval` is written with a nonzero value when a misaligned load or
store causes an access-fault or page-fault exception, then `stval` will
contain the virtual address of the portion of the access that caused the
fault.
If `stval` is written with a nonzero value when an instruction
access-fault or page-fault exception occurs on a system with
variable-length instructions, then `stval` will contain the virtual
address of the portion of the instruction that caused the fault, while
`sepc` will point to the beginning of the instruction.
The `stval` register can optionally also be used to return the faulting
instruction bits on an illegal-instruction exception (`sepc` points to
the faulting instruction in memory). If `stval` is written with a
nonzero value when an illegal-instruction exception occurs, then `stval`
will contain the shortest of:
* the actual faulting instruction
* the first ILEN bits of the faulting instruction
* the first SXLEN bits of the faulting instruction
The value loaded into `stval` on an illegal-instruction exception is
right-justified and all unused upper bits are cleared to zero.
For other traps, `stval` is set to zero, but a future standard may
redefine `stval`’s setting for other traps.
`stval` is a *WARL* register that must be able to hold all valid virtual
addresses and the value 0. It need not be capable of holding all
possible invalid addresses. Prior to writing `stval`, implementations
may convert an invalid address into some other invalid address that
`stval` is capable of holding. If the feature to return the faulting
instruction bits is implemented, `stval` must also be able to hold all
values less than latexmath:[$2^N$], where latexmath:[$N$] is the smaller
of SXLEN and ILEN.
==== Supervisor Environment Configuration Register (`senvcfg`)
The `senvcfg` CSR is an SXLEN-bit read/write register, formatted as
shown in <<senvcfg>>, that controls certain
characteristics of the U-mode execution environment.
[[senvcfg]]
.Supervisor environment configuration register (`senvcfg`).
include::images/bytefield/senvcfg.edn[]
If bit FIOM (Fence of I/O implies Memory) is set to one in `senvcfg`,
FENCE instructions executed in U-mode are modified so the requirement to
order accesses to device I/O implies also the requirement to order main
memory accesses. <<senvcfg-FIOM>> details the modified
interpretation of FENCE instruction bits PI, PO, SI, and SO in U-mode
when FIOM=1.
Similarly, for U-mode when FIOM=1, if an atomic instruction that
accesses a region ordered as device I/O has its _aq_ and/or _rl_ bit
set, then that instruction is ordered as though it accesses both device
I/O and memory.
If `satp`.MODE is read-only zero (always Bare), the implementation may
make FIOM read-only zero.
[[senvcfg-FIOM]]
.Modified interpretation of FENCE predecessor and successor sets in U-mode when FIOM=1.
[%autowidth,float="center",align="center",cols="^,<",options="header"]
|===
|Instruction bit |Meaning when set
|PI +
PO
|Predecessor device input and memory reads (PR implied) +
Predecessor device output and memory writes (PW implied)
|SI +
SO
|Successor device input and memory reads (SR implied) +
Successor device output and memory writes (SW implied)
|===
[NOTE]
====
Bit FIOM exists for a specific circumstance when an I/O device is being
emulated for U-mode and both of the following are true: (a) the emulated
device has a memory buffer that should be I/O space but is actually
mapped to main memory via address translation, and (b) multiple physical
harts are involved in accessing this emulated device from U-mode.
A hypervisor running in S-mode without the benefit of the hypervisor
extension of <<hypervisor>> may need to emulate
a device for U-mode if paravirtualization cannot be employed. If the
same hypervisor provides a virtual machine (VM) with multiple virtual
harts, mapped one-to-one to real harts, then multiple harts may
concurrently access the emulated device, perhaps because: (a) the guest
OS within the VM assigns device interrupt handling to one hart while the
device is also accessed by a different hart outside of an interrupt
handler, or (b) control of the device (or partial control) is being
migrated from one hart to another, such as for interrupt load balancing
within the VM. For such cases, guest software within the VM is expected
to properly coordinate access to the (emulated) device across multiple
harts using mutex locks and/or interprocessor interrupts as usual, which
in part entails executing I/O fences. But those I/O fences may not be
sufficient if some of the device ``I/O'' is actually main memory,
unknown to the guest. Setting FIOM=1 modifies those fences (and all
other I/O fences executed in U-mode) to include main memory, too.
Software can always avoid the need to set FIOM by never using main
memory to emulate a device memory buffer that should be I/O space.
However, this choice usually requires trapping all U-mode accesses to
the emulated buffer, which might have a noticeable impact on
performance. The alternative offered by FIOM is sufficiently inexpensive
to implement that we consider it worth supporting even if only rarely
enabled.
====
The definition of the CBZE field will be furnished by the forthcoming
Zicboz extension. Its allocation within `senvcfg` may change prior to
the ratification of that extension.
The definitions of the CBCFE and CBIE fields will be furnished by the
forthcoming Zicbom extension. Their allocations within `senvcfg` may
change prior to the ratification of that extension.
[[satp]]
==== Supervisor Address Translation and Protection (`satp`) Register
The `satp` register is an SXLEN-bit read/write register, formatted as
shown in <<rv32satp>> for SXLEN=32 and
<<rv64satp>> for SXLEN=64, which controls
supervisor-mode address translation and protection. This register holds
the physical page number (PPN) of the root page table, i.e., its
supervisor physical address divided by ; an address space identifier
(ASID), which facilitates address-translation fences on a
per-address-space basis; and the MODE field, which selects the current
address-translation scheme. Further details on the access to this
register are described in <<virt-control>>.
[[rv32satp]]
.Supervisor address translation and protection register `satp` when SXLEN=32.
include::images/bytefield/rv32satp.edn[]
[NOTE]
====
Storing a PPN in `satp`, rather than a physical address, supports a
physical address space larger than 4 GiB for RV32.
The `satp`.PPN field might not be capable of holding all physical page
numbers. Some platform standards might place constraints on the values
`satp`.PPN may assume, e.g., by requiring that all physical page numbers
corresponding to main memory be representable.
====
[[rv64satp]]
.Supervisor address translation and protection register `satp` when SXLEN=64, for MODE values Bare, Sv39, Sv38, and Sv57.
include::images/bytefield/rv64satp.edn[]
[NOTE]
====
We store the ASID and the page table base address in the same CSR to
allow the pair to be changed atomically on a context switch. Swapping
them non-atomically could pollute the old virtual address space with new
translations, or vice-versa. This approach also slightly reduces the
cost of a context switch.
====
<<satp-mode>> shows the encodings of the MODE field when
SXLEN=32 and SXLEN=64. When MODE=Bare, supervisor virtual addresses are
equal to supervisor physical addresses, and there is no additional
memory protection beyond the physical memory protection scheme described
in <<pmp>>. To select MODE=Bare, software must write
zero to the remaining fields of `satp` (bits 30–0 when SXLEN=32, or bits
59–0 when SXLEN=64). Attempting to select MODE=Bare with a nonzero
pattern in the remaining fields has an UNSPECIFIED effect on the value that the
remaining fields assume and an UNSPECIFIED effect on address translation and
protection behavior.
When SXLEN=32, the `satp` encodings corresponding to MODE=Bare and
ASID[8:7]=3 are designated for custom use, whereas the encodings
corresponding to MODE=Bare and ASID[8:7]≠3 are reserved
for future standard use. When SXLEN=64, all `satp` encodings
corresponding to MODE=Bare are reserved for future standard use.
[NOTE]
====
Version 1.11 of this standard stated that the remaining fields in `satp`
had no effect when MODE=Bare. Making these fields reserved facilitates
future definition of additional translation and protection modes,
particularly in RV32, for which all patterns of the existing MODE field
have already been allocated.
====
When SXLEN=32, the only other valid setting for MODE is Sv32, a paged
virtual-memory scheme described in <<sv32>>.
When SXLEN=64, three paged virtual-memory schemes are defined: Sv39,
Sv48, and Sv57, described in <<sv39>>, <<sv48>>,
and <<sv57>>, respectively. One additional scheme, Sv64, will be
defined in a later version of this specification. The remaining MODE
settings are reserved for future use and may define different
interpretations of the other fields in `satp`.
Implementations are not required to support all MODE settings, and if
`satp` is written with an unsupported MODE, the entire write has no
effect; no fields in `satp` are modified.
The number of ASID bits is and may be zero. The number of implemented
ASID bits, termed _ASIDLEN_, may be determined by writing one to every
bit position in the ASID field, then reading back the value in `satp` to
see which bit positions in the ASID field hold a one. The
least-significant bits of ASID are implemented first: that is, if
ASIDLEN latexmath:[$>$] 0, ASID[ASIDLEN-1:0] is writable. The maximal
value of ASIDLEN, termed ASIDMAX, is 9 for Sv32 or 16 for Sv39, Sv48,
and Sv57.
<<<
[[satp-mode]]
.Encoding of `satp` MODE field.
[%autowidth,float="center",align="center",cols="^,^,<",options="header"]
|===
3+|SXLEN=32
|Value |Name |Description
|0 +
1
|Bare +
Sv32
|No translation or protection. +
Page-based 32-bit virtual addressing (see <<sv32>>).
3+|*SXLEN=64*
|Value |Name |Description
|0 +
1-7 +
8 +
9 +
10 +
11 +
12-13 +
14-15
|Bare +
- +
Sv39 +
Sv48 +
Sv57 +
Sv64 +
- +
-
|No translation or protection. +
_Reserved for standard use_ +
Page-based 39-bit virtual addressing (see <<sv39>>). +
Page-based 48-bit virtual addressing (see <<sv48>>). +
Page-based 57-bit virtual addressing (see <<sv57>>). +
_Reserved for page-based 64-bit virtual addressing._ +
_Reserved for standard use_ +
_Designated for custom use_
|===
[NOTE]
====
For many applications, the choice of page size has a substantial
performance impact. A large page size increases TLB reach and loosens
the associativity constraints on virtually indexed, physically tagged
caches. At the same time, large pages exacerbate internal fragmentation,
wasting physical memory and possibly cache capacity.
After much deliberation, we have settled on a conventional page size of
4 KiB for both RV32 and RV64. We expect this decision to ease the
porting of low-level runtime software and device drivers. The TLB reach
problem is ameliorated by transparent superpage support in modern
operating systems. cite:[transparent-superpages] Additionally, multi-level TLB hierarchies are quite
inexpensive relative to the multi-level cache hierarchies whose address
space they map.
====
The `satp` register is considered _active_ when the effective privilege
mode is S-mode or U-mode. Executions of the address-translation
algorithm may only begin using a given value of `satp` when `satp` is
active.
[NOTE]
====
Translations that began while `satp` was active are not required to
complete or terminate when `satp` is no longer active, unless an
SFENCE.VMA instruction matching the address and ASID is executed. The
SFENCE.VMA instruction must be used to ensure that updates to the
address-translation data structures are observed by subsequent implicit
reads to those structures by a hart.
====
Note that writing `satp` does not imply any ordering constraints between
page-table updates and subsequent address translations, nor does it
imply any invalidation of address-translation caches. If the new address
space’s page tables have been modified, or if an ASID is reused, it may
be necessary to execute an SFENCE.VMA instruction (see
<<sfence.vma>>) after, or in some cases before, writing
`satp`.
[NOTE]
====
Not imposing upon implementations to flush address-translation caches
upon `satp` writes reduces the cost of context switches, provided a
sufficiently large ASID space.
====
=== Supervisor Instructions
In addition to the SRET instruction defined in <<otherpriv>>, one new supervisor-level instruction is provided.
[[sfence.vma]]
==== Supervisor Memory-Management Fence Instruction
include::images/wavedrom/sfencevma.edn[]
The supervisor memory-management fence instruction SFENCE.VMA is used to
synchronize updates to in-memory memory-management data structures with
current execution. Instruction execution causes implicit reads and
writes to these data structures; however, these implicit references are
ordinarily not ordered with respect to explicit loads and stores.
Executing an SFENCE.VMA instruction guarantees that any previous stores
already visible to the current RISC-V hart are ordered before certain
implicit references by subsequent instructions in that hart to the
memory-management data structures. The specific set of operations
ordered by SFENCE.VMA is determined by _rs1_ and _rs2_, as described
below. SFENCE.VMA is also used to invalidate entries in the
address-translation cache associated with a hart (see <<sv32algorithm>>). Further details on the behavior of this instruction are described in <<virt-control>> and <<pmp-vmem>>.
[NOTE]
====
The SFENCE.VMA is used to flush any local hardware caches related to
address translation. It is specified as a fence rather than a TLB flush
to provide cleaner semantics with respect to which instructions are
affected by the flush operation and to support a wider variety of
dynamic caching structures and memory-management schemes. SFENCE.VMA is
also used by higher privilege levels to synchronize page table writes
and the address translation hardware.
====
SFENCE.VMA orders only the local hart’s implicit references to the
memory-management data structures.
[NOTE]
====
Consequently, other harts must be notified separately when the
memory-management data structures have been modified. One approach is to
use 1) a local data fence to ensure local writes are visible globally,
then 2) an interprocessor interrupt to the other thread, then 3) a local
SFENCE.VMA in the interrupt handler of the remote thread, and finally 4)
signal back to originating thread that operation is complete. This is,
of course, the RISC-V analog to a TLB shootdown.
====
For the common case that the translation data structures have only been
modified for a single address mapping (i.e., one page or superpage),
_rs1_ can specify a virtual address within that mapping to effect a
translation fence for that mapping only. Furthermore, for the common
case that the translation data structures have only been modified for a
single address-space identifier, _rs2_ can specify the address space.
The behavior of SFENCE.VMA depends on _rs1_ and _rs2_ as follows:
* If __rs1__=`x0` and __rs2__=`x0`, the fence orders all reads and writes
made to any level of the page tables, for all address spaces. The fence
also invalidates all address-translation cache entries, for all address
spaces.
* If __rs1__=`x0` and __rs2__≠``x0``, the fence orders all
reads and writes made to any level of the page tables, but only for the
address space identified by integer register _rs2_. Accesses to _global_
mappings (see <<translation>>) are not ordered. The
fence also invalidates all address-translation cache entries matching
the address space identified by integer register _rs2_, except for
entries containing global mappings.
* If __rs1__≠``x0`` and __rs2__=`x0`, the fence orders only
reads and writes made to leaf page table entries corresponding to the
virtual address in __rs1__, for all address spaces. The fence also
invalidates all address-translation cache entries that contain leaf page
table entries corresponding to the virtual address in _rs1_, for all
address spaces.
* If __rs1__≠``x0`` and __rs2__≠``x0``, the
fence orders only reads and writes made to leaf page table entries
corresponding to the virtual address in _rs1_, for the address space
identified by integer register _rs2_. Accesses to global mappings are
not ordered. The fence also invalidates all address-translation cache
entries that contain leaf page table entries corresponding to the
virtual address in _rs1_ and that match the address space identified by
integer register _rs2_, except for entries containing global mappings.
If the value held in _rs1_ is not a valid virtual address, then the
SFENCE.VMA instruction has no effect. No exception is raised in this
case.
When __rs2__≠``x0``, bits SXLEN-1:ASIDMAX of the value held
in _rs2_ are reserved for future standard use. Until their use is
defined by a standard extension, they should be zeroed by software and
ignored by current implementations. Furthermore, if
ASIDLEN<ASIDMAX, the implementation shall ignore bits
ASIDMAX-1:ASIDLEN of the value held in _rs2_.
[NOTE]
====
It is always legal to over-fence, e.g., by fencing only based on a
subset of the bits in _rs1_ and/or _rs2_, and/or by simply treating all
SFENCE.VMA instructions as having _rs1_=`x0` and/or _rs2_=`x0`. For
example, simpler implementations can ignore the virtual address in _rs1_
and the ASID value in _rs2_ and always perform a global fence. The
choice not to raise an exception when an invalid virtual address is held
in _rs1_ facilitates this type of simplification.
====
An implicit read of the memory-management data structures may return any
translation for an address that was valid at any time since the most
recent SFENCE.VMA that subsumes that address. The ordering implied by
SFENCE.VMA does not place implicit reads and writes to the
memory-management data structures into the global memory order in a way
that interacts cleanly with the standard RVWMO ordering rules. In
particular, even though an SFENCE.VMA orders prior explicit accesses
before subsequent implicit accesses, and those implicit accesses are
ordered before their associated explicit accesses, SFENCE.VMA does not
necessarily place prior explicit accesses before subsequent explicit
accesses in the global memory order. These implicit loads also need not
otherwise obey normal program order semantics with respect to prior
loads or stores to the same address.
[NOTE]
====
A consequence of this specification is that an implementation may use
any translation for an address that was valid at any time since the most
recent SFENCE.VMA that subsumes that address. In particular, if a leaf
PTE is modified but a subsuming SFENCE.VMA is not executed, either the
old translation or the new translation will be used, but the choice is
unpredictable. The behavior is otherwise well-defined.
In a conventional TLB design, it is possible for multiple entries to
match a single address if, for example, a page is upgraded to a
superpage without first clearing the original non-leaf PTE’s valid bit
and executing an SFENCE.VMA with __rs1__=`x0`. In this case, a similar
remark applies: it is unpredictable whether the old non-leaf PTE or the
new leaf PTE is used, but the behavior is otherwise well defined.
Another consequence of this specification is that it is generally unsafe
to update a PTE using a set of stores of a width less than the width of
the PTE, as it is legal for the implementation to read the PTE at any
time, including when only some of the partial stores have taken effect.
***
This specification permits the caching of PTEs whose V (Valid) bit is
clear. Operating systems must be written to cope with this possibility,
but implementers are reminded that eagerly caching invalid PTEs will
reduce performance by causing additional page faults.
====
Implementations must only perform implicit reads of the translation data
structures pointed to by the current contents of the `satp` register or
a subsequent valid (V=1) translation data structure entry, and must only
raise exceptions for implicit accesses that are generated as a result of
instruction execution, not those that are performed speculatively.
Changes to the `sstatus` fields SUM and MXR take effect immediately,
without the need to execute an SFENCE.VMA instruction. Changing
`satp`.MODE from Bare to other modes and vice versa also takes effect
immediately, without the need to execute an SFENCE.VMA instruction.
Likewise, changes to `satp`.ASID take effect immediately.
[TIP]
====
The following common situations typically require executing an
SFENCE.VMA instruction:
* When software recycles an ASID (i.e., reassociates it with a different
page table), it should _first_ change `satp` to point to the new page
table using the recycled ASID, _then_ execute SFENCE.VMA with __rs1__=`x0`
and _rs2_ set to the recycled ASID. Alternatively, software can execute
the same SFENCE.VMA instruction while a different ASID is loaded into
`satp`, provided the next time `satp` is loaded with the recycled ASID,
it is simultaneously loaded with the new page table.
* If the implementation does not provide ASIDs, or software chooses to
always use ASID 0, then after every `satp` write, software should
execute SFENCE.VMA with __rs1__=`x0`. In the common case that no global
translations have been modified, _rs2_ should be set to a register other
than `x0` but which contains the value zero, so that global translations