Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Use PKU for isolation between LibOS and userspace program #243

Closed
Bonjourz opened this issue Apr 6, 2022 · 3 comments
Closed

[RFC] Use PKU for isolation between LibOS and userspace program #243

Bonjourz opened this issue Apr 6, 2022 · 3 comments
Assignees

Comments

@Bonjourz
Copy link
Contributor

Bonjourz commented Apr 6, 2022

Summary

  • Feature Name: Use PKU for isolation between LibOS and userspace program.
  • Start Date: 2022-03-22

Motivation

PKU (Protection Keys for Userspace) is a lightweight intra-process isolation mechanism for userspace (Ring 3) software. It incurs no overhead at runtime compared to Software Fault Isolation (SFI), and the memory access permission switch overhead is low .

Currently, NGO lacks the ability to isolate LibOS from userspace applications. Though userspace applications are considered benign in NGO, but is may be bug-prone. Potential illegal memory accesses may affect correctness of computation, even lead to the crash of the whole enclave.

It necessary to enforce the isolation in NGO, and leveraging PKU is a good choice.

High-Level Design

Pkey Allocation

We catagorize the secure memory inside enclave to two parts: LibOS + SGX SDK (trts part) and userspace application. We use two pkeys (pkey 0 and pkey 1) to tag the memory, as shown in FIG: pkey configuration

pkey_layout

Userspace applications use the reserved memory[1] provided by SGX SDK. We use pkey 1 to tag the resevered memory in our design, reasons are:

  • The deafult pkey is 0 in Linux. All the memory pages allocated by OS are tagged with pkey 0. SGX SDK and LibOS may request new memory pages from OS, and we don't want to extrally invoke pkey_mprotect() when we get new memory pages;
  • At the enclave boot time, the reserved memory is allocated once, and its size is fixed. So we only need to invoke pkey_mproctect() once after LibOS requests reserved memory from SGX SDK.

PKRU Configuration

PKRU value

We use 0x0 (PKRU_LibOS) for LibOS, and 0x55555551 (PKRU_User) for userspace applications in NGO as shown in the figure. With such configuration, LibOS is able to have access to applications, but not vice versa.

pkru_config

Why we use 0x0 for PKRU_LibOS:
In SGX SDK 2.16, the PKRU of LibOS and SDK is updated to 0x0 when performing an ecall (details). To keep consistency with it, we set PKRU_LibOS to 0x0 in our design.

There are three different value for PKRU in our design:

  • PKRU_Default = 0x55555554: The default PKRU value of a Linux process
  • PKRU_LibOS = 0x0: The PKRU value for LibOS and SGX SDK inside enclave
  • PKRU_User = 0x55555551: The PKRU value for userspace apps in NGO

PKRU switch

PKU use WRPKRU instruction to update the value of PKRU. Obviously, the PKRU switch happens at the boundary between LibOS and userspace apps. There are two types of interactions between them:

  • Synchronous: Userspace application performs LibOS syscall;
  • Asynchronous: An exception or interrupt occurs when userspace application executes

We will show design details at the next section.

Design Details

Synchronous Interactions

For Synchronous interactions, user apps invoke __syscall_entry_linux_abi to enter LibOS, and LibOS invokes __switch_to_user to return to user app. We update the PKU at the begining of __syscall_entry_linux_abi, as shown in
FIG: Update PKRU at the entry of syscall.

syscall_enter

We also restore the PKRU value at the end of __syscall_entry_linux_abi as shown in FIG: Update PKRU at the exit of syscall.
syscall_exit

Asynchronous Interactions

NGO uses sgx_register_exception_handler() and sgx_interrupt_init() provided by SGX SDK to handle exceptions generated from userspace apps and timer interrupt process. NGO supports HW and SW mode simultaneously. For Asynchronous interactions, their implementations are different, so we will dicscuss it separately:

HW Mode

Interruptions and exceptions result in Asynchronous Enclave Exits (AEX) when the CPU core is in SGX mode.

If the AEX results from interruptions, the OS will resume the enclave execution after handling it. The PKRU value should be saved and restored during it. This is done by setting the XFRM.PKRU by 1.

exception_hw

If the AEX results from exceptions, SGX SDK use the signal abstract provided by OS to handle it. The detailed workflows are shown in FIG: Exceptions workflow in HW Mode. NGO uses trts_handle_exception() and trts_handle_interrupt() to handle different types of signal generated from OS.
In addtion to setting GPRs and other states in SSA for LibOS to handle the them, we also need set to the PKRU in SSA to PKRU_LibOS.

The following shows the what is PKRU value in different steps when handling interruptions and excecptions:

  • ① --> ② : 0x5555554, as mentioned in this man page, "Each time a signal handler is invoked (including nested signals), the thread is temporarily given a new, default set of protection key rights that override the rights from the interrupted context". So PKRU is set to PKRU_Default by OS when sig_handler() is invoked;
  • ② --> ③ : 0x5555554, EENTER does not change PKRU value;
  • ③ --> ④ : 0x5555554, EEXIT does not change PKRU value;
  • ④ --> .. : 0x0, ERESUME restores the PKRU value recored in SSA XSAVE region.

SW Mode
(We decide to not support such feature in SW Mode, since it involves substantial modifications on SGX SDK, and we do not use SW Mode in production environment. But we still show its design as following)

Different from HW Mode, no AEX occurs in SW Mode. All the exceptions are handled by sig_handler_sim(). The workflows are showned in FIG: Exceptions workflow in SW Mode. There are a few steps for handling exceptions:

  • When an exception arrives, OS invokes sig_handler_sim_wrapper() in urt. As mentioned in [2], each time a signal handler is invoked, the PKRU is set to default value (PKRU_Default). But the stack prepared by OS may lay in the reserved memory, and the signal handler may access the memory region owned by SGX SDK, so we need to update the PKRU to PKRU_LibOS before we call signal_handler_sim()
  • signal_handler_sim() sets %rip = AEP and %rax = ERESUME in ucontext_t to ERESUME the program after handling the signal. Then, it enters the simulated enclave to set the %rip in SSA frame.
  • After Return from signal handler, the PKRU is restored to PKRU_User, but the AEP uses the stack provided by urts. AEP also needs to access data with pkey = 0. So we need to update the PKRU to PKRU_LibOS at the begin of AEP.
  • After ERESUME, the handler in LibOS (execption_entrypion or interrupt_entrypoint) is invoked. LibOS resumes the execution of userspace apps after handling it.

execption_sw

Other Issusses

  • pkey_alloc(), pkey_mprotect() and pkey_free() have not been added into the docker default seccomp profile (can be found here). We have raised an issue. Current solution is use our customized seccomp profile in docker run command, shown as following:
$ docker run --security-opt seccomp=<customized profile> .....

References

[1] https://community.intel.com/t5/Intel-Software-Guard-Extensions/SGX-Reserved-Memory/td-p/1279337

[2] http://manpages.ubuntu.com/manpages/bionic/man7/pkeys.7.html

@IceCY
Copy link

IceCY commented Jul 5, 2022

As I know, PKU relies on the pkey bits within the page table entry to enforce the data access policy. However, the page table is considered untrusted in SGX's threat model and can be manipulated by the attacker.

@Bonjourz
Copy link
Contributor Author

Bonjourz commented Jul 6, 2022

Hi @IceCY , PKU here is an option for users to enhance security. As mentioned before:

Though userspace applications are considered benign in NGO, but is may be bug-prone. Potential illegal memory accesses may affect correctness of computation, even lead to the crash of the whole enclave.

(LibOS's) userspace applications are still in our TCB, but they are bug-prone inevitably. We only use PKU for fault isolation enforce its robustness inside enclave, which can help developers uncover bugs beforehand.

OS has the full control of enclave's page table, and it is able to misconfigure the pkey in PTE without enclave's authentication, but such misconfigurations can only help OS to perform DoS attacks. However, DoS is not considered in SGX's threat model. If users worry that PKU feature in Occlum opens a new attack vector, they can turn off PKU feature in production environment. The PKU feature can be switched on/off easily by configuring occlum.json.

@Bonjourz
Copy link
Contributor Author

Bonjourz commented Jul 18, 2022

Occlum supports PKU feature now. NGO will support PKU after its SDK is upgraded to 2.16.

Another good news is moby accepts our PR: moby/moby#43490. (Add syscalls related to PKU in default docker's policy). Dockerd will support PKU related syscalls by default in version of 1.5.x and 1.6.x: containerd/containerd#7163.

Thanks for all the reviewers for helpful advice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants