

# Learn the architecture - Providing protection for complex software

Version 2.0

Non-Confidential

Copyright  $\ \ \,$  2020, 2022–2023 Arm Limited (or its affiliates). All rights reserved.

**Issue 01** 102433\_0200\_01\_en



#### Learn the architecture - Providing protection for complex software

Copyright © 2020, 2022–2023 Arm Limited (or its affiliates). All rights reserved.

#### Release information

#### **Document history**

| Issue   | Date             | Confidentiality  | Change                                        |
|---------|------------------|------------------|-----------------------------------------------|
| 0100-01 | 8 January 2020   | Non-Confidential | Initial release                               |
| 0100-02 | 30 June 2022     | Non-Confidential | Minor bug fix in Return-oriented programming. |
| 0200-01 | 11 December 2023 | Non-Confidential | Update for 2023 architecture extensions       |

#### **Proprietary Notice**

This document is protected by copyright and other related rights and the practice or implementation of the information contained in this document may be protected by one or more patents or pending patent applications. No part of this document may be reproduced in any form by any means without the express prior written permission of Arm. No license, express or implied, by estoppel or otherwise to any intellectual property rights is granted by this document unless specifically stated.

Your access to the information in this document is conditional upon your acceptance that you will not use or permit others to use the information for the purposes of determining whether implementations infringe any third party patents.

THIS DOCUMENT IS PROVIDED "AS IS". ARM PROVIDES NO REPRESENTATIONS AND NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE WITH RESPECT TO THE DOCUMENT. For the avoidance of doubt, Arm makes no representation with respect to, and has undertaken no analysis to identify or understand the scope and content of, patents, copyrights, trade secrets, or other rights.

This document may include technical inaccuracies or typographical errors.

TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL ARM BE LIABLE FOR ANY DAMAGES, INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING OUT OF ANY USE OF THIS DOCUMENT, EVEN IF ARM HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

This document consists solely of commercial items. You shall be responsible for ensuring that any use, duplication or disclosure of this document complies fully with any relevant export laws and regulations to assure that this document or any portion thereof is not exported, directly or indirectly, in violation of such export laws. Use of the word "partner" in reference to Arm's customers is not intended to create or refer to any partnership relationship with any other company. Arm may make changes to this document at any time and without notice.

This document may be translated into other languages for convenience, and you agree that if there is any conflict between the English version of this document and any translation, the terms of the English version of the Agreement shall prevail.

The Arm corporate logo and words marked with ® or ™ are registered trademarks or trademarks of Arm Limited (or its affiliates) in the US and/or elsewhere. All rights reserved. Other brands and names mentioned in this document may be the trademarks of their respective owners. Please follow Arm's trademark usage guidelines at https://www.arm.com/company/policies/trademarks.

Copyright © 2020, 2022–2023 Arm Limited (or its affiliates). All rights reserved.

Arm Limited. Company 02557590 registered in England.

110 Fulbourn Road, Cambridge, England CB1 9NJ.

(LES-PRE-20349|version 21.0)

#### **Confidentiality Status**

This document is Non-Confidential. The right to use, copy and disclose this document may be subject to license restrictions in accordance with the terms of the agreement entered into by Arm and the party that Arm delivered this document to.

Unrestricted Access is an Arm internal classification.

#### **Product Status**

The information in this document is Final, that is for a developed product.

#### Feedback

Arm welcomes feedback on this product and its documentation. To provide feedback on the product, create a ticket on https://support.developer.arm.com

To provide feedback on the document, fill the following survey: https://developer.arm.com/documentation-feedback-survey.

### Inclusive language commitment

Arm values inclusive communities. Arm recognizes that we and our industry have used language that can be offensive. Arm strives to lead the industry and create change.

We believe that this document contains no offensive language. To report offensive language in this document, email terms@arm.com.

# **Contents**

| 1. Overview                                 | 6  |
|---------------------------------------------|----|
| 2. Stack smashing and execution permissions | 7  |
| 3. Return-oriented programming              | 10 |
| 4. Jump-oriented programming                | 19 |
| 5. Applying these techniques to real code   | 23 |
| 6. Detecting memory safety violations       | 26 |
| 7. Check your knowledge                     | 31 |
| 8. Related information                      | 32 |
| 9. Next steps                               | 33 |

### 1. Overview

This guide introduces some common forms of attacks that are used against complex software stacks. The guide also examines the features, including pointer authentication, branch target identification and memory tagging, that are provided in Armv8-A to help mitigate against such attacks. The guide is an overview of these features, and not a technical deep dive. You can use the Related information section to explore some topics in this guide in more detail.

At the end of this guide, you will be able to:

- Define the terms Return-Oriented Programming (ROP) and Jump-Oriented Programming (JOP).
- List the features in Armv8-A that help protect against ROP and JOP attacks.
- Describe how memory tagging can be used to detect memory safety violations, like buffer overruns or use-after-free.

#### Before you begin

We assume that you are familiar with the Arm memory model. If you are not, you might want to first read our Memory model and Memory management guides.

If you are not familiar with security, we also recommend that you read our Introduction to security guide before reading this guide.

# 2. Stack smashing and execution permissions

One of the oldest forms of attack is stack smashing. There are many types of stack smashing. The basic form of stack smashing involves malicious software writing new opcodes into memory and then attempting to execute the written memory. This process is illustrated here:

Figure 2-1: Stack smashing process



Typically, the memory that is used to launch the attack is stack memory. This is where the name stack smashing comes from. To protect against stack smashing, modern processor architectures, like the Arm architecture, have execution permissions. In AArch64, the main controls are execution permission bits in the translation tables. If we focus only on ELO and EL1:

#### **UXN**

User (ELO) Execute-never

#### **PXN**

Privileged Execute-never

Setting one of these bits marks the page as not executable. This means that any attempt to branch to an address within that page triggers an exception, in the form of a Permission fault. There are separate Privileged and Unprivileged bits. This is because application code needs to be executable in user space (ELO) but should never be executed with kernel permissions (EL1/EL2). Another form of attack involves abusing system calls to try to get privileged code to call code from user memory.

The following diagram shows a simplified, but typical, virtual address space for an application that is running under an Operating System (OS), with the expected execution permissions:

Figure 2-2: Virtual address space





By convention, kernel space is at the top of memory and user space is at the bottom of memory. Although this is not required by the architecture, it is the most common layout and the examples in this guide follow this convention.

The architecture also provides control bits in the system control register, SCTLR\_ELx, to make all writable addresses non-executable. Enabling this control makes locations like the stack non-executable.

A location that is writable at ELO is never executable at EL1, regardless of how the PXN and SCTLR\_ELx controls are configured.

Together, these controls can provide robust protection against the kinds of attack that we have described. The translation table attributes and write controls can block execution from any location that the malicious code could write to, as you can see in the following diagram:

Figure 2-3: Protection against malicious code stack



### 3. Return-oriented programming

Features like the execution permission that we described have made it increasingly difficult to execute arbitrary code. This means that attackers use other approaches like Return Oriented Programming (ROP). ROP takes advantage of the scale of the software stack in many modern systems. An attacker analyzes the software in a system, looking for gadgets. A gadget is a useful fragment of code, usually ending with a function return, for example:

```
ADD x0, x1, x2
RET
```

This code provides a gadget for adding two registers together. By scanning all the available libraries, an attacker can build a library of gadgets. These gadgets are existing legal code, within executable regions. This means that they are not caught by protections like execution permissions. The attacker strings together a chain of gadgets, forming what is effectively a new program, made up of existing code fragments. You can see an example in the following diagram:

Figure 3-1: Gadget attack code



Any library that is available in the address space for the process is a potential source of gadgets. For example, the C library contains many functions, each offering potential gadgets. With so many gadgets available, statistically enough gadgets are available to form any arbitrary new program.

Some compilers are even designed to compile to gadgets, rather than assembler. An ROP attack is effective, because it is made up of existing legal code, so it is not trapped by execution permissions or checks on executing from writable memory.

It is time-consuming for an attacker to find gadgets and create the sequence that is necessary to produce a new program. However, this process can be automated and can be reused to attack multiple systems. Address Space Randomization (ASLR) can help prevent the practice of automated and multiple attacks.

#### Pointer authentication

Armv8.3-A introduces the option of pointer authentication, FEAT\_PAC. Pointer authentication can mitigate against ROP attacks.

Pointer authentication takes advantage of the fact that pointers are stored in a 64-bit format, but not all those bits are needed to represent the address. The following diagram shows the virtual address space layout:

Figure 3-2: Virtual address space



You can see that there are potentially two  $2^{52}$  byte address ranges, one at the top of the address space, and one at the bottom of the address space:

Bottom Range: 0x0000\_0000\_0000 - 0x000F\_FFFF FFFF FFFF

Any address that falls outside of both ranges is always invalid and results in a fault if accessed.



Armv8.0-A supported up to 48 bits per range. This increased to 52 bits with the introduction of Armv8.1-A (FEAT\_LVA) and 56 bits with the introduction of Arm9.3-A (FEAT\_LVA3). This guide assumes a maximum of 52 bits per region.

You can see that any valid virtual address will have its top 12 bits as 0x000 or 0xFFF. When pointer authentication is enabled, the upper bits are used to store a signature and are not treated as part of the address. This signature is referred to as a Pointer Authentication Code (PAC).

The PAC uses the top bits of the pointer. Bit[55] is reserved to indicate whether the top or bottom region is being accessed. This is illustrated here:

Figure 3-3: Pointer Authentication Code



The exact number of bits that are available for the PAC depends on the configured size of the virtual address space, and on whether tagged pointers are enabled. The smaller the virtual address space, the more bits that are available for the PAC.

To protect against ROP attacks, at the start of a function the return address in the LR is signed. This means that a PAC is added in the upper order bits of the register. Before returning, the return address is authenticated using the PAC. If the check fails, an exception is generated when the address is used for a branch. The following diagram shows an example:

Figure 3-4: Protection against ROP attacks



This change makes ROP attacks much harder to launch. This is because, to form the chain of gadgets, the attacker needs to know the location of those gadgets, and possess correctly signed pointers to those locations. To get a signed pointer it would need access to a signing gadget.

#### How is the PAC formed?

The architecture provides five 128-bit keys. Each key is stored in a pair of 64-bit System registers:

- Two keys, A and B, for instruction pointers
- Two keys, A and B, for data pointers
- One key for general use

The registers that store these keys are only accessible at EL1 and above.

For data and instruction addresses, the instruction used to create and check the PAC specifies whether the A key or the B key is used. For a particular pointer, the instruction that generates the PAC and the instruction that authenticates the PAC must agree on which key to use.

The signature is formed from the address itself, the key, and a modifier, as shown in the following diagram:

Figure 3-5: Key and modifier authentication



The architecture allows different implementations, for example from different vendors, to use different encryption algorithms. The recommended algorithm is QARMA, which is required by SBSA level 5. ID AA64ISAR1 EL1 reports which algorithm is supported on a specific processor.

The limited size of the PAC means that the strength of the signature is potentially low, depending on the size of the configured virtual address size. However, the keys are typically of limited life span. Each running application can use different keys, and a given application can be given different keys each time that it is launched. When forming a chain of gadgets, the attacker must get every pointer correct, otherwise an exception will be raised.

#### What is the modifier?

The instructions that generate and authenticate the PAC specify how to construct the modifier.

The options for constructing the modifier are as follows:

- The modifier value is zero.
- The modifier value is the value of a single general purpose register.
- The modifier value is the value of the stack pointer, sp.
- From Armv9.4-A, if FEAT\_PAuth\_LR is implemented, the modifier value is constructed by concatenating bits [36:5] from the program counter, pc, and bits [35:4] from the stack pointer, sp.

The modifier value must be the same on entry and exit if the function is called correctly. For example, the Stack Pointer (sp) can have a different value every time that a function is called but will have the same value at the start and at the end of a given call.

Using the sp as a modifier gives you a PAC that is only valid for that call of the function. This is because the sp will probably be in a different location on future calls.

Combining the sp and pc offers even greater protection, tying together both the caller and the callee in the return address to uniquely associate the PAC with a particular instance of function usage.

#### How is the PAC checked?

Before use, the pointer must be authenticated. The authentication process is shown in this diagram:

Figure 3-6: PAC check



The authentication operation regenerates the PAC and compares it with the value that is stored in the pointer. If authentication succeeds, a pointer without the PAC is returned. If authentication fails, an invalid pointer is returned. This means that an exception is raised if the pointer is used.

#### **New instructions**

To support pointer authentication, new instructions are added to A64. Let's look at some examples of the operations that are related to the instruction pointers:

PACIXSP - Sign LR using SP as the modifier.

PACIXZ - Sign LR using O as the modifier.

PACIx - Sign Xn using a general-purpose register as modifier.

AUTIXSP - Authenticate LR using SP as the modifier.

AUTIXZ - Authenticate LR using O as the modifier.

AUTIX - Authenticate xn using a general-purpose register as modifier.

BRAx - Indirect branch with pointer authentication.

BLRAx - Indirect branch with link, with pointer authentication.

RETAX - Function return with pointer authentication.

ERETAX - Exception return with pointer authentication.

The 2023 extensions add new instructions that let you use a modifier formed by combining sp and pc:

PACIXSPPC - Sign LR using SP+PC as the modifier.

AUTIXSPPC - Authenticate LR using SP and an address as the modifier. The address can be specified by either an immediate program label or a general-purpose register.

The following code example demonstrates the PC-relative immediate form of AUTIXSPPC:

```
myFunc:
    PACIASPPC
    ...
    AUTIASPPC <label>
    RET
```

The following code example demonstrates the register form of AUTIXSPPC:

```
myFunc:
    PACIASPPC
    ...
    ...
ADDR <Xn>, <label>
AUTIASPPC <Xn>
RET
```

In each case, replace x with A or B to select the wanted key.

The preceding list is not complete, but it shows the type of operations that are available. You can refer to the Arm ARM for a complete list and detailed descriptions.

#### Use of the NOP space

Some of the new authentication instructions are in the **NOP** space. Applications or libraries that protect themselves with these NOP-space instructions can run on older processors without pointer authentication support. Although the older processors will not benefit from the protections, this can be very useful in heterogeneous systems, as you can see in the following diagram:

Figure 3-7: Use of the





To provide backwards compatibility, this program uses separate instructions to authenticate the LR and return. Ideally the combined authenticate and return instructions, RETAX, would be used. However, the RETAX instruction does not use the NOP instruction space. This means that it is not compatible with a processor that does not support authentication.

#### **Enabling pointer authentication**

Pointer authentication is controlled by Exception level using SCTLR\_ELX. SCTLR\_ELX uses separate controls for instruction checking and for data checking:

- Enix Enables instruction pointer authentication using key x.
- Endx Enables data pointer authentication using key x.

#### **Guarded Control Stack (GCS)**

Pointer authentication stores a PAC signature in the unused upper bits of an address. However, the number of bits that are available for the PAC depends on the size of the virtual address space. This means that the strength of the PAC decreases as the size of the configured virtual address size increases. This is called the memory cost of PAC.

A smaller virtual address space might only use 40 bits for the address, leaving 23 bits for the PAC, giving good protection (64 - 40 address bits - 1 VA range select bit). A large scale server system with a larger virtual address space might use 52 bits for the address. This only leaves 11 bits for the PAC, resulting in less effective protection.

A Guarded Control Stack (GCS) provides additional protection against some forms of ROP attacks, helping to mitigate the reduced protection from PAC for larger virtual address spaces.

A GCS is a protected region of virtual address space allocated by software. When the processor executes a Branch with Link instruction, such as BL, the return address is pushed onto the GCS as well as being written into the Link Register (LR). On a procedure return, the latest stored return address is popped from the GCS. The processor either compares the popped value with the LR, or uses the popped value directly. The following diagram illustrates this process:

Figure 3-8: Guarded Control Stack

#### Execution stream Guarded PC Instruction Control Stack 1000 BL 2000 STR LR, [SP] 2004 ADD 1004 2008 200C 3000 STR LR, [SP] 3004 SUB LDR LR, [SP] 3008 300C RET 200C LDR LR, [SP] 2010 **RET SUB** 1004

There are times when the software needs to make manual adjustments to the control stack, for example to handle some long jumps. To enable this, the architecture provides specialist instructions for maintaining the GCS: gcspushx and gcspopx.

To prevent accidental or malicious changes to the GCS, a specific Stage 1 permission exists to allows reads by software, but restricts writes to either GCSPUSH instructions or as a side-effect of executing a BL.

GCS also provides an efficient mechanism for profiling tools to get a copy of the current call stack, without needing to unwind the main stack.

### 4. Jump-oriented programming

Jump-Oriented Programming (JOP), is similar to Return-Oriented Programming (ROP). In an ROP attack, the software stack is scanned for gadgets that can be strung together to form a new program. ROP attacks look for sequences that end in a function return (RET). In contrast, JOP attacks target sequences that end in other forms of indirect (absolute) branches, like function pointers or case statements. You can see an example here:

Figure 4-1: Jump oriented programming



The attacker exploits the fact that BLR or BR instructions can target any executable address, and not just the addresses that are entry points defined by the compiler or developer. This means that the instructions can be hijacked to string gadgets together.

#### **Branch target instructions**

To help protect against JOP attacks, Armv8.5-A introduced Branch Target Instructions (BTIs). BTIs are also called landing pads. The processor can be configured so that indirect branches (BR and

BLR) can only allow target landing pad instructions. If the target of an indirect branch is not a landing pad, a Branch Target Exception is generated as you can see here:

Figure 4-2: Branch target exception



The use of landing pads significantly reduces the number of possible targets for an indirect branch and makes it harder to string chains of gadgets together to form a new program.

#### **Enabling branch target checking**

Support for landing pads is enabled for each page, using a new bit (gp bit) in the translation tables. Per-page controls allows a filesystem to contain a mixture of landing pad-protected code and legacy code, which is illustrated here:

Figure 4-3: Landing pad



The encoding for BTI instructions, like the pointer-authentication instructions, is allocated within the **NOP** space. BTI-protected code can still function when run on older processors that do not support BTI, or when gp=0, although without the additional protection.

#### How BTI is implemented

PSTATE includes a field, BTYPE, that records the branch type. On executing an indirect branch, the type of indirect branch is recorded in PSTATE.BTYPE. The following list shows the value BTYPE takes for different branch instructions:

- BTYPE=11: BR, BRAA, BRAB, BRAAZ, BRABZ with any register other than X16 or X17
- BTYPE=10: blr, blraa, blrab, blraaz, blrabz
- BTYPE=01: BR, BRAA, BRAB, BRAAZ, BRABZ With X16 Or X17

Executing any other type of instruction, including direct branches, causes BTYPE to be set to boo.

Why store two bits? A simple implementation could record whether an indirect branch was in process or not. However, recording the type of indirect branches further limits the possibilities of finding gadgets. The syntax of the BTI instruction includes an argument, specifying which types of indirect branch it can be targeted by:

| Argument | Accepted PSTATE.BTYPE | Use case                                         |
|----------|-----------------------|--------------------------------------------------|
| BTI C    | 0b10 and 0b01         | Function calls                                   |
| вті ј    | 0b11 and 0b01         | Non-function call branches, like case-statements |
| BTI jc   | All                   | All                                              |

When BTYPE!=00, the processor checks whether the instruction being targeted is a landing. If it is not a landing, or if it is the wrong type of indirect branch, an exception is generated.

#### X16 and X17

Why does the architecture distinguish between indirect branches that use x16 or x17 and those that do not?

x16 and x17 have special significance in the Procedure Call Standard used by Arm. They are referred to as the intra-procedure call corruptible registers, or IPO or IP1. They can be used by static linkers for inserting branch-range extending veneers, or by dynamic linkers for handling jump tables.

This is relevant to us because it means that a function might be entered directly from the caller using  $\mathtt{BL}$  or  $\mathtt{BLR}$  or indirectly via linker generated code using  $\mathtt{x16}$  or  $\mathtt{x17}$ . Therefore, the landing pad for a function entry needs to be able to accept both.

#### Function entry and return

The function return instructions, RET, RETAA and RETAB, are also a form of indirect branch. If these instructions were required to target a BTI, every function call would need to be followed by a BTI. This would cause undesirable code bloat. Also, the pointer authentication feature already provides a way to protect function returns.

For function entry, the pointer signing instructions PACIXSP and PACIXZ act like landing pads. These instructions are like BTI instructions. This means that when the landing pad feature is used pointer authentication, there is no need to start every function with a BTI. This also avoids code bloat.

### 5. Applying these techniques to real code

In Return-oriented programming (ROP) and Jump-oriented programming (JOP), we explored features that Arm introduced to the Arm architecture to mitigate against JOP-style and ROP-style attacks. Now we will look at the compiler support for these features, and how enabling these protections affects the number of that are gadgets available to attackers.

In this section, we refer to these versions of Arm Compiler 6 and Gnu C Compiler (GCC):

- Arm Compiler 6.11
- GCC 9.1

Compiler support for these features continues to evolve. Precise figures will vary based on the versions that you use.

#### Build an image with pointer authentication and branch target identification

For Arm Compiler 6, GCC and LLVM generation of pointer authentication and BTI-enabled code is controlled by:

mbranch-protection=<protection>

Where can be any combination of:

- pac-ret{+leaf+b-key}
  - pac-ret enables return address signing for non-leaf functions using the A-key.
  - +leaf increases the scope of return address signing to include leaf functions.
  - +b-key uses B-key instructions to sign addresses instead of A-key instructions.
- bti protects code using Branch Target Identification.
- standard turns on all types of branch protection.
  - Currently standard implies pac-ret+bti.
- none turns off all types of branch protection.
  - This is the default if the -mbranch-protection flag is not provided.

Whether the combined or NOP-compatible instructions are generated depends on the architecture version that the code is built for. When building for Armv8.3-A, or later, the compiler will use the combined operations. When building for Armv8.2-A, or earlier, it will use the **NOP** compatible instructions. For example:

| -march=armv8.2-a -mbranch-protection=standard | -march=armv8.3-a -mbranch-protection=standard |
|-----------------------------------------------|-----------------------------------------------|
| enableInt                                     | enableInt                                     |
| 0x00000000: d503233f PACIASP                  | 0x00000000: d503233f PACIASP                  |
|                                               |                                               |
|                                               |                                               |
| 0x00000350: d50323bf AUTIASP                  | 0x00000350: d65f0bff RETAA                    |

| -march=armv8.2-a -mbranch-protection=standard | -march=armv8.3-a -mbranch-protection=standard |
|-----------------------------------------------|-----------------------------------------------|
| 0x00000354: 65f03c0 RET                       |                                               |



The function used in this example was taken from the example that accompanies our guide Arm CoreLink Generic Interrupt Controller v3 and v4 Overview and built with Arm Compiler 6.

The compiler generates the instructions that are required to perform signing and authentication. Generating and configuring keys is the responsibility of supervising software, typically an operating system.

#### Reduction in available gadgets

GLIBC is a large library that is used in C or C++ applications. This means that it is a good target for attackers, and a good place for us to see the effect of applying the measures to mitigate attacks. Arm used this tool to measure the number of available gadgets and modified the tool to fit our requirements.

The following graph shows the number of gadgets before and after the compiler options were enabled:



Figure 5-1: ROP and JOP gadgets gets in GLIBC graph

By enabling both pointer authentication and branch target identification, the number of gadgets that are available reduces by 97.65%.

#### Effect on code size

The protection described in the preceding section is helpful but comes at a cost. One obvious cost is the increase in code size. Here is an analysis of this cost:

Figure 5-2: Change in image size from enabling ROP and JOP mitigations



The graph shows that the code size effect on GLIBC is minimal. Even though turning on both the mitigations leads to a 2.9% code size increase, this increase is smaller when compiling with - march=armv8.3-a. Compiling for Armv8.3-A allows the compiler to use fused authenticate and return instructions. This means that, for Armv8.3-A, the code size increase is only 1.6%.

## 6. Detecting memory safety violations

Some classes of vulnerability that are related to memory usage can be difficult to detect and test for. Two examples of this are:

- Use after free Applications continue to use allocated memory after releasing it, or after it is out of scope. This is a violation of temporal memory safety.
- Buffer overrun, or overflow Going beyond the bounds of an allocated structure or buffer, usually because of insufficient bounds checking. This is a violation of spatial memory safety.

Armv8.5-A introduces the Memory Tagging Extension (MTE), also called memory coloring. Memory tagging makes detecting memory safety violations easier and more efficient.



One of the first Internet-spread computer worms was the Internet Worm in 1988, which exploited a buffer overrun. More than thirty years later, we are still seeing attacks that exploit this type of programming bug.

#### Memory tagging

Regions of address space are allocated a tag, or lock. The upper bits of a virtual address are also used to store a tag, or key. On a memory access, the processor compares the key in the issued address with the lock that is assigned to that physical location. Here is an example:

Figure 6-1: Memory tagging



In the preceding diagram, two regions have been allocated, using tags 9 and 2.

For the first two pointers, the tag matches that of the accessed location. You can think of this as the key fitting the lock. Accesses using these pointers would succeed as normal.

However, for the final pointer the tag does not match that of the accessed location. This will be captured as a tag check failure. We will look at what happens in the case later.

Let's apply this mechanism to the problems that we identified earlier, starting with buffer overruns, as you can see in this diagram:

Figure 6-2: Buffer overruns memory tagging



On the call to malloc() the C library will allocate the memory and assign a tag for the buffer. The returned pointer will include the allocated tag. If software using the pointer goes beyond the limits of the buffer, the tag comparison check will fail. This failure will allow us to detect the overrun.

Similarly, for use-after-free, on the call to malloc() the buffer gets allocated in memory and assigned a tag value. The pointer that is returned by malloc() includes this tag. Later the buffer is released. The C library might change the tag when the memory is released or might wait until the memory is reused for some other purpose. If software continues to use the old pointer, it will have the old tag value and the tag check will catch it.

Figure 6-3: Old tag value check





The total number of possible tags is small. Therefore, the same tag value might be used for several different regions over time, or at the same time. However, with careful tag allocation, sequential overruns or underruns can be detected. Wild accesses are statistically likely to be caught.

#### **Tags**

To work with tags, the architecture gains several new instructions, including:

- IRG Generates a random tag value and inserts it to a pointer
- stg Sets the tag value for a block of memory
- stzg Sets the tag value for a block of memory, and zeros corresponding memory location
  - If the allocator is going to zero the allocated memory, stzg offers better performance than separate zeroing and tagging.
- LDG Reads the tag value for a block of memory

Tags are four bits and are stored in two places:

- Key Stored in bits [59:56] of a pointer
  - This requires pointer tagging to be enabled. We will discuss this later in the guide.
- Lock A new address space, the tag address space, is added. The tag address space records the tag to a memory region.

On allocating a block of memory, software allocates a tag either randomly, using IRG, or using a custom algorithm. Each tag covers 16 bytes. This means that software needs to execute STZG or STG multiple times to cover all the 16-byte blocks within the allocated memory.

#### Tagged and untagged addresses

Not all memory accesses require tag checking. We describe an access as Checked or Unchecked, depending on whether tag checking is carried out.

The following accesses are always Unchecked:

Instruction fetches

- Translation table walks, including hardware updates of the Access Flag or Dirty state
- Data cache maintenance operations
- Accesses to the Allocation tags

For data accesses, a new memory attribute is added to indicate that accesses to this region should be Checked:

• MemAttr[] == 0xF0: Inner+Outer Write-Back Cacheable, Read or Write-Allocate, Tagged

Data accesses to a region that is marked as Tagged are classed as Checked, unless one of the following applies:

- TCR ELx.TBI==0
- The Logical tag (bits [59:56] of the virtual address) are b0000 or b1111.
- The load or store uses the SP as a base register with an immediate offset, or no offset
- It is a PC relative load.
- PSTATE.TCO==1

Data accesses to any region without the Tagged attribute are Unchecked.



Loads or stores using the stack pointer with an immediate offset can be statically checked at build time. This means that there is less benefit to checking with MTE. The same principle applies to PC-relative loads.

#### What happens when a comparison fails?

Let's discuss what happens when the tag comparison fails. The architecture makes the behavior of tag comparison failure configurable, controlled by SCTLR\_ELX.TCF, or SCTLR\_ELX.TCF0 for ELO:

- TCF==00 Tag comparison failures are ignored.
- TCF==01 Tag comparison failures are reported as a synchronous Data Abort. The address that caused the failure is reported in FAR\_ELx.
- TCF==10 Tag comparison failures are reported asynchronously by updating bits in TFSR\_ELX, or TFSRO\_EL1 for ELO. Optionally, checks can be synchronized on exception entry, to allow check failures to be attributed to a specific process.

The architecture provides both synchronous and asynchronous mechanisms to report tag comparison failures. Synchronous checking makes debugging simpler, because it allows you to identify the precise instruction and address that caused the failure. However, synchronous checking typically has a significant performance impact. This performance impact might be acceptable in a development environment but is too high for deployment.

Asynchronous checking is less costly. This means that asynchronous checking is potentially acceptable even on production systems. Although asynchronous checking provides less precise information on where the tag comparison failure occurred, it can provide some mitigation and be

used for profiling. Profiling allows problem areas to be identified, narrowing down the search area for bugs.

MTE feature names

Different aspects of memory tagging functionality are configured by several different feature names:

#### FEAT\_MTE

Instruction-only Memory Tagging Extension

#### FEAT\_MTE2

Memory Tagging Extension version 2

#### FEAT\_MTE3

MTE Asymmetric Fault Handling

#### FEAT\_MTE4

**Enhanced Memory Tagging Extension** 

#### FEAT\_MTE\_ASYM\_FAULT

Memory tagging asymmetric faults

#### FEAT\_MTE\_ASYNC

Memory Tagging asynchronous faulting

#### FEAT\_MTE\_CANONICAL\_TAGS

Canonical Tag checking for Untagged memory

#### FEAT\_MTE\_NO\_ADDRESS\_TAGS

Memory tagging with Address tagging disabled

#### FEAT MTE PERM

Allocation tag access permission

#### FEAT MTE STORE ONLY

Store-only Tag Checking

#### FEAT\_MTE\_TAGGED\_FAR

FAR ELx on a Tag Check Fault

#### Combining memory tagging and pointer authentication

Memory tagging and pointer authentication both use the upper bits of an address to store additional information about the pointer: a tag for memory tagging, and a PAC for pointer authentication.

Both technologies can be enabled at the same time. The size of the PAC is variable, depending on the size of the virtual address space. When memory tagging is enabled at the same time, there are fewer bits available for the PAC.

### 7. Check your knowledge

The following questions will help you test your knowledge.

### What is a gadget in Return oriented-programming (ROP) and Jump-oriented programming (JOP) attacks?

A gadget is a piece of existing code which ends in either a function return or an indirect (absolute) branch. In ROP and JOP attacks, these gadgets are chained together to form new programs.

#### Describe how Branch Target Identification (BTI) limits the scope of JOP attacks.

BTI restricts indirect branches to only targeting BTI instructions, or PACIXSP and PACIXZ instructions. This greatly reduces the number of possible targets and makes it difficult to form chains of gadgets.

#### When using pointer authentication, where is the signature of an address stored?

In the upper bits of the virtual address.

## In the Arm Memory Tagging Extension (MTE), what happens when the tag issued alongside a memory access does not match the allocation tag?

This situation is known as a tag checking failure. The behavior is configurable, via SCTLR\_ELX.TCF. The failure can be ignored, reported synchronously, or reported asynchronously.

#### How many bits are used to store the logical tag in the Arm memory tagging extension?

4 bits, but the values 0b0000 and 0b1111 are reserved.

### 8. Related information

Here are some resources related to information in this guide:

- Arm architecture and reference manuals: Find technical manuals and documentation relating to this guide and other similar topics
- Arm Community: Ask development questions, and find articles and blogs on specific topics from Arm experts
- Armv8-A Instruction Set Architecture: More information on the Procedure Call Standard
- The QARMA Block Cipher Family: Information on the QARMA cipher from the International Association for Cryptologic Research
- Control-Flow-Integrity: NSA paper on control flow protection. While not specific to the Arm architecture, the paper provides good background reading on the topic

#### **Detecting memory safety violations**

- Adopting the Arm Memory Tagging Extension in Android: Google blog about their use of memory tagging techniques to locate memory safety bugs
- Armv8.5-A Memory Tagging Extension: Arm white paper with a detailed description of the memory tagging technology

#### Pointer authentication

- Armv8.3-A pointer authentication support: The patches that added support for pointer authentication to the Linux kernel give information on how the technology is used in practice
- Code reuse attacks: the compiler story: Arm blog discussing the use of Pointer authentication and Branch Target

### 9. Next steps

This guide introduced features available in the Arm architecture which can provide robust defenses for complex software stacks. We have looked at the pointer authentication and branch target identification extensions, which can be used to defend against ROP and JOP attacks. We also looked at how memory tagging can be used to detect and locate potential vulnerabilities before they are exploited.

Next you might want to learn about Arm's TrustZone technology, another feature available in the Arm architecture.

The knowledge in this guide, and in the TrustZone guide, will be useful to you as you design your own complex systems. Enabling you to decide which combination of technologies you should deploy to protect different assets in the system.