

## Agenda

- SNIA NVM Programming Model
  - Block based I/O
  - Memory Mapped I/O
- Understanding power-failure atomicity
- Persistence domain
- Visibility versus Power Fail Atomicity



## The SNIA NVM Programming Model



## Don't Forget: The NVM Programming Model Starts With Standard Storage APIs file memory













## A Programmer's View (mapped files)

```
fd = open("/my/file", O_RDWR);
base = mmap(NULL, filesize,
                PROT READ PROT_WRITE, MAP_SHARED, fd, 0);
close(fd);
base[100] = 'X';
strcpy(base, "hello there");
*structp = *base_structp;
```

"Load/Store"



## How the Hardware Works

MOV

Not shown: MCA **ADR Failure Detection** 



Application **Program Initialization** Responsibilities: Flushing DAX mapped file? (OS provides info) yes no Use standard API for flushing (msync/fsync or FlushFileBuffers) **CPU** caches considered persistent? (ACPI provides info) yes no Stores considered persistent CLWB? when globally-visible (CPU\_ID provides info) yes no Use CLWB+SFENCE CLFLUSHOPT? for flushing (CPU\_ID provides info) no yes Use CLFLUSHOPT+SFENCE Use CLFLUSH for flushing for flushing



Application
Responsibilities:
Recovery





# Application Responsibilities: Consistency

```
open(...);

mmap(...);

strcpy(pmem, "Hello, World!");

msync(...);
Crash
```

#### Result

```
    "\0\0\0\0\0\0\0\0\0\0\0..."
    "Hello, W\0\0\0\0\0\0..."
    "\0\0\0\0\0\0\0\0\0\0\0"
    "Hello, \0\0\0\0\0\0\0\0\0"
    "Hello, World!\0"
```



# Application Responsibilities: Consistency

```
open(...);
mmap(...);
strcpy(pmem, "Hello, World!");
pmem_persist(pmem, 14);
Crash
```

```
pmem_persist() may be faster,
but is still not transactional
```

#### Result

```
    "\0\0\0\0\0\0\0\0\0\0\0..."
    "Hello, W\0\0\0\0\0\0..."
    "\0\0\0\0\0\0\0\0\0\0\0"
    "Hello, \0\0\0\0\0\0\0\0\0"
    "Hello, World!\0"
```



## Possible ways to access persistent memory

- No Code Changes Required
- Operates in Blocks like SSD/HDD
  - Traditional read/write
  - Works with Existing File Systems
  - Atomicity at block level
  - Block size configurable
    - 4K, 512B\*
- NVDIMM Driver required
  - Support starting Kernel 4.2
- Configured as Boot Device

\*Requires Linux

- Higher Endurance than Enterprise S\$Ds
- High Performance Block Storage
  - Low Latency, higher BW, High IOPs
- Storage API with DAX **Legacy Storage API Application** | mmap mmap Standard Load/ Standard Standard Raw Device File API Store File API Access **PMDK** File System ernel pmem-MMU Aware DevDAX Mappings **File System** BTT S **Block Atomicity Generic NVDIMM Driver** persistent memory
- Code changes may be required\*
- Bypasses file system page cache
- Requires DAX enabled file system
  - XFS, EXT4, NTFS
- No Kernel Code or interrupts
- No interrupts
- Fastest IO path possible



<sup>\*</sup> Code changes required for load/store direct access if the application does not already support this.

## Visibility versus Power Fail Atomicity

| Feature      | Atomicity                                                                           |
|--------------|-------------------------------------------------------------------------------------|
| Atomic Store | 8 byte powerfail atomicity<br>Much larger visibility atomicity                      |
| TSX          | Programmer must comprehend XABORT, cache flush can abort                            |
| LOCK CMPXCHG | Non-blocking algorithms depend on CAS, but CAS doesn't include flush to persistence |

Software must implement all atomicity beyond 8 bytes for pmem Transactions are fully up to software



## If caches are not flush on failure...

- Can't easily use compare\_and\_swap / fetch\_and\_add on Persistent Memory resident variables
- Can't use Hardware Transactional Memory (TSX) on Persistent Memory
- Must manually flush all data after writing

## If caches are flush on failure...

- No need to flush data
- But applications still need do their own transactions
  - Can use HTM/TSX for that, must include a software fallback in case hardware transaction fails



## PMEM reference counter – BAD example

```
struct my object {
        uint64 t refcount;
        type some resource;
                                                              No decision based on this value in this thread...
};
static void object ref(struct my object *object) { /* refcount visible = 0
                                                                                   persistent = 0 */
        __sync_fetch_and_add(&object->refcount, 1); /* visible = 1
                                                                                   persistent = ? */
        persist(&object->refcount, sizeof(object->refcount)); /* visible = 1
                                                                                   persistent = 1 */
                                                              Decision is made based on visible but not persistent value
static void object_deref(struct my_object *object) { /* visible = 1
                                                                                   persistent = 1 */
        if ( sync sub and fetch(&object->refcount, 1) == 0) {/* visible = 0
                                                                                   persistent = ? */
                delete some resource(object->some resource); /* visible = 0
                                                                                   persistent = ? */
        persist(&object->refcount, sizeof(object->refcount)); /* visible = 0
                                                                                   persistent = 0 */
```

## PMEM reference counter – GOOD example

```
struct my object {
        uint64 t refcount;
        type some_resource;
};
                                                               No decision based on this value in this thread...
static void object ref(struct my object *object) { /* refcount visible = 0
                                                                                    persistent = 0 */
        __sync_fetch_and_add(&object->refcount, 1); /*
                                                                                    persistent = ? */
                                                                   visible = 1
        persist(&object->refcount, sizeof(object->refcount)); /* visible = 1
                                                                                    persistent = 1 */
                                                               Decision is based on a known persistent value
static void object deref(struct my object *object) {
                                                                   visible = 1
                                                                                    persistent = 1 */
  if ( sync sub and fetch(&object->refcount, 1) == 0) { /*
                                                                                    persistent = ? */
                                                             visible = 0
        persist(&object->refcount, sizeof(object->refcount)); /* visible = 0
                                                                                    persistent = 0 */
        delete_some_resource(object->some_resource);
                                                                   visible = 0
                                                                                    persistent = 0 */
```

Atomic variables need to be read and flushed before making any decisions/calculations with them to ensure that the action is taken on a value that is known to have been persistent at some point.





