# Lecture 12: Low-Latency Memory 2, Memory Controllers (performance, energy, quality of service)

*Notes*

<hr>

## 1 - Mechanism to reduce DRAM latency

### FLY-DRAM

**Observation:** DRAM timing errors (slow DRAM cells) are concentrated in certain DRAM regions

Flexible-LatencY (FLY) DRAM (a software-transparent design that reduces latency)

**Key idea:** 1) divide memory into regions of different latencies, 2) memory controller: use lower latency for regions without slow cells; higher latency for other regions

<u>Advantages</u>

- reduces latency significantly
- exploits significant within chip latency variation

<u>Disadvantages</u>

- Need to determine reliable operating latencies for different parts of a chip -> higher testing cost
- More complicated controllers

<hr>

###  Design-Induced Variation

Systematic variation in cell access time due to design architecture.

![design](images/design.png)

Solution is to profile the slowest latency of each memory cell to identify the inherently slow (design induced, localized error) and other slow cells (process variation, random error). Error-correcting code can be used for the latter.

#### DIVA Online Profiling

Combine error-correcting codes and online profiling to reliably reduce dRAM latency.

<u>Advantages:</u>

- Automatically finds the lowest reliable operating latency at system runtime
- Reduces latency more than prior methods
- Reduces latency at high temperatures as well

<u>Disadvantages:</u>

- Requires knowledge of inherently-slow regions
- Requires ECC
- Imposes overhead during runtime profiling
- More complicated memory controller (capable of profiling)

<hr>

## 2 - Data-Aware DRAM Latency for Deep Neural Network Inference

Deep Neural Networks evaluation is very DRAM-intensive (especially for large networks)

1. Some dat and layers in DNNs are very tolerant to errors
2. Reduce DRAM latency and voltage on such data and layers
3. While still achieving a user-specified DNN accuracyh target by making training DRAM-error-aware

### EDEN

![eden](images/eden.png)

<hr>

## 3 - Understanding and Exploiting the Voltage-Latency-Reliability Relationship

Reliable low voltage operations requires higher latency. DRAM requires longer latency to access data without errors at lower voltage. Memory banks are unequal towards voltage variation (spatial locality of errors). 

One can use linear regression model to predict performance of memory cells (Voltron) based on application's characteritics, predicting performance loss. The result helps predict the minimum voltage to achieve the wanted performances.

### Voltron

Advantages

- Can trade off between voltage and latency to improve energfy or performance
- Can exploit the high voltage margin present in DRAM

Disadvantages

- REquires finding the reliable operating voltage for each chip -> higher testing cost
- More complicated memory controller

<hr>

## 4 - Memory Controller

Long latency memories ahve similar characteristics that need to be controlled.

Many scheduling and control issues are similar in the design of controllers for other types of memory (besides DRAM)

### Flash Memory (SSD controllers)

Similar to DRAM memory controllers except: 1) they are flash memory specific, 2) they do much more: complex error correction, wear leveling, voltage optimization, garbage collection, page remapping.

Flash memory has a lot of latency

### DRAM types

DRAM has different types with differen interfaces optimized for different purposes (DDR, DDR2, DDR3, ..., low power for mobile, low latency, high bandwidth, 3d stacked)

Underlying microarchitectures are fundamentally the same. A flexible memory controller can support various DRAM types. This complicates the memory controller. Difficult to support all types (and upgrades) Analog interface is different from different DRAM types.

### DRAM Power Management

DRAM chips have power modes. Idea: When not accessing a chip power it down

Power states:

- Active (highest power)
- All banks idle
- Power-down
- Self-refresh (lower power)

Trade-offs:

- State transitions incur latency during which the chip cannot be accessed

### Self-Optimizing DRAM Controllers

Problem: DRAM controllers are difficult to design.

Dynamically adapt the memory scheduling policy via interaction with the system at runtime:
- Associate system states and actions (commands) with long term rewar values: each action at a given state leads to a learned reward. 
- Scheduled command with highest estimated long-term reward value in each state.
- Continuously update reward values for <state, action> pairs based on feedback from system

Need for:

- Continuous learning in the presenc eof changing environment

- Reduced designer burden in finding a good scheduling policy. Designer specified:
    - What system variables might be used
    - What target to optimzie, but not how to optimize
    
