# C3D: Mitigating the NUMA Bottleneck via Coherent DRAM Caches

## **Detailed Protocol Specification**

Cheng-Chieh Huang<sup>1</sup>, Rakesh Kumar<sup>1</sup>, Marco Elver<sup>1</sup>, Boris Grot<sup>1</sup> and Vijay Nagarajan<sup>1</sup>

<sup>1</sup>University of Edinburgh

August 12, 2016

#### 1 Introduction

This document provides a detailed description of the C3D cache coherence protocol [HKE<sup>+</sup>16].

#### 2 Controllers

The following summarizes the notation and assumptions used in the transition tables:

- Format of receive message action: source? Message
- Format of send message action: destination! Message
- Only the message types Data and PutX carry data.
- No ordering constraints on interconnect.
- The sharer list or owner entry per line b.so points to sockets. The functions llc(socket) and dram(socket) are used to refer to the LLC or DRAM cache controller within the socket respectively; self refers to the controller's socket.

#### 2.1 LLC Controller

Table 1 shows the last level on-chip cache controller.

## 2.2 DRAM Cache Controller

Table 2 shows the DRAM cache controller.

## 2.3 Directory Controller

Table 3 shows the main memory directory controller.

Table 1: LLC Cache Controller

|      | Read                                                        | Write                                                          | Replacement                                                  | src?Data                                     | src?Downgrade                                 | src?Inv                           | src?PutAck                        | src?UpgradeAck              |
|------|-------------------------------------------------------------|----------------------------------------------------------------|--------------------------------------------------------------|----------------------------------------------|-----------------------------------------------|-----------------------------------|-----------------------------------|-----------------------------|
| I    | $\begin{array}{l} dram(self) ! GetS; \\ \to IS \end{array}$ | $\begin{array}{c} dram(self) ! GetX; \\ \to IM \end{array}$    |                                                              |                                              |                                               | src!InvAck;                       | src!InvAck;                       |                             |
| IS   | stall;                                                      | stall;                                                         | stall;                                                       | copy data; hit; $\rightarrow S$              |                                               | $src!InvAck;$ $\rightarrow IS\_I$ | $src!InvAck;$ $\rightarrow IS\_I$ |                             |
| IS_I | stall;                                                      | stall;                                                         | stall;                                                       | $\begin{array}{c} hit; \\ \to I \end{array}$ |                                               | src!InvAck;                       |                                   |                             |
| IM   | stall;                                                      | stall;                                                         | stall;                                                       | copy data; hit;<br>dir!DataAck;<br>→ M       | $src!DowngradeAck;$ $\rightarrow IM\_S$       | src!InvAck;                       | src!InvAck;                       |                             |
| IM_S | stall;                                                      | stall;                                                         | stall;                                                       | copy data; hit;<br>dram(self)!PutX;<br>→ MS; |                                               | src!InvAck;                       |                                   |                             |
| S    | hit;                                                        | $\begin{array}{c} dram(self) ! Upgrade; \\ \to SM \end{array}$ | $\rightarrow$ I                                              |                                              |                                               | src!InvAck;                       |                                   |                             |
| SM   | stall;                                                      | stall;                                                         | stall;                                                       | copy data; hit;<br>dir!DataAck;<br>→ M       |                                               | src!InvAck;                       |                                   | hit;<br>dir!DataAck;<br>→ M |
| М    | hit;                                                        | hit;                                                           | $\begin{array}{l} dram(self) ! PutX; \\ \to MI; \end{array}$ |                                              | dram(self)!PutX;<br>src!DowngradeAck;<br>→ MS | dir!PutX;<br>→ I                  |                                   |                             |
| MI   | stall;                                                      | stall;                                                         | stall;                                                       |                                              | src!DowngradeAck;                             | $\rightarrow$ I                   | $\rightarrow$ I                   |                             |
| MS   | stall;                                                      | stall;                                                         | stall;                                                       |                                              |                                               | $src!InvAck;$ $\rightarrow MI$    | $\rightarrow$ S                   |                             |

Table 2: DRAM Cache Controller

| Table 2. Diami cache contone |                 |                  |                  |                    |                  |                 |                 |                       |  |
|------------------------------|-----------------|------------------|------------------|--------------------|------------------|-----------------|-----------------|-----------------------|--|
|                              | Replacement     | src?GetS         | src?GetX         | src?Upgrade        | src?Inv          | src?Data        | src?PutX        | src?UpgradeAck        |  |
| I                            |                 | dir!GetS;        | dir!GetX;        | dir!Upgrade;       | llc(self)!Inv;   |                 | // forward      |                       |  |
|                              |                 | $\rightarrow$ IS | $\rightarrow$ IM | $\rightarrow$ SM_U |                  |                 | dir!PutX;       |                       |  |
| IS                           | stall;          |                  |                  |                    | llc(self)!Inv;   | copy data;      | // forward      |                       |  |
|                              |                 |                  |                  |                    | → IS_Í           | llc(self)!Data; | dir!PutX;       |                       |  |
|                              |                 |                  |                  |                    | _                | $\rightarrow$ S |                 |                       |  |
| IS_I                         | stall;          |                  |                  |                    | src!InvAck;      | llc(self)!Data; |                 |                       |  |
| _                            | ·               |                  |                  |                    |                  | $\rightarrow$ 1 |                 |                       |  |
| IM                           | stall;          |                  |                  |                    | src!InvAck;      | copy data;      | // forward      |                       |  |
|                              |                 |                  |                  |                    |                  | llc(self)!Data; | dir!PutX;       |                       |  |
|                              |                 |                  |                  |                    |                  | $\rightarrow$ M |                 |                       |  |
| S                            | $\rightarrow$ I | src!Data;        | dir!GetX;        | dir!Upgrade;       | llc(self)!Inv;   |                 |                 |                       |  |
|                              |                 |                  | $\rightarrow$ SM | → SM_U             | $\rightarrow$ 1  |                 |                 |                       |  |
| SM                           | stall;          |                  |                  |                    | llc(self)!Inv;   | copy data;      |                 |                       |  |
|                              |                 |                  |                  |                    | $\rightarrow$ IM | Ilc(self)!Data; |                 |                       |  |
|                              |                 |                  |                  |                    |                  | $\rightarrow$ M |                 |                       |  |
| SM_U                         | stall;          |                  |                  |                    | llc(self)!Inv;   | copy data;      |                 | llc(self)!UpgradeAck; |  |
|                              |                 |                  |                  |                    | → IM             | llc(self)!Data; |                 | $\rightarrow M$       |  |
|                              |                 |                  |                  |                    |                  | $\rightarrow M$ |                 |                       |  |
| М                            | $\rightarrow$ I |                  |                  |                    | llc(self)!Inv;   |                 | copy data;      |                       |  |
|                              |                 |                  |                  |                    | $\rightarrow$ 1  |                 | dir!PutX;       |                       |  |
|                              |                 |                  |                  |                    |                  |                 | $\rightarrow S$ |                       |  |
| M                            | $\rightarrow$ I |                  |                  |                    |                  |                 | dir!PutX;       |                       |  |

Table 3: Directory Controller

|         | Replacement                        | src?GetS                                                   | src?GetX                                                                                                                                                                                                                                        | src?Upgrade                                                                                                                                                                                                                                                                                                                           | src?PutX                                                                 | src?DataAck     | src?DowngradeAck                                                              | src?InvAck                                                                                                                           |
|---------|------------------------------------|------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------|-----------------|-------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------|
|         | Терівестісті                       | src!Data;                                                  | $dst \leftarrow all - \{src\};$                                                                                                                                                                                                                 | $  dst \leftarrow all - \{src\};$                                                                                                                                                                                                                                                                                                     | SIC:T ULX                                                                | 31C: DataACK    | 31C: DOWNGIAUCACK                                                             | SIC: HIVACK                                                                                                                          |
| '       |                                    | 0.0.2 4.4,                                                 | dram(dst)!Inv;                                                                                                                                                                                                                                  | dram(dst)!Inv;                                                                                                                                                                                                                                                                                                                        |                                                                          |                 |                                                                               |                                                                                                                                      |
|         |                                    |                                                            | $b.so \leftarrow \{src\};$                                                                                                                                                                                                                      | $b.so \leftarrow \{src\};$                                                                                                                                                                                                                                                                                                            |                                                                          |                 |                                                                               |                                                                                                                                      |
|         |                                    |                                                            | $  tbe.need\_acks \leftarrow  dst ;$                                                                                                                                                                                                            | $tbe.need\_acks \leftarrow  dst ;$                                                                                                                                                                                                                                                                                                    |                                                                          |                 |                                                                               |                                                                                                                                      |
|         |                                    |                                                            | → IM_IA                                                                                                                                                                                                                                         | $\rightarrow$ IM_I $\overline{A}$                                                                                                                                                                                                                                                                                                     |                                                                          |                 |                                                                               |                                                                                                                                      |
| IM_IA   | stall;                             | stall;                                                     | stall;                                                                                                                                                                                                                                          | stall;                                                                                                                                                                                                                                                                                                                                |                                                                          |                 |                                                                               | tbe.need_acks; if tbe.need_acks = 0 then dram(b.so)!Data; → IM_DA endif                                                              |
| IM_DA   | stall;                             | stall;                                                     | stall;                                                                                                                                                                                                                                          | stall;                                                                                                                                                                                                                                                                                                                                | copy data;<br>Ilc(src)!PutAck;<br>→ MI                                   | $\rightarrow$ M |                                                                               |                                                                                                                                      |
| S       | $\rightarrow$ I                    | b.so ← b.so $\cup$ {src};<br>src!Data;                     | $\begin{array}{l} \text{dst} \leftarrow \text{b.so} - \{\text{src}\};\\ \text{dram}(\text{dst})! \text{Inv};\\ \text{b.so} \leftarrow \{\text{src}\};\\ \text{tbe.need\_acks} \leftarrow  \text{dst} ;\\ \rightarrow \text{SM\_IA} \end{array}$ | $\begin{array}{l} \text{dst} \leftarrow \text{b.so} - \{\text{src}\};\\ \text{dram}(\text{dst})! \text{Inv};\\ \text{b.so} \leftarrow \{\text{src}\};\\ \text{tbe.need\_acks} \leftarrow  \text{dst} ;\\ \text{if src} \in \text{dst then}\\ \rightarrow \text{SM\_U\_IA else}\\ \rightarrow \text{SM\_IA}  \text{endif} \end{array}$ |                                                                          |                 |                                                                               |                                                                                                                                      |
| SM_IA   | stall;                             | stall;                                                     | stall;                                                                                                                                                                                                                                          | stall;                                                                                                                                                                                                                                                                                                                                |                                                                          |                 |                                                                               | $ \begin{tabular}{ll} tbe.need\_acks;\\ if tbe.need\_acks = 0 then\\ dram(b.so)!Data;\\ $\rightarrow$ SM\_DA\\ endif \end{tabular} $ |
| SM_U_IA | stall;                             | stall;                                                     | stall;                                                                                                                                                                                                                                          | stall;                                                                                                                                                                                                                                                                                                                                |                                                                          |                 |                                                                               | tbe.need_acks; if tbe.need_acks = 0 then dram(b.so)!UpgradeAck; → SM_DA endif                                                        |
| SM_DA   | stall;                             | stall;                                                     | stall;                                                                                                                                                                                                                                          | stall;                                                                                                                                                                                                                                                                                                                                | copy data;<br>Ilc(src)!PutAck;<br>→ MI                                   | $\rightarrow$ M |                                                                               |                                                                                                                                      |
| М       | $dram(b.so)!Inv;$ $\rightarrow MI$ | llc(b.so)!Downgrade;<br>b.so $\leftarrow$ b.so $\cup$ src; | $ \frac{dram(b.so)!Inv;}{b.so \leftarrow \{src\};} $                                                                                                                                                                                            | dram(b.so)!Inv;<br>$b.so \leftarrow \{src\};$                                                                                                                                                                                                                                                                                         | copy data;<br>Ilc(src)!PutAck;                                           |                 |                                                                               |                                                                                                                                      |
|         |                                    | tbe.req $\leftarrow$ src; $\rightarrow$ MS2                | → MM_P                                                                                                                                                                                                                                          | → MM_P                                                                                                                                                                                                                                                                                                                                | $\rightarrow$ 1                                                          |                 |                                                                               |                                                                                                                                      |
| MM_P    | stall;                             | stall;                                                     | stall;                                                                                                                                                                                                                                          | stall;                                                                                                                                                                                                                                                                                                                                | // forward<br>dram(b.so)!Data;<br>→ MM_DA                                |                 |                                                                               |                                                                                                                                      |
| MM_DA   | stall;                             | stall;                                                     | stall;                                                                                                                                                                                                                                          | stall;                                                                                                                                                                                                                                                                                                                                | copy data;<br>Ilc(src)!PutAck;<br>→ MI                                   | $\rightarrow$ M |                                                                               |                                                                                                                                      |
| MS2     | stall;                             | stall;                                                     | stall;                                                                                                                                                                                                                                          | stall;                                                                                                                                                                                                                                                                                                                                | copy data;<br>→ MS1                                                      |                 | → MS1                                                                         |                                                                                                                                      |
| MS1     | stall;                             | stall;                                                     | stall;                                                                                                                                                                                                                                          | stall;                                                                                                                                                                                                                                                                                                                                | copy data;<br>dram(tbe.req)!Data;<br>Ilc(src)!PutAck;<br>$\rightarrow S$ |                 | $\begin{array}{c} dram(tbe.req)!Data;\\ Ilc(src)!PutAck;\\ \to S \end{array}$ |                                                                                                                                      |
| MI      | stall;                             | stall;                                                     | stall;                                                                                                                                                                                                                                          | stall;                                                                                                                                                                                                                                                                                                                                | copy data; $\rightarrow$ I                                               | $\rightarrow$ I |                                                                               | $\rightarrow$ I                                                                                                                      |
|         |                                    |                                                            |                                                                                                                                                                                                                                                 |                                                                                                                                                                                                                                                                                                                                       |                                                                          |                 |                                                                               |                                                                                                                                      |

### References

[HKE+16] Cheng-Chieh Huang, Rakesh Kumar, Marco Elver, Boris Grot, and Vijay Nagarajan. C3D: Mitigating the NUMA Bottleneck via Coherent DRAM Caches. In *IEEE/ACM International Symposium on Microarchitecture (MICRO)*, October 2016.