# Multi Core Architecture

Lab Assignment 3 MOESI Protocol

Group members,
Shriraam Mohan <<u>shriraam.mohan@mailbox.tu-berlin.de</u>>
Tamilselvan Shanmugam <<u>tamilselvan@mailbox.tu-berlin.de</u>>

Due date: 05 Jan 2015, Monday

# Implementation



**Fig.1 Processor requests** 



MOESI State Transition Diagram for Bus Requests

Fig.2 Snooping requests

We have defined four new bus ports as follows,

| sc_in_rv<32> | Port_CtoCData; |
|--------------|----------------|
| sc_in_rv<1>  | Port_doIHave;  |
| sc in rv<3>  | Port Provider; |

Port CtoCData: For cache to cache data transfer.

Port doIHave: For telling other caches that I have a copy.

Port Provider: To tell other cache that who is providing latest data in case of cache to cache

transfer.

#### **Results:**

## **Debug tracefile, 8 Processors**

| CPU    | Reads | RHit | RMiss | Writes | WHit | WMiss | Hitrate  |
|--------|-------|------|-------|--------|------|-------|----------|
| 0      | 6     | 0    | 6     | 4      | 0    | 4     | 0        |
| 1      | 34    | 0    | 34    | 22     | 0    | 22    | 0        |
| 2      | 35    | 0    | 35    | 43     | 0    | 43    | 0        |
| 3      | 39    | 2    | 37    | 46     | 2    | 44    | 4.705882 |
| 4      | 36    | 0    | 36    | 55     | 0    | 55    | 0        |
| 5      | 52    | 0    | 52    | 47     | 0    | 47    | 0        |
| 6      | 48    | 3    | 45    | 51     | 2    | 49    | 5.050505 |
| 7      | 42    | 1    | 41    | 55     | 5    | 50    | 6.185567 |
| Total: | 292   | 6    | 286   | 323    | 9    | 314   | 15       |
| Avg:   | 36    | 0    | 35    | 40     | 1    | 39    | 1        |

## 2. Main memory access rates

Bus had 286 reads and 1 upgrades and 314 readX.

A total of 601 accesses.

#### 3. Average time for bus acquisition

There were 50 waits for the bus.

Average waiting time per access: 0.083195 cycles.

- 4. There were 1 Cache to Cache transfers
- 5. Total execution time is 10204 ns, Avg per-mem-access time is 16.978369 ns
- 6. Probe Read: 5, Probe ReadX: 7

## **Debug tracefile, 4 Processors**

| CPU    | Reads | RHit | RMiss | Writes | WHit | WMiss | Hitrate  |
|--------|-------|------|-------|--------|------|-------|----------|
| 0      | 8     | 0    | 8     | 8      | 1    | 7     | 6.25     |
| 1      | 27    | 0    | 27    | 32     | 0    | 32    | 0        |
| 2      | 43    | 1    | 42    | 38     | 2    | 36    | 3.703704 |
| 3      | 45    | 0    | 45    | 42     | 0    | 42    | 0        |
| Total: | 123   | 1    | 122   | 120    | 3    | 117   | 9        |
| Avg:   | 30    | 0    | 30    | 30     | 0    | 29    | 2        |

## 2. Main memory access rates

Bus had 122 reads and 0 upgrades and 117 readX.

A total of 239 accesses.

## 3. Average time for bus acquisition

There were 2 waits for the bus.

Average waiting time per access: 0.008368 cycles.

- 4. There were 0 Cache to Cache transfers
- 5. Total execution time is 8976 ns, Avg per-mem-access time is 37.556485 ns

6. Probe Read: 0, Probe ReadX: 2

## **Debug tracefile, 2 Processors**

| CPU    | Reads | RHit | RMiss | Writes | WHit | WMiss | Hitrate  |
|--------|-------|------|-------|--------|------|-------|----------|
| 0      | 30    | 0    | 30    | 25     | 1    | 24    | 1.818182 |
| 1      | 32    | 0    | 32    | 35     | 1    | 34    | 1.492537 |
| Total: | 62    | 0    | 62    | 60     | 2    | 58    | 2        |
| Avg:   | 31    | 0    | 31    | 30     | 1    | 29    | 1        |

## 2. Main memory access rates

Bus had 62 reads and 0 upgrades and 58 readX.

A total of 120 accesses.

## 3. Average time for bus acquisition

There were 0 waits for the bus.

Average waiting time per access: 0.000000 cycles.

- 4. There were 0 Cache to Cache transfers
- 5. Total execution time is 6833 ns, Avg per-mem-access time is 56.941667 ns
- 6. Probe Read: 0, Probe ReadX: 0

## Debug tracefile, 1 Processor

| CPU | Reads | RHit | RMiss | Writes | WHit | WMiss | Hitrate |
|-----|-------|------|-------|--------|------|-------|---------|
| 0   | 40    | 19   | 21    | 60     | 26   | 34    | 45      |

#### 2. Main memory access rates

Bus had 21 reads and 0 upgrades and 34 readX.

A total of 55 accesses.

#### 3. Average time for bus acquisition

There were 0 waits for the bus.

Average waiting time per access: 0.000000 cycles.

- 4. There were 0 Cache to Cache transfers
- 5. Total execution time is 5755 ns, Avg per-mem-access time is 104.636364 ns
- 6. Probe Read: 0, Probe ReadX: 0

#### Random tracefile, 8 Processor

| CPU    | Reads  | RHit | RMiss  | Writes | WHit | WMiss  | Hitrate  |
|--------|--------|------|--------|--------|------|--------|----------|
| 0      | 4113   | 22   | 4091   | 4030   | 35   | 3995   | 0.699988 |
| 1      | 18318  | 0    | 18318  | 18440  | 0    | 18440  | 0        |
| 2      | 25834  | 21   | 25813  | 25470  | 31   | 25439  | 0.101357 |
| 3      | 29549  | 1275 | 28274  | 28801  | 1180 | 27621  | 4.207369 |
| 4      | 31366  | 2    | 31364  | 30548  | 1    | 30547  | 0.004845 |
| 5      | 32015  | 12   | 32003  | 31782  | 26   | 31756  | 0.059564 |
| 6      | 32327  | 1620 | 30707  | 32285  | 1646 | 30639  | 5.054789 |
| 7      | 32536  | 1681 | 30855  | 32584  | 1765 | 30819  | 5.291769 |
| Total: | 206058 | 4633 | 201425 | 203940 | 4684 | 199256 | 14       |
| Avg:   | 25757  | 579  | 25178  | 25492  | 585  | 24907  | 1        |

#### 2. Main memory access rates

Bus had 201425 reads and 54 upgrades and 199256 readX.

A total of 400735 accesses.

#### 3. Average time for bus acquisition

There were 3180 waits for the bus.

Average waiting time per access: 0.007935 cycles.

- 4. There were 3692 Cache to Cache transfers
- 5. Total execution time is 12805573 ns, Avg per-mem-access time is 31.955215 ns

6. Probe Read: 2145, Probe ReadX: 2254

## Random tracefile, 4 Processor

| CPU    | Reads | RHit | RMiss | Writes | WHit | WMiss | Hitrate  |
|--------|-------|------|-------|--------|------|-------|----------|
| 0      | 8039  | 85   | 7954  | 8251   | 89   | 8162  | 1.06814  |
| 1      | 20450 | 0    | 20450 | 20221  | 0    | 20221 | 0        |
| 2      | 26553 | 982  | 25571 | 26498  | 1056 | 25442 | 3.841586 |
| 3      | 29600 | 0    | 29600 | 29695  | 3    | 29692 | 0.005059 |
| Total: | 84642 | 1067 | 83575 | 84665  | 1148 | 83517 | 4        |
| Avg:   | 21160 | 266  | 20893 | 21166  | 287  | 20879 | 1        |

Avg: 21160 266 20893 21166 287 20879 1

#### 2. Main memory access rates

Bus had 83575 reads and 1 upgrades and 83517 readX.

A total of 167093 accesses.

## 3. Average time for bus acquisition

There were 757 waits for the bus.

Average waiting time per access: 0.004530 cycles.

- 4. There were 91 Cache to Cache transfers
- 5. Total execution time is 11932782 ns, Avg per-mem-access time is 71.414015 ns

6. Probe Read: 93, Probe ReadX: 85

#### Random tracefile, 2 Processor

| CPU    | Reads | RHit | RMiss | Writes | WHit | WMiss | Hitrate  |
|--------|-------|------|-------|--------|------|-------|----------|
| 0      | 16496 | 385  | 16111 | 16402  | 393  | 16009 | 2.364885 |
| 1      | 24556 | 865  | 23691 | 24804  | 877  | 23927 | 3.529173 |
| Total: | 41052 | 1250 | 39802 | 41206  | 1270 | 39936 | 5        |
| Avg:   | 20526 | 625  | 19901 | 20603  | 635  | 19968 | 2        |

#### 2. Main memory access rates

Bus had 39802 reads and 0 upgrades and 39936 readX.

A total of 79738 accesses.

## 3. Average time for bus acquisition

There were 137 waits for the bus.

Average waiting time per access: 0.001718 cycles.

- 4. There were 0 Cache to Cache transfers
- 5. Total execution time is 9583851 ns, Avg per-mem-access time is 120.191766 ns
- 6. Probe Read: 0, Probe ReadX: 0

## Random tracefile, 1 Processor

| CPU | Reads | RHit  | RMiss | Writes | WHit  | WMiss | Hitrate   |
|-----|-------|-------|-------|--------|-------|-------|-----------|
| 0   | 33031 | 18659 | 14372 | 32505  | 18306 | 14199 | 56.404114 |

## 2. Main memory access rates

Bus had 14372 reads and 0 upgrades and 14199 readX.

A total of 28571 accesses.

#### 3. Average time for bus acquisition

There were 0 waits for the bus.

Average waiting time per access: 0.000000 cycles.

- 4. There were 0 Cache to Cache transfers
- 5. Total execution time is 5771443 ns, Avg per-mem-access time is 202.003535 ns
- 6. Probe Read: 0, Probe ReadX: 0

#### FFTtracefile, 8 Processors

| CPU    | Reads | RHit | RMiss | Writes | WHit | WMiss | Hitrate   |
|--------|-------|------|-------|--------|------|-------|-----------|
| 0      | 4956  | 1117 | 3839  | 3154   | 740  | 2414  | 22.897657 |
| 1      | 3983  | 819  | 3164  | 2890   | 545  | 2345  | 19.845773 |
| 2      | 3997  | 882  | 3115  | 2703   | 498  | 2205  | 20.597015 |
| 3      | 4043  | 862  | 3181  | 2695   | 488  | 2207  | 20.035619 |
| 4      | 4055  | 854  | 3201  | 2662   | 480  | 2182  | 19.860057 |
| 5      | 4055  | 856  | 3199  | 2733   | 508  | 2225  | 20.094284 |
| 6      | 4074  | 829  | 3245  | 2710   | 485  | 2225  | 19.369104 |
| 7      | 4124  | 834  | 3290  | 2705   | 465  | 2240  | 19.021819 |
| Total: | 33287 | 7053 | 26234 | 22252  | 4209 | 18043 | 158       |
| Avg:   | 4160  | 881  | 3279  | 2781   | 526  | 2255  | 19        |

#### 2. Main memory access rates

Bus had 26234 reads and 57 upgrades and 18043 readX.

A total of 44334 accesses.

## 3. Average time for bus acquisition

There were 860 waits for the bus.

Average waiting time per access: 0.019398 cycles.

- 4. There were 2640 Cache to Cache transfers
- 5. Total execution time is 1111136 ns, Avg per-mem-access time is 25.062841 ns

6. Probe Read: 4475, Probe ReadX: 630

#### FFTtracefile, 4 Processors

| CPU    | Reads | RHit | RMiss | Writes | WHit | WMiss | Hitrate   |
|--------|-------|------|-------|--------|------|-------|-----------|
| 0      | 11274 | 1743 | 9531  | 6553   | 942  | 5611  | 15.061424 |
| 1      | 9417  | 1457 | 7960  | 6083   | 762  | 5321  | 14.316129 |
| 2      | 9834  | 1533 | 8301  | 5523   | 608  | 4915  | 13.941525 |
| 3      | 9881  | 1498 | 8383  | 5972   | 712  | 5260  | 13.940579 |
| Total: | 40406 | 6231 | 34175 | 24131  | 3024 | 21107 | 55        |
| Avg:   | 10101 | 1557 | 8543  | 6032   | 756  | 5276  | 13        |

#### 2. Main memory access rates

Bus had 34175 reads and 10 upgrades and 21107 readX.

A total of 55292 accesses.

# 3. Average time for bus acquisition

There were 297 waits for the bus.

Average waiting time per access: 0.005371 cycles.

- 4. There were 1599 Cache to Cache transfers
- 5. Total execution time is 2918558 ns, Avg per-mem-access time is 52.784453 ns
- 6. Probe Read: 2139, Probe ReadX: 122

## FFTtracefile, 2 Processors

| CPU    | Reads | RHit | RMiss | Writes | WHit | WMiss | Hitrate   |
|--------|-------|------|-------|--------|------|-------|-----------|
| 0      | 29313 | 3403 | 25910 | 15417  | 2013 | 13404 | 12.108205 |
| 1      | 29262 | 3151 | 26111 | 14801  | 1898 | 12903 | 11.458593 |
| Total: | 58575 | 6554 | 52021 | 30218  | 3911 | 26307 | 23        |
| Avg:   | 29287 | 3277 | 26010 | 15109  | 1955 | 13153 | 11        |

## 2. Main memory access rates

Bus had 52021 reads and 0 upgrades and 26307 readX.

A total of 78328 accesses.

## 3. Average time for bus acquisition

There were 93 waits for the bus.

Average waiting time per access: 0.001187 cycles.

- 4. There were 798 Cache to Cache transfers
- 5. Total execution time is 7845423 ns, Avg per-mem-access time is 100.161156 ns
- 6. Probe Read: 822, Probe ReadX: 1

#### FFTtracefile, 1 Processor

| CPU | Reads | RHit | RMiss | Writes | WHit  | WMiss | Hitrate   |
|-----|-------|------|-------|--------|-------|-------|-----------|
| 0   | 86298 | 7652 | 78646 | 43195  | 10882 | 32313 | 14.312743 |

## 2. Main memory access rates

Bus had 78646 reads and 0 upgrades and 32313 readX.

A total of 110959 accesses.

## 3. Average time for bus acquisition

There were 0 waits for the bus.

Average waiting time per access: 0.000000 cycles.

- 4. There were 0 Cache to Cache transfers
- 5. Total execution time is 22459345 ns, Avg per-mem-access time is 202.411206 ns
- 6. Probe Read: 0, Probe ReadX: 0

#### **Performance comparison:**

#### **Hitrate:**

Hit rates are same Valid-Invalid(VI) protocol. MOESI is not introducing any mechanism of pre-fectching. So there is no possibility where hit rate changes.

## **No.Of Bus requests:**

Compared to VI protocol, MOESI protocol doesn't need to place bus request for the data states EXCLUSIVE, MODIFIED. Here we save lot of bus requests. To sum up, #bus requests are less than VI protocol in all cases.

E.g;

| # Bus req  | VI    | MOESI |  |
|------------|-------|-------|--|
| FFT 8 proc | 48486 | 44334 |  |
| DBG 4 proc | 242   | 239   |  |
| RND 1 proc | 46877 | 28571 |  |

## Average waiting time for bus acquisition:

As the number of bus requests are less, obviously waiting time to acquire bus also less. This is reflected in MOESI protocol.

| Avg bus waiting time | VI       | MOESI    |  |
|----------------------|----------|----------|--|
| FFT 8 proc           | 0.022728 | 0.019398 |  |
| DBG 4 proc           | 0.016529 | 0.008368 |  |
| RND 1 proc           | 0        | 0        |  |

#### **Total execution time:**

|            | VI       |             | MOESI    |             |  |
|------------|----------|-------------|----------|-------------|--|
|            |          | Avg per mem |          | Avg per mem |  |
|            | Exe time | access time | Exe time | access time |  |
| FFT 8 proc | 975866   | 20.126758   | 1111136  | 25.06       |  |
| RND 8 proc | 9748117  | 24.04       | 12805573 | 31.95       |  |
| DBG 8 proc | 14914    | 24.48       | 10204    | 16.978      |  |

Ideally MOESI should perform at least as same as VI protocol in terms of execution time. MOESI performs way better in Debug trace files. On the other hand, we are some seeing results for which MOESI consumes longer execution time and average per memory access time. This is because of the additional wait() statements in our the cache implementation intended to let other processes know that some port change event has happened. Cache to cache transfer technique avoids memory latencies to improve overall performance.

## **Sample output**

----- Num CPU: 1 -----

@0 s: P0: Read from 40

@0 s: cache lookup. Tag: 0 Set no : 1

@0 s: Cache 0 snoops own read req

@1 ns:C0 Read miss for 40

@101 ns: C0 selected\_way: 0

@101 ns: C0 have read Addr: 40 in STATE:3

update\_lru\_state : 0 updated state : 11

@102 ns: Cache 0 writing Port\_Done

@103 ns: P0: Read from 108

@103 ns: cache lookup. Tag: 0 Set no : 3 @103 ns: Cache 0 snoops own read req

@104 ns:C0 Read miss for 108 @204 ns: C0 selected\_way: 0

@204 ns: C0 have read Addr: 108 in STATE:3

update\_lru\_state : 0
updated state : 11

@205 ns: Cache 0 writing Port\_Done