|                       | Benchmark | MATRIXMUL (naive) |        |        |        |        |        |
|-----------------------|-----------|-------------------|--------|--------|--------|--------|--------|
|                       | Data type | 32                |        |        |        |        |        |
|                       | m         | 32                |        | 128    |        |        | 512    |
|                       | n         | 32                | 64     | 32     | 128    | 512    | 32     |
|                       | р         | 32                | 128    | 32     | 128    |        | 512    |
| Exec. time (s)        | Native    | < 0.1             | < 0.1  | < 0.1  | < 0.1  | < 0.1  | < 0.1  |
|                       | Ours      | 0.9               | 3.4    | 2      | 20.8   | 87.2   | 85.5   |
|                       | ACCELSIM  | 2.1               | 4.5    | 2.8    | 16.3   | 67.2   | 57.6   |
| Cycles                | Native    | 15.2K             | 22.8K  | 14.6K  | 40K    | 138.4K | 54.3K  |
|                       | Ours      | 11.4K             | 17.1K  | 11.7K  | 27.9K  | 102.2K | 50.2K  |
|                       | ACCELSIM  | 11.1K             | 16.1K  | 11.2K  | 26.1K  | 92.4K  | 42.2K  |
| Instruction<br>count  | Native    | 315.4K            | 2.1M   | 1.3M   | 14.9M  | 54.2M  | 80.7M  |
|                       | Ours      | 328.7K            | 2.3M   | 1.3M   | 17.4M  | 66.2M  | 84.1M  |
|                       | ACCELSIM  | 328.7K            | 2.3M   | 1.3M   | 17.4M  | 66.2M  | 84.1M  |
| L1 accesses           | Native    | 8.2K              | 65.5K  | 32.8K  | 524.3K | 2.1M   | 2.1M   |
|                       | Ours      | 6.1K              | 49.2K  | 24.6K  | 393.2K | 1.6M   | 1.6M   |
|                       | ACCELSIM  | 5.1K              | 41K    | 20.5K  | 327.7K | 1.3M   | 1.3M   |
| L1 hit rate (%)       | Native    | 95%               | 95%    | 95%    | 95%    | 86.58% | 95.21% |
|                       | Ours      | 95.83%            | 95.83% | 95.83% | 95.82% | 87.78% | 96.12% |
|                       | ACCELSIM  | 70.66%            | 71.92% | 71.44% | 70.37% | 52.07% | 78.07% |
| L2 read hits          | Native    | 256               | 2K     | 1K     | 16.4K  | 164.2K | 59.8K  |
|                       | Ours      | 256               | 2K     | 1K     | 16.4K  | 192.2K | 61.1K  |
|                       | ACCELSIM  | 0                 | 768    | 384    | 14.9K  | 144.4K | 54.8K  |
| L2 read hit rate (%)  | Native    | 100%              | 100%   | 100%   | 100%   | 95.82% | 96.1%  |
|                       | Ours      | 100%              | 100%   | 100%   | 100%   | 100%   | 100%   |
|                       | ACCELSIM  | 0%                | 37.5%  | 37.5%  | 78.41% | 89.81% | 93.04% |
| L2 write hits         | Native    | 128               | 512    | 512    | 2K     | 2K     | 32.8K  |
|                       | Ours      | 128               | 512    | 512    | 2K     | 2K     | 32.8K  |
|                       | ACCELSIM  | 0                 | 0      | 0      | 0      | 0      | 0      |
| L2 write hit rate (%) | Native    | 100%              | 100%   | 100%   | 100%   | 100%   | 100%   |
|                       | Ours      | 100%              | 100%   | 100%   | 100%   | 100%   | 100%   |
|                       | ACCELSIM  | 0%                | 0%     | 0%     | 0%     | 0%     | 0%     |
| DRAM<br>reads         | Native    | 0                 | 0      | 0      | 0      | 0      | 0      |
|                       | Ours      | 0                 | 0      | 0      | 0      | 0      | 0      |
|                       | ACCELSIM  | 384               | 1.8K   | 1.2K   | 6.1K   | 18.4K  | 36.9K  |
| DRAM<br>writes        | Native    | 0                 | 0      | 0      | 12     | 17     | 17     |
|                       | Ours      | 0                 | 0      | 0      | 0      | 0      | 0      |
|                       | ACCELSIM  | 128               | 512    | 512    | 2K     | 2K     | 32.8K  |