# 113 學年度

# 國立中山大學

課程名稱:算數處理器與實作

題目:

Computation Unit for Dot Product of 4D Vectors

作業/成果報告/專題

授課教師:蕭勝夫

學生學號/班級/姓名:

B103040021/114 資工/謝鎧駿

### 一、數據比較表格

|         |       |             | Area       | (um <sup>2</sup> ) | Delay |      | Power        | r (W         | )            |
|---------|-------|-------------|------------|--------------------|-------|------|--------------|--------------|--------------|
|         |       | CL          | SL         | Total              | (ns)  | (ns) | dynamic      | leakage      | total        |
|         | area  | 5496.232457 | 0          | 5496.232457        | 4.67  | 4.67 | 1.6915<br>mW | 3.1513<br>uW | 1.6947<br>mW |
| DP      | mid   | 5647.553425 | 0          | 5647.553425        | 2.99  | 2.99 | 2.4382<br>mW | 3.4185<br>uW | 2.4416<br>mW |
|         | delay | 7415.919524 | 0          | 7415.919524        | 1.31  | 1.31 | 7.3633<br>mW | 5.9745<br>uW | 7.3693<br>mW |
|         | area  | 4827.911158 | 736.024346 | 5563.935504        | 1.89  | 7.56 | 2.0804<br>mW | 3.1140<br>uW | 2.0835<br>mW |
| DP_pipe | mid   | 4941.596282 | 736.490906 | 5678.087188        | 1.27  | 5.08 | 3.0458<br>mW | 3.8389<br>uW | 3.0496<br>mW |
|         | delay | 6091.563015 | 736.024346 | 6827.587361        | 0.65  | 2.6  | 6.6515<br>mW | 4.7993<br>uW | 6.6563<br>mW |

## 二、架構圖



架構的話跟作業要求設計的一樣,

Stagel: unpack, multiplication, alignment

Stage2: multi-operand addition, conversion to sign-magnitude

Stage3: normalization Stage4: rounding, pack

而我 clock gating 是設在 stagel,當進行 16bit 計算的時候,就不存 HH, LH, HL 暫存器。

而 Critical path 是在 stagel。

#### 三、Pipeline delay

Critical path: stage 1

#### 1. delay opt.:

stage 1:0.64

stage 2:0.63

stage 3:0.63

stage 4:0.59

#### 2. mid:

stage 1:1.25

stage 2:1.23

stage 3:1.08

stage 4:0.59

#### 3. Area opt.:

stage 1:1.85

stage 2:1.38

stage 3:1.08

stage 4:0.59

#### 四、驗證硬體之方式

我是透過 python 套件去產出答案,以下是產出的流程

- 1. **將十六進制轉換為浮點數**: 利用 Python 的 struct 模組,把 IEEE 754 格式的 32 位元十六進制數字轉換為浮點數。
- 2. 讀取輸入檔案: 打開 FP32. txt 檔案 1 並讀取。
- 3. **進行計算**: 對每行數據,先將前 4 個數值(x1 至 x4)與後 4 個數值(y1 至 y4) 逐一相乘,再將這些乘積相加。
- 4. **結果轉換為二進制格式**: 將浮點數計算結果轉換為 IEEE 754 的 32 位元二進制格式,並以字串形式保存。
- 5. 寫入輸出檔案: 將每個計算結果以二進制格式寫入 result.txt。

#### 五、誤差

在進行 32bit 計算時,1000 筆測資都不會有誤差

All test data correct !

但是進行 16bit 計算時,1000 筆 data 中,會有 100 筆 data 出現誤差,且誤差值都是差 2,如下圖

| Output incorrect at # 995            |  |
|--------------------------------------|--|
| The answer is : 1100110100101111     |  |
| Your module output: 1100110100101101 |  |
| ?: 2                                 |  |
|                                      |  |
|                                      |  |
| Output incorrect at # 998            |  |
| The answer is : 1100000001011100     |  |
| Your module output: 110000001011010  |  |
| ?: 2                                 |  |
|                                      |  |
|                                      |  |
| avg error: 100                       |  |
|                                      |  |

Avg error 是計算總共有幾筆誤差。

這邊截兩個測資誤差表示,我認為是因為 python 套件跟我 rounding 的方式不一樣,因此來回會相差 2。

## Clock Gating Summary

|                                 |              | _ |
|---------------------------------|--------------|---|
| Number of Clock gating elements | 4            |   |
| Number of Gated registers       | 192 (29.58%) |   |
| Number of Ungated registers     | 457 (70.42%) |   |
| <br>  Total number of registers | 649          |   |

## Clock Gating Report by Origin

|                                               | Actual (%)  <br>  Count |
|-----------------------------------------------|-------------------------|
| Number of tool-inserted clock gating elements | 4 (100.00%)             |
| Number of pre-existing clock gating elements  | 0 (0.00%)               |
| Number of gated registers                     | 192 (29.58%)            |
| Number of tool-inserted gated registers       | 192 (29.58%)            |
| Number of pre-existing gated registers        | 0 (0.00%)               |
| Number of ungated registers                   | 457 (70.42%)            |
| <br>  Number of registers                     | <br>  649               |

## 1. single precision

|                                            | Int        | Switch              | Leak      | Total    |      |
|--------------------------------------------|------------|---------------------|-----------|----------|------|
| Hierarchy                                  | Power      | Power               | Power     | Power    | %    |
|                                            |            |                     |           |          |      |
| FLP_DP                                     |            | 1.75e-03            |           |          |      |
| s1 (stage1)                                |            | 1.46e-03            |           | 3.10e-03 | 78.7 |
| add_0_root_sub_0_root_sub_141_I2           |            |                     |           |          |      |
|                                            |            | 5.29e-07            |           | 2.32e-06 | 0.1  |
| add_0_root_sub_0_root_sub_141_I3           |            |                     |           |          |      |
|                                            |            | 5.04e-07            |           | 2.21e-06 | 0.1  |
| add_0_root_sub_0_root_sub_141_I4           |            |                     |           |          |      |
|                                            |            | 5.31e-07            |           |          | 0.1  |
| sub_195 (stage1_DW01_sub_3)                |            | 2.35e-06            | 4.68e-09  | 5.40e-06 | 0.1  |
| add_0_root_sub_0_root_sub_141 (st          |            |                     |           |          |      |
|                                            |            | 5.36e-07            | 5.34e-09  | 2.32e-06 | 0.1  |
| add_0_root_sub_0_root_sub_161 (st          |            |                     |           |          |      |
|                                            |            | 6.72e-07            |           | 2.50e-06 | 0.1  |
| add_0_root_sub_0_root_sub_161_I2           |            |                     |           |          |      |
|                                            |            | 6.64e-07            |           | 2.46e-06 | 0.1  |
| add_0_root_sub_0_root_sub_161_I3           |            |                     |           |          |      |
|                                            | 1.80e-06   | 6.72e-07            | 4.02e-09  | 2.48e-06 | 0.1  |
| add_0_root_sub_0_root_sub_161_I4           | (stage1_D  | W <b>01_add_</b> 7) | )         |          |      |
|                                            | 1.78e-06   | 6.61e-07            | 4.02e-09  | 2.44e-06 | 0.1  |
| cg1 (ClockGating_0)                        | 3.15e-04   | 2.69e-04            | 8.44e-07  | 5.85e-04 | 14.9 |
| clk_gate_C218 (SNPS_CLOCK_GATE_            | HIGH_Clock | k <b>Gating_0</b> ) | )         |          |      |
|                                            | 3.53e-07   | 3.15e-06            | 8.03e-12  | 3.50e-06 | 0.1  |
| <pre>mult_add_249_3_aco (ClockGating</pre> | _0_DW_mul  | t_uns_11)           |           |          |      |
|                                            | 7.28e-06   | 9.46e-06            | 1.85e-08  | 1.68e-05 | 0.4  |
| add_0_root_add_0_root_add_249_2            | (ClockGa   | ting_0_DW0          | 91_add_12 | )        |      |
|                                            | 1.93e-05   | 1.93e-05            | 3.71e-08  | 3.87e-05 | 1.0  |
| add_249_3_aco (ClockGating_0_DW            | 01_add_14  | )                   |           |          |      |
|                                            |            | 2.50e-05            | 4.56e-08  | 4.84e-05 | 1.2  |
|                                            |            |                     |           |          |      |

## 2. Half precision

|                                   | Int      | Switch   | Leak     | Total    |       |
|-----------------------------------|----------|----------|----------|----------|-------|
| Hierarchy                         | Power    | Power    | Power    | Power    | %     |
| FLP_DP                            | 9.39e-04 | 6.80e-04 | 4.80e-06 | 1.62e-03 | 100.0 |
| s1 (stage1)                       | 5.04e-04 | 4.53e-04 | 3.69e-06 | 9.60e-04 | 59.1  |
| add_0_root_sub_0_root_sub_141_I2  |          |          |          |          |       |
|                                   | 0.000    | 0.000    | 4.95e-09 | 4.95e-09 | 0.0   |
| add_0_root_sub_0_root_sub_141_I3  | ,        |          |          |          |       |
|                                   | 0.000    | 0.000    | 4.95e-09 | 4.95e-09 | 0.0   |
| add_0_root_sub_0_root_sub_141_I4  |          |          |          |          |       |
|                                   |          | 0.000    |          |          |       |
| sub_195 (stage1_DW01_sub_3)       |          | 1.99e-06 | 4.52e-09 | 4.45e-06 | 0.3   |
| add_0_root_sub_0_root_sub_141 (st |          |          |          |          |       |
|                                   |          | 0.000    | 4.95e-09 | 4.95e-09 | 0.0   |
| add_0_root_sub_0_root_sub_161 (st |          |          |          |          |       |
|                                   |          | 4.67e-07 |          | 1.75e-06 | 0.1   |
| add_0_root_sub_0_root_sub_161_I2  |          |          |          |          |       |
|                                   |          | 4.64e-07 |          | 1.75e-06 | 0.1   |
| add_0_root_sub_0_root_sub_161_I3  |          |          |          |          |       |
|                                   |          | 4.46e-07 |          | 1.68e-06 | 0.1   |
| add_0_root_sub_0_root_sub_161_I4  |          |          |          |          |       |
|                                   |          | 4.66e-07 |          |          |       |
| cg1 (ClockGating_0)               |          | 4.66e-05 |          | 1.13e-04 | 7.0   |
| clk_gate_C218 (SNPS_CLOCK_GATE_   | _        |          |          |          |       |
|                                   |          | 0.000    | 6.74e-12 | 1.20e-07 | 0.0   |
| mult_add_249_3_aco (ClockGating   |          |          |          |          |       |
|                                   |          | 0.000    |          |          | 0.0   |
| add_0_root_add_0_root_add_249_2   |          |          |          |          |       |
|                                   |          | 0.000    | 3.33e-08 | 3.33e-08 | 0.0   |
| add_249_3_aco (ClockGating_0_DW   |          |          |          |          |       |
|                                   | 4.15e-06 | 4.40e-06 | 4.23e-08 | 8.59e-06 | 0.5   |

## 八、波形圖

1. nonpipe

RTI.

| 1[31:0] .0d2 0129                                                                                        | 40d2 0129                                                                   | 4118 c8c3                                                                   | 411b 5715                                                                   | bf7a 734c                                                                   | c01a 13a6                                                                   | c10a fb93                                                                  |
|----------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------|-----------------------------------------------------------------------------|-----------------------------------------------------------------------------|-----------------------------------------------------------------------------|-----------------------------------------------------------------------------|----------------------------------------------------------------------------|
| 2[31:0] 103 0760                                                                                         | 4103_07b0                                                                   | 4088_4ec1                                                                   | 4115_caad                                                                   | c113_d290                                                                   | c01e_d832                                                                   | c0a4_fe53                                                                  |
| 3[31:0] :08f 75ad                                                                                        | c88f_75ad                                                                   | 4115_0f78                                                                   | c004_6ff2                                                                   | c0fb_2fld                                                                   | 401c_9e2d                                                                   | c0c4_bclf                                                                  |
| 4[31:0] (f69 95b3                                                                                        | bf69_95b3                                                                   | 40f2_20af                                                                   | 4109_8afe                                                                   | c063_299e                                                                   | c01b_bede                                                                   | c114_76be                                                                  |
| 1[31:0] :059 7aba                                                                                        | c859_7aba                                                                   | 4056_7550                                                                   | 40bd_0176                                                                   | c8db_1831                                                                   | 407a_28f7                                                                   | 40a6_019a                                                                  |
| 2[31:0] .100_9774                                                                                        | 4100_9774                                                                   | 403a_3eal                                                                   | 40bf_b513                                                                   | c10c_398b                                                                   | c0f0_aead                                                                   | 410a_6744                                                                  |
| 3[31:0] :0eb_6363                                                                                        | c0eb_6363                                                                   | c114_6341                                                                   | 3e4c_e0al                                                                   | bfa3_896f                                                                   | c0a3_8537                                                                   | 410d_28cc                                                                  |
| 4[31:0] :10a_2b8b                                                                                        | c10a_2b8b                                                                   | c0f5_5b2f                                                                   | c112_9da2                                                                   | c11c_529f                                                                   | 411b_d89a                                                                   | 3fce_2a45                                                                  |
| e[31:0] 2a8 bf79                                                                                         | 42a8 bf79                                                                   | c2c8 θbac                                                                   | 4208 f8e2                                                                   | 4304 5fed                                                                   | cld7 9bc0                                                                   | c31e d951                                                                  |
| 28 DT/9                                                                                                  | 4280_B179                                                                   | CZCO ODAC                                                                   | 4200 TOEZ                                                                   | 4304_51ed                                                                   | C187_9BC0                                                                   | £316_8351                                                                  |
| r[31:0] :XXX_XXXX                                                                                        | XXXX_XXXX                                                                   | 42a8_bf79                                                                   | c2c8_0bac                                                                   | 4208_f8e2                                                                   | 4304_5fed                                                                   | c1d7_9bc0                                                                  |
|                                                                                                          | XXXX_XXXX                                                                   | 42a8_bf79                                                                   |                                                                             |                                                                             |                                                                             |                                                                            |
| Gate-le                                                                                                  | xxxx xxxx<br>ve1:                                                           | 42a8_b179 X                                                                 | c2c8_0bac<br>411b_5715                                                      | 4208_f8e2<br>                                                               | 4304_51ed<br>c01a_13a6                                                      | cld7_9bc0<br>cl0a_fb93                                                     |
| Gate-le                                                                                                  | xxxx xxxx<br>vel:<br>48d2 8129<br>4103 87b8                                 | 42a8 bf79<br>4118 c8c3<br>4088 4ec1                                         | c2c8 Obac<br>411b 5715<br>4115 caad                                         | 4208 f8e2<br>bf7a 734c<br>c113 d290                                         | 4304 Sfed<br>c01a 13a6<br>c01e d832                                         | c1d7_9bc0<br>c10a_fb93<br>c0a4_fe53                                        |
| Gate-le                                                                                                  | xxxx xxxx vel:  48d2 0129 4103 07b0 c08f_75ad                               | 42a6 bf79 (4118 c8c3 4868 4cc1 4115 9f78 )                                  | c2c8 Obac 411b 5715 4115 cand c004 6ff2                                     | 4208 [8e2                                                                   | 4304_5fed  c01a_13a6  c01e_d632  401c_9e2d                                  | cl07 9bc0<br>cl0a fb93<br>c0a4 fe53<br>c0c4 bclf                           |
| Gate-lev  [31:0] 802_0129 [31:0] 103_0700 [31:0] 807_75ad [31:0] 659_75ba                                | xxxx xxxx Ve1: 48d2 0129 4103 0700 c88f 75ad bf69 95b3                      | 42a8 bf79  4118 c8c3  4108 4ec1  4115 ef78  40f2 20af                       | c2c8 0bac 411b 5715 4115 caad c0004 6ff2 4109 8afe                          | 4208 f8e2  bf7a 734c  c113 d290  c0fb 2f1d  c063 299e                       | 4304 5fed  c01a 13a6  c01e d832  401c 9e2d  c01b bede                       | cl07_9bc0<br>cl0a_fb93<br>c0a4_fe53<br>c0c4_bclf<br>cl14_76be              |
| Gate-le' [131:0] 002_0129 [131:0] 002_0129 [131:0] 006_7500 [131:0] 006_7500                             | XXXX XXXX Vel: 4802 0129 4103 07100 c88f 75ad bf69 95b3 c659 7aba           | 42a8_bf79                                                                   | c2c8 0bac  411b 5715  4115 cand                                             | 4208 f8e2  bf7a 734c  cil3 d290  cofb 2f1a  c663 299e  c6bb 1031            | 4304 5fed  CD1a 13a6  CD1a 6832  481c 9e2a  CD1b bede  407a 2817            | c107 9bc0<br>c10a fb93<br>c804 fe53<br>c804 bc1f<br>c114 76be<br>4806 019a |
| Gate-le' [31:0] 002_0129 [31:0] 103_0700 [31:0] 08f_75ad [31:0] 169_95b3 [31:0] 059_7aba [31:0] 109_9774 | XXXX XXXX Vel:  4802 0129 4103 0700 c88f 75ad br69 95b3 c659 7aba 4100 9774 | 42a6 bf79  4118 c8c3  4888 4cc1  4115 0f78  4012 20af  4056 7550  403a 3ca1 | c2c8 0bac  411b 5715  4115 cand  c804 6ff2  4109 8afe  4004 0176  400F b513 | 4208 f8e2  bf7a 734c  c113 d290  c0fb 2f1d  c663 299e  c0fb 1031  c10c 398b | 4304 5fed  c01a 13a6  c01e d832  401c 9e2d  c01b bede  407a 26f7  c0f0 aead | cla7 9bc0  claa fb93 c8a4 fe53 c8c4 bclf cll4 76be 48a6 619a 410a 6744     |

## 2. pipeline:

RTL:



#### Gate-level:



以上兩個 nonpipe 跟 pipeline 都是透過觀察 answer 跟 outcome 有一樣的答案來確認沒有問題,而觀察 input 可以確認讀檔沒有問題。

### 九、心得

這次作業我認為剛開始比較難,因為 RTL 或是 TB 產出來的答案,兩邊如果對不上,就會不知道到底是哪一邊出錯了,後來是靠跟朋友對答案才知道是我的 RTL 部分錯了,錯在 unpack mantissa 的時候。