(a)

factor = 2



factor = 5



(b)

factor = 2



factor = 5





 $\max\{4/3,4/5\}=4/3$  分母即為要取的 unfolding factor 3-unfolding

## 3.



Fig 5.27 The PDSP architecture

For a 7-tap FIR filter computation with 7-unfolding:



如下圖所示之 bit-serial adder,以 unfolding factors 4 和 5 來實現其 digit-serial adders,其 word-length 為 12 bits。



waveform (original vs 4-unfolding vs 5-unfolding):



## timing (original vs 4-unfolding vs 5-unfolding):

| Point                                                                                                                                                                                      | Incr                                                         | Path                                 |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------|--------------------------------------|
| <pre>clock clk (rise edge) clock network delay (ideal) in_c_reg/CK (DFFRX1) in_c_reg/Q (DFFRX1) F0/cin (fa) F0/U2/Y (XOR2X1) F0/U1/Y (AO22X1) F0/cout (fa) c (out) data arrival time</pre> | 0.00<br>0.50<br>0.00<br>0.52<br>0.00<br>0.15<br>0.59<br>0.00 | 1.02 f                               |
| clock clk (rise edge)<br>clock network delay (ideal)<br>clock uncertainty<br>output external delay<br>data required time                                                                   | 8.00<br>0.50<br>-0.10<br>-0.50                               | 8.00<br>8.50<br>8.40<br>7.90<br>7.90 |
| data required time<br>data arrival time                                                                                                                                                    |                                                              | 7.90<br>-1.76                        |
| slack (MET)                                                                                                                                                                                |                                                              | 6.14                                 |

```
clock clk (rise edge)
clock network delay (ideal)
in_c_reg/CK (DFFRX1)
in_c_reg/Q (DFFRX1)
f0/cin (fa_3)
f0/U2/Y (XOR2X1)
f0/U1/Y (AO22X1)
f0/c (fa_3)
f1/cin (fa_2)
f1/U2/Y (XOR2X1)
f1/U1/Y (AO22X1)
f1/U1/Y (AO22X1)
f1/c (fa_2)
f2/cin (fa_1)
f2/U2/Y (XOR2X1)
f2/U1/Y (AO22X1)
f2/U1/Y (AO22X1)
f3/cin (fa_0)
f3/U2/Y (XOR2X1)
f3/U1/Y (AO22X1)
f3/cin (fa_0)
c (out)
data arrival time
                                                                                                                   0.50
0.50
                                                                                           0.00
                                                                                                                   1.02
1.02
1.17
1.51
                                                                                           0.52
                                                                                           0.00
                                                                                           0.15
0.34
                                                                                           0.00
                                                                                                                  1.51
1.57
2.01
2.01
2.01
2.17
2.51
2.51
2.51
                                                                                           0.00
                                                                                           0.16
                                                                                           0.34
                                                                                           0.00
                                                                                           0.00
                                                                                           0.16
                                                                                           0.34
                                                                                           0.00
                                                                                           0.00
                                                                                                                   3.26
                                                                                           0.00
                                                                                                                   3.26
                                                                                                                   3.26
3.26
                                                                                           0.00
 clock clk (rise edge)
clock network delay (ideal)
clock uncertainty
output external delay
data required time
                                                                                        8.00
0.50
-0.10
                                                                                                                   8.50
                                                                                                                   8.40
7.90
7.90
                                                                                        -0.50
 data required time
 data arrival time
                                                                                                                  4.64
 slack (MET)
clock clk (rise edge)
                                                                                             8.00
                                                                                                                       8.00
clock network delay (ideal)
                                                                                           0.50
                                                                                                                     8.50
clock uncertainty
                                                                                           -0.10
                                                                                                                       8.40
                                                                                                                       7.90
output external delay
                                                                                           -0.50
data required time
                                                                                                                        7.90
data required time
                                                                                                                      7.90
data arrival time
                                                                                                                     -3.76
```

#### area (original vs 4-unfolding vs 5-unfolding):

slack (MET)

```
Number of ports:
Number of nets:
Number of cells:
Number of combinational cells:
Number of sequential cells:
                                                        27
                                                         17
                                                         9
Number of macros/black boxes:
Number of buf/inv:
Number of references:
                                                         Θ
                                              186.713998
Combinational area:
                                               0.000000
Buf/Inv area:
                                 297.044994
0.000000
undefined (No wire load specified)
Noncombinational area:
Macro/Black Box area:
Net Interconnect area:
Total cell area:
                                              483.758993
                                 undefined
Total area:
```

```
Report : area
Design : top
Version: R-2020.09
Date : Thu Nov 17 18:48:01 2022
***********
Library(s) Used:
    slow (File: /cad/CBDK/CBDK_IC_Contest_v2.1/SynopsysDC/db/slow.db)
Number of ports:
                                            40
Number of nets:
                                            69
Number of cells:
                                            40
Number of combinational cells:
Number of sequential cells:
                                            23
                                            13
Number of macros/black boxes:
                                             Θ
Number of buf/inv:
Number of references:
                                            16
Combinational area:
                                    220.661997
                                     3.394800
Buf/Inv area:
Noncombinational area:
                                    446.416203
Macro/Black Box area:
                                      0.000000
                            undefined (No wire load specified)
Net Interconnect area:
Total cell area:
                                    667.078199
Total area:
                             undefined
```

```
Library(s) Used:
      slow (File: /cad/CBDK/CBDK_IC_Contest_v2.1/SynopsysDC/db/slow.db)
Number of ports:
                                                         49
Number of ports:
Number of nets:
Number of cells:
Number of combinational cells:
Number of sequential cells:
Number of macros/black boxes:
Number of buf/inv:
                                                         81
                                                         46
                                                         26
                                                         15
                                                          Θ
Number of references:
                                                         17
Combinational area:
                                              254.609996
Buf/Inv area:
                                                 3.394800
Noncombinational area:
                                               517.707005
Macro/Black Box area:
                                                0.000000
                               undefined (No wire load specified)
Net Interconnect area:
Total cell area:
                                              772.317000
```

### power (original vs 4-unfolding vs 5-unfolding):

| Power Group                                                             | Internal<br>Power                                                          | Switching<br>Power                                                         | Leakage<br>Power                                                           | Total<br>Power (                                                                         | <sub>96</sub> )                                                     | Attrs |
|-------------------------------------------------------------------------|----------------------------------------------------------------------------|----------------------------------------------------------------------------|----------------------------------------------------------------------------|------------------------------------------------------------------------------------------|---------------------------------------------------------------------|-------|
| io_pad memory black_box clock_network register sequential combinational | 0.0000<br>0.0000<br>0.0000<br>0.0000<br>2.4435e-02<br>0.0000<br>1.0323e-03 | 0.0000<br>0.0000<br>0.0000<br>0.0000<br>3.2644e-04<br>0.0000<br>8.5173e-04 | 0.0000<br>0.0000<br>0.0000<br>0.0000<br>2.5684e+05<br>0.0000<br>2.0509e+05 | 0.0000 (<br>0.0000 (<br>0.0000 (<br>0.0000 (<br>2.5018e-02 (<br>0.0000 (<br>2.0891e-03 ( | 0.00%)<br>0.00%)<br>0.00%)<br>0.00%)<br>92.29%)<br>0.00%)<br>7.71%) |       |
| Total                                                                   | 2.5467e-02 mW                                                              | 1.1782e-03 mW                                                              | 4.6193e+05 pW                                                              | 2.7107e-02 mW                                                                            |                                                                     |       |

| Power Group   | Internal<br>Power | Switching<br>Power | Leakage<br>Power | Total<br>Power ( | 95    | )   | Attrs |
|---------------|-------------------|--------------------|------------------|------------------|-------|-----|-------|
| io pad        | 0.0000            | 0.0000             | 0.0000           | 0.0000 (         | 0.00  | %)  |       |
| memory        | 0.0000            | 0.0000             | 0.0000           | 0.0000 (         | 0.00  | %j  |       |
| black box     | 0.0000            | 0.0000             | 0.0000           | 0.0000 (         | 0.00  | %)  |       |
| clock_network | 0.0000            | 0.0000             | 0.0000           | 0.0000 (         | 0.00  | % ) |       |
| register      | 3.6306e-02        | 5.9708e-04         | 3.4464e+05       | 3.7247e-02 (     | 88.27 | %j) |       |
| sequential    | 0.0000            | 0.0000             | 0.0000           | 0.0000 (         | 0.00  | %)  |       |
| combinational | 1.7354e-03        | 2.9790e-03         | 2.3500e+05       | 4.9495e-03 (     | 11.73 | %)  |       |
| Total         | 3.8041e-02 mW     | 3.5761e-03 mW      | 5.7963e+05 pW    | 4.2197e-02 mW    | I     |     |       |

| Power Group                                                             | Internal<br>Power                                                          | Switching<br>Power                                                         | Leakage<br>Power                                                           | Total<br>Power                                                             | (             | % )                                                       | Attrs |
|-------------------------------------------------------------------------|----------------------------------------------------------------------------|----------------------------------------------------------------------------|----------------------------------------------------------------------------|----------------------------------------------------------------------------|---------------|-----------------------------------------------------------|-------|
| io_pad memory black_box clock_network register sequential combinational | 0.0000<br>0.0000<br>0.0000<br>0.0000<br>4.1925e-02<br>0.0000<br>1.9923e-03 | 0.0000<br>0.0000<br>0.0000<br>0.0000<br>6.1741e-04<br>0.0000<br>3.5284e-03 | 0.0000<br>0.0000<br>0.0000<br>0.0000<br>3.9369e+05<br>0.0000<br>2.7903e+05 | 0.0000<br>0.0000<br>0.0000<br>0.0000<br>4.2936e-02<br>0.0000<br>5.7997e-03 | ( ( ( ( ( ( ( | 0.00%)<br>0.00%)<br>0.00%)<br>0.00%)<br>88.10%)<br>0.00%) |       |
| Total                                                                   | 4.3918e-02 mW                                                              | 4.1458e-03 mW                                                              | 6.7272e+05 pW                                                              | 4.8736e-02 m                                                               | nW            |                                                           |       |

### 5. 結論

|             | Area(um <sup>2</sup> ) | Time(ns) | Power(mW) |
|-------------|------------------------|----------|-----------|
| Original    | 483.75                 | 1.76     | 0.027107  |
| 4-unfolding | 667.08                 | 3.26     | 0.042197  |
| 5-unfolding | 772.32                 | 3.76     | 0.048736  |

經過 unfolding 展開後,會根據 unfolding factor 而使面積越來越大

雖然時間看似越來越長,但各個架構處理 word-length = 12 所需乘上的倍數不同

original: 1.76\*12 = 21.12 4-unfolding: 3.26\*3 = 9.78 5-unfolding: 3.76\*2.4 = 9.024 所以展開後的 throughput 會變大

功率和面積相同,會隨著 unfolding factor 而越來越大