# Final project - Recurrent Unit Circuit Design

# b05902086 周逸

## 1. 設計取捨

- o 一開始花了一段時間考慮要把哪些權重主動儲存在 circuit 上面,而因為最後計算的 結果是看 AT 值(面積×cycle數×cycle time),如果把所有權重資料都存在 circuit 上, 雖然會使得總 cycle 數量變很低,但是會導致節省下來的 cycle 數量的倍率跑到面積 上面,而這樣會導致 circuit 過大,很難壓 cycle time。因此最後是選擇所有資料都重 新讀取。
- 。 雖然原本以為把  $B_{ih}$  及  $B_{hh}$  放在 Circuit 上可以用少量的 area 換取一些 cycle 數上的節省,但是實際上實作後發現為了存  $B_{ih}$  及  $B_{hh}$  會導致多用了大約一半面積,反而得不償失,而 cycle 數只少了 $\frac{1}{64}$ 左右。

## 2. Stage

• 一開始仔細思考之後,會發現基本上只要依照讀六種不同 memory 的位置來切狀態即可完成,並且會在其中的五個狀態中重複循環。

#### 3. 乘法器

在寫之前就有預想到乘法部分會變成 critical path,而實際用 design compiler 跑下去結果也是如此。因此把一個大乘法拆成多段小的乘法,然後把加總的部分放到下個 cycle 處理,這樣就能壓低 cycle time。

#### 4. activation function

把乘法壓低後, critical path 就變成了計算 activation function 到傳出結果到mdata\_w 的這段,因為這段只有半個 cycle 的時間可以運算(其它的運算都是在下次的 posedge 前能計算出結果就好,但是送出的資料則必須要在 negedge 前計算完成),因此把 activation function 跟送出的部分拆開成兩個階段,雖然會讓總 cycle 數量多大約1%,卻可以讓 cycle time 繼續往下壓,因此這也是個好的優化。

# 5. 合成 Cycle time

- $\circ$  Gate-Level: 4.5~ns
- Transistor-Level: 4.5 ns
  - Ratio: 1.5
  - Density: 0.9

#### 6. 結果

- Gate-level results
  - Can you pass gate-level simulation?
    - yes
  - Cycle time that can pass your gate-level simulation:
    - 4.5 ns
  - Total simulation time:
    - 5741605.030 ns
  - Total cell area:
    - $\blacksquare$  192980.800955  $\mu m^2$
  - Cell area × Simulation time:
    - $\blacksquare$  1108019537456.657  $\mu m^2 \cdot ns$
- Transistor-level results

- Can you pass transistor-level simulation?
  - yes
- Cycle time that can pass your transistor-level simulation:
  - 3.9 ns
- Total simulation time:
  - 4976057.763 ns
- Total cell area:
  - $\blacksquare$  214514.017  $\mu m^2$
- Cell area × Simulation time:
  - $\blacksquare$  1067434139565.164  $\mu m^2 \cdot ns$

# 7. 截圖

#### o RTL Pass

Gate-level Area Report

# Gate-level Timing Report

```
Ibs902086erad29-y-project/RIG
Endpoint: mul 21 reg[6]
(rising edge-triggered flip-flop clocked by clk)
Path Group: max
Point Incr

clock clk (fall edge) 2, 25
clock network delay (ideal) 9,50
input external delay 0,00
matas_f[5] (in) 0,10
ulla805/Y (INOXIG) 0,07
ull26807/ (NOREXI) 0,32
ull605/5 (ADDFHX/4) 0,44
ull9197/ (NAND28K4) 0,17
ull918/Y (NAND28K4) 0,17
ull918/Y (NAND28K4) 0,17
ull918/Y (NAND2X4) 0,09
ull923/Y (NAND2X4) 0,09
ull923/Y (NAND2X4) 0,19
ull921/Y (NAND2X4) 0,19
ull921/Y (NAND2X4) 0,19
ull921/Y (NAND2X4) 0,19
ull921/Y (NAND2K4) 0,19
ul
         clock clk (rise edge)
clock network delay (ideal)
clock uncertainty
mul_21_reg[6]/CK (DFFHQX4)
llbrary setup time
data required time
```

#### Gate-level Pass

```
Timing violation
Ssetuphold<setup>(posedge CK &&& (flag == 1):6750 P5, negedge D:6509 P5, 0.294 : 294 P5, -0.145 : -145 P5 );
File: ./syn/tsmc12_neg.v, line = 18114
Scope: testfixture.u_RNN.\mul_14_reg[4]
Time: 6750 PS
  rning! Timing violation

$setuphold<artup>( posedge CK &&& (flag == 1):6750 PS, negedge D:6610 PS, 0.270 : 270 PS, -0.094 : -94 PS );

File: //syn/tsmc13 neg.v, line = 18064

Scope: testfixture.u_RNN.\mui_24_reg[5]

Time: 6750 PS
 arning! Timing violation
$setupholdsetup>( posedge CK &&& (flag == 1):6750 PS, negedge D:6622 PS, 0.195 : 195 PS, -0.117 : -117 PS );
File: _/syn/tsmcl3_neg.v, line = 23418
Scope: testfixture.u_RNN.\mul_33_reg[7]
Time: 6750 PS
 arning! Timing violation
$setupholdvsetup>( posedge CK &&& (flag == 1):6750 PS, negedge D:6525 PS, 0.303 : 303 PS, -0.156 : -156 PS );
File: ./syn/tsmcl3_neg.v, line = 18114
$cope: testfixture.u_RNN.\t_offset_reg[0]
Time: 6750 PS
 imulation complete via $finish(1) at time 5741605030 PS + 0
/testfixture.v:171 #(`CYCLE/2); $finish;
csim> exit
(8 min 49 s) Tue Jun 09 16:51:45
(18)#-/project/RTL6
(CAD)b5902086@cad29:[0]$ ncverilog testfixture.v RNN_syn.v -v syn/tsmc13_neg.v +define+SDF_
```

# o Transistor-level Floorplan



# o Transistor-level Full placement



o Transistor-level Power Ring



o Transistor-level Power Stripe





#### Transistor-level Special Route



```
nd Time: Tue Jun 9 22:42:34 2020
ime Elapsed: 0:00:01.0
 ****** End: VERIFY CONNECTIVITY *******
Verification Complete : 0 Viols. 0 Wrngs.
(CPU Time: 0:00:01.3 MEM: 20.008M)
 (CPU Time: 0:00:01.3 MEM: 20.000M)
nnovus 5 **WARN: (IMPTCM-125): Option "-checkPinLayerForAccess" for command getPlaceMode is obsolete and will be ignored. It no longer has any effet and should be removed from your script.
** Starting refinePlace (0:23:270 mem=3336.5M) ***
ensity distribution unevenness ratio = 1.980%
pow report: Detail placement moves 0 insts, mean move: 0.00 um, max move: 0.00 um
Runtine: CPU: 0:00:00.4 REAL: 0:00:01.0 MEM: 3336.5MB
stances move: 0 (out of 14919 movable)
nstances move: 0 (out of 14919 movable)
nstances flipped: 0
and displacement: 0.00 um
ax displacement: 0.00 um
ax displacement: 0.00 um
intine: CPU: 0:00:00:06 REAL: 0:00:01.0 MEM: 3336.5MB
** Finished refinePlace (0:23:30 mem=3336.5M) ***
ensity distribution unevenness ratio = 1.980%
nnovus 5 VERIFY_CONNECTIVITY use new engine.
 ****** Start: VERIFY CONNECTIVITY ******
tart Time: Tue Jun 9 22:42:54 2020
 esign Name: RNN
atabase Units: 2000
esign Boundary: (0.0000, 0.0000) (460.0000, 644.9300)
prore Limit = 1000; Warning Limit = 50
heck specified nets
se 4 pthreads
 egin Summary
Found no problems or warnings.
nd Summary
nd Time: Tue Jun 9 22:42:55 2020
ime Elapsed: 0:00:01.0
 ******* End: VERIFY CONNECTIVITY *******
Verification Complete : 0 Viols. 0 Wrngs.
(CPU Time: 0:00:01.1 MEM: -2.000M)
  detailRoute Statistics:
pu time = 00:02:16
clapsed time = 00:00:38
increased memory = -7.03 (MB)
rotal memory = 1712.79 (MB)
rotal memory = 1954.30 (MB)
globalDetailRoute statistics:
cpu time = 00:00:253
Elapsed time = 00:00:08
Increased memory = :15.16 (MB)
Total memory = 1554.30 (MB)
Peak memory + 1554.30 (MB)
Number of warnings = 22
Total number of warnings = 114
Number of fails = 0
Complete globalDetailRoute on Tue Jun 9 22:45:23 2020
 ** Summary of all messages that are not suppressed in this session:
everity IO

Count Summary

1 The design extraction status has been re...

ARRING IMPEXT-3493

1 The design extraction status has been re...

The process node is not set. Use the com...

ARRING IMPEXT-9530

1 The command %s is obsolete and will be r...

ARRING IMPEST-3014

4 The RC network is incomplete for net %s....

** Message Summary: 68 warning(s), 0 error(s)
```

#### Transistor-level Nano Route



```
Inspossescate3P./project7RIGAMyout
alculate late delays in OCV mode...
tart delay calculation (fulloC) (4 T). (MEM-3729.42)
litch Analysis: View av_func_mode_max -- Total Number of Nets Skipped = 0.
litch Analysis: View av_func_mode_max -- Total Number of Nets Analyzed = 15239.
otal number of fetched objects 15239
AE_INFO: Total number of nets for which stage creation was skipped for all views 0
AE_INFO-618: Total number of nets in the design is 15276, 0.4 percent of the nets selected for SI analysis nd delay calculation. (MEM-3729.4.2 (PU=0:00:00.3 REAL=0:00:00:00:00)
  etup views included:
                        WNS (ns): 0.001 | 0.009 | 0.001

TNS (ns): 0.000 | 0.000 | 0.000

Violating Paths: 0 | 0 | 0

All Paths: 2834 | 2790 | 2832
    ensity: 90.176%
otal number of glitch violations: 0
  eported timing to dir timingReports
btal CPU time: 18.25 sec
btal Real time: 11.0 sec
btal Memory Usage: 3463.394531 Mbytes
set AAE Options
    | Institute | Inst
                      Hold mode all reg2reg default

MNS (ns): 0.541 0.541 0.704

TNS (ns): 0.000 0.000 0.000

Violating Paths: 0 0 0

All Paths: 2834 2790 2832
  eported timing to dir timingReports
otal CPU time: 17.77 sec
otal Real time: 10.0 sec
otal Memory Usage: 3414.867188 Mbytes
eset AAE Options
```

#### Transistor-level Pass

```
| Manning! | Taing violation | Sectupholdchold() | Cosedge CK &&& (flag == 1):5850 PS, negedge D:5777 PS, 0.255 : 255 PS, -0.066 : -66 PS ); | File: //syn/tsnc13_ngg.v, line = 1806.4 | Scope: testfixture.u_RNN.\mul_00_reg[4] | Time: 5862 PS | Time: 5863 PS | Time: 5863
```