#### **Computer-Aided VLSI System Design**

# Final Project: 5G MIMO Demodulation

**lecturer: Yu-Hsuan Tsai** 

Graduate Institute of Electronics Engineering, National Taiwan University

MediaTek





#### **Overview**



 MIMO (multiple input, multiple output) is an antenna technology for wireless communications, the encode flow is as follows:



- At the receiver, we need to decode the data by reverting the encode flow
- In this project, we'll try to implement a part of simple MIMO receiver to demodulate the RX data
  - AWGN (additive white Gaussian noise) channel

### **System Model**



The received signal y per data RE can be expressed as [2] RE: resource element

$$- \underline{y} = H\underline{\tilde{s}} + \underline{n}$$

- H: channel,  $\tilde{s}$ : transmitted symbol, n: noise
- At the 4TX \* 4RX transmission, the formula can be re-written as

$$-\begin{bmatrix} \frac{y_1}{y_2} \\ \frac{y_2}{y_3} \\ \frac{y_4}{y_4} \end{bmatrix} = \begin{bmatrix} H_{11} & H_{12} & H_{13} & H_{14} \\ H_{21} & H_{22} & H_{23} & H_{24} \\ H_{31} & H_{32} & H_{33} & H_{34} \\ H_{41} & H_{42} & H_{43} & H_{44} \end{bmatrix} \begin{bmatrix} \frac{\tilde{s}_1}{\tilde{s}_2} \\ \frac{\tilde{s}_3}{\tilde{s}_4} \end{bmatrix} + \begin{bmatrix} \frac{n_1}{n_2} \\ \frac{n_3}{n_4} \end{bmatrix}$$

- The MIMO receiver is to demodulate the  $\underline{\tilde{s}}$  by y, H,  $\underline{n}$ .
- $-\ \underline{\tilde{s}}_1, \underline{\tilde{s}}_2, \underline{\tilde{s}}_3$  and  $\underline{\tilde{s}}_4$  are the symbol with modulation (QPSK 2-bit data)
- The output of MIMO receiver is the LLR per bit
  - LLR: log likelihood ratio, if the value is positive, it means the possibility of this bit is 0 is much higher than 1, and vice versa
  - Total 8 LLRs per data RE (4-signal \* 2-bit (QPSK))

### **System Model**



 From [2], MIMO receiver is composed of QR decomposition (QRD) and Maximum Likelihood (ML) demodulation



- At this project, we will provide the detailed formula of ML demod. with full search for implementation
- To reduce the complexity of MIMO receiver, many proposals can be found on papers, you can try different proposals to observe the performance and the area/power/latency if you have interest

#### **Tx2Rx Model**



Tx2Rx Model

$$- \underline{y} = H\underline{\tilde{s}} + \underline{n}$$

- Modulator: transmitted signal  $\underline{\tilde{s}}$ 
  - QPSK (TS 38.211 Section 5.1 [1]): pairs of bits are mapped to complex-valued modulation symbols

    Q
- MIMO
  - Channel: multiply by channel matrix: H
    - 4X4 matrix, complex number
    - use Normal distribution to generate a random matrix
    - *H* ∼N(0,1/4)
  - AWGN: add noise  $\underline{n}$ 
    - adds white Gaussian noise from MATLAB function: awgn()
    - <u>n</u> ~N(0,1)



#### **QR Decomposition (QRD)**



- Motivation
  - Reduce the complexity of Maximum Likelihood (ML) demodulation

• 
$$\underline{\hat{s}} = argmin_{\underline{s} \in A} (\|\underline{y} - H\underline{s}\|^2)$$
 A: a set of all combinations of 4 transmitted symbol vectors (s<sub>1</sub>~s<sub>4</sub>)

With QR decomposition, a signal model can be re-written as

$$\frac{\underline{y} = H\underline{s} + \underline{n}}{\underline{y} = (QR)\underline{s} + \underline{n}} \begin{cases}
Q: an orthogonal matrix, where  $Q^HQ = I \\
Q^H\underline{y} = Q^HQR\underline{s} + Q^H\underline{n}
\end{cases}$ 

$$\frac{\underline{y} = H\underline{s} + \underline{n}}{\widehat{y} = R\underline{s} + \underline{v}}$$

$$\frac{Q^H\underline{y} = Q^HQR\underline{s} + Q^H\underline{n}}{\widehat{y} = R\underline{s} + \underline{v}}$$$$

– ML demodulation question becomes  $\hat{\underline{s}} = argmin_{\underline{s} \in A}(\|\hat{\underline{y}} - R\underline{s}\|^2)$ 

Soft-bit calculation [2]

$$LLR: L\left(x_{k,b}|\underline{y}\right) \approx \min_{\underline{x} \in X_{k,b,1}} \left\|\underline{y} - H\underline{s}\right\|^2 - \min_{\underline{x} \in X_{k,b,0}} \left\|\underline{y} - H\underline{s}\right\|^2 \quad \underline{x} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{bmatrix} = \begin{bmatrix} (x_{1,1}, x_{1,2}) \\ (x_{2,1}, x_{2,2}) \\ (x_{3,1}, x_{3,2}) \\ (x_{4,1}, x_{4,2}) \end{bmatrix} \Leftrightarrow \underline{s}$$

 $b^{th}$  bit in the  $x_k$ 

 $\mathbf{k}^{\text{th}}$  entry of  $\mathbf{x}$ ,  $X_{k,b,1}$ : subsets of  $\{x\}$  with the bth bit in the  $\mathbf{k}^{\text{th}}$  entry = 1

 $X_{k,b,0}$ : subsets of  $\{\underline{x}\}$  with the b<sup>th</sup> bit in the k<sup>th</sup> entry = 0

$$\left\| \underline{y} - H\underline{s} \right\|^{2} \Rightarrow \left\| \underline{\hat{y}} - R\underline{s} \right\|^{2} = \left( \begin{bmatrix} \widehat{y_{1}} \\ \widehat{y_{2}} \\ \widehat{y_{3}} \\ \widehat{y_{4}} \end{bmatrix} - \begin{bmatrix} R_{11} & R_{12} & R_{13} & R_{14} \\ 0 & R_{22} & R_{23} & R_{24} \\ 0 & 0 & R_{33} & R_{34} \\ 0 & 0 & 0 & R_{44} \end{bmatrix} \begin{bmatrix} s_{1} \\ s_{2} \\ s_{3} \\ s_{4} \end{bmatrix} \right)^{2}$$

$$= \sum_{i=1}^{4} | [\widehat{y}_i - \sum_{j=i}^{4} R_{ij} s_j] |^2$$

- Hard-bit calculation
  - $L(x_{k,b}|y)$  's sign-bit = 0, hard-bit out = 0
  - $L(x_{k,b}|y)$  's sign-bit = 1, hard-bit out = 1

#### Formula

$$- \text{ LLR for } x_{1,1} : L\left(x_{1,1}|\underline{y}\right) = \min_{\underline{x} \in X_{1,1,1}} \sum_{i=1}^{4} \left| \left[ \widehat{y_i} - \sum_{j=i}^{4} R_{ij} s_j \right] \right|^2 - \min_{\underline{x} \in X_{1,1,0}} \sum_{i=1}^{4} \left| \left[ \widehat{y_i} - \sum_{j=i}^{4} R_{ij} s_j \right] \right|^2 - \min_{\underline{x} \in X_{1,2,0}} \sum_{i=1}^{4} \left| \left[ \widehat{y_i} - \sum_{j=i}^{4} R_{ij} s_j \right] \right|^2 - \min_{\underline{x} \in X_{1,2,0}} \sum_{i=1}^{4} \left| \left[ \widehat{y_i} - \sum_{j=i}^{4} R_{ij} s_j \right] \right|^2 - \min_{\underline{x} \in X_{2,1,0}} \sum_{i=1}^{4} \left| \left[ \widehat{y_i} - \sum_{j=i}^{4} R_{ij} s_j \right] \right|^2 - \min_{\underline{x} \in X_{2,1,0}} \sum_{i=1}^{4} \left| \left[ \widehat{y_i} - \sum_{j=i}^{4} R_{ij} s_j \right] \right|^2 - \min_{\underline{x} \in X_{2,2,0}} \sum_{i=1}^{4} \left| \left[ \widehat{y_i} - \sum_{j=i}^{4} R_{ij} s_j \right] \right|^2 - \min_{\underline{x} \in X_{2,2,0}} \sum_{i=1}^{4} \left| \left[ \widehat{y_i} - \sum_{j=i}^{4} R_{ij} s_j \right] \right|^2 - \min_{\underline{x} \in X_{2,2,0}} \sum_{i=1}^{4} \left| \left[ \widehat{y_i} - \sum_{j=i}^{4} R_{ij} s_j \right] \right|^2 - \min_{\underline{x} \in X_{2,2,0}} \sum_{i=1}^{4} \left| \left[ \widehat{y_i} - \sum_{j=i}^{4} R_{ij} s_j \right] \right|^2 - \min_{\underline{x} \in X_{2,2,0}} \sum_{i=1}^{4} \left| \left[ \widehat{y_i} - \sum_{j=i}^{4} R_{ij} s_j \right] \right|^2 - \min_{\underline{x} \in X_{2,2,0}} \sum_{i=1}^{4} \left| \left[ \widehat{y_i} - \sum_{j=i}^{4} R_{ij} s_j \right] \right|^2 - \min_{\underline{x} \in X_{2,2,0}} \sum_{i=1}^{4} \left| \left[ \widehat{y_i} - \sum_{j=i}^{4} R_{ij} s_j \right] \right|^2 - \min_{\underline{x} \in X_{2,2,0}} \sum_{i=1}^{4} \left| \left[ \widehat{y_i} - \sum_{j=i}^{4} R_{ij} s_j \right] \right|^2 - \min_{\underline{x} \in X_{2,2,0}} \sum_{i=1}^{4} \left| \left[ \widehat{y_i} - \sum_{j=i}^{4} R_{ij} s_j \right] \right|^2 - \min_{\underline{x} \in X_{2,2,0}} \sum_{i=1}^{4} \left| \left[ \widehat{y_i} - \sum_{j=i}^{4} R_{ij} s_j \right] \right|^2 - \min_{\underline{x} \in X_{2,2,0}} \sum_{i=1}^{4} \left| \left[ \widehat{y_i} - \sum_{j=i}^{4} R_{ij} s_j \right] \right|^2 - \min_{\underline{x} \in X_{2,2,0}} \sum_{i=1}^{4} \left| \left[ \widehat{y_i} - \sum_{j=i}^{4} R_{ij} s_j \right] \right|^2 - \min_{\underline{x} \in X_{2,2,0}} \sum_{i=1}^{4} \left| \left[ \widehat{y_i} - \sum_{j=i}^{4} R_{ij} s_j \right] \right|^2 - \min_{\underline{x} \in X_{2,2,0}} \sum_{i=1}^{4} \left| \left[ \widehat{y_i} - \sum_{j=i}^{4} R_{ij} s_j \right] \right|^2 - \min_{\underline{x} \in X_{2,2,0}} \sum_{i=1}^{4} \left| \left[ \widehat{y_i} - \sum_{j=i}^{4} R_{ij} s_j \right] \right|^2 - \min_{\underline{x} \in X_{2,2,0}} \sum_{i=1}^{4} \left| \left[ \widehat{y_i} - \sum_{j=i}^{4} R_{ij} s_j \right] \right|^2 - \min_{\underline{x} \in X_{2,2,0}} \sum_{i=1}^{4} \left| \left[ \widehat{y_i} - \sum_{j=i}^{4} R_{ij} s_j \right] \right|^2 - \min_{\underline{x} \in X_{2,2,0}} \sum_{i=1}^{4} \left| \left[ \widehat{y_i} - \sum_{j=i}^{4} R_{ij} s_j \right] \right|^2 - \min_{\underline{x} \in X_{2,2,0}} \sum_{i=1}^{4} \left| \left[ \widehat{y_i} - \sum_{j=i$$

#### Formula

$$- s_1 \sim s_4$$
: one of  $\left(\frac{1}{\sqrt{2}} + \frac{1}{\sqrt{2}}\mathbf{j}\right)$ ,  $\left(-\frac{1}{\sqrt{2}} + \frac{1}{\sqrt{2}}\mathbf{j}\right)$ ,  $\left(\frac{1}{\sqrt{2}} - \frac{1}{\sqrt{2}}\mathbf{j}\right)$ , and  $\left(-\frac{1}{\sqrt{2}} - \frac{1}{\sqrt{2}}\mathbf{j}\right)$ 

- At 
$$\sum_{i=1}^{4} |[\widehat{y}_i - \sum_{j=i}^{4} R_{ij} s_j]|^2$$
 part:

- 4<sup>th</sup> entry:  $\widehat{y_4} R_{44}s_4 = a + bj \rightarrow a^2 + b^2$
- 3<sup>rd</sup> entry:  $\widehat{y_3} R_{33}s_3 R_{34}s_4 = c + dj \rightarrow c^2 + d^2$
- 2<sup>nd</sup> entry:  $\widehat{y}_2 R_{22}s_2 R_{23}s_3 R_{24}s_4 = e + fj \rightarrow e^2 + f^2$
- 1st entry:  $\widehat{y_1} R_{11}s_1 R_{12}s_2 R_{13}s_3 R_{14}s_4 = g + hj \rightarrow g^2 + h^2$

$$-\sum_{i=1}^{4} \left| \left[ \hat{y}_i - \sum_{j=i}^{4} R_{ij} s_j \right] \right|^2 = a^2 + b^2 + c^2 + d^2 + e^2 + f^2 + g^2 + h^2$$

- QPSK constellation [1]
  - $-x_1 \sim x_4$ : one of (0,0), (1,0), (0,1), and (1,1)

$$- \ s_1 \sim s_4 : \ \text{one of} \left( \frac{1}{\sqrt{2}} + \frac{1}{\sqrt{2}} j \right), \left( -\frac{1}{\sqrt{2}} + \frac{1}{\sqrt{2}} j \right), \left( \frac{1}{\sqrt{2}} - \frac{1}{\sqrt{2}} j \right) \ \text{, and} \ \left( -\frac{1}{\sqrt{2}} - \frac{1}{\sqrt{2}} j \right)$$

- Total 256 (=4<sup>4</sup>) possibilities for 4-layer QPSK
- Full search with 256 possibilities



- Compute  $\sum_{i=1}^4 \left| \left[ \widehat{y}_i \sum_{j=i}^4 R_{ij} s_j \right] \right|^2$  for each A(path M), M=1~256
  - Bring in total 256 results to compute each bit LLR, exactly 128 results for  $X_{k,b,0}$  and  $X_{k,b,1}$  without overlapping

- A: a set of all combinations of s<sub>1</sub>-s<sub>4</sub>
   with total 4<sup>4</sup> = 256 combinations
- A(M): one of the combination



### **Block Diagram**





# Input/Output

| Signal Name | I/O | Width | Simple Description                                                                                                                                                                                                                                                                                                       |  |  |  |
|-------------|-----|-------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|
| i_clk       | I   | 1     | 本系統為同步於時脈正緣之同步設計。<br>(註: Host端採clk正緣時送資料。)                                                                                                                                                                                                                                                                               |  |  |  |
| i_reset     | ı   | 1     | 高位準"非"同步(active high asynchronous)之系統重置信號。                                                                                                                                                                                                                                                                               |  |  |  |
| i_trig      | I   | 1     | 輸入資料有效控制訊號。當為high時i_y_hat與i_r有效。                                                                                                                                                                                                                                                                                         |  |  |  |
| i_y_hat     | I   | 160   | $\hat{y}$ 資料傳輸,包含 4 筆,每筆各 40 bits,i_y_hat [159:120] 為 y <sub>4</sub> ,i_y_hat [119:80] 為 y <sub>3</sub> ,依此類推,每筆資料包含虛部與實部 ({imaginary, real}),各20位元,為{S3.16}之fixed point。                                                                                                                                                |  |  |  |
| i_r         | I   | 320   | R資料傳輸,依序為 {r <sub>44</sub> , r <sub>34</sub> , r <sub>24</sub> , r <sub>14</sub> , r <sub>33</sub> , r <sub>23</sub> , r <sub>13</sub> , r <sub>22</sub> , r <sub>12</sub> , r <sub>11</sub> },r <sub>ii</sub> 僅包含實部,為20位元,{S3.16}之fixed point,r <sub>ij</sub> 則包含虛部與實部({imaginary, real}),各20位元,同樣為{S3.16}之fixed point。 |  |  |  |
| i_rd_rdy    | ı   | 1     | 準備讀取資料控制訊號。當為High時,表示準備好接收資料。                                                                                                                                                                                                                                                                                            |  |  |  |
| o_rd_vld    | 0   | 1     | 輸出資料有效之控制訊號。當為High時,表示目前輸出的 o_llr 與 o_hard_bit 為有效的。                                                                                                                                                                                                                                                                     |  |  |  |
| o_llr       | 0   | 8     | 輸出IIr。一次輸 1bit 的 IIr,為 {S3.4} 之fixed point,詳細參考 ML demodulation。                                                                                                                                                                                                                                                         |  |  |  |
| o_hard_bit  | 0   | 1     | 輸出hard bit。一次輸出 1bit 的 hard bit,詳細參考 ML demodulation。                                                                                                                                                                                                                                                                    |  |  |  |

**{SA.B}:** fixed point with sign bit, A-bit integer, and B-bit fraction

#### **Data format**



- i\_y\_hat and i\_r
  - Real and imaginary are both S3.16



re: real

im: imaginary

|     | MSB             | 3  |    |                |    |    |    |                 |    |    | _  |    |                 |    |    | LSB             |
|-----|-----------------|----|----|----------------|----|----|----|-----------------|----|----|----|----|-----------------|----|----|-----------------|
| i r | r <sub>44</sub> | r  | 34 | r <sub>2</sub> | 24 | r. | 14 | r <sub>33</sub> | r; | 23 | r. | 13 | r <sub>22</sub> | r. | 12 | r <sub>11</sub> |
| _   | re              | im | re | im             | re | im | re | re              | im | re | im | re | re              | im | re | re              |

re: real

im: imaginary

#### Waveform







#### Waveform



- Output buffer depth evaluation
  - i\_rd\_rdy is high randomly, be high 128T every 640T
  - The worst case:



– Buffer should store at least 1024T/64T = 16 REs output

### **Specification**



- Only worst-case library is used for synthesis and APR.
- The slack for setup-time should be non-negative.
- No any timing violation and glitches for the gate level simulation and post-layout simulation.

### **Specifications for APR (1)**



- 只需做 Marco layout 即不用包含 IO Pad 、 Bonding Pad)
- VDD 與 VSS Power Ring 寬度請各設定為 2um 只須做一組
- 不需加 Dummy Metal
- Power Stripe 務必至少加一組 , 其 VDD 、 VSS 寬度各設定為 2um
  - Power Stripe 垂直方向至少一組,水平方向可不加



### **Specifications for APR (2)**



- 務必要加 Power Rail (follow pin)
- Core Filler 務必要加
- APR 後之 GDSII 檔案務必產生
- 完成 APR DRC/LVS 完全無誤
- 記得先產生ml\_demodulator.ioc,再重新讀取該檔來設定 pin position

### **Grading Policy**



Baseline 50% + Performance 40% + Report 10%

| Item           | %  | Description                                                                         |  |  |  |
|----------------|----|-------------------------------------------------------------------------------------|--|--|--|
| RTL Simulation | 20 | Pass full pattern simulation with specs                                             |  |  |  |
| Synthesis      | 10 | Pass gate-level sim                                                                 |  |  |  |
| APR            | 20 | Finish APR with no DRC/LVS errors Pass post-layout simulation                       |  |  |  |
| Performance    | 40 | Area x Time x Power                                                                 |  |  |  |
| Report         | 10 | <ol> <li>Algorithm</li> <li>Performance</li> <li>Hardware implementation</li> </ol> |  |  |  |

| Violation                             | Penalty         |
|---------------------------------------|-----------------|
| Gate-level sim pass but post-sim fail | Performance*0.5 |
| Only RTL pass                         | Performance不評分  |
| 違反繳交格式與規則                             | 總分-3            |

#### **Grading Policy - Test Pattern**



- AWGN channel
- Total 6 packets,
  - 3 Packets: SNR = 10dB, Data Error Rate < 0.12</p>
  - 3 Packets: SNR = 15dB, Data Error Rate < 0.01</li>
  - If any LLR == 0, it will be identified as fail data
- Total 1000 data RE at each packet

## **Grading Policy - Report**



- Algorithm
  - ML demod. algorithm introduction
  - FXP setting
- Performance
  - The plot with Data Error Rate vs. SNR
- HW implementation
  - HW scheduling
  - HW block diagram
  - Area / Power / Latency report
    - Technique sharing for HW improvement



### **Submission (1)**



- Due Tuesday, June 6, 23:59
  - No late submission
- Require data (with the required directory hierarchy):

| Violation | Penalty                                      |
|-----------|----------------------------------------------|
| 01_RTL    | 1. All design Verilog files 2. rtl.f         |
| 02_SYN    | 1. Area/timing reports                       |
| 03_GATE   | 1. ml_demodulator_syn.v/sdf 2. rtl.f         |
| 04_APR    | 1. All design database 2. ml_demodulator.gds |
| 05_POST   | 1. ml_demodulator_pr.v/sdf 2. rtl.f          |
| reports   | 1. design.spec. 2. teamXX_report.pdf         |

- Final project presentation (MTK experience sharing)
  - Date: June 13, 2023

### Submission (2)

Create a folder named teamID\_final\_project and follow the

hierarchy below

```
team03 final project
          - 01 RTL
                     ml demodulator.v (and other verilog files)
           02_SYN
                    - ml demodulator.area
                     ml demodulator.max.timing
                     ml_demodulator.min.timing
           03 GATE
                    ·ml demodulator syn.sdf
                   ml_demodulator_syn.v
                    rtl.f
            04 APR
                     route
                    - route.dat
                     ml demodulator.gds
                    ml_demodulator_pr.sdf
                     ml demodulator pr.v
                     rtl.f
                   design.spec
                     team03_report.pdf
```

- Compress the folder teamID\_final\_project in a tar file named teamID\_final\_project\_vk.tar (k is the number of version, k =1,2,...), e.g. team03\_final\_project\_v1.tar
- Submit to NTU Cool

#### Reference



- [1] 5G 3GPP spec 38.211 : Link
- [2] Parallel High Throughput Soft-output Sphere Decoder: Link
- [3] Gram-Schmidt process: <u>Link</u>