#### **Computer-Aided VLSI System Design**

# Final Project: 5G MIMO Demodulation: QR Decomposition

**Lecturer: Yi-Lin Lo** 

Graduate Institute of Electronics Engineering, National Taiwan University

MediaTek



MEDIATEK

#### **Overview**



 MIMO (multiple input, multiple output) is an antenna technology for wireless communications, the encode flow is as follows:



- At the receiver, we need to decode the data by reverting the encode flow
- In this project, we'll try to implement a part of simple MIMO receiver to demodulate the RX data
  - AWGN (additive white Gaussian noise) channel

### **System Model**



The received signal y per data RE can be expressed as [1] RE: resource element

$$- \underline{y} = H\underline{\tilde{s}} + \underline{n}$$

- H: channel,  $\tilde{s}$ : transmitted symbol, n: noise
- At the 4TX \* 4RX transmission, the formula can be re-written as

$$-\begin{bmatrix} \frac{y_1}{y_2} \\ \frac{y_2}{y_3} \\ \frac{y_4}{y_4} \end{bmatrix} = \begin{bmatrix} H_{11} & H_{12} & H_{13} & H_{14} \\ H_{21} & H_{22} & H_{23} & H_{24} \\ H_{31} & H_{32} & H_{33} & H_{34} \\ H_{41} & H_{42} & H_{43} & H_{44} \end{bmatrix} \begin{bmatrix} \frac{\tilde{s}}{\tilde{s}_1} \\ \frac{\tilde{s}_2}{\tilde{s}_3} \\ \frac{\tilde{s}_4}{\tilde{s}_4} \end{bmatrix} + \begin{bmatrix} \frac{n_1}{n_2} \\ \frac{n_3}{n_4} \end{bmatrix}$$

- The MIMO receiver is to demodulate the  $\underline{\tilde{s}}$  by y, H,  $\underline{n}$ .
- $-\ \underline{\tilde{s}}_1, \underline{\tilde{s}}_2, \underline{\tilde{s}}_3$  and  $\underline{\tilde{s}}_4$  are the symbol with modulation (QPSK 2-bit data)
- The output of MIMO receiver is the LLR per bit
  - LLR: log likelihood ratio, if the value is positive, it means the possibility of this bit is 0 is much higher than 1, and vice versa
  - Total 8 LLRs per data RE (4-signal \* 2-bit (QPSK))

#### **System Model**



 From [1], MIMO receiver is composed of QR decomposition (QRD) and Maximum Likelihood (ML) demodulation



#### **Tx2Rx Model**



Tx2Rx Model

$$- \underline{y} = H\underline{\tilde{s}} + \underline{n}$$

- Modulator: transmitted signal  $\tilde{\underline{s}}$ 
  - QPSK: pairs of bits are mapped to complex-valued modulation symbols
- MIMO
  - Channel: multiply by channel matrix: H
    - 4X4 matrix, complex number
    - Normal distribution random matrix
    - *H* ∼N(0,1/4)
  - AWGN: add noise n
    - adds white Gaussian noise
    - <u>n</u> ~N(0,1)



### **QR Decomposition (QRD)**



- Motivation
  - Reduce the complexity of Maximum Likelihood (ML) demodulation

• 
$$\hat{\underline{s}} = argmin_{\underline{s} \in A} (\|\underline{y} - H\underline{s}\|^2)$$
 A: a set of all combinations of 4 transmitted symbol vectors  $(s_1 \sim s_4)$ 

With QR decomposition, a signal model can be re-written as

$$\underline{y} = H\underline{s} + \underline{n}$$

$$\underline{y} = (QR)\underline{s} + \underline{n}$$

$$Q^{H}\underline{y} = Q^{H}QR\underline{s} + Q^{H}\underline{n}$$

$$\underline{\hat{y}} = R\underline{s} + \underline{v}$$

$$\begin{cases}
Q: an orthogonal matrix, where  $Q^{H}Q = I$ 

$$R: an upper triangular matrix$$$$

— ML demodulation question becomes  $\hat{\underline{s}} = argmin_{\underline{s} \in A}(\|\hat{\underline{y}} - R\underline{s}\|^2)$ 

#### **Modified Gram-Schmidt**



- You can find the concept of Gram-Schmidt Procedure in Prof.
   Hung-Yi Lee's Linear Algebra course [2]
  - Starting from 21:30
- In this project, we use Modified Gram-Schmidt Procedure [3]
  - Please refer to **Numerical stability** in [3]

$$H = [h_1|h_2|h_3|h_4] = [e_1|e_2|e_3|e_4] \begin{bmatrix} h_1 \cdot e_1 & h_2 \cdot e_1 & h_3 \cdot e_1 & h_4 \cdot e_1 \\ 0 & h_2 \cdot e_2 & h_3 \cdot e_2 & h_4 \cdot e_2 \\ 0 & 0 & h_3 \cdot e_3 & h_4 \cdot e_3 \\ 0 & 0 & 0 & h_4 \cdot e_4 \end{bmatrix} = QR$$

- 1. Calculate Euclidean distance  $\|\mathbf{h}_1^{(0)}\|$  // $R_{11}$
- 2. Calculate normalized orthogonal vector  $e_1 = h_1^{(0)} / \|h_1^{(0)}\|$
- 3. Calculate inner products

$$- h_2^{(0)} \cdot e_1 = e_1^H h_2^{(0)} //R_{12}$$

$$- h_3^{(0)} \cdot e_1 = e_1^H h_3^{(0)} //R_{13}$$

$$- h_4^{(0)} \cdot e_1 = e_1^H h_4^{(0)} //R_{14} = \sqrt{a^2 + b^2} \text{ for } (a + bj)$$

Note: Euclidean distance
$$-\sqrt{a^2 + b^2} \text{ for } (a + bi)$$

$$\operatorname{proj}_{\mathbf{u}}(\mathbf{v}) = \frac{\langle \mathbf{v}, \mathbf{u} \rangle}{\langle \mathbf{u}, \mathbf{u} \rangle} \mathbf{u},$$

 $Q_{11}, Q_{21}, Q_{31}, Q_{41}$ 

• 4. Calculate orthogonal vectors  $h_2^{(1)}$ ,  $h_3^{(1)}$ ,  $h_4^{(1)}$  by removing its projection along  $h_1$ 

$$- h_2^{(1)} = h_2^{(0)} - proj_{h_1}^{h_2^{(0)}} = h_2^{(0)} - (h_2^{(0)} \cdot e_1)e_1 = h_2^{(0)} - (R_{12})e_1$$

$$- h_3^{(1)} = h_3^{(0)} - proj_{h_1}^{h_3^{(0)}} = h_3^{(0)} - (h_3^{(0)} \cdot e_1)e_1 = h_3^{(0)} - (R_{13})e_1$$

$$- h_4^{(1)} = h_4^{(0)} - proj_{h_1}^{h_4^{(0)}} = h_4^{(0)} - (h_4^{(0)} \cdot e_1)e_1 = h_4^{(0)} - (R_{14})e_1$$

- 1. Calculate Euclidean distance  $\|\mathbf{h}_{2}^{(1)}\| //R_{22}$
- 2. Calculate normalized orthogonal vector  $e_2 = h_2^{(1)} / \|h_2^{(1)}\|$
- 3. Calculate inner products

$$- h_3^{(1)} \cdot e_2 = e_2^H h_3^{(1)} //R_{23}$$

$$- h_4^{(1)} \cdot e_2 = e_2^H h_4^{(1)} //R_{24}$$



• 4. Calculate orthogonal vectors  $h_3^{(2)}$ ,  $h_4^{(2)}$  by removing its projection along  $h_2$ 

$$- h_3^{(2)} = h_3^{(1)} - proj_{h_2}^{h_3^{(1)}} = h_3^{(1)} - (h_3^{(1)} \cdot e_2)e_2 = h_3^{(1)} - (R_{23})e_2$$

$$- h_4^{(2)} = h_4^{(1)} - proj_{h_2}^{h_4^{(1)}} = h_4^{(1)} - (h_4^{(1)} \cdot e_2)e_2 = h_4^{(1)} - (R_{24})e_2$$

- 1. Calculate Euclidean distance  $\|\mathbf{h}_3^{(2)}\| //R_{33}$
- 2. Calculate normalized orthogonal vector  $e_3 = h_3^{(2)} / ||h_3^{(2)}||$
- 3. Calculate inner products

$$- h_4^{(2)} \cdot e_3 = e_3^H h_4^{(2)} //R_{34}$$



• 4. Calculate orthogonal vectors  $h_4^{(3)}$  by removing its projection along  $h_3$ 

$$- h_4^{(3)} = h_4^{(2)} - proj_{h_3}^{h_4^{(2)}} = h_4^{(2)} - (h_4^{(2)} \cdot e_3)e_3 = h_4^{(2)} - (R_{34})e_3$$

- 1. Calculate Euclidean distance  $\|m{h}_4^{(3)}\|$  // $R_{44}$
- 2. Calculate normalized orthogonal vector  $e_4 = h_4^{(3)} / ||h_4^{(3)}||$

Q<sub>14</sub>, Q<sub>24</sub>, Q<sub>34</sub>, Q<sub>4</sub>

• Received signal y\_hat  $(\hat{y})$  becomes

$$- \ \hat{\underline{y}} = Q^H \underline{y} = \begin{bmatrix} Q_{11}^* & Q_{21}^* & Q_{31}^* & Q_{41}^* \\ Q_{12}^* & Q_{22}^* & Q_{32}^* & Q_{42}^* \\ Q_{13}^* & Q_{23}^* & Q_{33}^* & Q_{43}^* \\ Q_{14}^* & Q_{24}^* & Q_{34}^* & Q_{44}^* \end{bmatrix} \begin{bmatrix} y_1 \\ y_2 \\ y_3 \\ y_4 \end{bmatrix} = \begin{bmatrix} Q_{11}^* y_1 + Q_{21}^* y_2 + Q_{31}^* y_3 + Q_{41}^* y_4 \\ Q_{12}^* y_1 + Q_{22}^* y_2 + Q_{32}^* y_3 + Q_{42}^* y_4 \\ Q_{13}^* y_1 + Q_{23}^* y_2 + Q_{33}^* y_3 + Q_{43}^* y_4 \\ Q_{14}^* y_1 + Q_{24}^* y_2 + Q_{34}^* y_3 + Q_{44}^* y_4 \end{bmatrix}$$

Soft-bit calculation [1]

$$LLR: L\left(\frac{x_{k,b}|\underline{y}}{\underline{y}}\right) \approx \min_{\underline{x} \in X_{k,b,1}} \left\|\underline{y} - H\underline{s}\right\|^2 - \min_{\underline{x} \in X_{k,b,0}} \left\|\underline{y} - H\underline{s}\right\|^2 \quad \underline{x} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{bmatrix} = \begin{bmatrix} (x_{1,1}, x_{1,2}) \\ (x_{2,1}, x_{2,2}) \\ (x_{3,1}, x_{3,2}) \\ (x_{4,1}, x_{4,2}) \end{bmatrix} \Leftrightarrow \underline{s}$$

 $b^{th}$  bit in the  $x_k$ 

 $\mathbf{k}^{\text{th}}$  entry of  $\mathbf{x}$ ,  $X_{k,b,1}$ : subsets of  $\{x\}$  with the bth bit in the  $\mathbf{k}^{\text{th}}$  entry = 1

 $X_{k,b,0}$ : subsets of  $\{\underline{x}\}$  with the b<sup>th</sup> bit in the k<sup>th</sup> entry = 0

$$\left\| \underline{y} - H\underline{s} \right\|^{2} \Rightarrow \left\| \underline{\hat{y}} - R\underline{s} \right\|^{2} = \left( \begin{bmatrix} \widehat{y_{1}} \\ \widehat{y_{2}} \\ \widehat{y_{3}} \\ \widehat{y_{4}} \end{bmatrix} - \begin{bmatrix} R_{11} & R_{12} & R_{13} & R_{14} \\ 0 & R_{22} & R_{23} & R_{24} \\ 0 & 0 & R_{33} & R_{34} \\ 0 & 0 & 0 & R_{44} \end{bmatrix} \begin{bmatrix} s_{1} \\ s_{2} \\ s_{3} \\ s_{4} \end{bmatrix} \right)^{2}$$

$$= \sum_{i=1}^{4} \left| \left[ \widehat{y}_{i} - \sum_{j=i}^{4} R_{ij} s_{j} \right] \right|^{2}$$

- Hard-bit calculation
  - $L(x_{k,b}|y)$  's sign-bit = 0, hard-bit out = 0
  - $L(x_{k,b}|y)$  's sign-bit = 1, hard-bit out = 1

### **Block Diagram**





# Input/Output

| Signal Name | I/O | Width | Simple Description                                                                                                                                                                                                                                                                                                       |
|-------------|-----|-------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| i_clk       | I   | 1     | 本系統為同步於時脈正緣之同步設計。<br>(註: Host端採clk正緣時送資料。)                                                                                                                                                                                                                                                                               |
| i_rst       | I   | 1     | 高位準 "非" 同步(active high asynchronous)之系統重置信號。                                                                                                                                                                                                                                                                             |
| i_trig      | I   | 1     | 輸入資料有效控制訊號。當為high時i_data有效。                                                                                                                                                                                                                                                                                              |
| i_data      | I   | 48    | 每筆資料包含虛部與實部({imaginary, real}),各24位元,為 {S1.22}之fixed point。                                                                                                                                                                                                                                                              |
| o_rd_vld    | 0   | 1     | 輸出資料有效之控制訊號。當為High時,表示目前輸出的 o_y_hat<br>與 o_r 為有效的。                                                                                                                                                                                                                                                                       |
| o_last_data | 0   | 1     | 當為High時,下個i_clk正緣i_trig為High開始送資料(200 cycles)                                                                                                                                                                                                                                                                            |
| o_y_hat     | 0   | 160   | y_hat資料輸出,包含 4 筆,每筆各 40 bits,i_y_hat [159:120]<br>為 y <sub>4</sub> ,i_y_hat [119:80] 為 y <sub>3</sub> ,依此類推,每筆資料包含虛部與實<br>部({imaginary, real}),各20位元,為{S3.16}之fixed point。                                                                                                                                               |
| o_r         | 0   | 320   | R資料輸出,依序為 {r <sub>44</sub> , r <sub>34</sub> , r <sub>24</sub> , r <sub>14</sub> , r <sub>33</sub> , r <sub>23</sub> , r <sub>13</sub> , r <sub>22</sub> , r <sub>12</sub> , r <sub>11</sub> },r <sub>ii</sub> 僅包含實部,為20位元,{S3.16}之fixed point,r <sub>ij</sub> 則包含虛部與實部({imaginary, real}),各20位元,同樣為{S3.16}之fixed point。 |

{SA.B}: fixed point with sign bit, A-bit integer, and B-bit fraction

#### **Data format**



- o\_y\_hat and o\_r
  - Real and imaginary are both S3.16



re: real im: imaginary

|     | MSB             |    |    |    |    |    |    |                 |    |    |                |    |                 |    |           | LSB             |
|-----|-----------------|----|----|----|----|----|----|-----------------|----|----|----------------|----|-----------------|----|-----------|-----------------|
| o_r | r <sub>44</sub> | rs | 34 | r  | 24 | r  | 14 | r <sub>33</sub> | r  | 23 | r <sub>:</sub> | 13 | r <sub>22</sub> | r  | <b>L2</b> | r <sub>11</sub> |
|     | re              | im | re | im | re | im | re | re              | im | re | im             | re | re              | im | re        | re              |

re: real im: imaginary

### I/O Spec



#### Input:

- i\_trig: comes in consecutive 200 cycles, each cycle carries one element of H and y\_hat
- i\_data: complex number, s1.22 for each element

• Order: 
$$H_{11} \rightarrow H_{12} \rightarrow H_{13} \rightarrow H_{14} \rightarrow Y_1 \rightarrow H_{21} \rightarrow H_{22} \rightarrow H_{23} \rightarrow H_{24} \rightarrow Y_2$$
  
  $\rightarrow H_{31} \rightarrow H_{32} \rightarrow H_{33} \rightarrow H_{34} \rightarrow Y_3 \rightarrow H_{41} \rightarrow H_{42} \rightarrow H_{43} \rightarrow H_{44} \rightarrow Y_4$ 

#### Output:

- o\_rd\_vld: set high (a RE of R and y\_hat are ready)
- o\_last\_data: set high (last RE(#10) of R and y\_hat are ready)
- R: s3.16 for each element
- y\_hat: s3.16 for each element

# **Spec - Interface**





### **Specification**



- The clock frequency should be at least 200 MHz
- Only worst-case library is used for synthesis and APR.
- The slack for setup-time should be non-negative.
- No any timing violation and glitches for the gate level simulation and post-layout simulation.

#### **Design Files**



 Use the functions in the "Matlab" file to evaluate the soft LLR error rate of the files generated by RTL simulation



Change the "packet\_no" to evaluate files from packets

### **Specifications for APR (1)**



- 只需做 Marco layout 即不用包含 IO Pad 、 Bonding Pad)
- VDD 與 VSS Power Ring 寬度請各設定為 2um 只須做一組
- 不需加 Dummy Metal
- Power Stripe 務必至少加一組 · 其 VDD 、 VSS 寬度各設定為 2um
  - Power Stripe 垂直方向至少一組,水平方向可不加



### **Specifications for APR (2)**



- 務必要加 Power Rail (follow pin)
- Core Filler 務必要加
- APR 後之 GDSII 檔案務必產生
- 完成 APR DRC/LVS 完全無誤
- 記得先產生QR\_Engine.ioc,再重新讀取該檔來設定 pin position

### **Grading Policy**



Baseline 50% + Performance 40% + Report 10%

| Item           | %  | Description                                                    |
|----------------|----|----------------------------------------------------------------|
| RTL Simulation | 20 | Pass full pattern simulation with specs                        |
| Synthesis      | 10 | Pass gate-level sim                                            |
| APR            | 20 | Finish APR with no DRC/LVS errors Pass post-layout simulation  |
| Performance    | 40 | Area x Time x Power / (Performance Gain)                       |
| Report         | 10 | <ol> <li>Algorithm</li> <li>Hardware implementation</li> </ol> |

| Violation                                   | Penalty         |
|---------------------------------------------|-----------------|
| Clock Frequency of Gate-level sim < 200 MHz | Performance*0.5 |
| Gate-level sim pass but post-sim fail       | Performance*0.5 |
| Only RTL pass                               | Performance不評分  |
| 違反繳交格式與規則                                   | 總分-5            |

#### **Grading Policy - Test Pattern**



- Total 6 packets: P#1-3 (SNR 10dB) and P#4-6 (SNR 15dB)
- Another 6 packets of hidden data: P#7-9 and P#10-12

| Soft LLR Error Rate (SNR)   |       |       |        |       |        |
|-----------------------------|-------|-------|--------|-------|--------|
| Soft LLR Error Rate (10 dB) | < 2 % | < 6%  | < 10 % | < 13% | >= 13% |
| Soft LLR Error Rate (15 dB) | < 1%  | < 2%  | < 4%   | < 6%  | >= 6%  |
| Performance Gain            | 1.76x | 1.32x | 1.14x  | 1.00x | fail   |

- For example, if a student gets
  - 1.56% in P#1, 1.12% in P#2, 1.00% in P#3, 1.20% in P#7, 1.30% in P#8, 2.01% in P#9 -> Performance Gain 1.32x
  - 3.57% in P#4, 4.13% in P#5, 1.56% in P#6, 3.50% in P#10, 2.57% in P#11, 0.56% in P#12 -> Performance Gain 1.00x
  - Take average, Performance Gain (1.32+1.00)/2 = 1.16x as final score
- Total 1000 data RE at each packet

See soft LLR error rate in appendix

#### **Power**



- Only need to calculate the total power of 10 RE
- Change the parameter "NO\_10RE" to 1 in testbench

```
testfixture.v

itimescale 1ns/1ps
define CYCLE 5.0

// NO_10RE 100 (function check)
// NO_10RE 1 (power analysis)
define NO_10RE 1 // 1 packet = 1000RE
module testfixture;
```

Use the output fsdb file for power analysis of primetime

# Area x Time x Power / (Performance Gain)

- Area (um²): Core area
  - 小數點以下四捨五入到兩位
- Time (ns): Simulation time of postsim
  - 整數
- Power (mW): Total power of postsim
  - 小數點以下四捨五入到一位

### **Grading Policy - Report**



- Algorithm
  - QR decomposition algorithm introduction
  - FXP setting
- HW implementation
  - HW scheduling
  - HW block diagram
  - Area / Power / Latency report
    - Technique sharing for HW improvement

#### **Submission Files**

- Due Tuesday, Dec. 19, 23:59 (Submit to NTUCOOL)
  - No late submission
- Require data (with the required directory hierarchy):

| Violation | Penalty                              |
|-----------|--------------------------------------|
| 01_RTL    | 1. All design Verilog files 2. rtl.f |
| 02_SYN    | 1. Area/timing reports               |
| 03_GATE   | 1. QR_Engine_syn.v/sdf 2. rtl.f      |
| 04_APR    | 1. Route database 2. QR_Engine.gds   |
| 05_POST   | 1. QR_Engine_pr.v/sdf 2. rtl.f       |
| reports   | 1. design.spec. 2. teamXX_report.pdf |

- Final project presentation (MTK experience sharing)
  - Date: Dec. 26, 2023

### design.spec



```
Maximum operating frequency: (MHz)

Performance Gain:

POST-SIM cycle: (ns)

POST-SIM latency: (ns)

Post layout area: (um^2)

Post layout total power: (mW)

# of DRC violations:

Status of LVS check: (pass/fail)
```

#### **Submission Hierarchy**



Folder name: teamID\_final\_project. Follow the hierarchy below

```
team03 final project
            01_RTL
                     QR Engine.v (and other verilog files)
            02_SYN
                      QR Engine.area
                     QR_Engine.max.timing
                     QR Engine.min.timing
            03 GATE
                      QR_Engine_syn.sdf
                      QR Engine syn.v
            04 APR
                      route
                      route.dat
                      QR Engine.gds
            05 POST
                      QR Engine pr.sdf
                      QR Engine pr.v
            reports
                      design.spec
                      team03_report.pdf
```

Compress the folder teamID\_final\_project in a tar file named teamID\_final\_project\_vk.tar (k is the number of version, k =1,2,...), e.g. team03\_final\_project\_v1.tar

#### **Final Project Presentation**



- The top 5-6 groups have the opportunity to give a presentation on stage and will be eligible for additional bonus points
  - PowerPoint
  - Approximate 15 minutes
- Presentation Content
  - Algorithm (if you use other methods)
  - Bit-length decision
  - ...
  - Hardware Design
    - The performance improvements for steps

#### Reference



- [1] Parallel High Throughput Soft-output Sphere Decoder: <u>Link</u>
- [2] [Video][Hung-yi Lee] Orthogonal Basis: Gram-Schmidt process: Link
- [3] Gram-Schmidt process: Link



# **Appendix**

#### **Performance Evaluation Before APR**

- Area (um²): Total cell area (Synthesis)
- Time (ns): Simulation time of gatesim
- Power (mW): Total power of gatesim

A x T x P

Performance Gain

| Level 6 | Unbelievable !!! | 10 <sup>11</sup>         |
|---------|------------------|--------------------------|
| Level 5 | Excellent !!     | 10 <sup>12</sup>         |
| Level 4 | Great!           | IC Design Engineer       |
| Level 3 | Good Design      | Have skills in IC Design |
| Level 2 |                  |                          |
| Level 1 |                  | 10 <sup>15</sup>         |
| Level 0 |                  | 10 <sup>16</sup>         |

# Multiply



- 8 x 8 bits: 2.54 ns
- 12 x 12 bits: 3.11 ns
- 16 x 16 bits: 3.46 ns
- 20 x 20 bits: 3.83 ns
- 24 x 24 bits: 4.04 ns
- 28 x 28 bits: 4.23 ns
- 32 x 32 bits: 4.39 ns
- 36 x 36 bits: 4.63 ns

#### DesignWare: Sqrt\_pipe



- Input 33 bits / Output 17 bits
- # of pipeline stage (1): 13.86 ns
- # of pipeline stage (2): 6.93 ns
- # of pipeline stage (3): 4.62 ns
- # of pipeline stage (4): 3.47 ns
- # of pipeline stage (5): 2.77 ns

#### **Soft LLR Error Rate**



- sign(DUT Soft LLR) == sign(GLD Soft LLR<sup>note1</sup>)
  - $|llr_{dut}| \ge |llr_{gld}|$ : error rate = 0%
  - $-|llr_{dut}| < |llr_{gld}|$ : error rate =  $(abs(llr_{gld}) abs(llr_{dut}))/abs(llr_{gld})$
- sign(DUT Soft LLR) ~= sign(GLD Soft LLR)
  - error rate =  $abs(llr_{dut} llr_{gld})/abs(llr_{gld})$

#### Formula

$$\begin{split} &- \text{ LLR for } x_{1,1} : L\left(x_{1,1}|\underline{y}\right) = \min_{\underline{x} \in X_{1,1,1}} \Sigma_{i=1}^4 | \left[\widehat{y_i} - \Sigma_{j=i}^4 R_{ij} s_j\right] |^2 - \min_{\underline{x} \in X_{1,1,0}} \Sigma_{i=1}^4 | \left[\widehat{y_i} - \Sigma_{j=i}^4 R_{ij} s_j\right] |^2 \\ &- \text{ LLR for } x_{1,2} : L\left(x_{1,2}|\underline{y}\right) = \min_{\underline{x} \in X_{1,2,1}} \Sigma_{i=1}^4 | \left[\widehat{y_i} - \Sigma_{j=i}^4 R_{ij} s_j\right] |^2 - \min_{\underline{x} \in X_{1,2,0}} \Sigma_{i=1}^4 | \left[\widehat{y_i} - \Sigma_{j=i}^4 R_{ij} s_j\right] |^2 \\ &- \text{ LLR for } x_{2,1} : L\left(x_{2,1}|\underline{y}\right) = \min_{\underline{x} \in X_{2,1,1}} \Sigma_{i=1}^4 | \left[\widehat{y_i} - \Sigma_{j=i}^4 R_{ij} s_j\right] |^2 - \min_{\underline{x} \in X_{2,2,0}} \Sigma_{i=1}^4 | \left[\widehat{y_i} - \Sigma_{j=i}^4 R_{ij} s_j\right] |^2 \\ &- \text{ LLR for } x_{2,2} : L\left(x_{2,2}|\underline{y}\right) = \min_{\underline{x} \in X_{2,2,1}} \Sigma_{i=1}^4 | \left[\widehat{y_i} - \Sigma_{j=i}^4 R_{ij} s_j\right] |^2 - \min_{\underline{x} \in X_{2,2,0}} \Sigma_{i=1}^4 | \left[\widehat{y_i} - \Sigma_{j=i}^4 R_{ij} s_j\right] |^2 \\ &- \text{ LLR for } x_{3,1} : L\left(x_{3,1}|\underline{y}\right) = \min_{\underline{x} \in X_{3,1,1}} \Sigma_{i=1}^4 | \left[\widehat{y_i} - \Sigma_{j=i}^4 R_{ij} s_j\right] |^2 - \min_{\underline{x} \in X_{3,1,0}} \Sigma_{i=1}^4 | \left[\widehat{y_i} - \Sigma_{j=i}^4 R_{ij} s_j\right] |^2 \\ &- \text{ LLR for } x_{3,2} : L\left(x_{3,2}|\underline{y}\right) = \min_{\underline{x} \in X_{3,2,1}} \Sigma_{i=1}^4 | \left[\widehat{y_i} - \Sigma_{j=i}^4 R_{ij} s_j\right] |^2 - \min_{\underline{x} \in X_{3,2,0}} \Sigma_{i=1}^4 | \left[\widehat{y_i} - \Sigma_{j=i}^4 R_{ij} s_j\right] |^2 \\ &- \text{ LLR for } x_{4,1} : L\left(x_{4,1}|\underline{y}\right) = \min_{\underline{x} \in X_{4,1,1}} \Sigma_{i=1}^4 | \left[\widehat{y_i} - \Sigma_{j=i}^4 R_{ij} s_j\right] |^2 - \min_{\underline{x} \in X_{3,2,0}} \Sigma_{i=1}^4 | \left[\widehat{y_i} - \Sigma_{j=i}^4 R_{ij} s_j\right] |^2 \\ &- \text{ LLR for } x_{4,2} : L\left(x_{4,2}|\underline{y}\right) = \min_{\underline{x} \in X_{4,2,1}} \Sigma_{i=1}^4 | \left[\widehat{y_i} - \Sigma_{j=i}^4 R_{ij} s_j\right] |^2 - \min_{\underline{x} \in X_{4,2,0}} \Sigma_{i=1}^4 | \left[\widehat{y_i} - \Sigma_{j=i}^4 R_{ij} s_j\right] |^2 \\ &- \text{ LLR for } x_{4,2} : L\left(x_{4,2}|\underline{y}\right) = \min_{\underline{x} \in X_{4,2,1}} \Sigma_{i=1}^4 | \left[\widehat{y_i} - \Sigma_{j=i}^4 R_{ij} s_j\right] |^2 - \min_{\underline{x} \in X_{4,2,0}} \Sigma_{i=1}^4 | \left[\widehat{y_i} - \Sigma_{j=i}^4 R_{ij} s_j\right] |^2 \\ &- \text{ LLR for } x_{4,2} : L\left(x_{4,2}|\underline{y}\right) = \min_{\underline{x} \in X_{4,2,1}} \Sigma_{i=1}^4 | \left[\widehat{y_i} - \Sigma_{j=i}^4 R_{ij} s_j\right] |^2 - \min_{\underline{x} \in X_{4,2,0}} \Sigma_{i=1}^4 | \left[\widehat{y_i} - \Sigma_{j=i}^4 R_{ij} s_j\right] |^2 \\ &- \text{ LLR for } x_{4,2} : L\left(x_{4,2}|\underline{y}\right) = \min_{\underline{x} \in X_{4,2,1}} \Sigma_{i=1}^4 | \left[\widehat{y_i} - \Sigma_{j=i}^4 R_{ij} s_j\right] |^2 - \min_{$$

#### Formula

$$- s_1 \sim s_4$$
: one of  $\left(\frac{1}{\sqrt{2}} + \frac{1}{\sqrt{2}}\mathbf{j}\right)$ ,  $\left(-\frac{1}{\sqrt{2}} + \frac{1}{\sqrt{2}}\mathbf{j}\right)$ ,  $\left(\frac{1}{\sqrt{2}} - \frac{1}{\sqrt{2}}\mathbf{j}\right)$ , and  $\left(-\frac{1}{\sqrt{2}} - \frac{1}{\sqrt{2}}\mathbf{j}\right)$ 

- At 
$$\sum_{i=1}^{4} |[\widehat{y}_i - \sum_{j=i}^{4} R_{ij} s_j]|^2$$
 part:

• 4<sup>th</sup> entry: 
$$\widehat{y_4} - R_{44}s_4 = a + bj \rightarrow a^2 + b^2$$

• 3<sup>rd</sup> entry: 
$$\widehat{y_3} - R_{33}s_3 - R_{34}s_4 = c + dj \rightarrow c^2 + d^2$$

• 2<sup>nd</sup> entry: 
$$\widehat{y_2} - R_{22}s_2 - R_{23}s_3 - R_{24}s_4 = e + fj \rightarrow e^2 + f^2$$

• 1st entry: 
$$\widehat{y_1} - R_{11}s_1 - R_{12}s_2 - R_{13}s_3 - R_{14}s_4 = g + hj \rightarrow g^2 + h^2$$

$$-\sum_{i=1}^{4} \left| \left[ \hat{y}_i - \sum_{j=i}^{4} R_{ij} s_j \right] \right|^2 = a^2 + b^2 + c^2 + d^2 + e^2 + f^2 + g^2 + h^2$$

- QPSK constellation
  - $-x_1 \sim x_4$ : one of (0,0), (1,0), (0,1), and (1,1)

$$- \ s_1 \sim s_4 : \ \text{one of} \left( \frac{1}{\sqrt{2}} + \frac{1}{\sqrt{2}} j \right), \left( -\frac{1}{\sqrt{2}} + \frac{1}{\sqrt{2}} j \right), \left( \frac{1}{\sqrt{2}} - \frac{1}{\sqrt{2}} j \right) \ \text{, and} \ \left( -\frac{1}{\sqrt{2}} - \frac{1}{\sqrt{2}} j \right)$$

- Total 256 (=4<sup>4</sup>) possibilities for 4-layer QPSK
- Full search with 256 possibilities



- Compute  $\sum_{i=1}^4 \left| \left[ \widehat{y}_i \sum_{j=i}^4 R_{ij} s_j \right] \right|^2$  for each A(path M), M=1~256
  - Bring in total 256 results to compute each bit LLR, exactly 128 results for  $X_{k,b,0}$  and  $X_{k,b,1}$  without overlapping

- A: a set of all combinations of s<sub>1</sub>-s<sub>4</sub>
   with total 4<sup>4</sup> = 256 combinations
- A(M): one of the combination

