# Implementation of an LDPC Decoder on GPU of Mobile Devices

Roohollah Amiri

Department of Electrical and Computer and Engineering Boise State University

> Parallel Computing May 2nd 2016

### Outline

LDPC Channel Coding

Decoding Algorithm

Implementation

Analysis of the work

Conclusion

# LDPC Channel Coding



Figure: Basic Building Blocks of a Communication System

# LDPC Channel Coding

### **Channel Coding**

- Channel Coding Adds Extra bits to each frame for error recognition at receiver
- Different Types as blocking, non-blocking, convolutional,...

### Low Density Parity Check Codes (LDPC)

- can approach the Shannon limit to within 0.0045 dB
- applications including many communication standards such as IEEE 802.11n, 10 Gigabit Ethernet (IEEE 802.3an), Long Term Evolution (LTE) and DVB-S2

# LDPC Channel Coding

### **LDPC** Representation



Nodes

Check

Nodes

# Decoding Algorithm

### **Belief Propagation**



Figure: Decoding of LDPC Codes

# Decoding Algorithm

### Implementation Challenges

- The number of computations with respect to the number of memory access is low.
- The data reuse between consecutive computations is low.
- It requires a large set of irregular memory access due to the sparse nature of the H-matrix

# Decoding Algorithm

### Parallelism Levels in the Proposed Algorithm

- First parallelism level is located at the check node level. Two check node computations can be done in parallel if there is no data dependency.
- Second parallelism level is located at the frame level (Complete execution of the Algorithm).



# **Implementation**

#### Multi-Stream Parallelism





### **Target Architecture**



Figure: NVIDIA-Mobile Processor-GK20a, Kepler, CUDA 6.5

### **Validating Results**



Figure: Bit Error Rate for AWGN Channel

### Throughput on Multiple Codes



Figure: Measured throughputs for 10 layered decoding iterations (1-7 LDPC codes:  $576 \times 288, 1024 \times 512, 1200 \times 600, 1944 \times 722, 4000 \times 2000, 8000 \times 4000, 9972 \times 4086$ )

### **Throughput on Multiple Devices**



Figure: code=(576,288)

### **Throughput on Multiple Devices**



Figure: code=(4000,2000)

### Conclusion

- An stream-based approach for GPU-based LDPC decoding on embedded devices was introduced
- Validating, Scalability Results were shown

## Thank You

