# Guide to Reed-Solomon List Decoding

The notebook follows the explanation of encoding a message and then build the entire list decoding algorithm to recover messages from a large number of errors.
This notebook provides a hands-on exploration of Reed-Solomon error-correcting codes, with a focus on the powerful **list decoding** technique. We will start from the first principles of encoding a message and then build the entire list decoding algorithm to recover messages from a large number of errors.

### The Core Idea

The message can be considered a secret curve on the graph. To send it, you pick several points on that curve.
* **Encoding:** Sending the coordinates of those points.
* **The Problem:** Some points get moved during transmission due to noise (errors).
* **Decoding:** Figuring out the original secret curve from the noisy, corrupted points.

When there are too many errors, there might be more than one plausible original curve. Instead of giving up, a list decoder returns a short list of all possible candidates.

All our calculations will be performed in the finite field $GF(257)$.

In [140]:
import numpy as np
import math
from sage.all import *

def add(x, y):
    return (x + y) % 257

def mul(x, y):
    return (x * y) % 257

def sub(x, y):
    return (x - y) % 257

def power(x, y):
    return (x ** y) % 257

print("Setup Complete! Helper functions are defined for arithmetic in GF(257).")

Setup Complete! Helper functions are defined for arithmetic in GF(257).


## Part 1: Encoding - Turning Messages into Codewords

Before we decode, let's quickly review the encoding process. A message is first converted into a polynomial, which is then evaluated at multiple points to create the final codeword.

1.  **Message to Polynomial:** A message like "cd" (length $k=2$) is converted to a list of its ASCII values `[99, 100]`. This becomes the polynomial $P(X) = 100 + 99X$.
2.  **Polynomial to Codeword:** This polynomial is evaluated at $n$ distinct points (e.g., $x = 0, 1, \dots, n-1$) to produce an $n$-symbol codeword.

The following code implements this process.

In [141]:
def create_matrix(k, n):
    A = []
    for i in range(k):
        temp_arr = []
        for j in range(n):
            temp_arr.append(power(j, i))
        A.append(temp_arr)
    return A

def create_coef_arr(msg):
    return [ord(m) for m in msg]

def matrix_mul(m1, m2, k, n):
    res = np.zeros(n, dtype=int); j = 0
    while j < n:
        s = 0
        for t in range(k): s = add(s, mul(m1[t], m2[t][j]))
        res[j] = s; j += 1
    return res

def encode(msg, n):
    k = len(msg)
    M = create_coef_arr(msg)
    M.reverse()
    A = create_matrix(k, n)
    RS_Code = matrix_mul(M, A, k, n)
    return RS_Code

## Part 2: Simulating a Noisy Channel

The decoder's job is to fix errors. To test it, we need a function to simulate a noisy channel by corrupting the encoded message. This mimics the random noise described in Shannon's model

In [142]:
def add_errors(msg, err, n):
    err_msg = [m for m in msg]
    # Use random positions for a more realistic simulation
    error_positions = np.random.choice(n, err, replace=False)
    for i in error_positions:
        e = randrange(257)
        if msg[i] != e:
            err_msg[i] = e
        else:
            err_msg[i] = add(e, 1) # Ensure the value changes
    return err_msg

## Part 3: The Basic List Decoding Algorithm

Here we implement the **Basic List-Decoder (Algorithm 12.2.1)** from the textbook.

### The Two-Step Strategy
The algorithm transforms the algebraic problem of error-correction into a geometric one:

1.  **Interpolation Step:** Find a non-zero, two-variable polynomial, $Q(X,Y)$, that passes through *all* of our noisy received points $(\alpha_i, y_i)$. To ensure a solution exists, the algorithm puts separate bounds on the maximum degree of the $X$ and $Y$ variables.

2.  **Root-Finding Step:** The original message polynomial, $P(X)$, is "hidden" inside $Q(X,Y)$ as a factor of the form `Y - P(X)`. This step factors $Q(X,Y)$ to find these special factors and identify the candidate message polynomials.

In [148]:
### Step 1: Interpolation ###
def xy_matrix(a_arr, y_arr, n, deg_a, deg_y):
    A = []
    for i in range(n):
        temp_arr = []
        for j in range(deg_a):
            for t in range(deg_y): temp_arr.append(mul(power(a_arr[i], j), power(y_arr[i], t)))
        A.append(temp_arr)
    return A

def get_q(a_arr, y_arr, n, deg_a, deg_y):
    deg_a_real, deg_y_real = deg_a + 1, deg_y + 1
    M = MatrixSpace(GF(257), n, deg_a_real * deg_y_real)
    A = M(xy_matrix(a_arr, y_arr, n, deg_a_real, deg_y_real))
    print(f"Interpolation: Solving for Q's coefficients by finding the null space of a {A.nrows()}x{A.ncols()} matrix.")
    kernel_basis = A.right_kernel_matrix(basis="computed")
    return kernel_basis

def gen_poly(q_arr, deg_a, deg_y):
    x, y = PolynomialRing(GF(257), 2, ['x','y']).gens(); f = 0
    deg_a_real, deg_y_real = deg_a + 1, deg_y + 1
    for i in range(deg_a_real):
        for j in range(deg_y_real):
            k = deg_y_real * i + j
            f += q_arr[k] * x**i * y**j
    return f

### Step 2: Root-Finding ###
def create_list(Q):
    if Q == 0: return []
    L = []
    x, y = Q.parent().gens() # Get the polynomial variables x and y
    
    try:
        factors = [g[0] for g in list(Q.factor())]
        print(f"\nFactoring Q(X,Y) found {len(factors)} factor(s).")
        for i, fact in enumerate(factors):
            fact_str = str(fact)
            if len(fact_str) > 70: fact_str = fact_str[:70] + "..."
            print(f"  - Factor {i+1}: {fact_str}")

            # Robustly check if the factor is of the form c*(Y - P(X))
            if fact.degree(y) == 1:
                # Factor is of the form A(x)*y + B(x)
                A = fact.coefficient({y: 1})
                B = fact.coefficient({y: 0})

                # A(x) must be a non-zero constant for it to be a Y-P(X) factor
                if A.is_constant() and A != 0:
                    # P(X) = -B(X) / A
                    P_X = -B / A
                    print(f"    --> Found valid candidate P(X) = {P_X}")
                    L.append(P_X)
    except Exception as e:
        print(f"Factoring failed: {e}")
    return L

### Step 3: Filtering ###
def filter_candidates(candidates, noisy_msg, n, k):
    good_candidates = []
    print("\n--- Filtering Candidates by Agreement ---")
    
    for p in candidates:
        if p.degree() >= k:
            print(f"  - Candidate {p} rejected (degree {p.degree()} is too high; expected < {k})")
            continue
            
        p_coeffs = p.coefficients()
        if len(p_coeffs) > k: continue

        p_coeffs.reverse()
        
        p_codeword = matrix_mul(p_coeffs, create_matrix(len(p_coeffs), n), len(p_coeffs), n)
        distance = np.sum(p_codeword != noisy_msg)
        
        print(f"  - Candidate {p} has {distance} disagreements.")
        good_candidates.append({'poly': p, 'dist': distance})

    # Find the candidate(s) with the minimum distance to the noisy message
    if not good_candidates:
        return []
    
    min_dist = min(c['dist'] for c in good_candidates)
    print(f"\nMinimum distance found among candidates is {min_dist}.")
    
    # Return all candidates that achieve this minimum distance
    final_list = [c['poly'] for c in good_candidates if c['dist'] == min_dist]
    print(f"Accepting all candidates with this distance: {final_list}")
    return final_list

### Top-Level Function ###
def list_decoding(encoded_msg, n, k, err):
    a_array = [i for i in range(n)]
    deg_a = int(math.sqrt(n * (k - 1)))
    deg_y = int(math.sqrt(n / (k - 1)))

    q_kernel_basis = get_q(a_array, encoded_msg, n, deg_a, deg_y)
    if q_kernel_basis.nrows() == 0: return []
    
    q_coeffs = list(q_kernel_basis[0])
    q_poly = gen_poly(q_coeffs, deg_a, deg_y)
    
    candidate_polynomials = create_list(q_poly)
    filtered_polynomials = filter_candidates(candidate_polynomials, encoded_msg, n, k)
    
    return filtered_polynomials

### Final Conversion ###
def poly_to_str(poly_arr, k):
    decoded_msg_arr = []
    for f_i in poly_arr:
        coefs_array = f_i.coefficients()
        
        while len(coefs_array) < k: coefs_array.append(0)
        coefs_array = coefs_array[:k]
        
        msg_coeffs = [int(c) for c in coefs_array]
        msg = [chr(c) for c in msg_coeffs if 0 <= c < 256]
        decoded_msg_arr.append(''.join(msg))
    return decoded_msg_arr

In [149]:
def RS_demonstration(msg, err, n):
    k = len(msg)
    print("="*60)
    print(f"--- Running Test: msg='{msg}', n={n}, k={k}, errors={err} ---")
    
    unique_decoding_limit = math.floor((n - k + 1) / 2)
    print(f"Unique decoding can handle less than {unique_decoding_limit} errors.")
    if err >= unique_decoding_limit:
        print("NOTE: Number of errors meets or exceeds the unique decoding limit. List decoding is required.")
    print("-"*60 + "\n")
    
    encoded_msg = encode(msg, n)
    print("Original Codeword (first 30 symbols):\n", encoded_msg[:30])

    encoded_msg_with_errors = add_errors(encoded_msg, err, n)
    print("\nNoisy Codeword (first 30 symbols):\n", np.array(encoded_msg_with_errors[:30]))
    
    print("\n--- Starting List Decoding ---\n")
    poly_list = list_decoding(encoded_msg_with_errors, n, k, err)

    final_messages = poly_to_str(poly_list, k)
    print("\n--- Final Results ---")
    print("Output Polynomials List:\n", poly_list)
    print("\nDecoded Possible Messages:\n", final_messages)

    if msg in final_messages:
        print("\nSUCCESS: Original message was recovered in the list!")
    else:
        print("\nFAILURE: Could not recover original message.")
    print("="*60 + "\n\n")

## Part 4: Demonstrations

We will now run a series of tests to showcase the algorithm's performance, including cases with both high and low code rates.

### 4.1 High-Rate Codes (e.g., R = 0.125)

For codes with a relatively high rate, the unique decoding bound is often better than the guarantee for the Basic List-Decoder. The following tests confirm that the algorithm works when the number of errors is low, but fails when the number of errors exceeds its mathematical limit.

**Test 1: A working high-rate case**
- **Message:** "abcde" ($k=5$)
- **Codeword Length:** $n=40$
- **Errors:** $err=10$
- **Rate:** $R = 5/40 = 0.125$
- **Analysis:** The number of actual agreements is $t = 40 - 10 = 30$. The required number of agreements for this algorithm is $t > 2\sqrt{40 \times (5-1)} \approx 25.3$. Since $30 > 25.3$, this test should succeed.

In [150]:
RS_demonstration(msg="abcde", err=10, n=40)

--- Running Test: msg='abcde', n=40, k=5, errors=10 ---
Unique decoding can handle less than 18 errors.
------------------------------------------------------------

Original Codeword (first 30 symbols):
 [101 238 206 230  36 136  29   0  92 106 115 207 228  39  30  92 131  68
  96 166 244  54 106 154 224 100  95  23 227  37]

Noisy Codeword (first 30 symbols):
 [164 238 206  15  36 136  29   0  92 106 115 207 228  39 232  92 131 229
  51 166 244  54 106 215 224 100  95  23 227  37]

--- Starting List Decoding ---

Interpolation: Solving for Q's coefficients by finding the null space of a 40x52 matrix.

Factoring Q(X,Y) found 2 factor(s).
  - Factor 1: -97*x^4 - 98*x^3 - 99*x^2 - 100*x + y - 101
    --> Found valid candidate P(X) = 97*x^4 + 98*x^3 + 99*x^2 + 100*x + 101
  - Factor 2: 33*x^3*y + x^2*y^2 + 94*x^3 - 71*x^2*y + 85*x*y^2 + 76*x^2 + 127*x*y -...

--- Filtering Candidates by Agreement ---
  - Candidate 97*x^4 + 98*x^3 + 99*x^2 + 100*x + 101 has 10 disagreements.

Minimum dist

**Test 2: A failing high-rate case**
- **Message:** "hello" ($k=5$)
- **Codeword Length:** $n=20$
- **Errors:** $err=8$
- **Rate:** $R = 5/20 = 0.25$
- **Analysis:** The number of actual agreements is $t = 20 - 8 = 12$. The required number of agreements is $t > 2\sqrt{20 \times (5-1)} \approx 17.9$. [cite_start]Since $12$ is not greater than $17.9$, the algorithm is not guaranteed to work and is expected to fail[cite: 267].

In [151]:
RS_demonstration(msg="hello", err=8, n=20)

--- Running Test: msg='hello', n=20, k=5, errors=8 ---
Unique decoding can handle less than 8 errors.
NOTE: Number of errors meets or exceeds the unique decoding limit. List decoding is required.
------------------------------------------------------------

Original Codeword (first 30 symbols):
 [111  18 147 222 150  21 108  96 110 201  89 191  79  22 215   8 219  50
 171 150]

Noisy Codeword (first 30 symbols):
 [111  18  21 227 150  21 108  72 142 201  89 191  77 152  87   8 219  50
 185 150]

--- Starting List Decoding ---

Interpolation: Solving for Q's coefficients by finding the null space of a 20x27 matrix.

Factoring Q(X,Y) found 1 factor(s).
  - Factor 1: x^6*y^2 + 87*x^6*y - 121*x^5*y^2 + 114*x^6 - 62*x^5*y - 48*x^4*y^2 - 7...

--- Filtering Candidates by Agreement ---

--- Final Results ---
Output Polynomials List:
 []

Decoded Possible Messages:
 []

FAILURE: Could not recover original message.




### 4.2 Low-Rate Codes (R < 0.07)

This is the region where the Basic List-Decoder's guarantee shines, surpassing the unique decoding bound. We will choose a number of errors that is too high for unique decoding but should be correctable by our list decoder.

**Test 3: A working low-rate case**
- **Message:** "data" ($k=4$)
- **Codeword Length:** $n=67$
- **Errors:** $err=33$
- **Rate:** $R = 4/67 \approx 0.0597$
- **Analysis:** The unique decoding limit is less than $\lfloor(67-4+1)/2\rfloor = 32$ errors. We have **33 errors**, so unique decoding would fail. The Basic List-Decoder's limit is $t > 2\sqrt{67 \times (4-1)} \approx 28.3$. We have $t = 67 - 33 = 34$ agreements. Since $34 > 28.3$, this should succeed.

In [152]:
RS_demonstration(msg="data", err=33, n=67)

--- Running Test: msg='data', n=67, k=4, errors=33 ---
Unique decoding can handle less than 32 errors.
NOTE: Number of errors meets or exceeds the unique decoding limit. List decoding is required.
------------------------------------------------------------

Original Codeword (first 30 symbols):
 [ 97 153 232 163  32 182 185 127  94 172 190 234 133 230  97  77 256 206
  13  20  56 207  45 170 154  83  43 120 143 198]

Noisy Codeword (first 30 symbols):
 [152 232 232 163  33 149 240  36  70 172 190 234 196 120  97  77 256  61
  13 207  85 207  45 170 154  83 245 120 154 198]

--- Starting List Decoding ---

Interpolation: Solving for Q's coefficients by finding the null space of a 67x75 matrix.

Factoring Q(X,Y) found 2 factor(s).
  - Factor 1: -100*x^3 - 97*x^2 - 116*x + y - 97
    --> Found valid candidate P(X) = 100*x^3 + 97*x^2 + 116*x + 97
  - Factor 2: x^7*y^3 - 65*x^8*y - 16*x^7*y^2 + 15*x^6*y^3 - 53*x^8 + 77*x^7*y - 90*...

--- Filtering Candidates by Agreement ---
  - Candidate

**Test 4: A working very low-rate case**
- **Message:** "log" ($k=3$)
- **Codeword Length:** $n=150$
- **Errors:** $err=90$
- **Rate:** $R = 3/150 = 0.02$
- **Analysis:** The unique decoding limit is less than $\lfloor(150-3+1)/2\rfloor = 74$ errors. We have **90 errors**. The Basic List-Decoder's limit is $t > 2\sqrt{150 \times (3-1)} \approx 34.6$. We have $t = 150 - 90 = 60$ agreements. Since $60 > 34.6$, this should succeed.

In [154]:
RS_demonstration(msg="log", err=90, n=150)

--- Running Test: msg='log', n=150, k=3, errors=90 ---
Unique decoding can handle less than 74 errors.
NOTE: Number of errors meets or exceeds the unique decoding limit. List decoding is required.
------------------------------------------------------------

Original Codeword (first 30 symbols):
 [103  65 243 123 219  17  31   4 193  84 191   0  25   9 209 111 229  49
  85  80  34 204  76 164 211 217 182 106 246  88]

Noisy Codeword (first 30 symbols):
 [ 76  65  88 123 140 118  31 206 193  84 191  86 188  78 218 140 228 110
  31   9  34  18  76 164 211 217 176  26 246  88]

--- Starting List Decoding ---

Interpolation: Solving for Q's coefficients by finding the null space of a 150x162 matrix.

Factoring Q(X,Y) found 2 factor(s).
  - Factor 1: -108*x^2 - 111*x + y - 103
    --> Found valid candidate P(X) = 108*x^2 + 111*x + 103
  - Factor 2: x^11*y^7 - 123*x^11*y^6 + 21*x^10*y^7 + 59*x^11*y^5 + 91*x^10*y^6 - 32...

--- Filtering Candidates by Agreement ---
  - Candidate 108*x^2 + 111

## Part 5: Conclusion

This notebook has successfully implemented and demonstrated the **Basic List-Decoder (Algorithm 12.2.1)**.

As the demonstrations show:
1.  The algorithm correctly decodes messages when the number of errors is within its mathematical performance guarantee ($t > 2\sqrt{n(k-1)}$).
2.  It fails, as expected, when this condition is not met, particularly for higher-rate codes.
3.  Crucially, it **succeeds** in correcting a large fraction of errors for **low-rate codes**, outperforming the guarantee of unique decoding in the exact region predicted by the theory.

This confirms the properties of this foundational list decoding algorithm and highlights the motivation for the more advanced versions (Weighted-Degree and Multiplicity) which provide superior performance across all code rates.