In [80]:
import numpy as np

# Channel coding
AKA **forward error correction** (FEC). Adds redundant information to input information so that receiver can detect and correct errors resulting from the noise in the channel.

Also used for data storage, e.g. hard drives and CD-ROMs (Reed-Solomon!).

### Concepts

##### Information

(*Telecommunications Breakdown* excellent breakdown)

Event occurring (for wireless channels, receiving a given symbol) gives us *information*. The less likely the event, the more information we get. Why? Consider a game of guessing words from a dictionary by their first letters. If you know the word starts with "k" it gives you less information, than if you knew it starts with "x".

We define **amount of information** conveyed by receiving a symbol $x_i$ (observing an outcome of experiment, etc.) as $$I(x_i) = - log(p(x_i))$$ because it's the only function that's additive for independent events and meets other criteria (certain event transfers no information, less likely event transfers more information).

For given source alphabet $x = x_i, i = 1, ..., N$ we define **source entropy** as $$H(x) = \sum_{i=1}^N p(x_i) I(x_i)$$ which is the average (expected) number of bits per symbol. We'd like that as high as possible, we know that $H(x) \leq log_2(N)$ and maximum is attained for equally likely symbols.

##### Channel capacity
How many bits per second can we transmit over given channel. Shannon theorem gives an exact upper bound! For given bandwidth $B$ \[Hz\] and average signal and noise power $S$ and $N$ \[W\]
$$C = B \ log_2 (1 + \frac{S}{N})$$
It illustrates the trade-off between SNR and bandwidth. If we have higher SNR, we can have more bits per symbol - which means lower data rate - which means lower bandwidth. And inversely. Shannon demonstrated that for data rates below $C$ it's always possible to encode data so that it's sent at this rate with arbitrarily small chance of error!

You can "derive" it intuitively:
 - max entropy \[bits/s\] for $N$ symbols is $log_2(N)$, so when we transmit symbols with period $T$ $$C \leq \frac{log_2(N)}{T}$$ 
 - perfect reception requires gaps between symbols larger than noise variation, for real symbols $$\frac{2 \sqrt{S + P}}{N - 1} > 2 \sqrt{P}$$
 - *Nyquist rate* - max symbol rate that can be transmitted over a channel (not exactly the sampling theorem!), achieved if you transform symbol sequence into continuous signal using sinc pulses (and the more narrow they are, i.e. the more we increase symbol rate, the more bandwidth is occupied) $$\frac{1}{T} \leq 2 B$$

E.g. for 802.11n WiFi at 2.4 GHz: $B = 20MHz$, for $SNR = 25dB$ we can approximate 
$$log_2 (1 + \frac{S}{N}) = \frac{10 log_{10} (\frac{S}{N})}{log_{10}(1024)} \approx \frac{SNR_{dB}}{3}$$
so we get around an upper bound of around 160Mbps per channel. Real rate is probably around half of it due to frame overhead (redundancy in information source), channel coding and modulation not being optimal.

##### Code rate
Input information to output information ratio. The lower the less efficient code is.

# Questions
1. If we use lower-order modulation scheme for low SNRs (e.g. QPSK instead of 16QAM), why don't we just increase the magnitude of the symbols to make them more spaced?
    - because larger amplitude of transmitted signal = larger required power and we already have established given transmission power and now choose a modulation scheme

# Linear block codes
We assume that symbols are elements of a finite field $F_q$, i.e. messages (codewords) are $m$-length vectors from a vector space over $F_q$. An $(n, \ k)$ linear code is a $k$-dimensional subspace of this vector space. A *generator matrix* is an $k \ x \ n$ matrix whose rows form a basis (linearly independent, span the subspace) for the code subspace. So there are $q^k$ codewords (combinations of basis vectors) and they can be produced by multiplying a $k$-dimensional input by the generator matrix. Every generator matrix can be written in form $[I_n, \ A]$.

There always exists an $(n-k)$-dimensional subspace that's orthogonal to $C$, i.e. all it's elements are orthogonal to all $C$ elements. It's called the *dual code* for C. It's generator matrix ($(n - k ) \ x \ n$) $H$ is called the *parity check matrix* for C and $\forall_{x \in C}: \ H \ x^T = 0 $.

##### Decoding
For transmitted codeword $x$ the received codeword is $y = x + z$, where $z$ is the error vector. $s = H y^T = H z^T$ is called the *syndrome*. It doesn't uniquely identify $x$ (nor $z$), there are $q^k$ solutions (of the form $z_0 + x$ for $x \in C$). So the receiver has the errors narrowed down from $q^n$ to $q^k$ possibilitites. If errors on each symbol occur IID and probability of each erroneous symbol is $\epsilon \leq \frac{1}{q}$, then the solution with smallest Hamming weight (number of non-zero positions) is the most likely error vector.

The only problem is to find the minimum Hamming weight solution for given syndrome efficiently. If we precompute these for all syndromes, then we have the fastest decoder possible. 

Also, for a linear code $C$ if we set $d = min \{ w_H(c), \ c \in C \}$ (it's the minimum distance between codewords since it's linear subspace), then the above algorithm always finds the correct codeword for error vectors s.t. $2 w_H(z) + 1 \leq d$. By contradiction, if there was another $y \in C$ closer to $x + z$, then $$2 w_H(z) + 1 \leq d = d_H(x, \ y) \leq d_H(x, \ x + z) + d_H(y, \ x + z) \leq 2 w_H(z)$$ It can be interpreted as Hamming spheres around the codewords of radius $w_H(z)$ not intersecting.

##### Examples - Hamming code
Hamming code is a $(7, \ 4)$ linear block code. It's smallest set of linearly dependent columns of parity-check matrix has 3 elements, so it's capable of correcting 1-bit errors.

In [3]:
hamming_gen_matrix = np.hstack([
    np.identity(4), 
    np.array([
        [0, 1, 1],
        [1, 0, 1],
        [1, 1, 0],
        [1, 1, 1],
    ]),
])
hamming_gen_matrix

array([[1., 0., 0., 0., 0., 1., 1.],
       [0., 1., 0., 0., 1., 0., 1.],
       [0., 0., 1., 0., 1., 1., 0.],
       [0., 0., 0., 1., 1., 1., 1.]])

In [4]:
hamming_parity_check_matrix = np.array([
    [0, 1, 1, 1, 1, 0, 0],
    [1, 0, 1, 1, 0, 1, 0],
    [1, 1, 0, 1, 0, 0, 1],
])

Hamming code has a nice property w.r.t. decoding, i.e. the syndrome gives us the position where error occurred.

In [19]:
def recv_with_error(codeword, err_position, pc_matrix=hamming_parity_check_matrix):
    error = np.zeros(7)
    error[err_position] = 1
    received = (transmitted + error) % 2
    syndrome = np.matmul(pc_matrix, received) % 2
    print(syndrome)


msg = np.array([0, 1, 1, 0])
transmitted = np.matmul(msg, hamming_gen_matrix) % 2

recv_with_error(transmitted, 3)
recv_with_error(transmitted, 1)

[1. 1. 1.]
[1. 0. 1.]


So the syndromes are columns 3 and 1 of parity check matrix, respectively - which points out which position has error in it.

**Actually**, it seems that any single-error-detecting code would have that property, just by definition of a syndrome when there's 1 bit of error...

But we can make it even cooler. We can verify that matrix below is another parity-check matrix for the Hamming code. And this time a syndrome gives us binary representation of bit number where error occurred.

In [20]:
hamming_parity_check_matrix2 = np.array([
    [0, 0, 0, 1, 1, 1, 1],
    [0, 1, 1, 0, 0, 1, 1],
    [1, 0, 1, 0, 1, 0, 1],
])
np.matmul(hamming_parity_check_matrix2, np.transpose(hamming_gen_matrix)) % 2

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [21]:
recv_with_error(transmitted, 3, pc_matrix=hamming_parity_check_matrix2)

[1. 0. 0.]


Which is column number 4, position 3.

##### Examples - RDS
For RDS a $(26, \ 16)$ linear code is used, so we have a $(16, \ 26)$ generator matrix, $(10, \ 26)$ parity check matrix and our syndromes are 10 bits in length. If this code had no special properties (it's cyclic!), then for each of 1024 syndromes we'd have to find the minimum Hamming weight error vector among 65536 possibilities. It's not a big deal on the computer, perhaps for an integrated circuit that would increase the cost substantially?

In [5]:
from error_coding import spec_generator_matrix, spec_parity_check_matrix

### Summary so far
 - linear code definition
 - generator and parity check matrices
 - syndromes and decoding using minimum Hamming weight

### Linear cyclic block codes
A code $C$ is cyclic, if rotation of any codeword produces another codeword.

Useful interpretation: for codeword $(C_0, ..., C_n)$ define it's *generating function* as $C(x) = C_0 + C_1 x + ... + C_n x^n$. Then a code is cyclic if for all codewords the codeword defined by generating function $xC(x) \ (mod \ x^n - 1))$ is also a codeword (this is the right cyclic shift of $C$!).

For a linear cyclic code we define *generator polynomial* (denoted $g(x)$) as the lowest degree polynomial in C. It divides $x^n - 1$ and it also divides all the codewords - and inversely, if it divides a polynomial of degree $\lt n$, then it's a codeword. Also, any divisor of $x^n - 1$ generates a linear cyclic code.

We also define *parity check polynomial* $h(x) = \frac{x^n - 1}{g(x)}$.

In [119]:
# we identify C_n x^n + ... C_1 x + C_0 = [C_0, ..., C_n] with C_n != 0 unless it's just const 0, then [0]
def divide_mod2(dividend, divisor):
    assert divisor[-1] != 0 
    n = len(dividend)
    k = len(divisor)
    remainder = dividend.copy()

    for i in range(n - 1, k - 2, -1):
        if remainder[i] != 0:
            for j in range(i, i - k, -1):
                # i - j + 1 goes from 1 to k, so k - (i - j + 1) goes from k - 1 to 0
                remainder[j] = (remainder[j] - divisor[k - (i - j + 1)]) % 2


    # strip potential zero coefficients
    non_zero_coef_idx = [i for i in range(0, min(k, n)) if remainder[i] != 0]
    return remainder[:non_zero_coef_idx[-1] + 1] if non_zero_coef_idx else [0]


def pp(polynomial):
    repr = [f'x^{i}' if i > 1 else ('x' if i == 1 else '1') for i in range(len(polynomial)) if polynomial[i] != 0]
    print(' + '.join(repr[::-1]) if repr else '0')

In [120]:
hamming_gen_polynomial = [1, 0, 1, 1, 1]  # x^4 + x^3 + x^2 + 1 is the generator for (7, 3) Hamming code, so it divides x^7 - 1
pp(divide_mod2([1, 0, 0, 0, 0, 0, 0, 1], hamming_gen_polynomial))

0


It can be shown (Theorem 8.3, corollary 2) that matrices defined as below are the generator and parity check matrices for linear cyclic code.

In [138]:
def generator_matrix(n, generator_polynomial):
    k = n - len(generator_polynomial) + 1
    gen_m = np.zeros((k, n))
    
    for i in range(0, k):
        # G[i, :] = x^(n - k + i) - x^(n - k + i) (mod g(x))
        row = divide_mod2([0] * (n - k + i) + [1], generator_polynomial)
        gen_m[i, :len(row)] = row
        gen_m[i, n - k + i] = 1

    return gen_m


def parity_check_matrix(n, generator_polynomial):
    k = n - len(generator_polynomial) + 1
    pc_m = np.zeros((n - k, n))

    for i in range(0, n):
        # H[:, i] = x^i (mod g(x))
        column = divide_mod2([0] * i + [1], generator_polynomial)
        pc_m[:len(column), i] = column

    return pc_m

In [142]:
hamming_gen_matrix = generator_matrix(7, hamming_gen_polynomial)
assert np.all(hamming_gen_matrix == np.array([
    [1., 0., 1., 1., 1., 0., 0.],
    [1., 1., 1., 0., 0., 1., 0.],
    [0., 1., 1., 1., 0., 0., 1.]
]))  # from the book

hamming_pc_matrix = parity_check_matrix(7, hamming_gen_polynomial)
assert np.all(hamming_pc_matrix == np.array([
    [1., 0., 0., 0., 1., 1., 0.],
    [0., 1., 0., 0., 0., 1., 1.],
    [0., 0., 1., 0., 1., 1., 1.],
    [0., 0., 0., 1., 1., 0., 1.]
]))  # from the book

Nice thing about parity check matrix defined this way is that the syndrome $z = H \ x_0^{T}$ has generating function defined by $r(x) = c_0(x) \ (mod \ g(x))$, where $c_0(x)$ is the generating function for the codeword $x_0$.

In [151]:
x = np.array([0, 1, 0])
encoded = np.matmul(x, hamming_gen_matrix)
error = np.array([0, 0, 1, 0, 1, 0, 0])
received = (encoded + error) % 2
syndrome = np.matmul(hamming_pc_matrix, received) % 2

In [153]:
pp(syndrome)

x^3 + 1


In [154]:
pp(divide_mod2(received, hamming_gen_polynomial))

x^3 + 1


### Summary so far
 - linear cyclic codes as polynomials
 - generator and parity check polynomials
 - generator polynomial completely defines the code
 - how to form generator and parity check matrices from the generator
 - how to calculate syndromes with polynomial division

In [155]:
# to go:
#  - shifting register encoders for cyclic codes (because they are simple)
#  - cyclic Hamming codes - because they have both simple encoders and DECODERS (important)
#  - burst-error-trapping decoders for cyclic codes 

# Galois fields
We have a set, two operations ("addition" and "multiplication") and this is an abelian group w.r.t. both operations, plus we have associativity. 

Basic finite field is $\mathbb{Z} / p \mathbb{Z}$, i.e. integers modulo a prime number $p$. The only non-obvious thing here is existence of inverse element for multiplication. It follows from [Bezout's identity](https://en.wikipedia.org/wiki/B%C3%A9zout%27s_identity), because for $n \in \mathbb{Z} / p \mathbb{Z}$ $gcd(n, p) = 1$, so there exist integers $m$ and $k$ such that $k n + m p = gcd(n, p) = 1$, i.e. $k n = 1\ (mod \ p)$, so $k$ is the inverse.

In general, finite fields can **only** be of order $p^m$, where $p \in \mathbb{P}$, $m \in \mathbb{Z}$ and they are called Galois fields, denoted $GF(p^m)$. Given a finite field of order $p$ we can **always** select a polynomial $f(x)$ of degree $m$ irreducible over $F_p$ (can't be factorized), define field elements as polynomials of degrees up to $m - 1$ and define all operations $(mod \ f(x))$, and $GF(p^m)$ arises.

# Reed-Solomon codes

In [None]:
# note: really not necessary right now

# Linear convolutional codes

In [None]:
# note: even less necessary right now

 - zaczął mając 17 lat
 - w wieku 20 po 3 latach robił 5.13+ (5.13d/5.14a Galaxy Emperor, 8b/+) wspinając się głównie po sportach
 - 2020 5.12+ Moonlight Buttress OS (po 3 latach?)