# Information Theory Exam

*February, 7th, 2022* -- Quentin Le Roux

<hr>

Please find my examination work in this document (.pdf rendering of a Jupyter Notebook). 

The document is set up in two parts: 

1. the declarations of the functions used to compute important values such as entropy, etc.

2. the detailed answers to each exercises.

<hr>

## 1 - Function declarations

### 1.1 Library imports

In [1]:
import math

### 1.2 Functions

In [2]:
def entropy(probabilities: dict) -> float:
    """
    Computes the entropy given an input list of
    probabilities.
    """
    h = 0
    for p in probabilities.values():
        h += p * math.log(p, 2)
    return -h

In [3]:
def lower_bound_expected_length(probabilities: dict) -> float:
    """
    Computes the minimum expected length n_bar of
    a given binary coding.
    """
    n_bar = entropy(probabilities)
    print("Min. expected length per codeword:",
          f"{n_bar} bits/state.")
    return n_bar

In [4]:
def lengths_given_shannon_coding(probabilities: dict) -> dict:
    """
    Computes the length of codewords given a Shannon
    encoding.
    """
    n = lambda x: math.log(1/x, 2)
    n_ceiling = lambda x: math.ceil(n(x))
    n_i = {k:round(n(v),2) for k,v in probabilities.items()}
    lengths = {k:n_ceiling(v) for k,v in probabilities.items()}
    print(f"n_i: {n_i}")
    print("Codeword lengths given Shannon coding:",
          f"{lengths}")
    return lengths

In [5]:
def lengths_given_shannonFano_coding(probabilities: dict) -> dict:
    """
    Computes the length of codewords given a Shannon-Fano
    encoding.
    """
    # See manual computation
    return None

def lengths_given_huffman_coding(probabilities: dict) -> dict:
    """
    Computes the length of codewords given a Shannon-Fano
    encoding.
    """
    # See manual computation
    return None

In [6]:
def average_length(lengths, probabilities, coding):
    """
    Computes the bits per state given a set of expected
    lengths per codeword and their respective probabilities.
    """
    n_bar = [lengths[k]*probabilities[k]
             for k in probabilities.keys()]
    n_bar = sum(n_bar)
    print(f"Avg. length per codeword given {coding} coding:",
          f"{n_bar} bits/state.")
    return n_bar

## Exercises

### Exercise 1

**1**. True. The entropy increases as the number of possible outcomes increases.

**2**. B. $log_2(2)=1$ Sh/state

**3**. A. The binary code is non-singular as all codewords are distinct but it is not uniquely decodable as 0 is the prefix of 01, the codewords don't have a fixed length, and there is no separator.

**4**. A. The binary code is non-singular as all codewords are distinct but it is not uniquely decodable as 11 is the prefix of 11, the codewords don't have a fixed length, and there is no separator.

**5**. A, B, and C. The binary code is non-singular as all codewords are distinct and it is uniquely decodable and instantaneous as there are no prefix and no codeword is the beginning of another.

### Exercise 2

#### Question 1

The entropy $H(S)$ is equal to c. $2.1598$.

In [7]:
probs = {"p0":14/35,"p1":6/35, "p2":6/35, "p3":5/35, "p4":4/35}
HS = entropy(probs)
lower_bound = lower_bound_expected_length(probs)
print(HS)

Min. expected length per codeword: 2.1597927486050486 bits/state.
2.1597927486050486


The entropy flow rate D(S) (or emission rate of the source $T$) expressed in shannon per seconds is the product of $H(S)$ times the rate $D_S$ in symbol per seconds: $$H(S)*D_S$$

We find: $$H(S)*D_S \approx 3239.689$$

In [8]:
DS = 1500
HS * DS

3239.6891229075727

#### Question 2

The channel capacity $C$ in shannon per seconds corresponds to the noiseless channel rate $D_C$ times $\log_2q$ given the channel's q-ary alphabet (here binary, i.e. $q=2$).

As such: 
\begin{align}
C&=D_C*log_22\\
&=3500*1\\
&=3500
\end{align}

We find that $T\le C$ given $3239.689 \le 3500$. The channel $C$ is adapted to the source.

#### Question 3

We propose the following fixed-length code for the source symbols.

| Case | code |
| --- | --- |
| s_0 | 000 |
| s_1 | 001 |
| s_2 | 010 |
| s_3 | 011 |
| s_4 | 111 |

It is non-singular and of fixed-length, therefore the code is decodable.

The average length of the codewords is as such:
\begin{align}
\bar{n}&=\underset{i=1}{\overset{N}{\sum}}p_i.n_i\\
&=(p_0+p_1+p_2+p_3+p_4)*3\\
&=3
\end{align}

Since the rate D_S is 1500 symbols and the average length for the fixed-length coding is 3, we have an emission rate with this encoding of 4500 bits per seconds (1500x3), which is higher than the channel capacity of 3500 bits per seconds.

The fixed-length coding does not allow the transmission of source symbols.

#### Question 4

We propose the following separator-based code for the source symbols in order of decreasing probabilities.

| Case | code |
| --- | --- |
| s_0 | 0 |
| s_1 | 01 |
| s_2 | 011 |
| s_3 | 0111 |
| s_4 | 01111 |

It is non-singular and with separators therefore the code is decodable.

The average length of the codewords is as such:
\begin{align}
\bar{n}&=\underset{i=1}{\overset{N}{\sum}}p_i.n_i\\
&=\frac{14}{35}*1+\frac{6}{35}*2+\frac{6}{35}*3+\frac{5}{35}*4+\frac{4}{35}*5\\
&=2.314
\end{align}

Since the rate D_S is 1500 symbols and the average length for the separator coding is 2.314, we have an emission rate with this encoding of 3471 bits per seconds (1500x2.314), which is lower than the channel capacity of 3500 bits per seconds.

The separator coding allows the transmission of source symbols.

In [9]:
14/35*1+6/35*2+5/35*(3+4)+4/35*5

2.314285714285714

#### Question 5

<u>Shannon encoding</u>

Shannon's coding technique consists in associating $n_i$ $q$-ary symbols to each source state $s_i$ such that: $$n_i=\big\lceil\frac{\log1/p_i}{\log_2 q}\big\rceil$$
And:
\begin{align}
\bar{n}&=\underset{i=1}{\overset{N}{\sum}}p_i.n_i\\
\bar{n}&\ge\bar{n}_{min}\text{ with }\bar{n}_{min}=\frac{H(S)}{\log_2q}
\end{align}

We consider a $5$-symbol source $\{s_0,\ldots,s_{4}\}$ defined such that we propose the shannon-based code for the source symbols in order of decreasing probabilities.

| Case | $p_i$ | $-log_2p_i$| $n_i$ |
| --- | ---: | ---: | ---: |
| s_0 | 14/35 | 1.32 | 2 |
| s_1 | 6/35 | 2.54 | 3 |
| s_2 | 6/35 | 2.54 | 3 |
| s_3 | 5/35 | 2.81 | 3 |
| s_4 | 4/35 | 3.13 | 4 |

It is non-singular and with shannon coding therefore the code is decodable.

The average length of the codewords is as such:
\begin{align}
\bar{n}&=\underset{i=1}{\overset{N}{\sum}}p_i.n_i\\
&=\frac{14}{35}*2+\frac{6}{35}*3+\frac{6}{35}*3+\frac{5}{35}*3+\frac{4}{35}*4\\
&=2.714
\end{align}

Since the rate D_S is 1500 symbols and the average length for the shannon coding is c. 2.714, we have an emission rate with this encoding of c. 4071 bits per seconds (c. 1500x2.714), which is higher than the channel capacity of 3500 bits per seconds.

The shannon coding does not allow the transmission of source symbols.

In [10]:
lengths_shannon = lengths_given_shannon_coding(probs)

n_i: {'p0': 1.32, 'p1': 2.54, 'p2': 2.54, 'p3': 2.81, 'p4': 3.13}
Codeword lengths given Shannon coding: {'p0': 2, 'p1': 3, 'p2': 3, 'p3': 3, 'p4': 4}


In [11]:
14/35*2+6/35*6+5/35*3+4/35*4

2.7142857142857144

In [12]:
avg_l = average_length(lengths_shannon, probs, "Shannon")

Avg. length per codeword given Shannon coding: 2.7142857142857144 bits/state.


#### Question 6

<u>Shannon-Fano encoding</u>

Shannon-Fano’s encoding proceeds as such:
1. Arrange the states of the system by decreasing probabilities
2. Split the system states into 2 groups $G_0$ et $G_1$ with probabilities as close as possible without modifying their arrangement in 1.
3. Each group $G_i$ is split into 2 sub-groups $G_{i0}$ et $G_{i1}$ with probabilities as close as possible to each other, again without modifying the state arrangement.
4. The procedure stops when each subgroup consists of a single element. The index of the group gives the codeword.

We consider a $5$-symbol source $\{s_0,\ldots,s_{4}\}$ defined such that we propose the shannon-fano-based code for the source symbols in order of decreasing probabilities.

<img src="images/q2_shannonfano.png" width="700">

It is non-singular and with shannon-fano coding therefore the code is decodable.

The average length of the codewords is as such:
\begin{align}
\bar{n}&=\underset{i=1}{\overset{N}{\sum}}p_i.n_i\\
&=\frac{14}{35}*2+\frac{6}{35}*2+\frac{6}{35}*2+\frac{5}{35}*3+\frac{4}{35}*3\\
&=2.257
\end{align}

Since the rate D_S is 1500 symbols and the average length for the shannon-fano coding is c. 2.257, we have an emission rate with this encoding of c. 3386 bits per seconds (c. 1500x2.257), which is lower than the channel capacity of 3500 bits per seconds.

The shannon-fano coding allows the transmission of source symbols.

In [13]:
#manually computed
lengths_shannonFano =  {"p0":2,"p1":2, "p2":2, "p3":3, "p4":3}
avg_l = average_length(lengths_shannonFano, probs, "Shannon-Fano")

Avg. length per codeword given Shannon-Fano coding: 2.257142857142857 bits/state.


In [14]:
14/35*2+6/35*(2+2)+5/35*3+4/35*3

2.257142857142857

#### Question 7

<u>Huffman encoding</u>

Huffman’s method provides a compact instantaneous code of minimum average length. A tree is built from the leaf nodes, which represent the states of the source.
1. At each step, the two least likely leaves are merged into one.
2. The procedure stops when the result is a single leaf consisting of all the symbols.
3. The reverse path of the tree provides the code words.

We consider a $5$-symbol source $\{s_0,\ldots,s_{4}\}$ defined such that we propose the huffman-based code for the source symbols in order of decreasing probabilities.

<img src="images/q2_huffman.png" width="700">

It is non-singular and with huffman coding therefore the code is decodable.

The average length of the codewords is as such:
\begin{align}
\bar{n}&=\underset{i=1}{\overset{N}{\sum}}p_i.n_i\\
&=\frac{14}{35}*1+\frac{6}{35}*3+\frac{6}{35}*3+\frac{5}{35}*3+\frac{4}{35}*3\\
&=2.1999...
\end{align}

Since the rate D_S is 1500 symbols and the average length for the huffman coding is c. 2.2, we have an emission rate with this encoding of c. 3300 bits per seconds (c. 1500x2.2), which is lower than the channel capacity of 3500 bits per seconds.

The huffman coding does not allow the transmission of source symbols.

In [15]:
#manually computed
lengths_huffman =  {"p0":1,"p1":3, "p2":3, "p3":3, "p4":3}
avg_l = average_length(lengths_huffman, probs, "Huffman")

Avg. length per codeword given Huffman coding: 2.1999999999999997 bits/state.


In [16]:
14/35*1+6/35*(3+3)+5/35*3+4/35*3

2.1999999999999997

#### Question 8

We obtain the following average codeword lengths $\bar{n}$

- Minimum possible (H(S)): c. 2.15 
- fixed-length: 3 (does not allow transmission)
- separator: c. 2.31 (allows transmission)
- Shannon: c. 2.71 (does not allow transmission)
- Shannon-Fano: c. 2.57 (allows transmission)
- Huffman: c.2.2 (allows transmission)

We find that the lowest average length of the codewords obtained throughout the exercise is obtained with the Huffman's method, which is near the minimum possible. The closest second is the separator method, which wins over the Shannon and Shannon-Fano method seemingly because of the large probability granted to the state $s_0$ which both latter methods give a code length of 2.

### Exercise 3

#### Question 1

\begin{align}
H(X)&=H_2(\beta) = H(\beta, 1-\beta)\\
&=-\beta\log_2\beta-(1-\beta)\log_2(1-\beta)\\
&=h(\beta)
\end{align}

Given $h(x) = -x\log_2(x)-(1-x)\log_2(1-x)$.

#### Question 2

We know that $P(Y|X) = \frac{P(X, Y)}{P(X)}$. As such, knowing:

\begin{align}
P(Y=1|X=1)&=1\\
P(Y=0|X=1)&=0\\
P(Y=1|X=0)&=1-q\\
P(Y=0|X=0)&=q\\
\end{align}

We find:

\begin{align}
P(Y=1,X=1)&=P(Y=1|X=1)*P(X=1)\\
&= 1-\beta\\
P(Y=0,X=1)&=P(Y=0|X=1)*P(X=0)\\
&= 0\\
P(Y=1,X=0)&=P(Y=1|X=0)*P(X=1)\\
&= (1-q)*(1-\beta)\\
P(Y=0,X=0)&=P(Y=0|X=0)*P(X=0)\\
&=q*\beta
\end{align}

| P(X=i, Y=j) | j=0 | j=1 |
| --- | --- | --- | 
| i = 0 | q*beta | (1-q)*(1-beta) |
| i = 1 | 0 | 1-beta |

#### Question 3

We find:

\begin{align}
P(Y=0) &= P(X=0, Y=0) + P(X=1, Y=0)\\
&= q*\beta\\
P(Y=1) &= 1-q*\beta\\
\end{align}

As such:

\begin{align}
H(Y)&=E[Y]=-\underset{i=1}{\overset{n}{\sum}}p_i\log p_i\\
&=-(q\beta*\log_2(q\beta)+(1-q\beta)*\log(1-q\beta))\\
&= (q\beta*\log_2(q\beta)-(1-q\beta)*\log(1-q\beta)\\
&=h(q\beta)
\end{align}

#### Question 4

We find:

\begin{align}
H(Y|X=0)&=-\underset{i=0}{\overset{1}{\sum}}P(Y=x_i|X=0)\log_2 P(Y=y_i|X=0)\\
&=-P(Y=0|X=0)\log_2 P(Y=0|X=0)-P(Y=1|X=0)\log P(Y=1|X=0)\\
&=-q\log_2q-(1-q)\log_2(1-q)\\
&=h(q)
\end{align}

And given $P(Y=1|X=1)=1$:

\begin{align}
H(Y|X=1)&=0\\
&=h(0.5)
\end{align}

As such:

\begin{align}
H(Y|X) &= \underset{i=0}{\overset{1}{\sum}}P(X=x_i)H(Y|X=x_i)\\
&= \beta*h(q) + (1-\beta)*h(0.5)\\
&=\beta*h(q)
\end{align}

#### Question 5

We recall: $I(X, Y)=H(X) - H(X|Y)$. As such:

\begin{align}
I(X, Y)&=H(Y) - H(Y|X)\\
&= h(q\beta) - \beta*h(q)
\end{align}



#### Question 6

\begin{align}
g(x)&=xh(q)\Rightarrow \frac{\delta}{\delta x}g(x)=h(q)\\
g(u)&=h(u)\Rightarrow \frac{\delta}{\delta x}g(u)=u'h'(u)\\
\text{Given $u=qx$ }&\rightarrow \frac{\delta}{\delta x}h(qx)=qh'(qx)
\end{align}

As such we find:

\begin{align}
f(x)&=h(qx)-xh(q)\\
\frac{\delta}{\delta x}f(x)&=qh'(qx)-h(q)
\end{align}

#### Question 7

\begin{align}
\frac{\delta}{\delta x}h(x) &= -x\frac{1}{x}-\log_2(x)-(-1)*\log_2(1-x)\\
&= -\log_2(x)+\log_2(1-x)
\end{align}

#### Question 8

#### Question 9



#### Question 10



#### Question 11



#### Question 12

