# Data compression fundamentals 

### Why to compress?

* Data compression can reduce the memory requirements of (almost) any kind of
  information source.

### Which data?
  
* Mainly ... audio, image, and video signals.

### Why these sources?

* After the digitalization of any signal we get a sequence $s[]$
  of samples that represent the signal $s$ with more or less fidelity.
  
* Usually, $s[]$ is encoded using PCM (Pulse Code Modulation), in which
  every sample $s[i]$ is represented with the same number of bits.
  
* Most digital PCM signals are memory demanding. For
  example, in a CD we have a data-rate of
  
\begin{equation}
    (16+16)\frac{\text{bits}}{\text{sample}}\times
    44{.}100\frac{\text{samples}}{\text{second}}=
    1{.}411{.}200\frac{\text{bits}}{\text{second}}.
\end{equation}
  
* Image and video signals require much more memory.

### Redundancy in signals

In general, signals has three types of redundancy:
    
   1. **Spatial/temporal redundancy**: Produced by similarities between
    adjacent (in 2D and 3D) samples. It can be removed using
    spatial/temporal models of the signal, generating [*lossless
    codecs*](https://en.wikipedia.org/wiki/Lossless_compression). These codecs are known as *audio*, *image*, and *video* codecs.
    
   2. **Statistical redundancy**: Spatial/temporal redundancy generates *probabilistic relationships* among samples. Statistical redundancy can be removed by using
    probabilistic models, producing also *lossless codecs*. These
    codecs are known as *text codecs* because can be used to compress text sources.
    
   3. **Psychological redundancy**: Part of the information that
    signals carry can not be perceived by humans. [*Lossy codecs*](https://en.wikipedia.org/wiki/Lossy_compression) remove
    this kind of pseudo-redundancy, basically, by means of [quantization](https://en.wikipedia.org/wiki/Quantization_(signal_processing).
    


### Symbols, runs, strings, code-words, and code-streams!

* In the context of statistical coding, each sample $s[i]$ is
  called a [*symbol*](https://en.wikipedia.org/wiki/Symbol).
  
* Depending on the type of statistical relationship among
  symbols, we will also speak about [*strings*](https://en.wikipedia.org/wiki/String_(computer_science) when we process
  more than one symbol, and about [*runs*](https://en.wikipedia.org/wiki/Sequence#Sequences_and_automata) when all the symbols are
  the same in a string.
  
* In any case, the output of the encoder is a sequence of
  [*code-words*](https://en.wikipedia.org/wiki/Universal_code_(data_compression) that all together generates a *code-stream*.

### Some interesting compression insights

* Lossless compressors are [bijective functions](https://en.wikipedia.org/wiki/Bijection) which find a different output for each possible input. For this reason, text compressors are lossless by definition.

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/a/a5/Bijection.svg/200px-Bijection.svg.png" width="200">

* Lossy compressors are [surjective functions](https://en.wikipedia.org/wiki/Surjective_function) and therefore, two or more inputs can produce the same output. There are lossless audio, image, and video compressors, but most of them are lossy (although some of them can be lossless if quantization is not used).

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/6/6c/Surjection.svg/220px-Surjection.svg.png" width="200">


## A. [Run-length encoding](https://en.wikipedia.org/wiki/Run-length_encoding)

* RLE (Run Length Encoding) is a technique that removes the data
  redundancy produced by the repetition of symbols. Example:
  ```
  aaaaa = 5a
  ```
* Depending on the length of the source alphabet and the maximal length of the run,
  different versions of RLE codecs have been proposed.

### A.1 $N$-ary run length encoding

RLE for $N$-ary alphabets (alphabets of size $N$), where typically, $N=256$.

### Encoder

1. While there are symbols to encode:
    1. Let $s$ the next symbol.
    2. Read the next $n$ consecutive symbols equal to $s$.
    3. Write the pair $ns$.

### Example

Runs:
```
aaaabbbbbaaaaaabbbbbbbcccccc
```
are encoded as:
```
4a5b6a7b6c
```

### Decoder

1. While there are $ns$ pairs to decode:
    1. Write $n$-times the symbol $s$.

### Lab

In [1]:
# https://rosettacode.org/wiki/Run-length_encoding#Python
# https://docs.python.org/3/library/itertools.html#itertools.groupby

from itertools import groupby
def N_RLE_encode(input_string):
    return [(len(list(g)), k) for k,g in groupby(input_string)]
 
def N_RLE_decode(lst):
    return ''.join(c * n for n, c in lst)

x = N_RLE_encode('aaaabbbbbaaaaaabbbbbbbcccccc')
print(x)
y = N_RLE_decode(x)
print(y)

[(4, 'a'), (5, 'b'), (6, 'a'), (7, 'b'), (6, 'c')]
aaaabbbbbaaaaaabbbbbbbcccccc


### A.2 Binary run length encoding

* In binary RLE is not necessary to indicate the next symbol
  (only the length) because when a run ends, only the other (possible) symbol will
  start with the next run.

### Encoder

1. Let $s\leftarrow$ `0`.
2. While there are bits to encode:
    1. Read the next $n$ consecutive bits equal to $s$.
    2. Write $n$.
    3. $s\leftarrow (s+1)~\text{modulus}~2$.

### Example

Runs:
```
0000111110000001111111000000
```
are encoded as::
```
4 5 6 7 6
```

### Decoder

1. Let $s\leftarrow$ `0`.
2. While there are items $n$ to decode:
    1. Write $n$ bits equal to $s$.
    2. $s\leftarrow (s+1)~\text{modulus}~2$.

### A.3 [MPN-5 run length encoding](https://en.wikipedia.org/wiki/Microcom_Networking_Protocol#MNP_5)

* Created by [Microcom Inc.](https://en.wikipedia.org/wiki/Microcom_Networking_Protocol)
for the [MNP (Microcom Networking Protocol) 5](https://scholar.google.es/scholar?hl=es&as_sdt=0%2C5&q=held+data+compression+techniques+applications&btnG=).

### Codec
  
```
Input     Output
--------- ---------
ab        ab
aab       aab
aaab      aaa0b /* 3-symbols length run == <ESC> code-word */
aaaab     aaa1b
aaaaab    aaa2b
:         :
a^nb      aaa(n-3)b
```

### Example

Runs:
```
aaaabbbbbaaaaaabbbbbbbccccccb
```
are encoded as:
```
aaa1bbb2aaa3bbb4ccc3b
```

### Lab

In [5]:
# TO-DO
from itertools import groupby    #I import the library to use the groupby function
def MPN5_enconding(input_string):#I define the method as MPN5_encoding
    code_stream = '' #The string that I will return when it is encoded
    for k, g in groupby(input_string): #The string I want to encode is separated in different runs
        length_eachRun = len(list(g)) #Length of run
        if length_eachRun > 3: #If lenght of run is equal or greater than 3, I treat it
            code_stream+= str(k*3) + str((length_eachRun - 3)) #Calculate this: aaa(N-3) being N the lenght of run
        elif length_eachRun == 3: 
            code_stream+= k*length_eachRun + '0'
        else: #If its length is less than 3, I leave it as it is, I do not treat it
            code_stream+= k*length_eachRun
        #print(k, code_stream)
    return code_stream #I return the already coded string

def MPN5_decoding(input_string): #I define the method as MPN5_decoding by passing it a code-stream encoded by parameter
    original_code = '' #The decoded string that I will return
    for i in range(len(input_string)): #I go through each code-stream symbol
        #print(i, input_string[i])
        if input_string[i].isdigit(): #If it is a number, I will take the symbol that is repeated and write it the n times it repeats
            run = input_string[i-1]*int(input_string[i])
            original_code+= run
        else:
            original_code+= input_string[i] #If it is not a number, I write the symbol
    return original_code #I return the original code-stream (not coded)

message_encoded = MPN5_enconding('aaaabbbbbaaaaaabbbbbbbccccccaaab')
print(message_encoded)
message_decoded = MPN5_decoding(message_encoded)
print(message_decoded)

'aaaabbbbbaaaaaabbbbbbbccccccaaab' == message_decoded


aaa1bbb2aaa3bbb4ccc3aaa0b
aaaabbbbbaaaaaabbbbbbbccccccaaab


True

## A.4 [Burrows-Wheeler Transform (BWT)](https://scholar.google.es/scholar?hl=es&as_sdt=0%2C5&q=Burrows+M%2C+Wheeler+DJ%3A+A+Block+Sorting+Lossless+Data+Compression+Algorithm.&btnG=)

* BWT is an algorithm that inputs a string and outputs:
  
  1. A different string with the same symbols but with longer runs, and therefore potentially more compressible.
     The lengths of the runs are proportional to the correlation
     among symbols and the length of the input.
  2. An index.
  
* The inverse transform, using the output of the
  forward transform, recover the original string.
  
* Used in [`bzip2`](https://en.wikipedia.org/wiki/Bzip2).

### Encoder

Let $B$ the block-size in symbols:

1. While the input is not exhausted:
    1. Read $B$ symbols in `S`.
    2. (`L`, `I`) = BWT(`S`).
    3. Output `L` (the output block with longer runs) and `I` (an index of a symbol in `L`).

### Decoder

1. While input pairs (`L`, `I`):
    1. `S` = iBWT(`L`, `I`).
    2. Output `S`.

### Forward BWT

1. Input `S`, the sequence of `B` symbols.
    ```
    S = "abraca."
    ```
    
2. Compute a matrix `M'` with all possible cyclic rotations of `S`.

    ```
    M' = ["abraca.",
          "braca.a",
          "raca.ab",
          "aca.abr",
          "ca.abra",
          "a.abrac",
          ".abraca"]
    ```

3. Sort lexicographically `M'` go generate `M`.
    ```
    M = [".abraca",
         "a.abrac",
         "abraca.",
         "aca.abr",
         "braca.a",
         "ca.abra",
         "raca.ab"]
    ```

3. Let `I` the index of `S` in `M`.
    ```
    I = 2
    ```

4. Let `L` the column `B`-1 of `M`.
    ```
    L = "ac.raab"
    ```

2. Output `I` and `L`.

### Inverse BWT

The backward transform regenerates the `I`-th row of `M`. Here there is an example:

1. Input `I` and `L`, the output of a BWT applied to a string `S` of length `B`.

2. The first `F` and the last `L` columns of `M` are available taking into consideration that `F=sorted(L)`.
    ```
    F23456L
    .     a
    a     c
    a     .
    a     r
    b     a
    c     a
    r     b
    ```
    
3. Notice that for a particular symbol in `L`, the corresponding symbol in `F` follow it in `S`
   (for example, `r` follows `b` in `abraca.`). Therefore, we have found all pairs of `S` by
   taking pairs of `LF`.
    ```
    a.
    ca
    .a
    ra
    ab
    ac
    br
    ```
   Which sorted:
    ```
    .a
    a.
    ab
    ac
    br
    ca
    ra
    ```
   become the first two columns of `M`.

4. Repeat the process until getting the rest of the columns of `M` (here only a few are shown):

    ```
    F23456L
    .a    a
    a.    c
    ab    .
    ac    r
    br    a
    ca    a
    ra    b
    ```
   Now, for a particular symbol in `L`, the corresponding pair in columns `F`
   and `2` follows it in `S` (for example, pair `br` follows symbol `a` in `abraca.`).
   So, we can find all triples of `S` by tacking triples of `LF2`:
    ```
    a.a         .ab
    ac.         a.a
    .ab  sort   abr
    rac ------> aca
    abr         bra
    aca         ca.
    bra         rac
    ```
   to have the partial reconstruction of `M`:
    ```
    F23456L
    .ab   a
    a.a   c
    abr   . <- I
    aca   a
    bra   a
    ca.   a
    rac   b
    ```

In an optimized implementation of the BWT, only the row `I` is generated.

### Lab

In [1]:
# https://gist.github.com/dmckean/9723bc06254809e9068f

def BWT(S):
    
    if __debug__:
        print('Original string:')
        print("S =", S)
        
    n = len(S)

    N = [S[i:n]+S[0:i] for i in range(n)]

    if __debug__:
        print('')
        print('Permutations matrix:')
        counter = 0
        for i in N:
            print(i, counter)
            counter += 1

    # 1. Matrix of all possible rotations (cyclid shifts) of 's'.
    M = sorted(N)
    
    if __debug__:
        print('')
        print('Sorted matrix:')
        counter = 0
        for i in M:
            print(i, counter)
            counter += 1

    # 2. I = the index of 's' in 'M'.
    I = M.index(S) 
    
    # 3. L = the last column of 'M'.
    L = ''.join([q[-1] for q in M])

    if __debug__:
        print('')
        print('M\' = M rotated one character to the right')
        Mp = []
        for i in range(n):
            Mp.append(M[i][n-1:n]+M[i][0:n-1])
        for i in range(n):
            print(Mp[i])
    
    return (I, L)

from operator import itemgetter

def iBWT(I, L):
    n = len(L)
    
    # 1. Compute correspondence between the rows of M and M'.
    X = sorted([(i, x) for i, x in enumerate(L)], key=itemgetter(1))
    T = [None for i in range(n)]
    for i, y in enumerate(X):
        j, _ = y
        T[j] = i
        
    if __debug__:
        print("T = Positions of rows of M\' in M:", T)

    # 2. for i in range(n): S[n-1-i] = L[T^i[I]]
    # where T^0[x]=x and T^{i+1}[x] = T[T^i[x]].
    Tx = [I]
    for i in range(1, n):
        Tx.append(T[Tx[i-1]])
    if __debug__:
        print("Tx = Positions in L of output symbols (reversed) =", Tx)
    S = [L[i] for i in Tx]
    S.reverse()
    return ''.join(S)

print("ENCODING")
I, L = BWT('abraca.')
print("")
print ("I = {}, L = {}\n".format(I, L))
print("DECODING")
Sp = iBWT(I, L)
print ("S = {}".format(Sp))

print("")

print("ENCODING")
I, L = BWT('abababababababababababababababababababa')
print("")
print ("I = {}, L = {}\n".format(I, L))
print("DECODING")
Sp = iBWT(I, L)
print ("S = {}".format(Sp))

print("")

print("ENCODING (run are formed by the symbol that define a context)")
I, L = BWT('abacadaeaf')
print("")
print ("I = {}, L = {}\n".format(I, L))
print("DECODING")
Sp = iBWT(I, L)
print ("S = {}".format(Sp))

print("")

print("ENCODING (or by the symbol that goes after one or more contexts)")
I, L = BWT('bacadaeafa')
print("")
print ("I = {}, L = {}\n".format(I, L))
print("DECODING")
Sp = iBWT(I, L)
print ("S = {}".format(Sp))

print("")

print("ENCODING ( see https://link.springer.com/book/10.1007/978-0-387-78909-5 )")
I, L = BWT('aardvark$')
print("")
print ("I = {}, L = {}\n".format(I, L))
print("DECODING")
Sp = iBWT(I, L)
print ("S = {}".format(Sp))


ENCODING
Original string:
S = abraca.

Permutations matrix:
abraca. 0
braca.a 1
raca.ab 2
aca.abr 3
ca.abra 4
a.abrac 5
.abraca 6

Sorted matrix:
.abraca 0
a.abrac 1
abraca. 2
aca.abr 3
braca.a 4
ca.abra 5
raca.ab 6

M' = M rotated one character to the right
a.abrac
ca.abra
.abraca
raca.ab
abraca.
aca.abr
braca.a

I = 2, L = ac.raab

DECODING
T = Positions of rows of M' in M: [1, 5, 0, 6, 2, 3, 4]
Tx = Positions in L of output symbols (reversed) = [2, 0, 1, 5, 3, 6, 4]
S = abraca.

ENCODING
Original string:
S = abababababababababababababababababababa

Permutations matrix:
abababababababababababababababababababa 0
bababababababababababababababababababaa 1
ababababababababababababababababababaab 2
babababababababababababababababababaaba 3
abababababababababababababababababaabab 4
bababababababababababababababababaababa 5
ababababababababababababababababaababab 6
babababababababababababababababaabababa 7
abababababababababababababababaabababab 8
bababababababababababababababaababababa 9
a

## B. String encoding

### How it works?

* We replace strings by shorter code-words.
* Strings are searched in a dictionary, and the sequence of positions of the strings in the dictionary form the code-stream.

### B.1 LZ77 [[Ziv and Lempel, 1977]](https://scholar.google.es/scholar?hl=es&as_sdt=0%2C5&q=Ziv+Lempel+universal+sequential+data+compression+1977&btnG=)

* Jacov Ziv and Abraham Lempel proposed the LZ77 algorithm in 1977. 
* In the eighties, [LZSS](https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Storer%E2%80%93Szymanski) (a branch of LZ77) was
  implemented by Haruyasu Yoshizaki (in the [LHA compressor](https://en.wikipedia.org/wiki/LHA_(file_format)), discovering
  the possibilities of the LZ77 encoder.
* After that, a large number of text compressors have been based
  on LZ77 (or a variation of it). Some of the most famous
  are: [ARJ](https://en.wikipedia.org/wiki/ARJ), [RAR](https://en.wikipedia.org/wiki/RAR_(file_format), [gzip](https://en.wikipedia.org/wiki/Gzip) and [7z](https://en.wikipedia.org/wiki/7z).
* LZ77 processes a sequence of symbols using the structure:

<img src="data/LZ77.png" width="600">

* The dictionary and the look-ahead buffer have a fixed size and
  can be considered as a sliding window moving over the symbols while they are coded.
  In each iteration, the input of new
  symbols to the buffer generates the output of the oldest ones, which become the
  newest symbols of the dictionary.

### Encoder

1. Let $I$ the length of the dictionary and $J$ the length of the
   buffer.
2. Input the first $J$ symbols in the buffer.
3. While the input is not exhausted:
    1. Let $i$ the position of the first $j$
    symbols of the buffer in the dictionary, and $k$ the symbol that makes that $j$ can
    not be larger.
    2. Output $ijk$.
    3. Input the next $j+1$ symbols in the buffer.

### Example
<img src="data/LZ77_encoding_example.png" width="800">

### Decoder

1. While code-words $ijk$ are input:
    1. Output the $j$ symbols extracted from the position $i$ in the
    dictionary.
    2. Output $k$.
    3. Put all the decoded symbols in the beginnig of the buffer.

### Example
<img src="data/LZ77_decoding_example.png" width="500">

* Parameters $I$ and $J$ control the performance
  of the algorithm. They should be large enough to guarantee the
  matching of long strings, but should be small in order to reduce
  the number of bits of the code-words $ijk$. Typical sizes are:
  $\log_2(I)=12.0$ and $\log_2(J)=4.0$.

### Lab

In [1]:
# To-do.
import math
from bitarray import bitarray


class LZ77Compressor:
    """
    A simplified implementation of the LZ77 Compression Algorithm
    """
    MAX_WINDOW_SIZE = 400

    def __init__(self, window_size=20):
        self.window_size = min(window_size, self.MAX_WINDOW_SIZE)  
        self.lookahead_buffer_size = 15 # length of match is at most 4 bits

    def compress(self, input_file_path, output_file_path, verbose=True):
        data = None
        #data = 'mahi magi magi mahi mahi hello mahi madi mahi mahi mahi facebook hello world mahi magi magi mahi mahi hello mahi madi mahi mahi mahi facebook hello world mahi magi magi mahi mahi hello mahi madi mahi mahi mahi facebook hello world '
        #data = input_data
        i = 0
        output_buffer = bitarray(endian='big')

        # read the input file 
        try:
            with open(input_file_path, 'r') as input_file:
                data = input_file.read()
        except IOError:
            print ('Could not open input file ...')
            raise

        while i < (len(data)-1):
            #print i

            match = self.findLongestMatch(data, i)

            if match: 
                # Add 1 bit flag, followed by 12 bit for distance, and 4 bit for the length
                # of the match 
                (bestMatchDistance, bestMatchLength) = match

                output_buffer.append(True)
                output_buffer.frombytes(bytes(chr(bestMatchDistance >> 4),"latin-1"))
                output_buffer.frombytes(bytes(chr(((bestMatchDistance & 0xf) << 4) | bestMatchLength),"latin-1"))

                if verbose:
                    print(("<1, %i, %i>") % (bestMatchDistance, bestMatchLength), end=' ')

                i += bestMatchLength

            else:
                # No useful match was found. Add 0 bit flag, followed by 8 bit for the character
                output_buffer.append(False)
                output_buffer.frombytes(bytes(data[i],"latin-1"))

                if verbose:
                    print(("<0, %s>") % data[i], end=' ')

                i += 1

        # fill the buffer with zeros if the number of bits is not a multiple of 8		
        output_buffer.fill()

        # write the compressed data into a binary file if a path is provided
        if output_file_path:
            try:
                with open(output_file_path, 'wb') as output_file:
                    output_file.write(output_buffer.tobytes())
                    print ("\n" + "File was compressed successfully and saved to output path ...")
                    return None
            except IOError:
                print ('Could not write to output file path. Please check if the path is correct ...')
                raise

        # an output file path was not provided, return the compressed data
        return output_buffer


    def decompress(self, input_file_path, output_file_path):
        data = bitarray(endian='big')
        output_buffer = []

        # read the input file
        try:
            with open(input_file_path, 'rb') as input_file:
                data.fromfile(input_file)
        except IOError:
            print ('Could not open input file ...')
            raise

        while (len(data)-1) >= 9:

            flag = data.pop(0)

            if not flag:
                byte = data[0:8].tobytes()

                output_buffer.append(byte)
                del data[0:8]
            else:
                byte1 = ord(data[0:8].tobytes())
                byte2 = ord(data[8:16].tobytes())

                del data[0:16]
                distance = (byte1 << 4) | (byte2 >> 4)
                length = (byte2 & 0xf)

                for i in range(length):
                    output_buffer.append(output_buffer[-distance])
        out_data =  b''.join(output_buffer)

        if output_file_path:
            try:
                with open(output_file_path, 'wb') as output_file:
                    output_file.write(out_data)
                    print ('File was decompressed successfully and saved to output path ...')
                    return None 
            except IOError:
                print ('Could not write to output file path. Please check if the path is correct ...')
                raise 
        return out_data

    def findLongestMatch(self, data, current_position):
        end_of_buffer = min(current_position + self.lookahead_buffer_size, len(data) + 1)

        best_match_distance = -1
        best_match_length = -1

        # Optimization: Only consider substrings of length 2 and greater, and just 
        # output any substring of length 1 (8 bits uncompressed is better than 13 bits
        # for the flag, distance, and length)
        for j in range(current_position + 2, end_of_buffer):

            start_index = max(0, current_position - self.window_size)
            substring = data[current_position:j]

            for i in range(start_index, current_position):

                repetitions = int(len(substring) / (current_position - i))

                last = int(len(substring) % (current_position - i))

                matched_string = data[i:current_position] * repetitions + data[i:i+last]

                if matched_string == substring and len(substring) > best_match_length:
                    best_match_distance = current_position - i 
                    best_match_length = len(substring)

        if best_match_distance > 0 and best_match_length > 0:
            return (best_match_distance, best_match_length)
        
        return None
    
if __name__ == "__main__":  
    compressor = LZ77Compressor(window_size=20) # window_size is optional

    #Read from a file and Write to a file
    input_file_path = 'input.txt'
    output_file_path = 'output.txt'
    result_file_path = 'result.txt'
    
    compressed_data = compressor.compress(input_file_path,output_file_path) #Compress the "input_file"
    decompressed_data = str(compressor.decompress(output_file_path,result_file_path))#Decompress the "output_file"

    print ("\n")
    with open(input_file_path, 'r') as input_file: #Read a file and save the result in the variable "original_data"
                original_data = input_file.read()
                original_data = original_data[:-1] #Delete the last character, \n

    with open(result_file_path, 'r') as input_file: #Read a file and save the result in the variable "result_data"
                result_data = input_file.read()
    
    print("The content of the first file is:" , original_data)
    print("The content of the decompress file is:" , result_data)

    if original_data == result_data:
        result = True
    else:
        result = False

    print("Are both files similar?", result)

<0, h> <0, o> <0, l> <0, a> <0,  > <1, 5, 4> 
File was compressed successfully and saved to output path ...
File was decompressed successfully and saved to output path ...


The content of the first file is: hola hola
The content of the decompress file is: hola hola
Are both files similar? True


### B.2 LZ78 [[Ziv and Lempel, 1978]](https://scholar.google.es/scholar?hl=es&as_sdt=0%2C5&q=Ziv+Lempel+1978&btnG=)

* In 1978, Ziv and Lempel published the [LZ78 algorithm](https://en.wikipedia.org/wiki/LZ77_and_LZ78).

* LZ78 represents the dictionary in a recursive way with the idea
  of reducing the memory used for representing the strings in the dictionary. Now, each
  entry in the dictionary is a pair $wk$, where $w$ is an index to
  an entry of the dictionary and $k$ is a symbol. In fact, each pair $wk$
  represents the string that results from the concatenation of the string
  $w$ and the symbol $k$, where $w$ can be recursively computed in the same way
  we have found $wk$.
  
* We will denote the string that $w$ represents by *string*$(w)$.
  
* The empty string is obtained by *string*$(0)$.

### Encoder

1. $w\leftarrow 0$.
2. While the input is not exhausted:
    1. $k\leftarrow$ next input symbol.
    2. If $wk$ is found in the dictionary, then:
        1. $w\leftarrow$ address of $wk$ in the dictionary.
    3. Else:
        1. Output $wk$.
        2. Insert $wk$ in the dictionary.
        3. $w\leftarrow 0$.

### Example
<img src="data/LZ78_encoding_example.png" width="700">

### Decoder

1. While the input is not exhausted:
    1. Input $wk$.
    2. Output $\text{string}(w)$.
    3. Output $k$.
    4. Insert $wk$ in the dictionary.

### Example
<img src="data/LZ78_decoding_example.png" width="600">

### Lab

In [1]:
# TO-DO

### B.3 LZW [[Welch, 1984]](https://scholar.google.es/scholar?hl=es&as_sdt=0%2C5&q=Terry+Welch+1984&btnG=)

* In 1984, Terry A. Welch proposed the [LZW algorithm](https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch),
  an improved version of the LZ78 algorithm that does not
  writes raw symbols ($k$ fields) to the code-stream.

* LZW was selected as the encoding engine for the [GIF (Graphics
  Interchange Format)](https://en.wikipedia.org/wiki/GIF), and for the compressor [`compress`](https://en.wikipedia.org/wiki/Compress).
  
* Initially, the dictionary is filled with the $2^k$ possible
  symbols (*roots*), that are stored in the first entries (for 1-byte symbols: $0\cdots255$).

### Encoder

1. $w\leftarrow$ first input symbol.
2. While the input is not exhausted:
    1. $k\leftarrow$ next input symbol.
    2. If $wk$ is found in the dictionary, then:
        1. $w\leftarrow$ address of $wk$ in the dictionary.
    3. Else:
        1. Output $\leftarrow w$.
        2. Insert $wk$ in the dictionary.
        3. $w\leftarrow k$.

### Example
<img src="data/LZW_encoding_example.png" width="800">

### Decoder

1. $c\leftarrow$ first input code-word.
2. Output $c$.
3. $c'\leftarrow c$.
4. While the input is not exhausted:
    1. $c\leftarrow$ next input code-word.
    2. $w\leftarrow c'$.
    3. If $c$ is found in the dictionary, then:
        1. Output string$(c)$.
    4. Else:
        1. Output string$(w)$.
        2. Output $k$.
    5. $k\leftarrow$ first symbol of the last output.
    6. Insert $wk$ in the dictionary.
    7. $c'\leftarrow c$.

### Example
<img src="data/LZW_decoding_example.png" width="600">

### Lab

In [None]:
# https://rosettacode.org/wiki/LZW_compression#Python

# TO-DO

## C. Symbol encoding

### How it works?

* We can compress a sequence of symbols if each one is translated by a code-word and,
  in average, the lengths of the code-words are smaller than the
  length of the symbols.
  
* The encoder and the decoder have a probabilistic model $M$ which
  provides to a variable-length encoder ($C$)/decoder($C^{-1}$) the
  probability $p(s)$ of each symbol $s$.
  
* The most probable symbols are represented by the shorter
  code-words and viceversa.

<img src="data/compresion_entropica.png" width="600">


### Bits, data and information

* data != information (data is the representation of the information).

* Lossless data compression uses a shorter representation for
  information.
  
* By definition, a bit of data stores a bit of information, if and
  only if, it represents the occurrence of an equiprobable event (an
  event that can be true or false with the same probability).
  In this ideal situation, the representation is fully efficient
  (no futher compression would be possible).
  
* By definition, a symbol $s$ with probability $p(s)$ stores
  \begin{equation}
    I(s)=-\log_2 p(s) \tag{Eq:symbol_information}
  \end{equation}
  bits of information.

  <img src="data/prob_vs_long.png" width="600">

* So, ideally, the length of a code-word in bits (of data) should match with the number bits of information.

### Entropy

* The entropy $H(S)$ measures the amount of information per
  symbol that a source of information $S$ produces, in average. By definition
  \begin{equation}
    H(S) = \frac{1}{N}\sum_{s=1}^{N} p(s)\times I(s)
  \end{equation}
  bits-of-information/symbol, where $N$ is the size of the source
  alphabet (number of different symbols).

### C.1 Basic compression algorithm

#### Encoding of a symbol

1. While the decoder does not know the symbol:
    1. Assert something about the symbol that allows to the decoder
    to minimize the uncertainty of finding that symbol. This guess
    should have true or false with the same probability.
    2. Output a bit of code that says if the last guess is true or
    false.
    
#### Decoding of a symbol

1. While the symbol is not known without uncertainty:
    1. Make the same guess that the encoder.
    2. Input a bit of code that represents the result of the last
    guess.
    
#### Tip

* This codec is 100% efficient if the guesses are equiprobable.

### Example

* Let's suppose that we use the Spanish alphabet. Humans know that
  symbols does not form words in any order, so we can
  formulate the following VLC (Variable Length Codec):
  
  In Spanish there are 28 letters. Therefore, to encode, for example,
  the word `preciosa`, the first symbol `p` can be represented by
  its index inside of the Spahish alphabet with a code-word of 5 bits. In
  this try, the encoding is not a very efficient, but this we are in first
  letter ... For the second one `r` we can see (using a
  Spanish dictionary) that after a `p`, the following symbols are
  possible: (1) `a`, (2) `e`, (3) `i`, (4) `l`, (5) `n`, (6)
  `o`, (7) `r`, (8) `s` and (9) `u`. Therefore, we don't need
  5 bits now, 4 are enough.
  
<img src="data/universal_coding_example.png" width="600">

* Notice that the compression ratio has been 40/25:1 (`preciosa` has 8
  letters).

### C.2 [Shannon-Fano coding](https://en.wikipedia.org/wiki/Shannon%E2%80%93Fano_coding) [[Shannon, 1948]](https://scholar.google.es/scholar?hl=es&as_sdt=0%2C5&q=Shannon+2001+A+Mathematical+Theory+of+Communication&btnG=),  [[Fano, 1949]](https://scholar.google.es/scholar?hl=es&as_sdt=0%2C5&q=Fano+1949+%22The+transmission+of+information%22&btnG=)

* At the end of the 40's, Claude E. Shannon (Bell Labs) and
  R.M. Fano (MIT) developed the Shannnon-Fano codec.

### Encoder

1. Sort the symbols using their probabilities.
2. Split the set of symbols into two subsets in a way in which the
   each subset have the same total probability.
3. Assign a different bit to each set.
4. Repeat the previous procedure to each subset until  each subset
   has only one symbol.

### Example

* Let's use the following probabilistic model:

<img src="data/shannon-fano_example.png" width="180">

Using it, this is the Shannon-Fano coding:

<img src="data/shannon-fano_example-coding.png" width="1000">

### Decoder

TO-DO.

### C.3 [Huffman coding](https://en.wikipedia.org/wiki/Huffman_coding) [[Huffman, 1952]](https://scholar.google.es/scholar?hl=es&as_sdt=0%2C5&q=huffman+method+codes+1952&btnG=)
 
* (Absolute) Optimal performance (in average, better than Shannon-Fano) when a integer
  number of bits is assigned to each symbol.
* Huffman-based VLC codecs build a binary tree where the symbols are stored
  in the leafs and the distance of each symbol to the root of the tree
  is $\lceil\log_2(p(s))\rceil$.
* After label each binary branch in the tree, the Huffman
  code-word for the symbol $s$ is the sequence of bits (labels) that
  we must use to travel from the root to the $s$-leaf.

### Building Huffman trees

1. Create a list of nodes. Each node stores a symbol and its
   probability.
2. While the number of nodes in the list > 1:
    1. Extract from the list the 2 nodes with the lowest probability.
    2. Insert in the list a new node (that is the root of a binary
       tree) whose probability is the sum of the probability of its
       leafs.

### Example

<img src="data/huffman_ejemplo.png" width="600">

### Encoder
TO-DO

### Example
TO-DO

### Decoder
TO-DO

### Example
TO-DO

### Limits

* Any Huffman code satisfies that
  \begin{equation}
    l\big(c(s)\big) = \lceil I(s)\rceil, \tag{Eq:Huffman}
  \end{equation}
  where $l\big(c(s)\big)$ is the length of the code-word assigned to
  the symbol $s$. This implies that, with each encoded symbol, up to 1 bit of
  redundant data can be introduced (think about a very frequent -- high probability -- symbol).
  
* This is a problem that grows when the size of the alphabet is
  small. In the extreme case, for binary source alphabets, the Huffman
  coding does not change the length of the original representation.

### C.4 Arithmetic coding

* Arithmetic coding relaxes the Eq. (Eq:Huffman),
  verifying that, for every encoded symbol, 
  \begin{equation}
    l\big(c(s)\big) = I(s), \tag{Eq:arithmetic}
  \end{equation}
  i.e. the number of bits of data (code-word) assigned by the encoder
  is equal to the number of bits of information that the symbol
  represent.

<img src="data/comparacion.png" width="800">

* It can be said that, arithmetic coding is optimal because
  the average length of an arithmetic code is equal to the entropy of
  the information source, measured in bits/symbol.

### An ideal encoder

1. Let $[L,H)\leftarrow [0.0,1.0)$ an interval of real numbers.
2. While the input is not exhausted:
    1. Split $[L,H)$ into so many sub-intervals as different symbols
       are in the alphabet. The size of each sub-interval is proportional
       to the probability of the corresponding symbol.
    2. Select the sub-interval $[L',H')$ associated with the encoded
       symbol.
    3. $[L,H)\leftarrow [L',H')$.
3. Output a real number $x\in[L,H)$ (the arithmetic
   code-stream). The number of decimals of $x$ should be large enough
   to distinguish the final sub-interval $[L,H)$ from the rest of
   possibilities.

### Example

* Imagine a binary sequence, where $p(\text{A})=3/4$ and
  $p(\text{B})=1/4$. Compute the arithmetic code of the sequences A, B,
  AA, AB, BA y BB.
  
<img src="data/aritmetica_ejemplo.png" width="500">

### An ideal decoder

1. Let $[L,H)\leftarrow [0.0,1.0)$ the initial interval.
2. While the input is not exhausted:
    1. Split $[L,H)$ in so many sub-intervals as different symbols
       are in the alphabet. The size of each sub-interval is proportional
       to the probability of the corresponding symbol.
    2. Input so many bits of $x$ as they are needed to:
        1. Select the sub-interval $[L',H')$ that contains $x$.
        2. Output the symbol that $[L',H')$ represents.
        3. $[L,H)\leftarrow[L',H')$.

### Example
TO-DO

### Incremental transmission

* It is not necessary to wait for the end of the encoding to
  generate the arithmetic code. When we work with binary
  representations of the real numbers $L$ and $H$, their most
  significant bits become identical when the interval is reduced. These
  bits belong to the output arithmetic code, therefore, they
  can be output as soon as they match.
  
  For example, when the symbol B is encoded, a code-bit 1 can be
  output because any sequence of symbols that start with B have a
  code-word that begins with 1.
    
* When the most significant bits of $L$ and $H$ are output, the
  bits of each register are shifted to the left, and new bits need to
  be inserted. The results is an automatic zoom of the selected
  sub-interval.

  Following with the previous example, the register shifting generates
  an ampliation of the $[0.50,1.00)$ interval to the $[0.00,1.00)$.

### Lab
TO-DO.

## C.5 Probabilistic models

* In order to use any of the previous VLCs, a probabilistic model is always needed.

### C.5.1 Static models

* Static models are the simplest ones because the probabilities of the symbols
  remain constant.
* The variable-length codec can be precomputed.
* If the last premise is true, the entropy codec is efficient an
  fast. For this reason, static models are very common in codecs such
  as JPEG, MPEG (audio and video), etc.

### C.5.2 Adaptive models

* The probabilities of the symbols are computed in run-time.
* In general, the compression ratios that adaptive models
  get are better than static model's ones, because the
  probabilities of the symbols are localy computed
  (think of the sequence `aaaaaaaaaaaaaabbbbbbbbbbbbbbb`).

### Encoding

1. Asign the same probability to all the symbols.
2. While the input if not exhausted:
    1. Encode the next symbol.
    2. Update (increase) its probability.

### Example
TO-DO

### Decoding

1. Identical to the step 1 of the encoder.
2. While the input is not exhausted:
    1. Decode the next symbol.
    2. Identical to the step 2.B of the encoder.

### Example
TO-DO

### C.5.3 Initially empty models

* The smaller the number of symbols used by the model, the higher
  the probabilities, and therefore, the better the compression ratios.
* An initially empty model only stores the ESC(cape) symbol, a
  symbol that it is used by the encoder only when a new symbol is
  found.

### Encoder

1. Set the probability of the $\text{ESC}$ to $1.0$ (and the probability of
   the rest of the symbols to $0.0$).
2. While the input is not exhausted:
    1. $s\leftarrow$ next symbol.
    2. If $s$ has been found before, then:
        1. Encode $s$ and output $c(s)$.
    3. Else:
        1. Output $c(\text{ESC})$.
        2. Output a raw symbol $s$.
    4. Update $p(s)$.

### Example
TO-DO

### Decoder

1. Identical to the step 1 of the encoder.
2. While the input is not exhausted:
    1. $c(s)\leftarrow $ next code-word.
    2. Decode $s$.
    3. If $s=\text{ESC}$, then:
        1. Input a raw symbol $s$.
    4. Update $p(s)$.
    5. Output $s$.

### Example
TO-DO

   
### C.5.4 Models with memory

* In most cases, the probability of a symbol depends on its
  neighborhood (context).
* The higher the memory of the model (the context), the higher the
  accuracy of the predictions (probabilities), and therefore, the
  lower the length of the code-words \cite{Cleary.PPM}.
* Let ${\cal C}[i]$ the last $i$ encoded symbols and
  $p(s|{\cal C}[i])$ the probability that the symbol $s$ follows
  the context ${\cal C}[i]$.
* Let $k$ the maximal order of the prediction (i.e. the largest
  number of symbols of ${\cal C}[]$ that are going to be used as the
  actual context). Notice that ${\cal C}[0]=\varnothing$ and the model
  has no memory.
* We suppose that arithmetic coding is used and therefore, when we
  input or output $c(s)$, we are transmitting $I(s)$ bits of code.
* Let $r$ the size of the source alphabet.

### Encoder

1. Create an empty model for every context $0\le i \le k$.
2. Create an non-empty model for $k=-1$.
3. While the input is not exhausted:
    1. $s\leftarrow$ Input$_{\log_2(r)}$.
    2. $i\leftarrow k$ (except for the first symbol, where
       $i\leftarrow 0$).
    3. While $p(s|{\cal C}[i])=0$ (it is the first time that $s$ follows
       ${\cal C}[i]$):
        1. Output $\leftarrow c(\text{ESC}|{\cal C}[i])$.
        2. Update $p(\text{ESC}|{\cal C}[i])$.
        3. Update $p(s|{\cal C}[i])$ (insert $s$ into the ${\cal C}[i]$ context).
        4. $i\leftarrow i-1$.
    4. Output $\leftarrow c(s|{\cal C}[i])$. The symbols that were in
       contexts with order $>i$ must be excluded of the actual (${\cal C}[i]$) context because $s$ is not none of them.
    5. If $i\ge 0$, update $p(s|{\cal C}[i])$.
    
### Example

* Let $r=256$ the size of the source alphabet.

* The probabilistic model $M[{\cal C}[-1]]$ (for the special context
  ${\cal C}[-1]$) is non adaptative, non empty and has an special symbol EOF
  (End Of File) that is going to be used when the compression has
  finished:
  $$M[{\cal C}[-1]]=\{0,1~1,1~\cdots~\mathtt{a},1~\mathtt{b},1~\cdots~255,1~\text{EOF},1\}.$$
  In a pair $a,b$, $a$ is the symbol and $b$ is its probability (counts).

* $M[{\cal C}[0]]$ is adaptative and empty:
  $$M[{\cal C}[0]]=\{\text{ESC},1\}.$$

* In this example (for the sake of the simplicity), the maximal
  order of the prediction $k=1$ (we only remember the previous
  symbol). Therefore, there are $r=256$ probabilistic models:
  $$M[{\cal C}[1]]=\{\text{ESC},1\}, 0\le {\cal C}[1]\le r.$$
  
* Encoding of the first symbol (\texttt{a}) (see Figure~\ref{fig:PPM}):

1. [3.A] $s\leftarrow$ \texttt{a}.
2. [3.B] $i\leftarrow 0$ (we don't know the previous symbol).
3. [3.C] $p(\mathtt{a}|{\cal C}[0])=0$ (the context only has the ESC).
4. [3.C.a] Output $\leftarrow c(\text{ESC}|{\cal C}[0])$ (althought
    $l(c(\text{ESC}|{\cal C}[0]))=0$).
5. [3.C.b] Update $p(\text{ESC}|{\cal C}[0])$ (now, the count of ESC is
    2).
6. [3.C.c] Insert \texttt{a} into
    $M[{\cal C}[0]]=\{\mathsf{ESC},2~\mathtt{a},1\}$.
7. [3.C.d] $i\leftarrow -1$.
8. [3.c] $p(\mathtt{a}|{\cal C}[-1])\neq 0$.
9. [3.d] Output $\leftarrow c(\texttt{a}|{\cal C}[-1])$ where
    $p(\texttt{a}|{\cal C}[-1]) = 1/(256+1)$.
    
* Encoding of the second symbol (\texttt{b}):

1. [3.a] $s\leftarrow$ \texttt{b}.
2. [3.b] $i\leftarrow 1$.
3. [3.c] $p(\mathtt{b}|{\cal C}[1])=0$ because ${\cal C}[1]=\texttt{a}$ and
   $M[\texttt{a}]=\{\text{ESC},1\}$.
4. [3.c.i] Output $\leftarrow c(\text{ESC}|\texttt{a})$ (althought
   $l(c(\text{ESC}|\texttt{a}))=0$).
5. [3.c.ii] Update $p(\text{ESC}|\texttt{a})$ (now, the count of ESC is 2).
6. [3.c.iii] Insert \texttt{b} into $M[\texttt{a}]=\{\text{ESC},2~ \texttt{b},1\}$.
7. [3.c.iv] $i\leftarrow 0$.
8. [3.c] $p(\mathtt{b}|{\cal C}[0])=0$ because
   $M[{\cal C}[0]]=\{\mathsf{ESC},2~\texttt{a},1\}$.
9. [3.c.i] Output $\leftarrow c(\text{ESC}|{\cal C}[0])$ where
   $p(\text{ESC}|{\cal C}[0]) = 2/3$.
10. [3.c.ii] Update $p(\text{ESC}|{\cal C}[0])$ (now, the count of ESC is
    3).
11. [3.c.iii] Insert \texttt{b} into $M[{\cal C}[0]] = \{\text{ESC},3~
    \texttt{a},1~ \texttt{b},1\}$.
12. [3.c.iv] $i\leftarrow -1$.
13. [3.c] $p(\mathtt{b}|{\cal C}[-1])\neq 0$.
14. [3.d] Output $\leftarrow c(\texttt{b}|{\cal C}[-1])$ where
    $p(\mathtt{b}|{\cal C}[-1]) = 1/r$. The symbol \texttt{a} has been
    excluded in the calculus of the probability of \texttt{b} because
    $\texttt{a}\in M[{\cal C}[0]] = \{\text{ESC},3~ \texttt{a},1~
    \texttt{b},1\}$.

<img src="00-fundamentals/PPM_example.png" style="width: 800px;" align="middle"/>

### Decoder

1. Equal to the step 1 of the encoder.
2. While the input is not exhausted:
    1. $i\leftarrow k$ (except for the first symbol, where $i\leftarrow 0$).
    2. $s\leftarrow$ next decoded symbol.
    3. While $s=\text{ESC}$:
        1. Update $p(\text{ESC}|{\cal C}[i])$.
        2. $i\leftarrow i-1$.
        3. $s\leftarrow$ next decoded symbol.
    4. Update $p(s|{\cal C}[i])$.
    5. While $i<k$:
        1. $i\leftarrow i+1$.
        2. Update $p(s|{\cal C}[i])$ (insert $s$ into the ${\cal C}[i]$ context).
        
### Lab
TO-DO

## C.6 MTF (Move To Front) transform

* Inputs a sequence of symbols and outputs a sequence of symbols.

* The size (in bits of data) for each sequence is the same.

* The entropy of the output is lower that the input's one.

* Performs a change in the representation of the symbols where
  those symbols that have a high probability of occurrency are
  ``moved'' in the source alphabet towards decreasing positions.

* The probability density function follows an exponential
  distribution with a slope $\lambda$ where
\begin{equation}
  f(x.\lambda) = \left\{ \begin{array}{ll}
      \lambda e^{-\lambda x} & \mbox{if $x \geq 0$};\\
      0 & \mbox{if $x < 0$}.\end{array} \right.
\end{equation}

<img src="00-fundamentals/exponential.svg" style="width: 600px;" align="middle"/>

### Forward transform

1. Create a list $L$ with the symbols of the source alphabet
  where $$L[s]\leftarrow s; 0\le s\le r.$$
2. While the input is not exhausted:
    1. $s\leftarrow$ next input symbol.
    2. $c\leftarrow$ position of $s$ in $L$ ($L[c]=s$).
    3. Output $\leftarrow c$.
    4. Move $s$ to the front of $L$.
    
### Example
Not copied?
    
### Inverse transform

1. The step 1 of the forward transform.
2. While the input is not exausted:
    1. $c\leftarrow$ next input code.
    2. $s\leftarrow L[c]$.
    3.  Output $s$.
    4. The step 2.C of the forward transform.
    
### Example

<img src="00-fundamentals/MTF_example.png" style="width: 250px;" align="middle"/>

### Lab
TO-DO

## C.7 Context-based Text Predictive transform

* The MTF uses a model where a symbol that has happened only one
  time can get a index-code that is lower than the index-code of a
  symbol that has been found thousands of times :-(

* We can solve this problem if the positions of the symbols are
  determined by their probability. In other words, the list $L$ will
  be sorted by the ocurrence of the symbols.
  
### 0-order encoder

1. The step 1 of the MTF transform, although now every node of the
   list stores also a count of the symbol.
2. While the input is not exhausted:
    1. $s\leftarrow$ next input symbol.
    2. $c\leftarrow$ position of $s$ in $L$ (the prediction error).
    3. Output $\leftarrow c$.
    4. Update the count of $L[c]$ (the count of $s$) and keep sorted $L$.

### Example

<img src="00-fundamentals/TPT_example.svg" style="width: 450px;" align="middle"/>

### 0-order decoder

1. The step 1 of the encoder.
2. While the input is not exhausted:
    1. $c\leftarrow$ next input code.
    2. $s\leftarrow L[c]$.
    3. Output $s$.
    4. Step 2.D of the encoder.
    
### Example
TO-DO
    
### $N$-order encoder

1. Let ${\cal C}[i]$ the context of $s$ and $L_{{\cal C}[i]}$ the
   list for that context. If $i>0$ then the lists are empty, else, the
   list is full and the count of every node is $0$.
2. Let $N$ the order of the prediction.
3. Let $H=\varnothing$ a list of tested symbols. All symbols in $H$
   must be different.
4. While the input is not exhausted:
    1. $s\leftarrow$ the next input symbol.
    2. $i\leftarrow k$ (except for the first symbol, where $i\leftarrow 0$).
    3. While $s\notin L_{{\cal C}[i]}$:
        1. $H\leftarrow \text{reduce}(H\cup L_{{\cal C}[i]})$. (reduce$()$ deletes the repeated nodes).
        2. Update the count of $s$ in $L_{{\cal C}[i]}$ and keep sorted it.
        3. $i\leftarrow i-1$.
    4. Let $c$ the position of $s$ en $L_{{\cal C}[i]}$.
    5. $c\leftarrow c+$ symbols of $H-L_{{\cal C}[i]}$. In this
       way, the decoder will know the length of the context where $s$
       happens and does not count the same symbol twice.
    6. Output $\leftarrow c$.
    7. Update the count of $s$ in $L_{{\cal C}[i]}$ and keep sorted it.
    8. $H\leftarrow\varnothing$.
    
### Example ($k=1$)

<img src="00-fundamentals/TPT_example.png" style="width: 450px;" align="middle"/>

### $N$-order decoder

1. Steps 1, 2 and 3 of the encoder.
2. While the input is not exhausted:
    1. $c\leftarrow$ the next input code.
    2. $i\leftarrow k$ (except for the first symbol, where $i\leftarrow 0$).
    3. While $L_{{\cal C}[i]}[c]=\varnothing$:
        1. $H\leftarrow \text{reduce}(H\cup L_{{\cal C}[i]})$.
        2. $i\leftarrow i-1$.
    4. $s\leftarrow \text{reduce}(H\cup L_{{\cal C}[i]})[c]$.
    5. Update the count of $L_{{\cal C}[i]}[c]$.
    6. While $i<k$:
        1. $i\leftarrow i+1$.
        2. Insert the symbol $s$ in $L_{{\cal C}[i]}$.
        
### Example
TO-DO

## C.8 Unary coding

* It is a particular case of the Huffman code where the number of
  bits of each code-word (minus one) is equal to the index of the
  symbol in the source alphabet. Example:
  
<img src="00-fundamentals/Unary_example.svg" style="width: 150px;" align="middle"/>

* The unary coding is only optimal when (see Equation
  \ref{eq:symbol_information})
  \begin{equation}
    p(s) = 2^{-(s+1)} \tag{Eq:Unary}
  \end{equation}
  where $s=0,1,\cdots$.
  
<img src="00-fundamentals/unary.png" style="width: 800px;" align="middle"/>

## C.9 Golomb coding [[Golomb, 1966]](https://scholar.google.es/scholar?hl=es&as_sdt=0%2C5&q=golomb+1966+run&btnG=)

* When the probabilities of the symbols follow an exponential
  distribution, the Golomg encoder has the same efficiency than the
  Huffman coding, but it is faster. In this case, the probabilities of
  the symbols shoud be
  
  \begin{equation}
    p(s) =
    2^{\displaystyle-\Big(\displaystyle m\big\lfloor\displaystyle\frac{s+m}{m}\big\rfloor\Big)}
    \tag{Eq:Golomb}
  \end{equation}
  where $s=0,1,\cdots$ is the symbol and $m=0,1,\cdots$ is the
  ``Golomb slope'' of the distribution.
  
* For $m=2^k$, it is possible to find a very efficient
  implementation and the encoder is also named Rice
  encoder~\cite{Rice79}. In this case
  
  \begin{equation}
    p(s) =
    2^{\displaystyle-\Big(2^k \displaystyle\big\lfloor\displaystyle\frac{s+2^k}{2^k}\big\rfloor\Big)}
    \tag{Eq:Rice}
    \label{eq:Rice}
  \end{equation}

<img src="00-fundamentals/Golomb_coding.png" style="width: 600px;" align="middle"/>

* Notice that for $m=1$, we take the unary encoding.

<img src="00-fundamentals/Golomb.png" style="width: 600px;" align="middle"/>

### Encoder

1. $k\leftarrow \lceil\log_2(m)\rceil$.
2. $r\leftarrow s~\mathrm{mod}~m$.
3. $t\leftarrow 2^k-m$.
4. Output $(s~\mathrm{div}~m)$ using an unary code.
5. If $r<t$:
    1. Output the integer encoded in the $k-1$ least significant bits of $r$ using a binary code.
6. Else:
    1. $r\leftarrow r+t$.
    2. Output the integer encoded in the $k$ least significant bits of $r$ using a binary code.

### Example ($m=7$ and $s=8$)

1. [1] $k\leftarrow \lceil\log_2(8)\rceil=3$.
2. [2] $r\leftarrow 8 \text{mod} 7 = 1$.
3. [3] $t\leftarrow 2^3-7 = 8-7 = 1$.
4. [4] Output $\leftarrow 8 \text{div} 7 = \lfloor 8/7\rfloor=1$ as an unary code (2 bits). Therefore, output $\leftarrow 10$.
5. [5] $r=1\le t=1$.
6. [6.A] $r\leftarrow 1+1=2$.
7. [6.B] Output $r=2$ using a binary code of $k=3$ bits. Therefore, $c(8)=10010$.

### Decoder

1. $k\leftarrow\lceil\log_2(m)\rceil$.
2. $t\leftarrow 2^k-m$.
3. Let $s\leftarrow$ the number of consecutive ones in the input (we stop when we read a $0$).
4. Let $x\leftarrow$ the next $k-1$ bits in the input.
5. If $x<t$:
    1. $s\leftarrow s\times m+x$.
6. Else:
    1. $x\leftarrow x\times 2~+$ next input bit.
    2. $s\leftarrow s\times m+x-t$.
    
### Example (decode $10010$ where $m=7$)

1. [1] $k\leftarrow 3$.
2. [2] $t\leftarrow 2^k-m = 2^3-7=1$).
3. [3] $s\leftarrow 1$ because we found only one $1$ in the input.
4. [4] $x\leftarrow \text{input}_{k-1} = \text{input}_2 = 01$.
5. [5] $x=1\nless t=1$.
6. [6.A] $x\leftarrow x\times x\times 2+\text{next input bit} = x\times 1\times 2+0 = 2$.
7. [6.B] $s\leftarrow s\times m+x-t = 1\times 7+2-1=8$.

#### Lab
TO-DO

## C.10 Rice coding

### Encoder

1. $m\leftarrow 2^k$.
2. Output $\leftarrow\lfloor s/m\rfloor$ using an unary code ($\lfloor s/m\rfloor+1$ bits).
3. Output $\leftarrow$ the $k$ least significant bits of $s$ using a binary code.
    
### Example ($k=1$ and $s=7$)
1. [1] $m\leftarrow 2^k=2^1=2$.
2. [2] Output $\leftarrow \lfloor s/m\rfloor=\lfloor 7/2\rfloor=3$ using an unary code of 4 bits. Therefore, output $\leftarrow 1110$.
3. Output $\leftarrow$ the $k=1$ least significant bits of $s=7$
  using a unary code ($k+1$ bits). So, output $\leftarrow 1$. Total
  output $c(7)=11101$.

### Decoder

1. Let $s$ the number of consecutive ones in the input (we stop when we read a 0).
2. Let $x$ the next $k$ input bits.
3. $s\leftarrow s\times 2^k+x$.

### Example (decode $11101$ where $k=1$)
1. [1] $s\leftarrow 3$ because we found $3$ consecutive ones in the input.
2. [2] $x\leftarrow$ next input $k=1$ input bits. Therefore $x\leftarrow 1$.
3. [3] $s\leftarrow s\times 2^k+x = 3\times 2^1+1 = 6+1 = 7$.

### Lab
TO-DO