# Text-Bit Encoding Validation

In this notebook, we check that the text-bit encoding is working properly. 

First, we define some helper functions to test the encodings.

In [1]:
import sys; sys.path.append('../')
from text_bit.dataset import read_datasets

datasets = read_datasets("../datasets/")

# Given a dataset, a `text_to_bit`, and a `bit_to_text` function, 
# Tests that each string of the data set works when encoded and decoded.
def expect_all_working(dataset, text_to_bit, bit_to_text):
    for m in dataset:
        try:
            bits = text_to_bit(m)
            assert bits is not None, f"Is None: {m}"
            assert bit_to_text(bits) == m, f"Does not match: {m}"
        except:
            assert False, f"Crashes: {m}"

# Given a dataset and a `text_to_bit` function, checks that each string
# is not supported, i.e., the function must return `None`
def expect_all_not_supported(dataset, text_to_bit):
    for m in dataset:
        try:
            bits = text_to_bit(m)
            assert bits is None, f"Is not None: {m}"
        except:
            assert False, f"Crashes: {m}"


#### `Nat` Encoding

`Nat` should only work on natural numbers without trailing zeroes.

In [2]:
from text_bit.encodings.nat_number import NatNumberEncoding as NNE

expect_all_working(datasets["nat"], NNE.text_to_bit, NNE.bit_to_text)
expect_all_not_supported(datasets["not_nat"], NNE.text_to_bit)
expect_all_not_supported(datasets["url"], NNE.text_to_bit)
expect_all_not_supported(datasets["text"], NNE.text_to_bit)

In [3]:
from text_bit.encodings.umbrella import UmbrellaEncoding as UE

print("Testing all datasets to validate umbrella encoding...")
for ds in datasets.values(): expect_all_working(ds, UE.text_to_bit, UE.bit_to_text)

Testing all datasets to validate umbrella encoding...


# CRC


In [4]:
from text_bit.bit_string import BitString
from text_bit.crc import Crc

bs = BitString("11111111")
crc = Crc(Crc.POLY32_IEEE, 32)
crc.compute_crc(bs)

# 0b11111111000000000000000000000000 expected, TODO FIX

01111001011010001100110011011010

# Multimodal Encoding

#### Prefix Code

The _Prefix Code_ is the basic building block of a multimodal encoding. It uses a full binary tree with an element as leaf. 

A multimodal encoding uses a tree of encodings. 

The beginning of the message tells which encoding must be applied. E.g., consider a multimodal encoding that supports 3 encodings, A, B, and C. We can map the encoding to the following tree:

```
  .        If the message starts with...        
 / \                     0... => Use encoding A 
A   .                    10...=> Use encoding B 
   / \                   11...=> Use encoding C 
  B   C                                         
```

In [5]:
from text_bit.prefix_code import prefix_code_unit_test

prefix_code_unit_test()

#### Multimodal Encoding

We now test that some multimodal encodings work

In [6]:
from text_bit.encodings.umbrella import UmbrellaEncoding as UE
from text_bit.encodings.nat_number import NatNumberEncoding as NNE
from text_bit.multimodal_encoding import MultimodalEncoding

In [7]:
from text_bit.encodings.umbrella import UmbrellaEncoding as UE
from text_bit.encodings.nat_number import NatNumberEncoding as NNE
from text_bit.multimodal_encoding import MultimodalEncoding

# Basic multimodal encoding: just use umbrella
me_umbrella = MultimodalEncoding("umbrella", UE)
for ds in datasets.values(): expect_all_working(ds, me_umbrella.text_to_bit, me_umbrella.bit_to_text)

# Multimodal encoding with two encoding, nats and umbrella
me_two = MultimodalEncoding("two", [NNE, UE])
for ds in datasets.values(): expect_all_working(ds, me_two.text_to_bit, me_two.bit_to_text)

# More complex multimodal encoding, some reserved values
me_none = MultimodalEncoding("complex", [None, [None, [NNE, [None, UE]]]])
for ds in datasets.values(): expect_all_working(ds, me_none.text_to_bit, me_none.bit_to_text)
