In [1]:
# Distributed under the "STEINWURF RESEARCH LICENSE 1.0".
# See accompanying file LICENSE.rst or
# http://www.steinwurf.com/license.html

# Kodo-python Getting Started

Welcome to the getting started ipython notebook for kodo-python.

This guide is intended for newcomers to the Kodo library. The guide will in tiny steps guide you through the creation and usage of both encoders and decoders.
Even though this guide focuses on the Python language bindings for Kodo - a similar API exists for other languages including C, C++, and Go.

## Importing Kodo

Before working with Kodo-python, you obviously need to have it installed and available. To ensure that's the case, try importing it:

In [2]:
import kodo

If the import worked, you are ready to move on to the next step. Otherwise please (re)visit the README.rst of the kodo-python repository for installation instructions.

## Creating an Encoder

In kodo, both encoders and decoders are created using factories. Doing so allows efficient memory management and reuse of various components and computations. 

Therefore, before creating an encoder, let's look at the encoder factories provided by the ``kodo`` module:

In [3]:
# print all members containing "Factory" and "Encoder"
print("\n".join([item for item in dir(kodo) if all([keyword in item for keyword in ["Factory", "Encoder"]])]))

FulcrumEncoderFactory
NoCodeEncoderFactory
PerpetualEncoderFactory
RLNCEncoderFactory


As seen from the output, many different encoder factories exist. Most of these have decoder factory counterparts.

The factory names are, with some exceptions, a combination of the encoding algorithm and the underlying finite field.

For this walkthrough we pick the full vector factory using the binary field, i.e. the **``FullVector``**``EncoderFactory``**``Binary``** factory.

Note: *For this guide, any of the factories should work. For this reason I'll define the factory class as ``EncoderFactory``.*

In [4]:
# Store the full vector binary encoder factory as EncoderFactory
EncoderFactory = kodo.RLNCEncoderFactory

By using python's ``help`` function, we can inspect the  ``EncoderFactory``'s constructor: 

In [5]:
# Get information about the encoder factory's __init__ function
help(EncoderFactory.__init__)

Help on instancemethod in module kodo:

__init__(...)
    Factory constructor.
    
    :param field: The finite field to use.
    :param symbols: The number of symbols in a block.
    :param symbol_size: The size of a symbol in bytes.



From the documentation, we can see that we need to provide the ``field``, ``symbols``, and ``symbol_size`` to create a factory.

These parameters represent the size of the Galois field, the generation size and the packet size in our book chapter. The developers of Kodo, used a different notation to refer to these parameters.

The proper values to pick depends on the use case, we'll pick the numbers 4 and 32 for the symbols and symbol_size, respectively.
These numbers would be low for a real use case, but they serve us well for this example.

Let's create an encoder_factory:

In [6]:
symbols = 4
symbol_size = 32

encoder_factory = EncoderFactory(
    field = kodo.field.binary,
    symbols=symbols,
    symbol_size=symbol_size)

To see which methods are available for the encoder_factory, we can use python's ``dir`` function.

In [7]:
# Print all public members
print("\n".join([item for item in dir(encoder_factory) if not item.startswith("__")]))

build
set_coding_vector_format
set_symbol_size
set_symbols
symbol_size
symbols


The ``build`` method is used for creating encoders. Let's create an encoder!

In [8]:
encoder = encoder_factory.build()

Fantastic, we've build our first encoder! Let's see what we can use it for:

In [9]:
# Print all public members
print("\n".join([item for item in dir(encoder) if not item.startswith("__")]))

block_size
density
generate
in_systematic_phase
is_systematic_on
payload_size
rank
set_const_symbol
set_const_symbols
set_density
set_seed
set_systematic_off
set_systematic_on
set_trace_callback
set_trace_off
set_trace_stdout
set_zone_prefix
symbol_size
symbols
write_payload
write_symbol
write_uncoded_symbol


Let us call a method on the encoder. Lets see what is the block size.

In [10]:
block_size = encoder.block_size()
print("Block size: {}".format(block_size))

Block size: 128


Note that the maximum block size is derived from the previously set ``symbols`` and ``symbol_size``. The block size is the total ammount of data that the encoder encodes together. It is the generation size times the packet size.

In [12]:
calculated_max_block_size = symbols * symbol_size
print("Calculated max block size: {}".format(calculated_max_block_size))

Calculated max block size: 128


Let's define a function called ``print_encoder_state`` to inspect the state of our newly created encoder. This function will take an encoder as an argument, will call some methods related to the state on the encoder, and will print the values.

In [13]:
def print_encoder_state(encoder):
    print(
        "block_size: {}\n"
        "is_systematic_on: {}\n"
        "in_systematic_phase: {}\n"
        "payload_size: {}\n"
        "rank: {}\n"
        "symbol_size: {}\n"
        "symbols: {}".format(
            encoder.block_size(),
            encoder.is_systematic_on(),
            encoder.in_systematic_phase(),
            encoder.payload_size(),
            encoder.rank(),
            encoder.symbol_size(),
            encoder.symbols())
    )
print_encoder_state(encoder)

block_size: 128
is_systematic_on: True
in_systematic_phase: False
payload_size: 39
rank: 0
symbol_size: 32
symbols: 4


## Using the Encoder

We use the ``write_payload`` method to encode the data, but since we have yet to tell encoder what data to encode, we can't use it yet.
This can be seen from the encoder rank which is 0.

Kodo uses python ``bytearrays`` as data objects.

Let's create some data to encode:

In [14]:
data_in = bytearray(
    "The size of this data is exactly 128 bytes "
    "which means it will fit perfectly in a single generation. "
    "That is very lucky, indeed!",
    'utf-8'
)
print("Length of data string: {}".format(len(data_in)))

Length of data string: 128


Let's tell the encoder the data to encode. Note that the user must take care of the lifetime of data_in. This bytearray must not go out of scope while the encoder exists.


In [15]:
encoder.set_const_symbols(data_in)

We should now be able to see how the state of the encoder has changed.

In [16]:
print_encoder_state(encoder)

block_size: 128
is_systematic_on: True
in_systematic_phase: True
payload_size: 39
rank: 4
symbol_size: 32
symbols: 4


Notice how the rank is now equal to the number of symbols:

In [17]:
encoder.rank() == symbols

NameError: name 'max_symbols' is not defined

We can only encode packets if the rank is ``> 0``.

Let's encode some packets using the ``write_payload`` method:

In [18]:
packet1 = encoder.write_payload()
packet2 = encoder.write_payload()
packet3 = encoder.write_payload()
packet4 = encoder.write_payload()

print(
    "packet1: {}\n"
    "packet2: {}\n"
    "packet3: {}\n"
    "packet4: {}\n".format(
        packet1,
        packet2,
        packet3,
        packet4,
    )
)

packet1: bytearray(b'\x02\x00\x00The size of this data is exactly')
packet2: bytearray(b'\x02\x00\x01 128 bytes which means it will f')
packet3: bytearray(b'\x02\x00\x02it perfectly in a single generat')
packet4: bytearray(b'\x02\x00\x03ion. That is very lucky, indeed!')



Notice how all the packets are prefixed with ``'b'\x02\x00\x0i`` where `i` goes from 0 to 3 - this is python displaying the packet header containing the symbol id. The symbol id's goes from 0 to 3 because our generation size is of four packets.

The reason why the contents of the packets are readable is that the encoder is in systematic phase. Systematic means that the encoder keeps each symbol uncoded in the first iteration. During the systematic phase, instead of appending the encoding coefficients, the kodo encoder only appends the packet id as two bytes, and a byte ``\x02`` to indicate another kodo object what type of encoder produced the packet. In this example, ``\x02`` means an RLNC encoder in systematic phase.

Because we've set the generation size to be four symbols, and we've created four packets - the encoder is no longer in systematic phase:  

In [19]:
encoder.in_systematic_phase()

False

This means that any subsequent packets will be encoded.

In [20]:
packet5 = encoder.write_payload()
print("packet5: {}".format(packet5))

packet5: bytearray(b'\x08\x1c\x81]TI^\\\x16\x006\x11\x15\x11SI\x04H\x1f\x06\x1aYM\t\x14\r\x18YETI\x19\r\t\tDG')


Since the encoding is random, the data could still be uncoded, it will however most likely be unreadable. If the payload of the coded packet is readable, you can run the previous cell again, and see how the coded payload varies each time.

## Creating a Decoder

Let's create a decoder factory and a decoder so that we can decode our newly generated packets:

In [21]:
decoder_factory = kodo.RLNCDecoderFactory(kodo.field.binary, symbols, symbol_size)
decoder = decoder_factory.build()

Let's investigate which methods are available for the decoder:

In [22]:
# Print all public members
print("\n".join([item for item in dir(decoder) if not item.startswith("__")]))

block_size
is_complete
is_partially_complete
is_status_updater_enabled
is_symbol_missing
is_symbol_partially_decoded
is_symbol_pivot
is_symbol_uncoded
payload_size
rank
read_payload
read_symbol
read_uncoded_symbol
set_mutable_symbol
set_mutable_symbols
set_seed
set_status_updater_off
set_status_updater_on
set_trace_callback
set_trace_off
set_trace_stdout
set_zone_prefix
symbol_size
symbols
symbols_missing
symbols_partially_decoded
symbols_uncoded
update_symbol_status
write_payload


The encoder and decoder share a few methods. Most of these shared methods have the same meaning.

Let's create a function to inspect the state of our newly created decoder.

In [23]:
def print_decoder_state(decoder):
    print(
        "block_size: {}\n"
        "is_complete: {}\n"
        "payload_size: {}\n"
        "rank: {}\n"
        "symbol_size: {}\n"
        "symbols: {}\n"
        "symbols_uncoded: {}\n".format(
            decoder.block_size(),
            decoder.is_complete(),
            decoder.payload_size(),
            decoder.rank(),
            decoder.symbol_size(),
            decoder.symbols(),
            decoder.symbols_uncoded())
    )
print_decoder_state(decoder)

block_size: 128
is_complete: False
payload_size: 39
rank: 0
symbol_size: 32
symbols: 4
symbols_uncoded: 0



What's probably the most interesting thing here is the rank. The rank corresponds to the number of innovative packets received.

As we did with the encoder, we should provide the decoder with a data object of type bytearray to store the received data. Lets call this object ``data_out``. The size of this object is the size of the ``block_size`` of the decoder. 

In [25]:
data_out = bytearray(decoder.block_size())
decoder.set_mutable_symbols(data_out)

## Using the Decoder

If we read one of our previously generated packets, we should see the rank increase:

In [26]:
decoder.read_payload(packet1)
decoder.rank()

1

And it does.

We can now try to read the 5th packet, and see what it does to the state. The unique thing about the 5th packet, is that it's the only one which has been encoded, due to our encoder being systematic.

In [27]:
decoder.read_payload(packet5)
print_decoder_state(decoder)

block_size: 128
is_complete: False
payload_size: 39
rank: 2
symbol_size: 32
symbols: 4
symbols_uncoded: 1



The rank has increased to 2! This means that we've read two (innovative) packets. If we print the current data in the decoder we get the following output:

In [32]:
data_out.decode('utf-8').replace('\x00', '_')

'The size of this data is exactlyI^\\\x16_6\x11\x15\x11SI\x04H\x1f\x06\x1aYM\t\x14\r\x18YETI\x19\r\t\tDG________________________________________________________________'

Notice that the first part of the string is readable. Depending on the encoding of the 5th packet other parts of the string may or may not be readable. The empty bytes of the data are printed as ``_``.

If we feed the same packet(s) to the decoder multiple times we will not increase its rank - no matter how many times we do so. We are simply feeding the decoder with linear dependent packets.

In [33]:
# Once
decoder.read_payload(packet1)
print("Rank after rereading packet1: {}".format(decoder.rank()))
decoder.read_payload(packet5)
print("Rank after rereading packet5: {}".format(decoder.rank()))

# A 100 times
for i in range(100):
    decoder.read_payload(packet1)
    decoder.read_payload(packet5)

print("Rank after rereading 100 times: {}".format(decoder.rank()))

Rank after rereading packet1: 2
Rank after rereading packet5: 2
Rank after rereading 100 times: 2


This is because the data we feed the decoder is not innovative.

Note that the rank may only increase by one when reading a packet.

If we start feeding the decoder new coded data, we will at one point have a complete decoded generation:

In [34]:
while not decoder.is_complete():
    decoder.read_payload(encoder.write_payload())
    print(decoder.rank())

3
4


And when the decoding is complete we should be able to extract the whole string:

In [35]:
print(data_out)
data_out == data_in

bytearray(b'The size of this data is exactly 128 bytes which means it will fit perfectly in a single generation. That is very lucky, indeed!')


True

Hurray, it worked!

For more information and inspiration please look through some of the many examples of the kodo-python library.