Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A new structure for experimentation on decoding: communication channels #18269

Closed
sagetrac-dlucas mannequin opened this issue Apr 21, 2015 · 40 comments
Closed

A new structure for experimentation on decoding: communication channels #18269

sagetrac-dlucas mannequin opened this issue Apr 21, 2015 · 40 comments

Comments

@sagetrac-dlucas
Copy link
Mannequin

sagetrac-dlucas mannequin commented Apr 21, 2015

For now, there is no structure to easily add errors in codewords. If one wants to experiment with decoding algorithms on codes, the only possible way is to add errors "by hand", which is rather tedious.

We propose here a new structure, based on communication channels, on purpose to facilitate the experimentation process with decoding algorithms.

For now, our structure consists of:

  • an abstract class for channels,
  • a channel which adds a specific number of errors at random positions to each provided vector
  • a channel which adds a specific number of errors at random positions to each provided vector, and erases a specific number of random positions

With this new structure, creating n errors in a vector can be done in one line of code into Sage.
Adding a new Channel class should also be easy: all one needs to do is to inherit from the abstract class, and override and implement a method.

For better consistency, channels can only be accessed using channels. (see #15445) from the global namespace.

CC: @johanrosenkilde

Component: coding theory

Author: David Lucas

Branch/Commit: 383e67c

Reviewer: Vincent Delecroix

Issue created by migration from https://trac.sagemath.org/ticket/18269

@sagetrac-dlucas sagetrac-dlucas mannequin added this to the sage-6.7 milestone Apr 21, 2015
@sagetrac-dlucas
Copy link
Mannequin Author

sagetrac-dlucas mannequin commented Apr 21, 2015

Branch: u/dlucas/channels

@sagetrac-dlucas
Copy link
Mannequin Author

sagetrac-dlucas mannequin commented Apr 21, 2015

New commits:

3f66f35Abstract class for channels
df9fd2cNew channel: static error rate
4da3b35Changed naming conventions and added new channel: ErrorErasure
62a2447Channels are now part of the reference manual for coding. Added imports.

@sagetrac-dlucas
Copy link
Mannequin Author

sagetrac-dlucas mannequin commented Apr 21, 2015

Commit: 62a2447

@videlec
Copy link
Contributor

videlec commented Apr 22, 2015

comment:3

Hello,

  1. You might consider inheriting from SageObject (sage.structure.sage_object).

  2. I would rather hide them under codes.channels.<name>. That way you can avoid the file channels_catalog.

  3. Why sage.coding.channel_constructions and not sage.coding.channels?

  4. Why not setting __call__ as an alias of transmit. That would allow

sage: channel(msg)

which is convenient.

  1. Would be cool to have cross references with actual codes in the documentation. Right now the documentation looks like this is an isolated block.

  2. You could add various tests to check that if you reset the seed you get the same answer. Might be useful to have this doctested and also for users to know that they can use it.

sage: set_random_seed(10)
sage: randint(0,5)
5
sage: set_random_seed(10)
sage: randint(0,5)
5
sage: set_random_seed(10)
sage: randint(0,5)
5
  1. In _random_error_vector why not starting with
vect = [F.random_element() for _ in range(n)]
  1. The functions _random_error_position, _random_error_vector and _tuple_to_integer will not appear in the documentation. You should either remove the _ at the begining or add some automethod sphinx directives (you have to look in the code to find how to do it)

  2. The function _random_error_position should be moved to prandom. It is useful in its own and quite independent. Note that it is quite similar to

sage: sample(range(10), 4)
[1, 7, 6, 4]
  1. For _random_error_vector I would rather use F._nonzero_random_element() that is useful in your case
sage: GF(2)._random_nonzero_element()
1

of course the default implementation is to call .random_element until something nonzero is found. Hence you can use

  1. Why do you need this _tuple_to_integer? Why not using directly randint(*value)?
sage: value = [0,5]
sage: randint(*value)
3
sage: value = (0,5)
sage: randint(*value)
4

Eleven is a nice prime number. So I will stop there.

Vincent

@sagetrac-dlucas
Copy link
Mannequin Author

sagetrac-dlucas mannequin commented Apr 22, 2015

comment:4

Hello,

Thank you for the input!

I would rather hide them under codes.channels.<name>. That way you can avoid the file channels_catalog.
Why sage.coding.channel_constructions and not sage.coding.channels?

My idea here was to mimic the way codes were stored and hide: a code_constructions file for the codes, and a codes_catalog file for the list of codes. I wanted to do the same with channels for consistency. That's why I chose the name channel_constructions.
Hiding the channels under codes.channels. might be confusing (imho): one can think that channels depends on codes, which is not the case as they are completely independant objects. So they can be used on any vector, not only on codewords, or for experimentation on decoding. Which also explains why the documentation looks like an isolated block, I wrote it as a new object, independant from codes.

The functions _random_error_position, _random_error_vector and _tuple_to_integer will not appear in the documentation. You should either remove the _ at the begining or add some automethod sphinx directives (you have to look in the code to find how to do it)

Well, that was done on purpose as one should not use these methods except when implementing a new Channel object. Now, with a second thought about this, it's not that smart as one will have to browse through the code to find these methods. I think I'll remove the underscore to make them appear in the doc, while putting the statement "This is a helper function, for internal use only." in a ..NOTE:: block to highlight this.

For the rest of your comments, thanks a lot for the advice on these random issues. I'll change the code accordingly.

David

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Apr 22, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

934f6c5Moved random_error_position to prandom
5ee95d7Changed code of random_error_vector
874deacAbstractChannel now inherits from SageObject. Added an alias to transmit method.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Apr 22, 2015

Changed commit from 62a2447 to 874deac

@videlec
Copy link
Contributor

videlec commented Apr 22, 2015

comment:6

random_error_position is not an appropriate name. Perhaps range_sample or sample_range. The description is not appropriate as well. Should be something like

def sample_range(n, k):
    """
    Return a subset of [0,...,n-1] of size k with uniform probability.

    INPUT:

    - ``n``, ``k`` - integers

    OUTPUT:

    An ordered list.   

    AUTHORS:

    This function is taken from codinglib (https://bitbucket.org/jsrn/codinglib/)
    written by Johan Nielsen.
"""

In the same function you should also:

  • check the input (just do n = int(n); k = int(k))
  • rename the variable error_position into something more meaningful
    And for the line sage:set_random_seed(10) there must be a space sage: set_random_seed(10).

@sagetrac-dlucas
Copy link
Mannequin Author

sagetrac-dlucas mannequin commented Apr 22, 2015

comment:7

I did some changes, as stated by commit messages above.

Some remarks:

  • I kept tuple_to_integer because the user can pass either an integer or an interval as number_errors parameter for the channels. If the user passes an integer, then randint won't work. Of course it is possible to do the check anytime you call number_errors (if integer, return it, else do a randint and return the result) but it's rather tedious. Especially because it's something that will be used by a some channels. I think it's easier to call a single method rather than copying the if statement each time. Maybe tuple_to_integer is not really a good name though.

  • random_error_vector(n, F, error_positions) returns a vector of length n over F with 0s in every position except those listed in error_positions. Because of this, calling vect = [F.random_element() for _ in range(n)] in random_error_vector does not really fit, because it will create a vector with non-controlled zero positions, so we'll need to set all positions except those into error_positions afterwards. I think create a zero vector of size n and add random values at some positions is better.

  • I did not change the documentation, nor the catalog stuff (see comment 4). If you think that Channels are actually more linked to codes than to anything else, I can make some changes.

David

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Apr 22, 2015

Changed commit from 874deac to 5f31c5c

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Apr 22, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

5f31c5cChanged name of random_error_position to sample_range, updated documentation, propaged naiming change. Removed some whitespaces

@sagetrac-dlucas
Copy link
Mannequin Author

sagetrac-dlucas mannequin commented Apr 22, 2015

New commits:

5f31c5cChanged name of random_error_position to sample_range, updated documentation, propaged naiming change. Removed some whitespaces

@videlec
Copy link
Contributor

videlec commented Apr 22, 2015

comment:10

Replying to @sagetrac-dlucas:

I did some changes, as stated by commit messages above.

Some remarks:

  • I kept tuple_to_integer because the user can pass either an integer or an
    interval as number_errors parameter for the channels. If the user passes an
    integer, then randint won't work. Of course it is possible to do the check
    anytime you call number_errors (if integer, return it, else do a randint
    and return the result) but it's rather tedious. Especially because it's
    something that will be used by a some channels. I think it's easier to call a
    single method rather than copying the if statement each time. Maybe
    tuple_to_integer is not really a good name though.

The name is very bad. Moreover you can convert an integer a into the tuple (a,a) in the constructor. With your way of doing it you avoid the function call to randint but with your version you always add an extra function call to tuple_to_integer.

You should replace all occurrence of hasattr(X, '__iter__') by isinstance(X, (tuple, list)). Having an __iter__ method does not charaterize tuples.

  • random_error_vector(n, F, error_positions) returns a vector of length n
    over F with 0s in every position except those listed in
    error_positions. Because of this, calling vect = [F.random_element() for _ in range(n)] in random_error_vector does not really fit, because it will
    create a vector with non-controlled zero positions, so we'll need to set all
    positions except those into error_positions afterwards. I think create a
    zero vector of size n and add random values at some positions is better.

Yes. You are right!

  • I did not change the documentation, nor the catalog stuff (see comment 4).
    If you think that Channels are actually more linked to codes than to anything
    else, I can make some changes

You should tell me ;-) Cryptogtaphers might like it but they do use strings
not vectors in a vector space over a finite field.

  1. When you do __call__ = transmit the documentation is copied as well. Did you try
sage: Chan = channels.StaticErrorRateChannel(GF(11), 2)
sage: Chan.__call__?
  1. I would rather implement StaticErrorRateChannel.transmit_unsafe as
def transmit_unsafe(self, msg):
    w = msg.copy()
    V = self.input_space()
    for i in sample_range(V.dim(), randint(*self.nb_errors())):
        w[i] = V.random_element()
    return w 

it would be more direct. Moreover it shows that sample_range would better be an iterator instead of returning a list. By the way, this function sample_range has currently linear complexity in n, isn't it possible to make it linear in k?

  1. In the main documentation class, it is useful to document the main methods. In other words, when reading sage: channels.StaticErrorChannel? I should be able to discvoer the method transmit (and its alias usage with __call__).

  2. In the very top documentation, you can have links by writing class:`AbstractChannel` instead of *AbstractChannel*.

Vincent

@videlec
Copy link
Contributor

videlec commented Apr 22, 2015

comment:12
  1. I would rather implement StaticErrorRateChannel.transmit_unsafe as
def transmit_unsafe(self, msg):
    w = msg.copy()
    V = self.input_space()
    for i in sample_range(V.dim(), randint(*self.nb_errors())):
        w[i] = V.random_element()
    return w 

it would be more direct. Moreover it shows that sample_range would better be an iterator instead of returning a list. By the way, this function sample_range has currently linear complexity in n, isn't it possible to make it linear in k?

Yes it is (the solution is not exactly linear though)

sage: S = Subsets(100,4)
sage: timeit("S.random_element()")
625 loops, best of 3: 54.2 µs per loop
sage: timeit("sample_range(100,4)")
625 loops, best of 3: 57.7 µs per loop
sage: S = Subsets(1000,4)
sage: timeit("S.random_element()")
625 loops, best of 3: 61.2 µs per loop
sage: timeit("sample_range(1000,4)")
625 loops, best of 3: 510 µs per loop

It is roughly the same speed for n=100 but for n=1000 yours is twice slower.

@videlec
Copy link
Contributor

videlec commented Apr 22, 2015

comment:13

Replying to @videlec:

Moreover it shows that sample_range would better be an iterator instead of returning a list. By the way, this function sample_range has currently linear complexity in n, isn't it possible to make it linear in k?

Yes it is (the solution is not exactly linear though)
...
It is roughly the same speed for n=100 but for n=1000 yours is twice slower.

This code for subsets is just using

sage: sample(range(n), k)

So I guess this is currently the best way to do it. And moreover, looking at Python doc

    To choose a sample in a range of integers, use xrange as an argument.
    This is especially fast and space efficient for sampling from a
    large population:   sample(xrange(10000000), 60)

Sorry for my previous remarks. You should just get rid of sample_range(n,k) and use sample(xrange(n),k).

Vincent

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Apr 23, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

f515e4dRemoved `__call__` = transmit
87667ebRemoved sample_range method, propaged changes and changed code for transmit_unsafe in StaticErrorRateChannel
f9b07beRemoved tuple_to_integer method
f93852eImproved documentation

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Apr 23, 2015

Changed commit from 5f31c5c to f93852e

@sagetrac-dlucas
Copy link
Mannequin Author

sagetrac-dlucas mannequin commented Apr 23, 2015

comment:15

Thank you again for the advice!

The only thing I did not do is to add cross-references between channels and codes. I think it's not the best time to do it, as there is no proper Encoder/Decoder structure. But when we will propose our structure for decoding, I think it will be interesting to illustrate the error correction using channels to create errors. Before that, I let it as is.

@sagetrac-dlucas sagetrac-dlucas mannequin removed the s: needs work label Apr 23, 2015
@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Apr 25, 2015

Changed commit from 2bc0590 to 723a55c

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Apr 25, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

723a55cReplaced `__repr__` by _repr_

@sagetrac-dlucas
Copy link
Mannequin Author

sagetrac-dlucas mannequin commented Apr 25, 2015

comment:22

Hello,

For Sage objects, you should use _repr_ and not __repr__.

Ok, good to know! Modification done.

@videlec
Copy link
Contributor

videlec commented Apr 25, 2015

comment:23
  1. In each class/function/method the documentation should be:
  • one line description
  • then more details in one or several paragraphs if needed. It is not the case in StaticErrorRateChannel and ErrorErasureChannel.

Moreover, in a class you should not write as a description Construct a X. It is rather simply X. The reason is because it is the documentation of the class itself that you will see with:

sage: Chan = channels.ErrorErasureChannel(GF(59)^40, 2, 2)
sage: Chan?<enter>

(BTW there is a nice short cut to build vector spaces: K^n just works!)

  1. You can simplify a bit the ErrorErasureChannel._repr_. By doing things in two parts
def format_interval(t):
    return str(t[0]) if t[0] == t[1] else 'between %s and %s' %(t[0], t[1])

def _repr_():
    return "Error-and-erasure channel creating %s errors and %s erasures' %(
              format_interval(self.number_errors()),
              format_interval(self.number_erasures()))

And if you really care about pluralization

def format_interval(t, singular, plural):
    name = singular if t[1] < 2 else plural
    if t[0] == t[1]:
        return '%s %s' %(t[0], name)
    else:
        return 'between %s and %s %s' %(t[0], t[1], name)

Such format_interval can be used in all your _repr_ methods.

  1. You can simplify ErrorErasureChannel.transmit_unsafe. Notice that the output of sample is already randomized (you can have a look at the code source which is pure python) and it is specified in its documentation
The resulting list is in selection order so that
all sub-slices will also be valid random samples

Hence, you can simply cut your errors into two parts as

errors = sample(xrange(n), number_errors + number_erasures)
error_positions   = errors[:number_errors]
erasure_positions = errors[number_errors:]

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Apr 27, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

23dd71cSimplified `_repr_` and `_latex_` methods using a new helper function
cb7b137Some changes to the documentation
768ed9cSimplified code in ErrorErasureChannel.transmit_unsafe

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Apr 27, 2015

Changed commit from 723a55c to 768ed9c

@videlec
Copy link
Contributor

videlec commented Apr 27, 2015

comment:26
  1. In the string representation of channels the input space/output space does not appear
sage: n_err, n_era = 21, 21
sage: Chan = channels.ErrorErasureChannel(GF(59)^50, n_err, n_era)
sage: Chan
Error-and-erasure channel creating 21 errors and 21 erasures

i.e. no mention of GF(59)^50 in the above example. Is that what you want?

  1. For the ErrorErasureChannel the output space is CartesianProduct. For mysterious reason, it is adviced to use cartesian_product from sage.categories.cartesian_product (see Meta-ticket: Cleanup cartesian products #15425).

  2. In ErrorErasureChannel.transmit_unsafe you would better replace

diff --git a/src/sage/coding/channel_constructions.py b/src/sage/coding/channel_constructions.py
index 0d32ffc..8935db7 100644
--- a/src/sage/coding/channel_constructions.py
+++ b/src/sage/coding/channel_constructions.py
@@ -592,4 +592,5 @@ class ErrorErasureChannel(Channel):
 
+        zero = V.base_ring().zero()
         for i in erasure_positions:
-            message[i] = 0
+            message[i] = zero
         return message, erasure_vector

it is cleaner and potentially faster (not much).

  1. You should add examples that check that the input vector is not modified, ie
    sage: v = my_funny_vector()
    sage: Chan(v)
    ... my_funny_output ....

Check that the input ``v`` is not modified::

    sage: v
    ... should be the initial vector ...

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Apr 27, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

c3b5555Replaced CartesianProduct by cartesian_product
6dd51f3New test in transmit method. Replaced 0 by the zero of the base ring of the input space in ErrorErasureChannel.transmit_unsafe
383e67cChanged `_repr_` and `_latex_` methods

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Apr 27, 2015

Changed commit from 768ed9c to 383e67c

@sagetrac-dlucas
Copy link
Mannequin Author

sagetrac-dlucas mannequin commented Apr 27, 2015

comment:28

For the ErrorErasureChannel the output space is CartesianProduct. For mysterious reason, it is adviced to use cartesian_product from sage.categories.cartesian_product (see #15425).

Thanks for the information.

I did the changes (and added input and output space in the representation methods for channels).

@videlec
Copy link
Contributor

videlec commented Apr 28, 2015

comment:29

Salut David,

All right! This ticket is good to go.

That would be nice to start reviewing other's ticket as well. You can have a look at the section "Reviewer Checklist" in the Sage's Developer Guide. If you want to start slowly, there are some tickets that are marked "beginner". Basically: one of your ticket gets reviewed -> you review one ticket.

Best,
Vincent

@sagetrac-dlucas
Copy link
Mannequin Author

sagetrac-dlucas mannequin commented Apr 28, 2015

comment:30

Hello,

Basically: one of your ticket gets reviewed -> you review one ticket.

Ok. I'll start reviewing then :)

@videlec
Copy link
Contributor

videlec commented Apr 28, 2015

comment:31

Replying to @sagetrac-dlucas:

Hello,

Basically: one of your ticket gets reviewed -> you review one ticket.

Ok. I'll start reviewing then :)

Cool :-) If you have doubts, hesitations, or anything do not hesitate to post a message to sage-devel@googlegroups.com

@vbraun
Copy link
Member

vbraun commented Apr 29, 2015

Changed branch from u/dlucas/channels to 383e67c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants