New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TryteString.as_bytes() does not work as expected #62

Open
th0br0 opened this Issue Aug 25, 2017 · 12 comments

Comments

4 participants
@th0br0

th0br0 commented Aug 25, 2017

Expected result (or similar):

>>> iota.TryteString(b'ABABZZDD99').as_bytes()
(30, [55,178,248,107,12,0])

Actual result:

>>> iota.TryteString(b'ABABZZDD99').as_bytes()
Traceback (most recent call last):
  File "iota.lib.py/iota/codecs.py", line 147, in decode
    + (self.index[second] * len(self.index))
ValueError: byte must be in range(0, 256)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "iota.lib.py/iota/types.py", line 421, in as_bytes
    return decode(self._trytes, 'trytes', errors)
  File "iota.lib.py/iota/codecs.py", line 165, in decode
    'input': input,
iota.codecs.TrytesDecodeError: 'trytes' codec can't decode trytes ZZ at position 4-5: ordinal not in range(255)
@todofixthis

This comment has been minimized.

Show comment
Hide comment
@todofixthis

todofixthis Aug 26, 2017

Collaborator

Hey @th0br0. This is actually expected behaviour — the tryte sequence ZZ represents value -28, which cannot be converted into a single byte.

>>> from iota.types import int_from_trits, TryteString

>>> int_from_trits(TryteString('ZZ').as_trits())
-28

>>> bytearray([-28])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: byte must be in range(0, 256)
byte must be in range(0, 256)

In order to convert this sequence into a byte string, you must instruct PyOTA to handle invalid sequences differently (this is very similar to how string.decode('utf-8') handles invalid bytes):

>>> from iota import TryteString

>>> TryteString(b'ABABZZDD99').as_bytes('strict')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/phx/Documents/pyota/iota/types.py", line 498, in as_bytes
    return decode(self._trytes, 'trytes', errors)
  File "/Users/phx/Documents/pyota/iota/codecs.py", line 165, in decode
    'input': input,
iota.codecs.TrytesDecodeError: 'trytes' codec can't decode trytes ZZ at position 4-5: ordinal not in range(255)
'trytes' codec can't decode trytes ZZ at position 4-5: ordinal not in range(255)

>>> TryteString(b'ABABZZDD99').as_bytes('replace')
b'77?p\x00'

>>> TryteString(b'ABABZZDD99').as_bytes('ignore')
b'77p\x00'
Collaborator

todofixthis commented Aug 26, 2017

Hey @th0br0. This is actually expected behaviour — the tryte sequence ZZ represents value -28, which cannot be converted into a single byte.

>>> from iota.types import int_from_trits, TryteString

>>> int_from_trits(TryteString('ZZ').as_trits())
-28

>>> bytearray([-28])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: byte must be in range(0, 256)
byte must be in range(0, 256)

In order to convert this sequence into a byte string, you must instruct PyOTA to handle invalid sequences differently (this is very similar to how string.decode('utf-8') handles invalid bytes):

>>> from iota import TryteString

>>> TryteString(b'ABABZZDD99').as_bytes('strict')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/phx/Documents/pyota/iota/types.py", line 498, in as_bytes
    return decode(self._trytes, 'trytes', errors)
  File "/Users/phx/Documents/pyota/iota/codecs.py", line 165, in decode
    'input': input,
iota.codecs.TrytesDecodeError: 'trytes' codec can't decode trytes ZZ at position 4-5: ordinal not in range(255)
'trytes' codec can't decode trytes ZZ at position 4-5: ordinal not in range(255)

>>> TryteString(b'ABABZZDD99').as_bytes('replace')
b'77?p\x00'

>>> TryteString(b'ABABZZDD99').as_bytes('ignore')
b'77p\x00'
@th0br0

This comment has been minimized.

Show comment
Hide comment
@th0br0

th0br0 Aug 26, 2017

I beg to differ. I'm not well-versed enough in Python3 codecs to pinpoint exactly where things need to change, but:
as_bytes() should split the trits into chunks of up to 5 trits each and then essentially do:

value = 0
RADIX = 3
chunk = [-1,1,1,1,-1]
for c in chunk:
  value = value * RADIX + c

the resulting value is signed (-69) can then be converted into an unsigned byte via value & 0xFF == 187. If the final chunk has less than 5 elements, expect them to be 0 (but multiply by the RADIX anyway). See https://github.com/iotaledger/iota.lib.java/blob/master/src/main/java/jota/utils/Converter.java#L56 for the Java impl.

Decomposition can happen either via a lookup table computed at library load, e.g. https://github.com/iotaledger/iota.lib.java/blob/master/src/main/java/jota/utils/Converter.java#L37,
or have a look at the code at the bottom. It's equal to what the Java code does.

Note that you need to store the number of encoded trits for decomposition of multiple bytes to work.

This is what both the Java & Rust libraries do, I don't think that the JS library currently implements byte conversion. The current Python library can't even do iota.TryteString(b'Z').as_bytes() ?!

RADIX = 3
MAX_TRIT_VALUE = 1
MIN_TRIT_VALUE = -1

def increment(trits):
    for i in range(5):
        trits[i] += 1
        if trits[i] > MAX_TRIT_VALUE:
            trits[i] = MIN_TRIT_VALUE
        else:
            break
    return trits


if __name__ == '__main__':
    trits = [0] * 5
    for i in range(243):
        value = 0
        # to_bytes
        for t in trits[::-1]:
            value = value * RADIX + t

        print("%s=%d"% (trits, value & 0xFF))

        # to_trits
        # there's 243 different value combinations
        # this is just easier than having to worry about signedness when converting back...
        value += 121

        chunk = []
        for _ in range(5):
            rem = value % RADIX
            chunk.append(int(rem))
            value = (value - rem) / RADIX
        chunk = [x - 1 for x in chunk]

        assert(trits == chunk)

        trits = increment(trits)

th0br0 commented Aug 26, 2017

I beg to differ. I'm not well-versed enough in Python3 codecs to pinpoint exactly where things need to change, but:
as_bytes() should split the trits into chunks of up to 5 trits each and then essentially do:

value = 0
RADIX = 3
chunk = [-1,1,1,1,-1]
for c in chunk:
  value = value * RADIX + c

the resulting value is signed (-69) can then be converted into an unsigned byte via value & 0xFF == 187. If the final chunk has less than 5 elements, expect them to be 0 (but multiply by the RADIX anyway). See https://github.com/iotaledger/iota.lib.java/blob/master/src/main/java/jota/utils/Converter.java#L56 for the Java impl.

Decomposition can happen either via a lookup table computed at library load, e.g. https://github.com/iotaledger/iota.lib.java/blob/master/src/main/java/jota/utils/Converter.java#L37,
or have a look at the code at the bottom. It's equal to what the Java code does.

Note that you need to store the number of encoded trits for decomposition of multiple bytes to work.

This is what both the Java & Rust libraries do, I don't think that the JS library currently implements byte conversion. The current Python library can't even do iota.TryteString(b'Z').as_bytes() ?!

RADIX = 3
MAX_TRIT_VALUE = 1
MIN_TRIT_VALUE = -1

def increment(trits):
    for i in range(5):
        trits[i] += 1
        if trits[i] > MAX_TRIT_VALUE:
            trits[i] = MIN_TRIT_VALUE
        else:
            break
    return trits


if __name__ == '__main__':
    trits = [0] * 5
    for i in range(243):
        value = 0
        # to_bytes
        for t in trits[::-1]:
            value = value * RADIX + t

        print("%s=%d"% (trits, value & 0xFF))

        # to_trits
        # there's 243 different value combinations
        # this is just easier than having to worry about signedness when converting back...
        value += 121

        chunk = []
        for _ in range(5):
            rem = value % RADIX
            chunk.append(int(rem))
            value = (value - rem) / RADIX
        chunk = [x - 1 for x in chunk]

        assert(trits == chunk)

        trits = increment(trits)

@todofixthis todofixthis reopened this Aug 27, 2017

@todofixthis

This comment has been minimized.

Show comment
Hide comment
@todofixthis

todofixthis Aug 27, 2017

Collaborator

Thanks for the thorough explanation! I see now what I was missing before.

TryteString.as_bytes() was modelled after the JS lib's fromTrytes function. But, it's not a proper "bytes from trytes" implementation, as you've noted.

I like the idea of implementing a proper trytes -> bytes conversion process (and I'm excited to finally see some maths showing how it works!); maybe we can rename TryteString.as_bytes() to as_ascii_encoded or similar, and then as_bytes can do the proper trytes -> bytes conversion.

Given everything that's going on with PyOTA right now, I think this will have to wait until v2.1 (v2.0 has the new hash algorithm and multisig — there's already enough backwards-incompatible changes for one release!).

In the meantime, I think you can get the result you're looking for by first calling TryteString.as_trits() and then using the code you posted to convert to byte values.

Collaborator

todofixthis commented Aug 27, 2017

Thanks for the thorough explanation! I see now what I was missing before.

TryteString.as_bytes() was modelled after the JS lib's fromTrytes function. But, it's not a proper "bytes from trytes" implementation, as you've noted.

I like the idea of implementing a proper trytes -> bytes conversion process (and I'm excited to finally see some maths showing how it works!); maybe we can rename TryteString.as_bytes() to as_ascii_encoded or similar, and then as_bytes can do the proper trytes -> bytes conversion.

Given everything that's going on with PyOTA right now, I think this will have to wait until v2.1 (v2.0 has the new hash algorithm and multisig — there's already enough backwards-incompatible changes for one release!).

In the meantime, I think you can get the result you're looking for by first calling TryteString.as_trits() and then using the code you posted to convert to byte values.

@todofixthis todofixthis self-assigned this Aug 27, 2017

@todofixthis todofixthis added this to Backlog in PyOTA Aug 27, 2017

@todofixthis todofixthis changed the title from TryteString.as_bytes() does not work as intended to TryteString.as_bytes() does not work as expected Sep 24, 2017

@todofixthis todofixthis moved this from Backlog to Scheduled v2.1.x in PyOTA Sep 24, 2017

@todofixthis todofixthis moved this from Scheduled v2.1.x to In Progress in PyOTA Oct 14, 2017

@todofixthis

This comment has been minimized.

Show comment
Hide comment
@todofixthis

todofixthis Oct 14, 2017

Collaborator

Hey @th0br0 I'm starting to work on this now. Would you be willing to help me come up with some test cases for this functionality?

There's a few scenarios that I'm interested in:

  • At least two "happy path" scenarios (one long sequence, one short sequence), both directions (trits <-> bytes).
  • Negative values.
  • Zero/empty values:
    • Should trits [0] be converted to bytes b'' or b'\0'?
    • What about [0, 0, 0, ...]?
    • What about []?
  • Edge cases for 5-trits-per-byte (from what I understand, it's not exactly 5 trits to 1 byte — are there any sequences of trits or bytes that we can use to demonstrate this?).
  • Others?

@alon-e @paulhandy Would love to get your thoughts on this as well. What are some unit tests we can use for trits <-> bytes conversion?

Collaborator

todofixthis commented Oct 14, 2017

Hey @th0br0 I'm starting to work on this now. Would you be willing to help me come up with some test cases for this functionality?

There's a few scenarios that I'm interested in:

  • At least two "happy path" scenarios (one long sequence, one short sequence), both directions (trits <-> bytes).
  • Negative values.
  • Zero/empty values:
    • Should trits [0] be converted to bytes b'' or b'\0'?
    • What about [0, 0, 0, ...]?
    • What about []?
  • Edge cases for 5-trits-per-byte (from what I understand, it's not exactly 5 trits to 1 byte — are there any sequences of trits or bytes that we can use to demonstrate this?).
  • Others?

@alon-e @paulhandy Would love to get your thoughts on this as well. What are some unit tests we can use for trits <-> bytes conversion?

todofixthis added a commit that referenced this issue Oct 15, 2017

[#62] Renamed trytes codec to trytes_ascii.
The previous name will still work, but it is deprecated and will be removed in PyOTA v2.1.

todofixthis added a commit that referenced this issue Oct 15, 2017

@mlouielu

This comment has been minimized.

Show comment
Hide comment
@mlouielu

mlouielu Oct 30, 2017

Contributor

Hi all, I'm currently working on Python reading rocksdb data, and it is relative to this topic.

Problem

I think TryteString naming and lack of documentation only confusing Python users.

  1. TryteString should act like str, not bytes:

the name TryteString should just act like the str object in Python, not bytes, otherwise, it should be named as TryteBytes.

  1. TryteString's method from_string, as_string, from_bytes, as_bytes confuse:

from_string will encode str with utf-8, but iota.TryteString(str) only accept tryte alphabet, not act like from_string. from_bytes take normal bytes, but not the bytes from trits.

  1. TryteString misused bytes:

In Python, bytes is kind of representation of codepoint, that is, '你好' in UTF-8 will be 'b'\xe4\xbd\xa0\xe5\xa5\xbd', but in TryteString, we use bytes as a limited check for the input in the trytes alphabet range, and make it confuse.

For examle, TryteString('HELLO') == TryteString(b'HELLO').

For TryteString, its bytes should be TryteString convert to bytes, e.g. TryteString('EXAMPLE').as_bytes() to b'\xb4T\xa1\xa0\x01'

How to fixed it

  1. Deprecate the usage of TryteString.

TryteString internal representation as str (trytes), and correct the behavior of the method. This will be a big change since all around the library was using TryteString with bytes.

  • Good
TryteString('EXAMPLE')                                   # input: trytes: EXAMPLE, output "EXAMPLE"
TryteString('OBGCKBWBZBVBOB').as_string() # input: trytes, output "EXAMPLE"
TryteString.from_string('EXAMPLE')             # input: str: EXAMPLE, output "OBGCKBWBZBVBOB"
TryteString.from_string('你好')                      # input: str: 你好, output: "LH9GYEMHCF9G"
TryteString.from_bytes(b'\xb4T\xa1\xa0\x01')  # input: tryte-bytes, output "EXAMPLE"
bytes(TryteString('EXAMPLE'))                      # input: trytes, output b'EXAMPLE'
  • Bad
TryteString('你好')                  # input should be trytes-string (in trytes-alphabet)
TryteString('@@##HELLO')  # input should be trytes-string (in trytes-alphabet)
TryteString(b'EXAMPLE')       # input should not be bytes
TryteString.from_string(b'EXAMPLE')  input should be str type
TryteString.from_bytes(b'EXAMPLE')   # input should be tryte-bytes (trytes-string binary format)
  1. Rename TryteString to TryteBytes, and correct from_bytes, as_bytes behavior.
Contributor

mlouielu commented Oct 30, 2017

Hi all, I'm currently working on Python reading rocksdb data, and it is relative to this topic.

Problem

I think TryteString naming and lack of documentation only confusing Python users.

  1. TryteString should act like str, not bytes:

the name TryteString should just act like the str object in Python, not bytes, otherwise, it should be named as TryteBytes.

  1. TryteString's method from_string, as_string, from_bytes, as_bytes confuse:

from_string will encode str with utf-8, but iota.TryteString(str) only accept tryte alphabet, not act like from_string. from_bytes take normal bytes, but not the bytes from trits.

  1. TryteString misused bytes:

In Python, bytes is kind of representation of codepoint, that is, '你好' in UTF-8 will be 'b'\xe4\xbd\xa0\xe5\xa5\xbd', but in TryteString, we use bytes as a limited check for the input in the trytes alphabet range, and make it confuse.

For examle, TryteString('HELLO') == TryteString(b'HELLO').

For TryteString, its bytes should be TryteString convert to bytes, e.g. TryteString('EXAMPLE').as_bytes() to b'\xb4T\xa1\xa0\x01'

How to fixed it

  1. Deprecate the usage of TryteString.

TryteString internal representation as str (trytes), and correct the behavior of the method. This will be a big change since all around the library was using TryteString with bytes.

  • Good
TryteString('EXAMPLE')                                   # input: trytes: EXAMPLE, output "EXAMPLE"
TryteString('OBGCKBWBZBVBOB').as_string() # input: trytes, output "EXAMPLE"
TryteString.from_string('EXAMPLE')             # input: str: EXAMPLE, output "OBGCKBWBZBVBOB"
TryteString.from_string('你好')                      # input: str: 你好, output: "LH9GYEMHCF9G"
TryteString.from_bytes(b'\xb4T\xa1\xa0\x01')  # input: tryte-bytes, output "EXAMPLE"
bytes(TryteString('EXAMPLE'))                      # input: trytes, output b'EXAMPLE'
  • Bad
TryteString('你好')                  # input should be trytes-string (in trytes-alphabet)
TryteString('@@##HELLO')  # input should be trytes-string (in trytes-alphabet)
TryteString(b'EXAMPLE')       # input should not be bytes
TryteString.from_string(b'EXAMPLE')  input should be str type
TryteString.from_bytes(b'EXAMPLE')   # input should be tryte-bytes (trytes-string binary format)
  1. Rename TryteString to TryteBytes, and correct from_bytes, as_bytes behavior.
@todofixthis

This comment has been minimized.

Show comment
Hide comment
@todofixthis

todofixthis Oct 31, 2017

Collaborator

Hey @mlouielu thanks for the info and the suggestions!

I see what you're getting at with TryteBytes. I don't think we need to go that far, but there is definitely room for improvement around the way that TryteString handles conversion to/from str and bytes values.

Let's figure out the trits <-> bytes conversion first, and then we can move on to trytes <-> characters.

The most important thing we need right now to fix this issue is test cases. Could you help with #62 (comment) ?

Collaborator

todofixthis commented Oct 31, 2017

Hey @mlouielu thanks for the info and the suggestions!

I see what you're getting at with TryteBytes. I don't think we need to go that far, but there is definitely room for improvement around the way that TryteString handles conversion to/from str and bytes values.

Let's figure out the trits <-> bytes conversion first, and then we can move on to trytes <-> characters.

The most important thing we need right now to fix this issue is test cases. Could you help with #62 (comment) ?

mlouielu added a commit to mlouielu/iota.lib.py that referenced this issue Oct 31, 2017

Add TryteString to bytes test cases
As for issue iotaledger#62, this is the test cases for converting
TryteString to bytes, and bytes to TryteString.
@todofixthis

This comment has been minimized.

Show comment
Hide comment
@todofixthis

todofixthis Oct 31, 2017

Collaborator

Thanks @mlouielu for the test cases!! I will look at them this weekend, and I'll let you know if I have any questions.

Collaborator

todofixthis commented Oct 31, 2017

Thanks @mlouielu for the test cases!! I will look at them this weekend, and I'll let you know if I have any questions.

@vbakke

This comment has been minimized.

Show comment
Hide comment
@vbakke

vbakke Dec 27, 2017

Hi @todofixthis

Your latest commit regarding trytes was naming only. Is that correct? Or did you change some of the behaviour as well?

There is a lot of concurrent work on the byte <-> tryte conversion. Both in #62 and #90 by you and @th0br0, and @mlouielu in the rocksdb.

I work on encrypting IOTA seeds (somewhat similar to BIP38), and @dashengSun (with feedback from @paulhand) work on Unicode to trytes in the iota.lib.js and @pRizz has published pRizz/Tryte-UTF8-JSON-Codec, @tuupola has tuupola/trytes (for PHP) and @knarz has knarz/trytes (for Rust).

Maybe we should coordinate a little, share the good ideas, and pin point difficulties before they get published and confuse the rest of the community? Because tryte <-> byte conversion is not trivial.

Square pegs and round holes
The English saying about fitting a square peg in a round hole is very describing. There will be trouble with the function that converts back to the original type.

We probably should not pollute issue #62 with this discussion. Please suggest a better place. If we lack better options, I just now created vbakke/trytes where we can discuss this freely.

vbakke commented Dec 27, 2017

Hi @todofixthis

Your latest commit regarding trytes was naming only. Is that correct? Or did you change some of the behaviour as well?

There is a lot of concurrent work on the byte <-> tryte conversion. Both in #62 and #90 by you and @th0br0, and @mlouielu in the rocksdb.

I work on encrypting IOTA seeds (somewhat similar to BIP38), and @dashengSun (with feedback from @paulhand) work on Unicode to trytes in the iota.lib.js and @pRizz has published pRizz/Tryte-UTF8-JSON-Codec, @tuupola has tuupola/trytes (for PHP) and @knarz has knarz/trytes (for Rust).

Maybe we should coordinate a little, share the good ideas, and pin point difficulties before they get published and confuse the rest of the community? Because tryte <-> byte conversion is not trivial.

Square pegs and round holes
The English saying about fitting a square peg in a round hole is very describing. There will be trouble with the function that converts back to the original type.

We probably should not pollute issue #62 with this discussion. Please suggest a better place. If we lack better options, I just now created vbakke/trytes where we can discuss this freely.

@todofixthis

This comment has been minimized.

Show comment
Hide comment
@todofixthis

todofixthis Jan 1, 2018

Collaborator

Hey @vbakke About 4 months ago, I started work on this issue. I thought we'd have it wrapped up in a couple of weeks, so I made some changes to the codebase to make room for the new codec and mark the existing codec as deprecated.

However, at this point, I'm still not sure that we have a bytes <-> trytes conversion process that can handle all edge cases (e.g., bytes in range [244, 255], certain cases where encode(decode(value)) != value, etc.), so I haven't made much progress on this issue.

Collaborator

todofixthis commented Jan 1, 2018

Hey @vbakke About 4 months ago, I started work on this issue. I thought we'd have it wrapped up in a couple of weeks, so I made some changes to the codebase to make room for the new codec and mark the existing codec as deprecated.

However, at this point, I'm still not sure that we have a bytes <-> trytes conversion process that can handle all edge cases (e.g., bytes in range [244, 255], certain cases where encode(decode(value)) != value, etc.), so I haven't made much progress on this issue.

@vbakke

This comment has been minimized.

Show comment
Hide comment
@vbakke

vbakke Jan 1, 2018

Hi @todofixthis ,

I've been spending most of my spare time during xmas looking at this as well. I just now published my latest thoughts on the issue, as well as working example code (JavaScript, though).

I think it covers all corner cases, and includes Unicode text to tryte strings, including a BOM mark to indicate the encoding, if used.

Feel free to have a look at my latest write up of the Readme.md. I don't think there is a universal two-way byte <->tryte conversion.

vbakke commented Jan 1, 2018

Hi @todofixthis ,

I've been spending most of my spare time during xmas looking at this as well. I just now published my latest thoughts on the issue, as well as working example code (JavaScript, though).

I think it covers all corner cases, and includes Unicode text to tryte strings, including a BOM mark to indicate the encoding, if used.

Feel free to have a look at my latest write up of the Readme.md. I don't think there is a universal two-way byte <->tryte conversion.

@todofixthis

This comment has been minimized.

Show comment
Hide comment
@todofixthis

todofixthis Jan 3, 2018

Collaborator

Thanks @vbakke! I'll have a look at your notes later this week. Quick glance shows that they are very thorough; looking forward to going through them!

Collaborator

todofixthis commented Jan 3, 2018

Thanks @vbakke! I'll have a look at your notes later this week. Quick glance shows that they are very thorough; looking forward to going through them!

@vbakke

This comment has been minimized.

Show comment
Hide comment
@vbakke

vbakke Jan 4, 2018

FYI I just added an online version of encoding native strings, and native trytes at https://vbakke.github.io/trytes/

vbakke commented Jan 4, 2018

FYI I just added an online version of encoding native strings, and native trytes at https://vbakke.github.io/trytes/

@todofixthis todofixthis moved this from In Progress to Scheduled v2.1.x in PyOTA Jan 6, 2018

@todofixthis todofixthis moved this from Scheduled v2.1.x to Blocked / On Hold in PyOTA Jan 7, 2018

redondo-mk pushed a commit to redondo-mk/iota.lib.py that referenced this issue Jul 28, 2018

[iotaledger#62] Renamed trytes codec to trytes_ascii.
The previous name will still work, but it is deprecated and will be removed in PyOTA v2.1.

redondo-mk pushed a commit to redondo-mk/iota.lib.py that referenced this issue Jul 28, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment