TryteString.as_bytes() does not work as expected #62

th0br0 · 2017-08-25T12:05:00Z

Expected result (or similar):

>>> iota.TryteString(b'ABABZZDD99').as_bytes()
(30, [55,178,248,107,12,0])

Actual result:

>>> iota.TryteString(b'ABABZZDD99').as_bytes()
Traceback (most recent call last):
  File "iota.lib.py/iota/codecs.py", line 147, in decode
    + (self.index[second] * len(self.index))
ValueError: byte must be in range(0, 256)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "iota.lib.py/iota/types.py", line 421, in as_bytes
    return decode(self._trytes, 'trytes', errors)
  File "iota.lib.py/iota/codecs.py", line 165, in decode
    'input': input,
iota.codecs.TrytesDecodeError: 'trytes' codec can't decode trytes ZZ at position 4-5: ordinal not in range(255)

The text was updated successfully, but these errors were encountered:

todofixthis · 2017-08-26T01:05:03Z

Hey @th0br0. This is actually expected behaviour — the tryte sequence ZZ represents value -28, which cannot be converted into a single byte.

>>> from iota.types import int_from_trits, TryteString

>>> int_from_trits(TryteString('ZZ').as_trits())
-28

>>> bytearray([-28])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: byte must be in range(0, 256)
byte must be in range(0, 256)

In order to convert this sequence into a byte string, you must instruct PyOTA to handle invalid sequences differently (this is very similar to how string.decode('utf-8') handles invalid bytes):

>>> from iota import TryteString

>>> TryteString(b'ABABZZDD99').as_bytes('strict')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/phx/Documents/pyota/iota/types.py", line 498, in as_bytes
    return decode(self._trytes, 'trytes', errors)
  File "/Users/phx/Documents/pyota/iota/codecs.py", line 165, in decode
    'input': input,
iota.codecs.TrytesDecodeError: 'trytes' codec can't decode trytes ZZ at position 4-5: ordinal not in range(255)
'trytes' codec can't decode trytes ZZ at position 4-5: ordinal not in range(255)

>>> TryteString(b'ABABZZDD99').as_bytes('replace')
b'77?p\x00'

>>> TryteString(b'ABABZZDD99').as_bytes('ignore')
b'77p\x00'

th0br0 · 2017-08-26T14:55:13Z

I beg to differ. I'm not well-versed enough in Python3 codecs to pinpoint exactly where things need to change, but:
as_bytes() should split the trits into chunks of up to 5 trits each and then essentially do:

value = 0
RADIX = 3
chunk = [-1,1,1,1,-1]
for c in chunk:
  value = value * RADIX + c

the resulting value is signed (-69) can then be converted into an unsigned byte via value & 0xFF == 187. If the final chunk has less than 5 elements, expect them to be 0 (but multiply by the RADIX anyway). See https://github.com/iotaledger/iota.lib.java/blob/master/src/main/java/jota/utils/Converter.java#L56 for the Java impl.

Decomposition can happen either via a lookup table computed at library load, e.g. https://github.com/iotaledger/iota.lib.java/blob/master/src/main/java/jota/utils/Converter.java#L37,
or have a look at the code at the bottom. It's equal to what the Java code does.

Note that you need to store the number of encoded trits for decomposition of multiple bytes to work.

This is what both the Java & Rust libraries do, I don't think that the JS library currently implements byte conversion. The current Python library can't even do iota.TryteString(b'Z').as_bytes() ?!

RADIX = 3
MAX_TRIT_VALUE = 1
MIN_TRIT_VALUE = -1

def increment(trits):
    for i in range(5):
        trits[i] += 1
        if trits[i] > MAX_TRIT_VALUE:
            trits[i] = MIN_TRIT_VALUE
        else:
            break
    return trits


if __name__ == '__main__':
    trits = [0] * 5
    for i in range(243):
        value = 0
        # to_bytes
        for t in trits[::-1]:
            value = value * RADIX + t

        print("%s=%d"% (trits, value & 0xFF))

        # to_trits
        # there's 243 different value combinations
        # this is just easier than having to worry about signedness when converting back...
        value += 121

        chunk = []
        for _ in range(5):
            rem = value % RADIX
            chunk.append(int(rem))
            value = (value - rem) / RADIX
        chunk = [x - 1 for x in chunk]

        assert(trits == chunk)

        trits = increment(trits)

todofixthis · 2017-08-27T20:16:52Z

Thanks for the thorough explanation! I see now what I was missing before.

TryteString.as_bytes() was modelled after the JS lib's fromTrytes function. But, it's not a proper "bytes from trytes" implementation, as you've noted.

I like the idea of implementing a proper trytes -> bytes conversion process (and I'm excited to finally see some maths showing how it works!); maybe we can rename TryteString.as_bytes() to as_ascii_encoded or similar, and then as_bytes can do the proper trytes -> bytes conversion.

Given everything that's going on with PyOTA right now, I think this will have to wait until v2.1 (v2.0 has the new hash algorithm and multisig — there's already enough backwards-incompatible changes for one release!).

In the meantime, I think you can get the result you're looking for by first calling TryteString.as_trits() and then using the code you posted to convert to byte values.

todofixthis · 2017-10-14T23:33:38Z

Hey @th0br0 I'm starting to work on this now. Would you be willing to help me come up with some test cases for this functionality?

There's a few scenarios that I'm interested in:

At least two "happy path" scenarios (one long sequence, one short sequence), both directions (trits <-> bytes).
Negative values.
Zero/empty values:
- Should trits [0] be converted to bytes b'' or b'\0'?
- What about [0, 0, 0, ...]?
- What about []?
Edge cases for 5-trits-per-byte (from what I understand, it's not exactly 5 trits to 1 byte — are there any sequences of trits or bytes that we can use to demonstrate this?).
Others?

@alon-e @paulhandy Would love to get your thoughts on this as well. What are some unit tests we can use for trits <-> bytes conversion?

The previous name will still work, but it is deprecated and will be removed in PyOTA v2.1.

mlouielu · 2017-10-30T11:36:57Z

Hi all, I'm currently working on Python reading rocksdb data, and it is relative to this topic.

Problem

I think TryteString naming and lack of documentation only confusing Python users.

TryteString should act like str, not bytes:

the name TryteString should just act like the str object in Python, not bytes, otherwise, it should be named as TryteBytes.

TryteString's method from_string, as_string, from_bytes, as_bytes confuse:

from_string will encode str with utf-8, but iota.TryteString(str) only accept tryte alphabet, not act like from_string. from_bytes take normal bytes, but not the bytes from trits.

TryteString misused bytes:

In Python, bytes is kind of representation of codepoint, that is, '你好' in UTF-8 will be 'b'\xe4\xbd\xa0\xe5\xa5\xbd', but in TryteString, we use bytes as a limited check for the input in the trytes alphabet range, and make it confuse.

For examle, TryteString('HELLO') == TryteString(b'HELLO').

For TryteString, its bytes should be TryteString convert to bytes, e.g. TryteString('EXAMPLE').as_bytes() to b'\xb4T\xa1\xa0\x01'

How to fixed it

Deprecate the usage of TryteString.

TryteString internal representation as str (trytes), and correct the behavior of the method. This will be a big change since all around the library was using TryteString with bytes.

Good

TryteString('EXAMPLE')                                   # input: trytes: EXAMPLE, output "EXAMPLE"
TryteString('OBGCKBWBZBVBOB').as_string() # input: trytes, output "EXAMPLE"
TryteString.from_string('EXAMPLE')             # input: str: EXAMPLE, output "OBGCKBWBZBVBOB"
TryteString.from_string('你好')                      # input: str: 你好, output: "LH9GYEMHCF9G"
TryteString.from_bytes(b'\xb4T\xa1\xa0\x01')  # input: tryte-bytes, output "EXAMPLE"
bytes(TryteString('EXAMPLE'))                      # input: trytes, output b'EXAMPLE'

Bad

TryteString('你好')                  # input should be trytes-string (in trytes-alphabet)
TryteString('@@##HELLO')  # input should be trytes-string (in trytes-alphabet)
TryteString(b'EXAMPLE')       # input should not be bytes
TryteString.from_string(b'EXAMPLE')  input should be str type
TryteString.from_bytes(b'EXAMPLE')   # input should be tryte-bytes (trytes-string binary format)

Rename TryteString to TryteBytes, and correct from_bytes, as_bytes behavior.

todofixthis · 2017-10-31T04:48:41Z

Hey @mlouielu thanks for the info and the suggestions!

I see what you're getting at with TryteBytes. I don't think we need to go that far, but there is definitely room for improvement around the way that TryteString handles conversion to/from str and bytes values.

Let's figure out the trits <-> bytes conversion first, and then we can move on to trytes <-> characters.

The most important thing we need right now to fix this issue is test cases. Could you help with #62 (comment) ?

As for issue iotaledger#62, this is the test cases for converting TryteString to bytes, and bytes to TryteString.

todofixthis · 2017-10-31T20:30:58Z

Thanks @mlouielu for the test cases!! I will look at them this weekend, and I'll let you know if I have any questions.

vbakke · 2017-12-27T21:27:37Z

Hi @todofixthis

Your latest commit regarding trytes was naming only. Is that correct? Or did you change some of the behaviour as well?

There is a lot of concurrent work on the byte <-> tryte conversion. Both in #62 and #90 by you and @th0br0, and @mlouielu in the rocksdb.

I work on encrypting IOTA seeds (somewhat similar to BIP38), and @dashengsun (with feedback from @paulhand) work on Unicode to trytes in the iota.lib.js and @pRizz has published pRizz/Tryte-UTF8-JSON-Codec, @tuupola has tuupola/trytes (for PHP) and @knarz has knarz/trytes (for Rust).

Maybe we should coordinate a little, share the good ideas, and pin point difficulties before they get published and confuse the rest of the community? Because tryte <-> byte conversion is not trivial.

Square pegs and round holes
The English saying about fitting a square peg in a round hole is very describing. There will be trouble with the function that converts back to the original type.

We probably should not pollute issue #62 with this discussion. Please suggest a better place. If we lack better options, I just now created vbakke/trytes where we can discuss this freely.

todofixthis · 2018-01-01T23:00:19Z

Hey @vbakke About 4 months ago, I started work on this issue. I thought we'd have it wrapped up in a couple of weeks, so I made some changes to the codebase to make room for the new codec and mark the existing codec as deprecated.

However, at this point, I'm still not sure that we have a bytes <-> trytes conversion process that can handle all edge cases (e.g., bytes in range [244, 255], certain cases where encode(decode(value)) != value, etc.), so I haven't made much progress on this issue.

vbakke · 2018-01-01T23:57:59Z

Hi @todofixthis ,

I've been spending most of my spare time during xmas looking at this as well. I just now published my latest thoughts on the issue, as well as working example code (JavaScript, though).

I think it covers all corner cases, and includes Unicode text to tryte strings, including a BOM mark to indicate the encoding, if used.

Feel free to have a look at my latest write up of the Readme.md. I don't think there is a universal two-way byte <->tryte conversion.

todofixthis · 2018-01-03T06:47:05Z

Thanks @vbakke! I'll have a look at your notes later this week. Quick glance shows that they are very thorough; looking forward to going through them!

vbakke · 2018-01-04T23:46:26Z

FYI I just added an online version of encoding native strings, and native trytes at https://vbakke.github.io/trytes/

The previous name will still work, but it is deprecated and will be removed in PyOTA v2.1.

todofixthis closed this as completed Aug 26, 2017

todofixthis reopened this Aug 27, 2017

todofixthis self-assigned this Aug 27, 2017

todofixthis added the enhancement label Aug 27, 2017

todofixthis added this to Backlog in PyOTA Aug 27, 2017

todofixthis changed the title ~~TryteString.as_bytes() does not work as intended~~ TryteString.as_bytes() does not work as expected Sep 24, 2017

todofixthis moved this from Backlog to Scheduled v2.1.x in PyOTA Sep 24, 2017

todofixthis moved this from Scheduled v2.1.x to In Progress in PyOTA Oct 14, 2017

todofixthis added a commit that referenced this issue Oct 15, 2017

[#62] Renamed trytes codec to trytes_ascii.

a108db5

The previous name will still work, but it is deprecated and will be removed in PyOTA v2.1.

todofixthis mentioned this issue Oct 15, 2017

Renamed 'trytes' codec to 'trytes_ascii'. #80

Merged

todofixthis added a commit that referenced this issue Oct 15, 2017

[#62] Fixed incorrect documentation.

53b7a9e

todofixthis mentioned this issue Oct 16, 2017

Seed.random().as_string() throws TrytesDecodeError #81

Closed

todofixthis mentioned this issue Oct 28, 2017

Release/2.0.1 #87

Merged

mlouielu added a commit to mlouielu/iota.lib.py that referenced this issue Oct 31, 2017

Add TryteString to bytes test cases

9109aef

As for issue iotaledger#62, this is the test cases for converting TryteString to bytes, and bytes to TryteString.

mlouielu mentioned this issue Oct 31, 2017

Add TryteString to bytes test cases #88

Closed

todofixthis mentioned this issue Oct 31, 2017

TryteString.as_string needs some re-branding #90

Closed

todofixthis mentioned this issue Dec 26, 2017

Release/2.0.3 #119

Merged

todofixthis moved this from In Progress to Scheduled v2.1.x in PyOTA Jan 6, 2018

todofixthis mentioned this issue Jan 6, 2018

Remove deprecated codecs/methods for v2.1 #134

Open

todofixthis moved this from Scheduled v2.1.x to Blocked / On Hold in PyOTA Jan 7, 2018

marko-k0 pushed a commit to marko-k0/iota.lib.py that referenced this issue Jul 28, 2018

[iotaledger#62] Renamed trytes codec to trytes_ascii.

ae00ccc

The previous name will still work, but it is deprecated and will be removed in PyOTA v2.1.

marko-k0 pushed a commit to marko-k0/iota.lib.py that referenced this issue Jul 28, 2018

[iotaledger#62] Fixed incorrect documentation.

18bfe7b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TryteString.as_bytes() does not work as expected #62

TryteString.as_bytes() does not work as expected #62

th0br0 commented Aug 25, 2017

todofixthis commented Aug 26, 2017 •

edited

th0br0 commented Aug 26, 2017

todofixthis commented Aug 27, 2017 •

edited

todofixthis commented Oct 14, 2017 •

edited

mlouielu commented Oct 30, 2017

todofixthis commented Oct 31, 2017 •

edited

todofixthis commented Oct 31, 2017

vbakke commented Dec 27, 2017

todofixthis commented Jan 1, 2018

vbakke commented Jan 1, 2018

todofixthis commented Jan 3, 2018 •

edited

vbakke commented Jan 4, 2018

TryteString.as_bytes() does not work as expected #62

TryteString.as_bytes() does not work as expected #62

Comments

th0br0 commented Aug 25, 2017

todofixthis commented Aug 26, 2017 • edited

th0br0 commented Aug 26, 2017

todofixthis commented Aug 27, 2017 • edited

todofixthis commented Oct 14, 2017 • edited

mlouielu commented Oct 30, 2017

Problem

How to fixed it

todofixthis commented Oct 31, 2017 • edited

todofixthis commented Oct 31, 2017

vbakke commented Dec 27, 2017

todofixthis commented Jan 1, 2018

vbakke commented Jan 1, 2018

todofixthis commented Jan 3, 2018 • edited

vbakke commented Jan 4, 2018

todofixthis commented Aug 26, 2017 •

edited

todofixthis commented Aug 27, 2017 •

edited

todofixthis commented Oct 14, 2017 •

edited

todofixthis commented Oct 31, 2017 •

edited

todofixthis commented Jan 3, 2018 •

edited