Skip to content
This repository has been archived by the owner on Jan 13, 2023. It is now read-only.

TryteString.as_bytes() does not work as expected #62

Open
th0br0 opened this issue Aug 25, 2017 · 12 comments
Open

TryteString.as_bytes() does not work as expected #62

th0br0 opened this issue Aug 25, 2017 · 12 comments
Assignees
Projects

Comments

@th0br0
Copy link

th0br0 commented Aug 25, 2017

Expected result (or similar):

>>> iota.TryteString(b'ABABZZDD99').as_bytes()
(30, [55,178,248,107,12,0])

Actual result:

>>> iota.TryteString(b'ABABZZDD99').as_bytes()
Traceback (most recent call last):
  File "iota.lib.py/iota/codecs.py", line 147, in decode
    + (self.index[second] * len(self.index))
ValueError: byte must be in range(0, 256)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "iota.lib.py/iota/types.py", line 421, in as_bytes
    return decode(self._trytes, 'trytes', errors)
  File "iota.lib.py/iota/codecs.py", line 165, in decode
    'input': input,
iota.codecs.TrytesDecodeError: 'trytes' codec can't decode trytes ZZ at position 4-5: ordinal not in range(255)
@todofixthis
Copy link
Contributor

todofixthis commented Aug 26, 2017

Hey @th0br0. This is actually expected behaviour — the tryte sequence ZZ represents value -28, which cannot be converted into a single byte.

>>> from iota.types import int_from_trits, TryteString

>>> int_from_trits(TryteString('ZZ').as_trits())
-28

>>> bytearray([-28])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: byte must be in range(0, 256)
byte must be in range(0, 256)

In order to convert this sequence into a byte string, you must instruct PyOTA to handle invalid sequences differently (this is very similar to how string.decode('utf-8') handles invalid bytes):

>>> from iota import TryteString

>>> TryteString(b'ABABZZDD99').as_bytes('strict')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/phx/Documents/pyota/iota/types.py", line 498, in as_bytes
    return decode(self._trytes, 'trytes', errors)
  File "/Users/phx/Documents/pyota/iota/codecs.py", line 165, in decode
    'input': input,
iota.codecs.TrytesDecodeError: 'trytes' codec can't decode trytes ZZ at position 4-5: ordinal not in range(255)
'trytes' codec can't decode trytes ZZ at position 4-5: ordinal not in range(255)

>>> TryteString(b'ABABZZDD99').as_bytes('replace')
b'77?p\x00'

>>> TryteString(b'ABABZZDD99').as_bytes('ignore')
b'77p\x00'

@th0br0
Copy link
Author

th0br0 commented Aug 26, 2017

I beg to differ. I'm not well-versed enough in Python3 codecs to pinpoint exactly where things need to change, but:
as_bytes() should split the trits into chunks of up to 5 trits each and then essentially do:

value = 0
RADIX = 3
chunk = [-1,1,1,1,-1]
for c in chunk:
  value = value * RADIX + c

the resulting value is signed (-69) can then be converted into an unsigned byte via value & 0xFF == 187. If the final chunk has less than 5 elements, expect them to be 0 (but multiply by the RADIX anyway). See https://github.com/iotaledger/iota.lib.java/blob/master/src/main/java/jota/utils/Converter.java#L56 for the Java impl.

Decomposition can happen either via a lookup table computed at library load, e.g. https://github.com/iotaledger/iota.lib.java/blob/master/src/main/java/jota/utils/Converter.java#L37,
or have a look at the code at the bottom. It's equal to what the Java code does.

Note that you need to store the number of encoded trits for decomposition of multiple bytes to work.

This is what both the Java & Rust libraries do, I don't think that the JS library currently implements byte conversion. The current Python library can't even do iota.TryteString(b'Z').as_bytes() ?!

RADIX = 3
MAX_TRIT_VALUE = 1
MIN_TRIT_VALUE = -1

def increment(trits):
    for i in range(5):
        trits[i] += 1
        if trits[i] > MAX_TRIT_VALUE:
            trits[i] = MIN_TRIT_VALUE
        else:
            break
    return trits


if __name__ == '__main__':
    trits = [0] * 5
    for i in range(243):
        value = 0
        # to_bytes
        for t in trits[::-1]:
            value = value * RADIX + t

        print("%s=%d"% (trits, value & 0xFF))

        # to_trits
        # there's 243 different value combinations
        # this is just easier than having to worry about signedness when converting back...
        value += 121

        chunk = []
        for _ in range(5):
            rem = value % RADIX
            chunk.append(int(rem))
            value = (value - rem) / RADIX
        chunk = [x - 1 for x in chunk]

        assert(trits == chunk)

        trits = increment(trits)

@todofixthis todofixthis reopened this Aug 27, 2017
@todofixthis
Copy link
Contributor

todofixthis commented Aug 27, 2017

Thanks for the thorough explanation! I see now what I was missing before.

TryteString.as_bytes() was modelled after the JS lib's fromTrytes function. But, it's not a proper "bytes from trytes" implementation, as you've noted.

I like the idea of implementing a proper trytes -> bytes conversion process (and I'm excited to finally see some maths showing how it works!); maybe we can rename TryteString.as_bytes() to as_ascii_encoded or similar, and then as_bytes can do the proper trytes -> bytes conversion.

Given everything that's going on with PyOTA right now, I think this will have to wait until v2.1 (v2.0 has the new hash algorithm and multisig — there's already enough backwards-incompatible changes for one release!).

In the meantime, I think you can get the result you're looking for by first calling TryteString.as_trits() and then using the code you posted to convert to byte values.

@todofixthis todofixthis self-assigned this Aug 27, 2017
@todofixthis todofixthis added this to Backlog in PyOTA Aug 27, 2017
@todofixthis todofixthis changed the title TryteString.as_bytes() does not work as intended TryteString.as_bytes() does not work as expected Sep 24, 2017
@todofixthis todofixthis moved this from Backlog to Scheduled v2.1.x in PyOTA Sep 24, 2017
@todofixthis todofixthis moved this from Scheduled v2.1.x to In Progress in PyOTA Oct 14, 2017
@todofixthis
Copy link
Contributor

todofixthis commented Oct 14, 2017

Hey @th0br0 I'm starting to work on this now. Would you be willing to help me come up with some test cases for this functionality?

There's a few scenarios that I'm interested in:

  • At least two "happy path" scenarios (one long sequence, one short sequence), both directions (trits <-> bytes).
  • Negative values.
  • Zero/empty values:
    • Should trits [0] be converted to bytes b'' or b'\0'?
    • What about [0, 0, 0, ...]?
    • What about []?
  • Edge cases for 5-trits-per-byte (from what I understand, it's not exactly 5 trits to 1 byte — are there any sequences of trits or bytes that we can use to demonstrate this?).
  • Others?

@alon-e @paulhandy Would love to get your thoughts on this as well. What are some unit tests we can use for trits <-> bytes conversion?

todofixthis added a commit that referenced this issue Oct 15, 2017
The previous name will still work, but it is deprecated and will be removed in PyOTA v2.1.
@mlouielu
Copy link
Contributor

Hi all, I'm currently working on Python reading rocksdb data, and it is relative to this topic.

Problem

I think TryteString naming and lack of documentation only confusing Python users.

  1. TryteString should act like str, not bytes:

the name TryteString should just act like the str object in Python, not bytes, otherwise, it should be named as TryteBytes.

  1. TryteString's method from_string, as_string, from_bytes, as_bytes confuse:

from_string will encode str with utf-8, but iota.TryteString(str) only accept tryte alphabet, not act like from_string. from_bytes take normal bytes, but not the bytes from trits.

  1. TryteString misused bytes:

In Python, bytes is kind of representation of codepoint, that is, '你好' in UTF-8 will be 'b'\xe4\xbd\xa0\xe5\xa5\xbd', but in TryteString, we use bytes as a limited check for the input in the trytes alphabet range, and make it confuse.

For examle, TryteString('HELLO') == TryteString(b'HELLO').

For TryteString, its bytes should be TryteString convert to bytes, e.g. TryteString('EXAMPLE').as_bytes() to b'\xb4T\xa1\xa0\x01'

How to fixed it

  1. Deprecate the usage of TryteString.

TryteString internal representation as str (trytes), and correct the behavior of the method. This will be a big change since all around the library was using TryteString with bytes.

  • Good
TryteString('EXAMPLE')                                   # input: trytes: EXAMPLE, output "EXAMPLE"
TryteString('OBGCKBWBZBVBOB').as_string() # input: trytes, output "EXAMPLE"
TryteString.from_string('EXAMPLE')             # input: str: EXAMPLE, output "OBGCKBWBZBVBOB"
TryteString.from_string('你好')                      # input: str: 你好, output: "LH9GYEMHCF9G"
TryteString.from_bytes(b'\xb4T\xa1\xa0\x01')  # input: tryte-bytes, output "EXAMPLE"
bytes(TryteString('EXAMPLE'))                      # input: trytes, output b'EXAMPLE'
  • Bad
TryteString('你好')                  # input should be trytes-string (in trytes-alphabet)
TryteString('@@##HELLO')  # input should be trytes-string (in trytes-alphabet)
TryteString(b'EXAMPLE')       # input should not be bytes
TryteString.from_string(b'EXAMPLE')  input should be str type
TryteString.from_bytes(b'EXAMPLE')   # input should be tryte-bytes (trytes-string binary format)
  1. Rename TryteString to TryteBytes, and correct from_bytes, as_bytes behavior.

@todofixthis
Copy link
Contributor

todofixthis commented Oct 31, 2017

Hey @mlouielu thanks for the info and the suggestions!

I see what you're getting at with TryteBytes. I don't think we need to go that far, but there is definitely room for improvement around the way that TryteString handles conversion to/from str and bytes values.

Let's figure out the trits <-> bytes conversion first, and then we can move on to trytes <-> characters.

The most important thing we need right now to fix this issue is test cases. Could you help with #62 (comment) ?

mlouielu added a commit to mlouielu/iota.lib.py that referenced this issue Oct 31, 2017
As for issue iotaledger#62, this is the test cases for converting
TryteString to bytes, and bytes to TryteString.
@todofixthis
Copy link
Contributor

Thanks @mlouielu for the test cases!! I will look at them this weekend, and I'll let you know if I have any questions.

@vbakke
Copy link

vbakke commented Dec 27, 2017

Hi @todofixthis

Your latest commit regarding trytes was naming only. Is that correct? Or did you change some of the behaviour as well?

There is a lot of concurrent work on the byte <-> tryte conversion. Both in #62 and #90 by you and @th0br0, and @mlouielu in the rocksdb.

I work on encrypting IOTA seeds (somewhat similar to BIP38), and @dashengsun (with feedback from @paulhand) work on Unicode to trytes in the iota.lib.js and @pRizz has published pRizz/Tryte-UTF8-JSON-Codec, @tuupola has tuupola/trytes (for PHP) and @knarz has knarz/trytes (for Rust).

Maybe we should coordinate a little, share the good ideas, and pin point difficulties before they get published and confuse the rest of the community? Because tryte <-> byte conversion is not trivial.

Square pegs and round holes
The English saying about fitting a square peg in a round hole is very describing. There will be trouble with the function that converts back to the original type.

We probably should not pollute issue #62 with this discussion. Please suggest a better place. If we lack better options, I just now created vbakke/trytes where we can discuss this freely.

@todofixthis
Copy link
Contributor

Hey @vbakke About 4 months ago, I started work on this issue. I thought we'd have it wrapped up in a couple of weeks, so I made some changes to the codebase to make room for the new codec and mark the existing codec as deprecated.

However, at this point, I'm still not sure that we have a bytes <-> trytes conversion process that can handle all edge cases (e.g., bytes in range [244, 255], certain cases where encode(decode(value)) != value, etc.), so I haven't made much progress on this issue.

@vbakke
Copy link

vbakke commented Jan 1, 2018

Hi @todofixthis ,

I've been spending most of my spare time during xmas looking at this as well. I just now published my latest thoughts on the issue, as well as working example code (JavaScript, though).

I think it covers all corner cases, and includes Unicode text to tryte strings, including a BOM mark to indicate the encoding, if used.

Feel free to have a look at my latest write up of the Readme.md. I don't think there is a universal two-way byte <->tryte conversion.

@todofixthis
Copy link
Contributor

todofixthis commented Jan 3, 2018

Thanks @vbakke! I'll have a look at your notes later this week. Quick glance shows that they are very thorough; looking forward to going through them!

@vbakke
Copy link

vbakke commented Jan 4, 2018

FYI I just added an online version of encoding native strings, and native trytes at https://vbakke.github.io/trytes/

@todofixthis todofixthis moved this from In Progress to Scheduled v2.1.x in PyOTA Jan 6, 2018
@todofixthis todofixthis moved this from Scheduled v2.1.x to Blocked / On Hold in PyOTA Jan 7, 2018
marko-k0 pushed a commit to marko-k0/iota.lib.py that referenced this issue Jul 28, 2018
The previous name will still work, but it is deprecated and will be removed in PyOTA v2.1.
marko-k0 pushed a commit to marko-k0/iota.lib.py that referenced this issue Jul 28, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
PyOTA
  
Blocked / On Hold
Development

No branches or pull requests

4 participants