New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

`TryteString.as_string` needs some re-branding #90

Closed
todofixthis opened this Issue Oct 31, 2017 · 5 comments

Comments

Projects
None yet
2 participants
@todofixthis
Collaborator

todofixthis commented Oct 31, 2017

The TryteString.as_string desperately needs to be renamed; a lot of users are confusing it with __str__.

@todofixthis todofixthis added this to Backlog in PyOTA via automation Oct 31, 2017

@todofixthis

This comment has been minimized.

Show comment
Hide comment
@todofixthis

todofixthis Oct 31, 2017

Collaborator

My recommendation is to call it decode:

  • Users will be familiar with this method because it is built into Python strings (and TryteString is supposed to "feel" like the Tangle version of a Python string).
  • Users who are familiar with how bytes.decode works will also be able to grasp more easily that this is a "trytes -> bytes -> characters" process (especially once #62 is implemented), so they are less likely to confuse it with __str__.
Collaborator

todofixthis commented Oct 31, 2017

My recommendation is to call it decode:

  • Users will be familiar with this method because it is built into Python strings (and TryteString is supposed to "feel" like the Tangle version of a Python string).
  • Users who are familiar with how bytes.decode works will also be able to grasp more easily that this is a "trytes -> bytes -> characters" process (especially once #62 is implemented), so they are less likely to confuse it with __str__.

@todofixthis todofixthis moved this from Backlog to Scheduled v2.1.x in PyOTA Oct 31, 2017

@todofixthis todofixthis changed the title from `TryteString.as_string` needs a makeover to `TryteString.as_string` needs some re-branding Oct 31, 2017

@mlouielu

This comment has been minimized.

Show comment
Hide comment
@mlouielu

mlouielu Nov 1, 2017

Contributor

Conclusion

+1 for rename as_string to decode, and rename as_bytes to encode.

For what I think TryteString should act like this:

>>> import iota
>>> ts = iota.codecs.encode(b'EXAMPLE'.decode('ascii'), 'utf-8')  # Return TryteString
>>> ts = iota.codecs.encode('EXAMPLE', 'utf-8')                    # Return TryteString
>>> ts = iota.TryteString.from_string('EXAMPLE')
>>> ts = iota.TryteString.from_bytes(b'*\x15d\x96\xb5\x121\x8b\x01')
>>> ts = iota.TryteString('OBGCKBWBZBVBOB')
iota.TryteString('OBGCKBWBZBVBOB')
>>> ts.encode()                          # encode "tryte-string" to "tryte-in-bytes"
b'*\x15d\x96\xb5\x121\x8b\x01'
>>> ts.decode('utf-8')                   # decode "tryte-string" to "str"
>>> ts.decode()                          # default with utf-8
'EXAMPLE'
>>> str(ts)
'OBGCKBWBZBVBOB'
>>> bytes(ts)                           # Not b'OBGCKBWBZBVBOB'
b'*\x15d\x96\xb5\x121\x8b\x01'

Explain

Users are confused between 'EXAMPLE', iota.Hash('EXAMPLE'), iota.Hash(b'EXAMPLE'), str(iota.Hash('EXAMPLE')), bytes(iota.Hash('EXAMPLE')), what is the different between them?

  • 'EXAMPLE': a string, maybe tryte string, or a Python string
  • iota.Hash('EXAMPLE'): a TryteString, with its value init with 'EXAMPLE'
  • iota.Hash(b'EXAMPLE'): a TryteString, with its value init with b'EXAMPLE' (this is same as 'EXAMPLE')
  • str(iota.Hash('EXAMPLE')): a tryte string in str, from iota.TryteString('EXAMPLE'))
  • bytes(iota.Hash('EXAMPLE')): a tryte string in bytes, from iota.TryteString('EXAMPLE'))

The point is, TryteString.__init__ input with str or bytes is both acceptable, in here, str and bytes both represent a "tryte string".

There isn't involve any decode/encode. So, str(iota.Hash('EXAMPLE')) will be 'EXAMPLE', and bytes(iota.Hash('EXAMPLE')) will be b'*\x15d\x96\xb5\x121\x8b\x01', is make sense.


But, from_string, as_string involve with encode/deocde, from_string encode input string to utf-8, and pass it to from_bytes, therefore, this is the same:

>>> iota.Hash.from_string('妳好') == iota.Hash.from_bytes('妳好'.encode('utf-8'))
True

The deeper problem here comes from from_bytes. It takes not the "tryte string in bytes format" but "any bytes".


For what I think, we just mess up two different converts in one type. we want to do something like str/bytes -> tryte-string -> TryteString -> bytes, and tryte-string (instrorbytes) -> TryteString.

# str/bytes -> tryte-string
"This is a message from GitHub" -> "CCWCXCGDEAXCGDEAPCEAADTCGDGDPCVCTCEAUCFDCDADEAQBXCHDRBIDQC"

# tryte-string -> TryteString
"CCWCXCGDEAXCGDEAPCEAADTCGDGDPCVCTCEAUCFDCDADEAQBXCHDRBIDQC" -> TryteString("CCWCXCGDEAXCGDEAPCEAADTCGDGDPCVCTCEAUCFDCDADEAQBXCHDRBIDQC")

# TryteString -> bytes
TryteString("CCWCXCGDEAXCGDEAPCEAADTCGDGDPCVCTCEAUCFDCDADEAQBXCHDRBIDQC") -> b'T\xf4\xe6\xcd\xbc\x0bNf.\xcb\xb7\x0bm\xeb@\xce^\x17L\xeb.R\x08&oT.\xe6\x05\x1at\x94R\xe9\x08'

---------

# str/bytes -> tryte-string
"EXAMPLE" -> "OBGCKBWBZBVBOB"

# tryte-string -> TryteString
"OBGCKBWBZBVBOB" -> TryteString("OBGCKBWBZBVBOB")
# tryte-string -> TryteString
"EXAMPLE" -> TryteString("EXAMPLE")
b"EXAMPLE" -> TryteString("EXAMPLE")

BTW, @todofixthis you are using Python 2, right? In Python 3, str can only encode to bytes, and bytes can only decode to str. str can't do decode to unicode. I think that's why I'm stuck in TryteString.decode(), if TryteString act like a Python string, it can't do decode in Python 3...

Contributor

mlouielu commented Nov 1, 2017

Conclusion

+1 for rename as_string to decode, and rename as_bytes to encode.

For what I think TryteString should act like this:

>>> import iota
>>> ts = iota.codecs.encode(b'EXAMPLE'.decode('ascii'), 'utf-8')  # Return TryteString
>>> ts = iota.codecs.encode('EXAMPLE', 'utf-8')                    # Return TryteString
>>> ts = iota.TryteString.from_string('EXAMPLE')
>>> ts = iota.TryteString.from_bytes(b'*\x15d\x96\xb5\x121\x8b\x01')
>>> ts = iota.TryteString('OBGCKBWBZBVBOB')
iota.TryteString('OBGCKBWBZBVBOB')
>>> ts.encode()                          # encode "tryte-string" to "tryte-in-bytes"
b'*\x15d\x96\xb5\x121\x8b\x01'
>>> ts.decode('utf-8')                   # decode "tryte-string" to "str"
>>> ts.decode()                          # default with utf-8
'EXAMPLE'
>>> str(ts)
'OBGCKBWBZBVBOB'
>>> bytes(ts)                           # Not b'OBGCKBWBZBVBOB'
b'*\x15d\x96\xb5\x121\x8b\x01'

Explain

Users are confused between 'EXAMPLE', iota.Hash('EXAMPLE'), iota.Hash(b'EXAMPLE'), str(iota.Hash('EXAMPLE')), bytes(iota.Hash('EXAMPLE')), what is the different between them?

  • 'EXAMPLE': a string, maybe tryte string, or a Python string
  • iota.Hash('EXAMPLE'): a TryteString, with its value init with 'EXAMPLE'
  • iota.Hash(b'EXAMPLE'): a TryteString, with its value init with b'EXAMPLE' (this is same as 'EXAMPLE')
  • str(iota.Hash('EXAMPLE')): a tryte string in str, from iota.TryteString('EXAMPLE'))
  • bytes(iota.Hash('EXAMPLE')): a tryte string in bytes, from iota.TryteString('EXAMPLE'))

The point is, TryteString.__init__ input with str or bytes is both acceptable, in here, str and bytes both represent a "tryte string".

There isn't involve any decode/encode. So, str(iota.Hash('EXAMPLE')) will be 'EXAMPLE', and bytes(iota.Hash('EXAMPLE')) will be b'*\x15d\x96\xb5\x121\x8b\x01', is make sense.


But, from_string, as_string involve with encode/deocde, from_string encode input string to utf-8, and pass it to from_bytes, therefore, this is the same:

>>> iota.Hash.from_string('妳好') == iota.Hash.from_bytes('妳好'.encode('utf-8'))
True

The deeper problem here comes from from_bytes. It takes not the "tryte string in bytes format" but "any bytes".


For what I think, we just mess up two different converts in one type. we want to do something like str/bytes -> tryte-string -> TryteString -> bytes, and tryte-string (instrorbytes) -> TryteString.

# str/bytes -> tryte-string
"This is a message from GitHub" -> "CCWCXCGDEAXCGDEAPCEAADTCGDGDPCVCTCEAUCFDCDADEAQBXCHDRBIDQC"

# tryte-string -> TryteString
"CCWCXCGDEAXCGDEAPCEAADTCGDGDPCVCTCEAUCFDCDADEAQBXCHDRBIDQC" -> TryteString("CCWCXCGDEAXCGDEAPCEAADTCGDGDPCVCTCEAUCFDCDADEAQBXCHDRBIDQC")

# TryteString -> bytes
TryteString("CCWCXCGDEAXCGDEAPCEAADTCGDGDPCVCTCEAUCFDCDADEAQBXCHDRBIDQC") -> b'T\xf4\xe6\xcd\xbc\x0bNf.\xcb\xb7\x0bm\xeb@\xce^\x17L\xeb.R\x08&oT.\xe6\x05\x1at\x94R\xe9\x08'

---------

# str/bytes -> tryte-string
"EXAMPLE" -> "OBGCKBWBZBVBOB"

# tryte-string -> TryteString
"OBGCKBWBZBVBOB" -> TryteString("OBGCKBWBZBVBOB")
# tryte-string -> TryteString
"EXAMPLE" -> TryteString("EXAMPLE")
b"EXAMPLE" -> TryteString("EXAMPLE")

BTW, @todofixthis you are using Python 2, right? In Python 3, str can only encode to bytes, and bytes can only decode to str. str can't do decode to unicode. I think that's why I'm stuck in TryteString.decode(), if TryteString act like a Python string, it can't do decode in Python 3...

@todofixthis

This comment has been minimized.

Show comment
Hide comment
@todofixthis

todofixthis Nov 1, 2017

Collaborator

These are great ideas, thanks @mlouielu !


tl;dr version: Overall, I think we're in agreement; I just have a couple of minor changes to request:

  • TryteString.__str__ and TryteString.__bytes__ can stay the way they are.
  • Use built-in codecs.{de,en}code instead of iota.codecs.{de,en}code. Depending on how PyOTA is using these functions internally, this part could get a bit complicated; might want to wait until we tackle #62.
  • Make sure we can support the legacy ASCII codec (this is more applicable to #62 though).

Changes to be made:

  • Rename TryteString.as_bytes to encode.
  • Rename TryteString.as_string to decode.

Everything else can stay the way it is — we'll make additional changes for #62, but for #90, I think we only need to rename a couple of methods.


Let's tackle this one item at a time:

1. __init__

I like the idea of TryteString('FOO') == TryteString(b'FOO'). In fact, this is what PyOTA does currently.

2. __str__ and __bytes__

To be consistent, __str__ and __bytes__ should either:

Return the ASCII representation of the trytes:

  • str(TryteString('ZQEHP9QXNTJHDBNZZCEOBHRBNJHDWM'))) == 'ZQEHP9QXNTJHDBNZZCEOBHRBNJHDWM'
  • bytes(TryteString('ZQEHP9QXNTJHDBNZZCEOBHRBNJHDWM')) == b'ZQEHP9QXNTJHDBNZZCEOBHRBNJHDWM'

OR return the binary representation of the trytes:

  • str(TryteString('ZQEHP9QXNTJHDBNZZCEOBHRBNJHDWM'))) == '你好,世界!'
  • bytes(TryteString('ZQEHP9QXNTJHDBNZZCEOBHRBNJHDWM')) == b'\xe4\xbd\xa0\xe5\xa5\xbd\xef\xbc\x8c\xe4\xb8\x96\xe7\x95\x8c\xef\xbc\x81'

I think the former satisfies the Principle of Least Astonishment. Additionally, it conforms to the Zen of Python ("There should be one-- and preferably only one --obvious way to do it.") because we will use encode/decode to get binary representations of TryteStrings anyway.

3. iota.codecs.encode and iota.codecs.decode

This is not necessary, as we can leverage Python's built-in codecs system.

To decode bytes into trytes:

>>> from codecs import encode, decode

>>> bytes_ = '你好,世界!'.encode('utf-8')
>>> decode(bytes_, 'trytes_binary')
TryteString('ZQEHP9QXNTJHDBNZZCEOBHRBNJHDWM')

# Using legacy ASCII codec:
>>> decode(bytes_, 'trytes_ascii')
TryteString('LH9GYEMHCF9GWHZFEELHVFOEOHNEEEWHZFUD')

To encode strings into trytes:

>>> str_ = '你好,世界!'
>>> encode(str_, 'trytes_binary')
TryteString('ZQEHP9QXNTJHDBNZZCEOBHRBNJHDWM')

# Using legacy ASCII codec:
>>> encode(str_, 'trytes_ascii')
TryteString('LH9GYEMHCF9GWHZFEELHVFOEOHNEEEWHZFUD')

Note: PyOTA already uses decode and encode internally to convert some values, so we might have to get creative here.

4. TryteString.from_bytes and TryteString.from_string

I think we're in alignment here; I just need to make one minor tweak, because we also have to support the legacy ASCII codec. See next section.

5. TryteString.encode replaces TryteString.as_bytes

I like the rename, and I think it will resonate with Python users; it is the reverse of decode(bytes_, 'trytes_binary') from the example above:

  • decode(bytes_, 'trytes_binary').encode('trytes_binary') == bytes_
  • decode(bytes_, 'utf-8').encode('utf-8') == bytes_

We will need to support the legacy ASCII codec, so there needs to be an optional argument to that method:

## Using binary codec (default):
>>> TryteString('ZQEHP9QXNTJHDBNZZCEOBHRBNJHDWM').encode()
>>> TryteString('ZQEHP9QXNTJHDBNZZCEOBHRBNJHDWM').encode('trytes_binary')
b'\xe4\xbd\xa0\xe5\xa5\xbd\xef\xbc\x8c\xe4\xb8\x96\xe7\x95\x8c\xef\xbc\x81'

# Using legacy ASCII codec:
>>> TryteString('LH9GYEMHCF9GWHZFEELHVFOEOHNEEEWHZFUD').encode('trytes_ascii')
b'\xe4\xbd\xa0\xe5\xa5\xbd\xef\xbc\x8c\xe4\xb8\x96\xe7\x95\x8c\xef\xbc\x81'

6. TryteString.decode replaces TryteString.as_string

Similar comments as the previous section.

Collaborator

todofixthis commented Nov 1, 2017

These are great ideas, thanks @mlouielu !


tl;dr version: Overall, I think we're in agreement; I just have a couple of minor changes to request:

  • TryteString.__str__ and TryteString.__bytes__ can stay the way they are.
  • Use built-in codecs.{de,en}code instead of iota.codecs.{de,en}code. Depending on how PyOTA is using these functions internally, this part could get a bit complicated; might want to wait until we tackle #62.
  • Make sure we can support the legacy ASCII codec (this is more applicable to #62 though).

Changes to be made:

  • Rename TryteString.as_bytes to encode.
  • Rename TryteString.as_string to decode.

Everything else can stay the way it is — we'll make additional changes for #62, but for #90, I think we only need to rename a couple of methods.


Let's tackle this one item at a time:

1. __init__

I like the idea of TryteString('FOO') == TryteString(b'FOO'). In fact, this is what PyOTA does currently.

2. __str__ and __bytes__

To be consistent, __str__ and __bytes__ should either:

Return the ASCII representation of the trytes:

  • str(TryteString('ZQEHP9QXNTJHDBNZZCEOBHRBNJHDWM'))) == 'ZQEHP9QXNTJHDBNZZCEOBHRBNJHDWM'
  • bytes(TryteString('ZQEHP9QXNTJHDBNZZCEOBHRBNJHDWM')) == b'ZQEHP9QXNTJHDBNZZCEOBHRBNJHDWM'

OR return the binary representation of the trytes:

  • str(TryteString('ZQEHP9QXNTJHDBNZZCEOBHRBNJHDWM'))) == '你好,世界!'
  • bytes(TryteString('ZQEHP9QXNTJHDBNZZCEOBHRBNJHDWM')) == b'\xe4\xbd\xa0\xe5\xa5\xbd\xef\xbc\x8c\xe4\xb8\x96\xe7\x95\x8c\xef\xbc\x81'

I think the former satisfies the Principle of Least Astonishment. Additionally, it conforms to the Zen of Python ("There should be one-- and preferably only one --obvious way to do it.") because we will use encode/decode to get binary representations of TryteStrings anyway.

3. iota.codecs.encode and iota.codecs.decode

This is not necessary, as we can leverage Python's built-in codecs system.

To decode bytes into trytes:

>>> from codecs import encode, decode

>>> bytes_ = '你好,世界!'.encode('utf-8')
>>> decode(bytes_, 'trytes_binary')
TryteString('ZQEHP9QXNTJHDBNZZCEOBHRBNJHDWM')

# Using legacy ASCII codec:
>>> decode(bytes_, 'trytes_ascii')
TryteString('LH9GYEMHCF9GWHZFEELHVFOEOHNEEEWHZFUD')

To encode strings into trytes:

>>> str_ = '你好,世界!'
>>> encode(str_, 'trytes_binary')
TryteString('ZQEHP9QXNTJHDBNZZCEOBHRBNJHDWM')

# Using legacy ASCII codec:
>>> encode(str_, 'trytes_ascii')
TryteString('LH9GYEMHCF9GWHZFEELHVFOEOHNEEEWHZFUD')

Note: PyOTA already uses decode and encode internally to convert some values, so we might have to get creative here.

4. TryteString.from_bytes and TryteString.from_string

I think we're in alignment here; I just need to make one minor tweak, because we also have to support the legacy ASCII codec. See next section.

5. TryteString.encode replaces TryteString.as_bytes

I like the rename, and I think it will resonate with Python users; it is the reverse of decode(bytes_, 'trytes_binary') from the example above:

  • decode(bytes_, 'trytes_binary').encode('trytes_binary') == bytes_
  • decode(bytes_, 'utf-8').encode('utf-8') == bytes_

We will need to support the legacy ASCII codec, so there needs to be an optional argument to that method:

## Using binary codec (default):
>>> TryteString('ZQEHP9QXNTJHDBNZZCEOBHRBNJHDWM').encode()
>>> TryteString('ZQEHP9QXNTJHDBNZZCEOBHRBNJHDWM').encode('trytes_binary')
b'\xe4\xbd\xa0\xe5\xa5\xbd\xef\xbc\x8c\xe4\xb8\x96\xe7\x95\x8c\xef\xbc\x81'

# Using legacy ASCII codec:
>>> TryteString('LH9GYEMHCF9GWHZFEELHVFOEOHNEEEWHZFUD').encode('trytes_ascii')
b'\xe4\xbd\xa0\xe5\xa5\xbd\xef\xbc\x8c\xe4\xb8\x96\xe7\x95\x8c\xef\xbc\x81'

6. TryteString.decode replaces TryteString.as_string

Similar comments as the previous section.

@todofixthis todofixthis moved this from Scheduled v2.1.x to In Progress in PyOTA Jan 6, 2018

todofixthis added a commit that referenced this issue Jan 6, 2018

[#90] Renamed {en,de}coding methods.
- `TryteString.from_string` is now `from_unicode`.
- `TryteString.as_string` is now `decode`.
- `TryteString.as_bytes` is now `encode`.
- Original methods are still available, but deprecated.
@todofixthis

This comment has been minimized.

Show comment
Hide comment
@todofixthis

todofixthis Jan 6, 2018

Collaborator

Summary of changes:

  • Rename TryteString.from_string to from_unicode.
  • Rename TryteString.as_bytes to encode.
  • Rename TryteString.as_string to decode.
  • Add deprecated versions of the renamed functions.
Collaborator

todofixthis commented Jan 6, 2018

Summary of changes:

  • Rename TryteString.from_string to from_unicode.
  • Rename TryteString.as_bytes to encode.
  • Rename TryteString.as_string to decode.
  • Add deprecated versions of the renamed functions.

@todofixthis todofixthis moved this from In Progress to Pull Request Submitted in PyOTA Jan 6, 2018

@todofixthis

This comment has been minimized.

Show comment
Hide comment
@todofixthis

todofixthis Jan 6, 2018

Collaborator

Scheduled for release: 2.0.4

Collaborator

todofixthis commented Jan 6, 2018

Scheduled for release: 2.0.4

@todofixthis todofixthis closed this Jan 6, 2018

PyOTA automation moved this from Pull Request Submitted to Done 2.0.4 Jan 6, 2018

@todofixthis todofixthis removed this from Done 2.0.5 in PyOTA Feb 18, 2018

redondo-mk pushed a commit to redondo-mk/iota.lib.py that referenced this issue Jul 28, 2018

[iotaledger#90] Renamed {en,de}coding methods.
- `TryteString.from_string` is now `from_unicode`.
- `TryteString.as_string` is now `decode`.
- `TryteString.as_bytes` is now `encode`.
- Original methods are still available, but deprecated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment