Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BufferedIncrementalEncoder violates IncrementalEncoder interface #64619

Open
serhiy-storchaka opened this issue Jan 28, 2014 · 4 comments
Open
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@serhiy-storchaka
Copy link
Member

BPO 20420
Nosy @malemburg, @loewis, @doerwalter, @vadmium, @serhiy-storchaka

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2014-01-28.16:34:45.636>
labels = ['type-bug', 'library', '3.9', '3.10', '3.11']
title = 'BufferedIncrementalEncoder violates IncrementalEncoder interface'
updated_at = <Date 2021-12-09.22:10:24.931>
user = 'https://github.com/serhiy-storchaka'

bugs.python.org fields:

activity = <Date 2021-12-09.22:10:24.931>
actor = 'iritkatriel'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2014-01-28.16:34:45.636>
creator = 'serhiy.storchaka'
dependencies = []
files = []
hgrepos = []
issue_num = 20420
keywords = []
message_count = 4.0
messages = ['209563', '209791', '222473', '234164']
nosy_count = 5.0
nosy_names = ['lemburg', 'loewis', 'doerwalter', 'martin.panter', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue20420'
versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

@serhiy-storchaka
Copy link
Member Author

The documentation of IncrementalEncoder.getstate() says:

"""
Return the current state of the encoder which must be an integer. The implementation should make sure that 0 is the most common state. (States that are more complicated than integers can be converted into an integer by marshaling/pickling the state and encoding the bytes of the resulting string into an integer).
"""

But implementation of BufferedIncrementalEncoder.getstate() is

    def getstate(self):
        return self.buffer or 0

self.buffer is "unencoded input that is kept between calls to encode()", e.g. a string.

@serhiy-storchaka serhiy-storchaka added the type-bug An unexpected behavior, bug, or error label Jan 28, 2014
@doerwalter
Copy link
Contributor

I dug up an ancient email about that subject:

>> However, I've discovered that BufferedIncrementalEncoder.getstate()
>> doesn't match the specification (i.e. it returns the buffer, not an
>> int). However this class is unused (and probably useless, because it
>> doesn't make sense to delay encoding the input). The simplest solution
>> would be to simply drop the class.
>
> Sounds like a plan; go right ahead!

Oops, there *is* one codec that uses it: The idna encoder. It buffers
the input until a '.' is encountered (or encode() is called with
final==True) and then encodes this part.

Either the idna encoder encodes the unencoded input as a int, or we drop
the specification that encoder.getstate() must return an int, or we
change it to mirror the decoder specification (i.e. return a
(buffered_input, additional_state_info) tuple.

(A more radical solution would be to completely drop the incremental
codecs for idna).

Maybe we should wait and see how the implementation of writing turns out?

And indeed the incremental encoder for idna behaves strange:

>>> import io
>>> b = io.BytesIO()
>>> s = io.TextIOWrapper(b, 'idna')
>>> s.write('x')
1
>>> s.tell()
0
>>> b.getvalue()
b''
>>> s.write('.')
1
>>> s.tell()
2
>>> b.getvalue()
b'x.'
>>> b = io.BytesIO()
>>> s = io.TextIOWrapper(b, 'idna')
>>> s.write('x')
1
>>> s.seek(s.tell())
0
>>> s.write('.')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/walter/.local/lib/python3.3/codecs.py", line 218, in encode
    (result, consumed) = self._buffer_encode(data, self.errors, final)
  File "/Users/walter/.local/lib/python3.3/encodings/idna.py", line 246, in _buffer_encode
    result.extend(ToASCII(label))
  File "/Users/walter/.local/lib/python3.3/encodings/idna.py", line 73, in ToASCII
    raise UnicodeError("label empty or too long")
UnicodeError: label empty or too long

The cleanest solution might probably by to switch to a (buffered_input, additional_state_info) state.

However I don't know what changes this would require in the seek/tell imlementations.

@serhiy-storchaka
Copy link
Member Author

IncrementalNewlineDecoder requires that decoder state is integer (C implementation requires at most 63-bit unsigned integer). TextIOWrapper requires that decoder state is at most 64-bit unsigned integer (only 63-bit if universal newlines is enabled).

@vadmium
Copy link
Member

vadmium commented Jan 17, 2015

For what it’s worth, both io.TextIOWrapper and _pyio.TextIOWrapper appear to only ever call IncrementalEncoder.setstate(0). And the newline _decoder_ is not relevant because it doesn’t use any _encoder_.

@iritkatriel iritkatriel added stdlib Python modules in the Lib dir 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes labels Dec 9, 2021
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
Development

No branches or pull requests

4 participants