BufferedIncrementalEncoder violates IncrementalEncoder interface #64619

serhiy-storchaka · 2014-01-28T16:34:46Z

BPO	20420
Nosy	@malemburg, @loewis, @doerwalter, @vadmium, @serhiy-storchaka

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2014-01-28.16:34:45.636>
labels = ['type-bug', 'library', '3.9', '3.10', '3.11']
title = 'BufferedIncrementalEncoder violates IncrementalEncoder interface'
updated_at = <Date 2021-12-09.22:10:24.931>
user = 'https://github.com/serhiy-storchaka'

bugs.python.org fields:

activity = <Date 2021-12-09.22:10:24.931>
actor = 'iritkatriel'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2014-01-28.16:34:45.636>
creator = 'serhiy.storchaka'
dependencies = []
files = []
hgrepos = []
issue_num = 20420
keywords = []
message_count = 4.0
messages = ['209563', '209791', '222473', '234164']
nosy_count = 5.0
nosy_names = ['lemburg', 'loewis', 'doerwalter', 'martin.panter', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue20420'
versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

serhiy-storchaka · 2014-01-28T16:34:45Z

The documentation of IncrementalEncoder.getstate() says:

"""
Return the current state of the encoder which must be an integer. The implementation should make sure that 0 is the most common state. (States that are more complicated than integers can be converted into an integer by marshaling/pickling the state and encoding the bytes of the resulting string into an integer).
"""

But implementation of BufferedIncrementalEncoder.getstate() is

    def getstate(self):
        return self.buffer or 0

self.buffer is "unencoded input that is kept between calls to encode()", e.g. a string.

doerwalter · 2014-01-31T14:18:42Z

I dug up an ancient email about that subject:

>> However, I've discovered that BufferedIncrementalEncoder.getstate()
>> doesn't match the specification (i.e. it returns the buffer, not an
>> int). However this class is unused (and probably useless, because it
>> doesn't make sense to delay encoding the input). The simplest solution
>> would be to simply drop the class.
>
> Sounds like a plan; go right ahead!

Oops, there *is* one codec that uses it: The idna encoder. It buffers
the input until a '.' is encountered (or encode() is called with
final==True) and then encodes this part.

Either the idna encoder encodes the unencoded input as a int, or we drop
the specification that encoder.getstate() must return an int, or we
change it to mirror the decoder specification (i.e. return a
(buffered_input, additional_state_info) tuple.

(A more radical solution would be to completely drop the incremental
codecs for idna).

Maybe we should wait and see how the implementation of writing turns out?

And indeed the incremental encoder for idna behaves strange:

>>> import io
>>> b = io.BytesIO()
>>> s = io.TextIOWrapper(b, 'idna')
>>> s.write('x')
1
>>> s.tell()
0
>>> b.getvalue()
b''
>>> s.write('.')
1
>>> s.tell()
2
>>> b.getvalue()
b'x.'
>>> b = io.BytesIO()
>>> s = io.TextIOWrapper(b, 'idna')
>>> s.write('x')
1
>>> s.seek(s.tell())
0
>>> s.write('.')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/walter/.local/lib/python3.3/codecs.py", line 218, in encode
    (result, consumed) = self._buffer_encode(data, self.errors, final)
  File "/Users/walter/.local/lib/python3.3/encodings/idna.py", line 246, in _buffer_encode
    result.extend(ToASCII(label))
  File "/Users/walter/.local/lib/python3.3/encodings/idna.py", line 73, in ToASCII
    raise UnicodeError("label empty or too long")
UnicodeError: label empty or too long

The cleanest solution might probably by to switch to a (buffered_input, additional_state_info) state.

However I don't know what changes this would require in the seek/tell imlementations.

serhiy-storchaka · 2014-07-07T16:47:41Z

IncrementalNewlineDecoder requires that decoder state is integer (C implementation requires at most 63-bit unsigned integer). TextIOWrapper requires that decoder state is at most 64-bit unsigned integer (only 63-bit if universal newlines is enabled).

vadmium · 2015-01-17T11:12:55Z

For what it’s worth, both io.TextIOWrapper and _pyio.TextIOWrapper appear to only ever call IncrementalEncoder.setstate(0). And the newline _decoder_ is not relevant because it doesn’t use any _encoder_.

serhiy-storchaka added the type-bug An unexpected behavior, bug, or error label Jan 28, 2014

iritkatriel added stdlib Python modules in the Lib dir 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes labels Dec 9, 2021

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BufferedIncrementalEncoder violates IncrementalEncoder interface #64619

BufferedIncrementalEncoder violates IncrementalEncoder interface #64619

serhiy-storchaka commented Jan 28, 2014

serhiy-storchaka commented Jan 28, 2014

doerwalter commented Jan 31, 2014

serhiy-storchaka commented Jul 7, 2014

vadmium commented Jan 17, 2015

BufferedIncrementalEncoder violates IncrementalEncoder interface #64619

BufferedIncrementalEncoder violates IncrementalEncoder interface #64619

Comments

serhiy-storchaka commented Jan 28, 2014

serhiy-storchaka commented Jan 28, 2014

doerwalter commented Jan 31, 2014

serhiy-storchaka commented Jul 7, 2014

vadmium commented Jan 17, 2015