`bytearray` treated as a subclass of `bytes` #8505

llchan · 2020-03-07T01:21:45Z

Are you reporting a bug, or opening a feature request? bug
Please insert below the code you are checking with mypy

foo: bytes = bytearray()
assert isinstance(foo, bytes)

What is the actual behavior/output?
No mypy error (and the expected AssertionError at runtime)
What is the behavior/output you expect?
Mypy error about assigning wrong type (and the expected AssertionError at runtime)
What are the versions of mypy and Python you are using?
mypy 0.761 on CPython 3.7.4
What are the mypy flags you are using? (For example --strict-optional)
--strict

Typeshed appears to be fine, with bytes and bytearray both inheriting from ByteString, but no inheritance of bytearray from bytes.

class bytes(ByteString):
    ...

class bytearray(MutableSequence[int], ByteString):
    ...

The text was updated successfully, but these errors were encountered:

llchan · 2020-03-07T01:23:10Z

I see some PRs about --strict-equality and bytes and bytearray, maybe the scope of some changes there leaked outside the intended context?

JelleZijlstra · 2020-03-07T17:17:56Z

Mypy internally promotes bytearray to bytes for convenience, similar to how int is promoted to float. I thought this was specified in PEP 484, but as it turns out, it isn't. The only mention of this behavior that I can find the mypy docs is in https://mypy.readthedocs.io/en/stable/python2.html?highlight=promotion#type-checking-python-2-code, which mentions the int-float promotion (but not the bytearray-bytes promotion) as an aside.

It's definitely intentional behavior, but it should be documented better.

srittau · 2020-03-08T15:21:12Z

It is mentioned in the typing documentation. One of the problems is that the documentation is currently split between several PEPs, the typing, and the mypy documentation.

Edit: The typing documentation implies that this promotion is valid for argument types only.

llchan · 2020-03-09T17:27:53Z

Thanks for the explanations.

I'm not sure I agree with convenience over correctness in a type system, but I can put that aside for now and be pragmatic. I wonder if we can tighten up the implementation so that bytes becomes typing.ByteString only when referring to function arguments? If I understand correctly, the genesis of this promotion was the C API's arg unpacking, so it only makes sense to apply this there. In "pure python" land, I expect for them to remain independent types, and if someone actually wants to accept either type in some other context, they can use a Union[bytes, bytearray] or typing.ByteString.

Btw, my concrete use case where bytes and bytearray are meaningfully different is when interacting with ctypes and getting a void* out of them. I'm seeing branches marked as unreachable because the isinstance differs from runtime.

llchan · 2020-03-09T17:41:30Z

Just thinking out loud here, but could we introduce a new option + CLI flag to make this more strict? I found the dict with the type promotions in semanal_classprop.py, so maybe it wouldn't actually be too hard to disable bytearray and memoryview promotion. This would allow for a more gradual opt-in transition for existing code, and would give us time to update typeshed/etc.

msullivan · 2020-03-17T00:10:51Z

This is a frustrating one. We could consider a flag, though I'd want to investigate the impact first.

One issue is that it would be hard to make it work in a per-module way with the existing implementation. If you want to submit a PR that adds a flag or changes the behavior, take a look at semanal_classprop.py. Then I'll try it out on our internal codebase and see what the damage is like there, at least, and we can try to assess how things stand.

My intuition, though, is that the damage will be moderate and manageable but that maybe the scope of this problem is still not big enough to be worth doing that fixing.

JukkaL · 2020-03-20T15:22:35Z

The main issue I see with changing this (even with an opt-in flag) is that existing stubs assume that you don't need to write Union[bytes, bytearray] for function arguments that accept both types. The number of false positives could still be fairly low, but that would be because bytearray isn't used that much. In a multi-million-line codebase I found only around 150 references to bytearray. There were over 8k references to bytes.

llchan · 2020-03-21T17:47:02Z

I don't expect this to be a trivial change that will happen overnight, but perhaps we can move in that direction. If there are issues with stubs, they can be updated over time to accept to the union. Because these changes are backwards compatible for users running without the opt-in flag, we don't need to resolve everything all at once and can update them as we go.

The 8k references to bytes are somewhat of an upper bound on the changes that would need to be made. I think only function signatures should get the magical type promotion, vs normal variables etc where the type is actually bytes specifically. In theory if we are brave, we could find and replace all bytes with the union, which is essentially what it's doing right now. If this makes us uneasy, it should indicate that the type promotion is maybe doing something unsafe.

It's important to be able to trust a type checker, and imho the long term roadmap should value correctness over convenience.

gschwaer · 2021-12-16T11:40:39Z

I just encountered a related issue: In a project I want to optimize the usage of some bytes to memoryview for performance reasons on slicing (which I think is a relatively common optimization in a growing project when working with binary data). To complete my refactoring I would have to find all cases where an argument is of type memoryview and the parameter is of type bytes. Mypy knows that, but I have no way to tap into that information, so I have to do it by hand! Here is some example code:

def decode_to_str(data: bytes) -> str:
    return data.decode()  # will fail at runtime: AttributeError: 'memoryview' object has no attribute 'decode'

# data = b"a string as bytes"  # slow slicing, refactor to memoryview
data = memoryview(b"a string as bytes")
print(decode_to_str(data[:8]))

$ mypy --version
mypy 0.920
$ mypy --strict test.py
Success: no issues found in 1 source file

Taking up the example of a project with 8k usages of bytes, not having a flag to disable the shorthand bytes for ByteString would probably be a show-stopper for any optimization involving memoryviews.

JelleZijlstra · 2022-04-26T01:41:17Z

See also the new PEP 688, which currently proposes to remove this behavior.

I also posted #12661 to see the impact of removing the promotion.

JukkaL · 2022-04-26T11:06:50Z

Based on #12661, the impact doesn't seem terrible, but some projects will be quite heavily impacted. Some of the errors will probably go away once stubs have been updated.

A few additional things to consider:

Each time there is an error about bytes/bytearray/memoryview mismatch, we could add a note that suggests using the new Buffer type (if PEP 688 is accepted) and/or includes a link to mypy docs explaining the change.
We must make sure that strict equality doesn't complain about comparing bytes against bytearray (and possibly other types).

JukkaL added the needs discussion label Mar 20, 2020

sbourlon mentioned this issue Feb 3, 2024

generic/_utils.py: function create_string_object not working with the bytearray type py-pdf/pypdf#2434

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`bytearray` treated as a subclass of `bytes` #8505

`bytearray` treated as a subclass of `bytes` #8505

llchan commented Mar 7, 2020

llchan commented Mar 7, 2020

JelleZijlstra commented Mar 7, 2020

srittau commented Mar 8, 2020 •

edited

llchan commented Mar 9, 2020

llchan commented Mar 9, 2020

msullivan commented Mar 17, 2020

JukkaL commented Mar 20, 2020

llchan commented Mar 21, 2020

gschwaer commented Dec 16, 2021

JelleZijlstra commented Apr 26, 2022

JukkaL commented Apr 26, 2022

bytearray treated as a subclass of bytes #8505

bytearray treated as a subclass of bytes #8505

Comments

llchan commented Mar 7, 2020

llchan commented Mar 7, 2020

JelleZijlstra commented Mar 7, 2020

srittau commented Mar 8, 2020 • edited

llchan commented Mar 9, 2020

llchan commented Mar 9, 2020

msullivan commented Mar 17, 2020

JukkaL commented Mar 20, 2020

llchan commented Mar 21, 2020

gschwaer commented Dec 16, 2021

JelleZijlstra commented Apr 26, 2022

JukkaL commented Apr 26, 2022

`bytearray` treated as a subclass of `bytes` #8505

`bytearray` treated as a subclass of `bytes` #8505

srittau commented Mar 8, 2020 •

edited