Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bytearray treated as a subclass of bytes #8505

Open
llchan opened this issue Mar 7, 2020 · 11 comments
Open

bytearray treated as a subclass of bytes #8505

llchan opened this issue Mar 7, 2020 · 11 comments

Comments

@llchan
Copy link
Contributor

llchan commented Mar 7, 2020

  • Are you reporting a bug, or opening a feature request? bug
  • Please insert below the code you are checking with mypy
foo: bytes = bytearray()
assert isinstance(foo, bytes)
  • What is the actual behavior/output?
    No mypy error (and the expected AssertionError at runtime)
  • What is the behavior/output you expect?
    Mypy error about assigning wrong type (and the expected AssertionError at runtime)
  • What are the versions of mypy and Python you are using?
    mypy 0.761 on CPython 3.7.4
  • What are the mypy flags you are using? (For example --strict-optional)
    --strict

Typeshed appears to be fine, with bytes and bytearray both inheriting from ByteString, but no inheritance of bytearray from bytes.

class bytes(ByteString):
    ...

class bytearray(MutableSequence[int], ByteString):
    ...
@llchan
Copy link
Contributor Author

llchan commented Mar 7, 2020

I see some PRs about --strict-equality and bytes and bytearray, maybe the scope of some changes there leaked outside the intended context?

@JelleZijlstra
Copy link
Member

Mypy internally promotes bytearray to bytes for convenience, similar to how int is promoted to float. I thought this was specified in PEP 484, but as it turns out, it isn't. The only mention of this behavior that I can find the mypy docs is in https://mypy.readthedocs.io/en/stable/python2.html?highlight=promotion#type-checking-python-2-code, which mentions the int-float promotion (but not the bytearray-bytes promotion) as an aside.

It's definitely intentional behavior, but it should be documented better.

@srittau
Copy link
Contributor

srittau commented Mar 8, 2020

It is mentioned in the typing documentation. One of the problems is that the documentation is currently split between several PEPs, the typing, and the mypy documentation.

Edit: The typing documentation implies that this promotion is valid for argument types only.

@llchan
Copy link
Contributor Author

llchan commented Mar 9, 2020

Thanks for the explanations.

I'm not sure I agree with convenience over correctness in a type system, but I can put that aside for now and be pragmatic. I wonder if we can tighten up the implementation so that bytes becomes typing.ByteString only when referring to function arguments? If I understand correctly, the genesis of this promotion was the C API's arg unpacking, so it only makes sense to apply this there. In "pure python" land, I expect for them to remain independent types, and if someone actually wants to accept either type in some other context, they can use a Union[bytes, bytearray] or typing.ByteString.

Btw, my concrete use case where bytes and bytearray are meaningfully different is when interacting with ctypes and getting a void* out of them. I'm seeing branches marked as unreachable because the isinstance differs from runtime.

@llchan
Copy link
Contributor Author

llchan commented Mar 9, 2020

Just thinking out loud here, but could we introduce a new option + CLI flag to make this more strict? I found the dict with the type promotions in semanal_classprop.py, so maybe it wouldn't actually be too hard to disable bytearray and memoryview promotion. This would allow for a more gradual opt-in transition for existing code, and would give us time to update typeshed/etc.

@msullivan
Copy link
Collaborator

This is a frustrating one. We could consider a flag, though I'd want to investigate the impact first.

One issue is that it would be hard to make it work in a per-module way with the existing implementation. If you want to submit a PR that adds a flag or changes the behavior, take a look at semanal_classprop.py. Then I'll try it out on our internal codebase and see what the damage is like there, at least, and we can try to assess how things stand.

My intuition, though, is that the damage will be moderate and manageable but that maybe the scope of this problem is still not big enough to be worth doing that fixing.

@JukkaL
Copy link
Collaborator

JukkaL commented Mar 20, 2020

The main issue I see with changing this (even with an opt-in flag) is that existing stubs assume that you don't need to write Union[bytes, bytearray] for function arguments that accept both types. The number of false positives could still be fairly low, but that would be because bytearray isn't used that much. In a multi-million-line codebase I found only around 150 references to bytearray. There were over 8k references to bytes.

@llchan
Copy link
Contributor Author

llchan commented Mar 21, 2020

I don't expect this to be a trivial change that will happen overnight, but perhaps we can move in that direction. If there are issues with stubs, they can be updated over time to accept to the union. Because these changes are backwards compatible for users running without the opt-in flag, we don't need to resolve everything all at once and can update them as we go.

The 8k references to bytes are somewhat of an upper bound on the changes that would need to be made. I think only function signatures should get the magical type promotion, vs normal variables etc where the type is actually bytes specifically. In theory if we are brave, we could find and replace all bytes with the union, which is essentially what it's doing right now. If this makes us uneasy, it should indicate that the type promotion is maybe doing something unsafe.

It's important to be able to trust a type checker, and imho the long term roadmap should value correctness over convenience.

@gschwaer
Copy link
Sponsor

I just encountered a related issue: In a project I want to optimize the usage of some bytes to memoryview for performance reasons on slicing (which I think is a relatively common optimization in a growing project when working with binary data). To complete my refactoring I would have to find all cases where an argument is of type memoryview and the parameter is of type bytes. Mypy knows that, but I have no way to tap into that information, so I have to do it by hand! Here is some example code:

def decode_to_str(data: bytes) -> str:
    return data.decode()  # will fail at runtime: AttributeError: 'memoryview' object has no attribute 'decode'

# data = b"a string as bytes"  # slow slicing, refactor to memoryview
data = memoryview(b"a string as bytes")
print(decode_to_str(data[:8]))
$ mypy --version
mypy 0.920
$ mypy --strict test.py
Success: no issues found in 1 source file

Taking up the example of a project with 8k usages of bytes, not having a flag to disable the shorthand bytes for ByteString would probably be a show-stopper for any optimization involving memoryviews.

@JelleZijlstra
Copy link
Member

See also the new PEP 688, which currently proposes to remove this behavior.

I also posted #12661 to see the impact of removing the promotion.

@JukkaL
Copy link
Collaborator

JukkaL commented Apr 26, 2022

Based on #12661, the impact doesn't seem terrible, but some projects will be quite heavily impacted. Some of the errors will probably go away once stubs have been updated.

A few additional things to consider:

  • Each time there is an error about bytes/bytearray/memoryview mismatch, we could add a note that suggests using the new Buffer type (if PEP 688 is accepted) and/or includes a link to mypy docs explaining the change.
  • We must make sure that strict equality doesn't complain about comparing bytes against bytearray (and possibly other types).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants