Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stdlib wrongly uses len() for bytes-like object #88605

Closed
animalize mannequin opened this issue Jun 17, 2021 · 16 comments
Closed

stdlib wrongly uses len() for bytes-like object #88605

animalize mannequin opened this issue Jun 17, 2021 · 16 comments
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@animalize
Copy link
Mannequin

animalize mannequin commented Jun 17, 2021

BPO 44439
Nosy @pitrou, @serhiy-storchaka, @animalize, @miss-islington, @iritkatriel
PRs
  • bpo-44439: BZ2File.write() / LZMAFile.write() handle buffer protocol correctly #26764
  • [3.10] bpo-44439: BZ2File.write() / LZMAFile.write() handle buffer protocol correctly (GH-26764) #26845
  • [3.9] bpo-44439: BZ2File.write()/LZMAFile.write() handle length correctly #26846
  • bpo-44439: _ZipWriteFile.write() handle buffer protocol correctly #29468
  • [3.9] bpo-44439: _ZipWriteFile.write() handle buffer protocol correctly (GH-29468) #31755
  • [3.10] bpo-44439: _ZipWriteFile.write() handle buffer protocol correctly (GH-29468) #31756
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2022-03-19.13:19:59.142>
    created_at = <Date 2021-06-17.05:05:24.162>
    labels = ['type-bug', 'library', '3.9', '3.10', '3.11']
    title = 'stdlib wrongly uses len() for bytes-like object'
    updated_at = <Date 2022-03-19.13:19:59.141>
    user = 'https://github.com/animalize'

    bugs.python.org fields:

    activity = <Date 2022-03-19.13:19:59.141>
    actor = 'malin'
    assignee = 'none'
    closed = True
    closed_date = <Date 2022-03-19.13:19:59.142>
    closer = 'malin'
    components = ['Library (Lib)']
    creation = <Date 2021-06-17.05:05:24.162>
    creator = 'malin'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 44439
    keywords = ['patch']
    message_count = 14.0
    messages = ['395971', '395973', '395976', '396305', '396309', '396334', '396335', '396336', '405948', '414737', '414742', '414743', '415528', '415549']
    nosy_count = 6.0
    nosy_names = ['pitrou', 'nadeem.vawda', 'serhiy.storchaka', 'malin', 'miss-islington', 'iritkatriel']
    pr_nums = ['26764', '26845', '26846', '29468', '31755', '31756']
    priority = 'normal'
    resolution = 'later'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue44439'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

    @animalize
    Copy link
    Mannequin Author

    animalize mannequin commented Jun 17, 2021

    If run this code, it will raise an exception:

        import pickle
        import lzma
        import pandas as pd
        with lzma.open("test.xz", "wb") as file:
            pickle.dump(pd.DataFrame(range(1_000_000)), file, protocol=5)

    The exception:

        Traceback (most recent call last):
          File "E:\testlen.py", line 7, in <module>
            pickle.dump(pd.DataFrame(range(1_000_000)), file, protocol=5)
          File "D:\Python39\lib\lzma.py", line 234, in write
            self._pos += len(data)
        TypeError: object of type 'pickle.PickleBuffer' has no len()
        
    The exception is raised in lzma.LZMAFile.write() method:
    https://github.com/python/cpython/blob/v3.10.0b2/Lib/lzma.py#L238
            
    PickleBuffer doesn't have .__len__ method, is it intended?

    @serhiy-storchaka
    Copy link
    Member

    Oh, LZMAFile.write() should not use len() directly on input data because it does not always work correctly with memoryview and other objects supporting the buffer protocol. It should use memoryview(data).nbytes or data = memoryview(data).cast('B') if other byte-oriented operations (indexing, slicing) are used. See for example Lib/gzip.py, Lib/_pyio.py, Lib/_compression.py, Lib/ssl.py, Lib/socketserver.py, Lib/wave.py.

    @animalize
    Copy link
    Mannequin Author

    animalize mannequin commented Jun 17, 2021

    Ok, I'm working on a PR.

    @animalize
    Copy link
    Mannequin Author

    animalize mannequin commented Jun 22, 2021

    I am checking all the .py files in Lib folder.
    hmac.py has two len() bugs:
    https://github.com/python/cpython/blob/v3.10.0b3/Lib/hmac.py#L212
    https://github.com/python/cpython/blob/v3.10.0b3/Lib/hmac.py#L214

    I think PR 26764 is prepared, it fixes the len() bugs in bz2.py/lzma.py files.

    @animalize animalize mannequin changed the title PickleBuffer doesn't have __len__ method stdlib wrongly uses len() for bytes-like object Jun 22, 2021
    @animalize animalize mannequin changed the title PickleBuffer doesn't have __len__ method stdlib wrongly uses len() for bytes-like object Jun 22, 2021
    @serhiy-storchaka
    Copy link
    Member

    New changeset bc6c12c by Ma Lin in branch 'main':
    bpo-44439: BZ2File.write() / LZMAFile.write() handle buffer protocol correctly (GH-26764)
    bc6c12c

    @serhiy-storchaka
    Copy link
    Member

    New changeset 8bc26d8 by Ma Lin in branch '3.9':
    bpo-44439: BZ2File.write()/LZMAFile.write() handle length correctly (GH-26846)
    8bc26d8

    @serhiy-storchaka
    Copy link
    Member

    Thank you for your contribution Ma Lin.

    @serhiy-storchaka serhiy-storchaka added 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes stdlib Python modules in the Lib dir labels Jun 22, 2021
    @serhiy-storchaka serhiy-storchaka added type-bug An unexpected behavior, bug, or error 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes stdlib Python modules in the Lib dir labels Jun 22, 2021
    @serhiy-storchaka serhiy-storchaka added the type-bug An unexpected behavior, bug, or error label Jun 22, 2021
    @serhiy-storchaka
    Copy link
    Member

    New changeset 01858fb by Miss Islington (bot) in branch '3.10':
    bpo-44439: BZ2File.write() / LZMAFile.write() handle buffer protocol correctly (GH-26764) (GH-26845)
    01858fb

    @animalize
    Copy link
    Mannequin Author

    animalize mannequin commented Nov 8, 2021

    Serhiy Storchaka:

    Sorry, I found zipfile module also has this bug, fixed in PR29468.

    This bug was reported & fixed by GitHub user marcoffee firstly, so I list him as a co-author, his work:
    https://github.com/animalize/pyzstd/issues/4

    The second commit fixes an omission of bpo-41735, a very simple fix, I fix it in PR29468 by the way.

    @animalize animalize mannequin reopened this Nov 8, 2021
    @animalize animalize mannequin reopened this Nov 8, 2021
    @serhiy-storchaka
    Copy link
    Member

    New changeset 36dd739 by Ma Lin in branch 'main':
    bpo-44439: _ZipWriteFile.write() handle buffer protocol correctly (GH-29468)
    36dd739

    @miss-islington
    Copy link
    Contributor

    New changeset 21c5b3f by Miss Islington (bot) in branch '3.10':
    bpo-44439: _ZipWriteFile.write() handle buffer protocol correctly (GH-29468)
    21c5b3f

    @miss-islington
    Copy link
    Contributor

    New changeset 0663ca1 by Miss Islington (bot) in branch '3.9':
    bpo-44439: _ZipWriteFile.write() handle buffer protocol correctly (GH-29468)
    0663ca1

    @iritkatriel
    Copy link
    Member

    Can this be closed now or is there anything else to do?

    @animalize
    Copy link
    Mannequin Author

    animalize mannequin commented Mar 19, 2022

    _Stream.write method in tarfile.py also has this code:
    https://github.com/python/cpython/blob/v3.11.0a6/Lib/tarfile.py#L434

    But this bug will not be triggered. When calling this method, always pass bytes data.

    _ConnectionBase.send_bytes method in multiprocessing\connection.py can be micro-optimized:
    https://github.com/python/cpython/blob/v3.11.0a6/Lib/multiprocessing/connection.py#L193
    This can be done in another issue.

    So I think this issue can be closed.

    @animalize animalize mannequin closed this as completed Mar 19, 2022
    @animalize animalize mannequin closed this as completed Mar 19, 2022
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @jakirkham
    Copy link

    Would it make sense to backport this change to Python 3.8 or would that not make sense?

    @iritkatriel
    Copy link
    Member

    3.8 is in security fix only mode. Unless this is a security issue it cannot be backported there.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes 3.11 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants