Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gzip fails to read a gzipped file (ValueError: readline of closed file) #89638

Closed
minstrelofc mannequin opened this issue Oct 14, 2021 · 5 comments
Closed

gzip fails to read a gzipped file (ValueError: readline of closed file) #89638

minstrelofc mannequin opened this issue Oct 14, 2021 · 5 comments
Labels
3.10 only security fixes 3.11 bug and security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@minstrelofc
Copy link
Mannequin

minstrelofc mannequin commented Oct 14, 2021

BPO 45475
Nosy @methane, @miss-islington, @tirkarthi
PRs
  • bpo-45475: Revert __iter__ optimization for GzipFile, BZ2File, and LZMAFile. #29016
  • [3.10] bpo-45475: Revert __iter__ optimization for GzipFile, BZ2File, and LZMAFile. (GH-29016) #29050
  • Files
  • UTF-8-test_for_gzip.txt.gz: gzip test file, just in case
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-10-19.03:32:27.805>
    created_at = <Date 2021-10-14.21:12:01.569>
    labels = ['type-bug', 'library', '3.10', '3.11']
    title = 'gzip fails to read a gzipped file (ValueError: readline of closed file)'
    updated_at = <Date 2021-10-19.03:32:27.804>
    user = 'https://bugs.python.org/minstrelofc'

    bugs.python.org fields:

    activity = <Date 2021-10-19.03:32:27.804>
    actor = 'methane'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-10-19.03:32:27.805>
    closer = 'methane'
    components = ['Library (Lib)']
    creation = <Date 2021-10-14.21:12:01.569>
    creator = 'minstrelofc'
    dependencies = []
    files = ['50358']
    hgrepos = []
    issue_num = 45475
    keywords = ['patch', '3.10regression']
    message_count = 5.0
    messages = ['403948', '403981', '404061', '404263', '404265']
    nosy_count = 4.0
    nosy_names = ['methane', 'miss-islington', 'xtreak', 'minstrelofc']
    pr_nums = ['29016', '29050']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue45475'
    versions = ['Python 3.10', 'Python 3.11']

    @minstrelofc
    Copy link
    Mannequin Author

    minstrelofc mannequin commented Oct 14, 2021

    Attempting to iterate over an opened gzip file raises a ValueError: readline of closed file

    Behavior in Python 3.9.7:
    Python 3.9.7 (default, Oct 13 2021, 09:08:19) 
    [GCC 8.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import gzip
    >>> ll = [l for l in gzip.GzipFile(filename='data/UTF-8-test_for_gzip.txt.gz')]
    >>> len(ll)
    300
    
    
    Behavior in Python 3.10.0 (and 3.11.0a1 is the same):
    Python 3.10.0 (default, Oct 13 2021, 08:53:15) [GCC 8.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import gzip
    >>> ll = [l for l in gzip.GzipFile(filename='data/UTF-8-test_for_gzip.txt.gz')]
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "<stdin>", line 1, in <listcomp>
    ValueError: readline of closed file
    
    
    This only happens when iterating directly over the GzipFile object. Using a with: statement has the correct behaviour in both 3.10 and 3.11:
    >>> with gzip.GzipFile(filename='UTF-8-test_for_gzip.txt.gz') as input_file:
    ...     len(list(input_file))
    ... 
    300

    @minstrelofc minstrelofc mannequin added 3.10 only security fixes 3.11 bug and security fixes type-bug An unexpected behavior, bug, or error labels Oct 14, 2021
    @tirkarthi
    Copy link
    Member

    This might be related to below commit :

    commit d2a8e69
    Author: Inada Naoki <songofacandy@gmail.com>
    Date: Tue Apr 13 13:51:49 2021 +0900

    bpo-43787: Add __iter__ to GzipFile, BZ2File, and LZMAFile (GH-25353)
    
    python -m gzip README.rst
    (myenv) ➜  cpython git:(main) ✗ git checkout d2a8e69c2c605fbaa3656a5f99aa8d295f74c80e~1 Lib/gzip.py
    Updated 1 path from 2ea7c00ab4
    (myenv) ➜  cpython git:(main) ✗ ./python
    Python 3.11.0a1+ (heads/main:160c38df7f, Oct 15 2021, 11:25:16) [GCC 9.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import gzip
    >>> len([None for _ in gzip.GzipFile("README.rst.gz")])
    267
    >>> 
    (myenv) ➜  cpython git:(main) ✗ git checkout d2a8e69c2c605fbaa3656a5f99aa8d295f74c80e Lib/gzip.py 
    Updated 1 path from 1f9874eec6
    (myenv) ➜  cpython git:(main) ✗ ./python
    Python 3.11.0a1+ (heads/main:160c38df7f, Oct 15 2021, 11:25:16) [GCC 9.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import gzip
    >>> len([None for _ in gzip.GzipFile("README.rst.gz")])
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "<stdin>", line 1, in <listcomp>
    ValueError: readline of closed file

    @tirkarthi tirkarthi added stdlib Python modules in the Lib dir labels Oct 15, 2021
    @methane
    Copy link
    Member

    methane commented Oct 16, 2021

    >> ll = [l for l in gzip.GzipFile(filename='data/UTF-8-test_for_gzip.txt.gz')]

    This is bad code pattern because you don't close the file explicitly.
    Actually, the error caused by the optimization thet iter(GzipFile) returns underlaying faster iterator that don't have reference to the GzipFile. So GzipFile.__del__ close the file.

    Although this is caused by bad code pattern, I must admit this is a regression.
    We need to call slow Python function for each lines instead of using fast C iterator...

    @methane
    Copy link
    Member

    methane commented Oct 19, 2021

    New changeset 0a4c82d by Inada Naoki in branch 'main':
    bpo-45475: Revert __iter__ optimization for GzipFile, BZ2File, and LZMAFile. (GH-29016)
    0a4c82d

    @miss-islington
    Copy link
    Contributor

    New changeset 97ce855 by Miss Islington (bot) in branch '3.10':
    bpo-45475: Revert __iter__ optimization for GzipFile, BZ2File, and LZMAFile. (GH-29016)
    97ce855

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes 3.11 bug and security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants