Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GzipFile.seek makes invalid write if buffer is not flushed in Python 3.12rc1 #108111

Closed
2 tasks done
effigies opened this issue Aug 18, 2023 · 2 comments
Closed
2 tasks done
Labels
type-bug An unexpected behavior, bug, or error

Comments

@effigies
Copy link
Contributor

effigies commented Aug 18, 2023

Bug report

Checklist

  • I am confident this is a bug in CPython, not a bug in a third-party project
  • I have searched the CPython issue tracker,
    and am confident this bug has not been reported before

CPython versions tested on:

3.12

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

Python 3.12.0rc1 (main, Aug 16 2023, 05:03:59) [GCC 12.2.0]

A clear and concise description of the bug:

I have code that writes out sections of a data file in chunks, and uses seeks to ensure that the position is correct before writing.

In the following example, I write 5 bytes, seek to position 5 and write five more bytes. If I flush the buffer, the result is as expected. If I do not, 5 null bytes are written between the two groups of intended bytes.

#!/usr/bin/env python

import io
import gzip


for flush in (True, False):
    data = io.BytesIO()
    gzip_writer = gzip.GzipFile(fileobj=data, mode='wb')
    gzip_writer.write(b'abcde')

    # If the buffer isn't flushed, seek works from unchanged offset
    if flush and hasattr(gzip_writer, '_buffer'):
        gzip_writer._buffer.flush()

    gzip_writer.seek(5)
    gzip_writer.write(b'fghij')
    gzip_writer.close()
    
    # Recover result
    data.seek(0)
    gzip_reader = gzip.GzipFile(fileobj=data, mode='rb')
    result = gzip_reader.read()

    print(f'{flush=}: {result}')

In the case where I seek but don't tell, I get spurious \x00 bytes:

flush=True: b'abcdefghij'
flush=False: b'abcde\x00\x00\x00\x00\x00fghij'

Here is the output in Python 3.10.10:

flush=True: b'abcdefghij'
flush=False: b'abcdefghij'

Linked PRs

@effigies effigies added the type-bug An unexpected behavior, bug, or error label Aug 18, 2023
@effigies
Copy link
Contributor Author

Looking at #101251, which is where I expect this was introduced, I suspect this is the fix:

     def seek(self, offset, whence=io.SEEK_SET):
         if self.mode == WRITE:
+            self._check_not_closed()
+            self._buffer.flush()
             if whence != io.SEEK_SET:
                 if whence == io.SEEK_CUR:

Here:

cpython/Lib/gzip.py

Lines 402 to 420 in daed54d

def seek(self, offset, whence=io.SEEK_SET):
if self.mode == WRITE:
if whence != io.SEEK_SET:
if whence == io.SEEK_CUR:
offset = self.offset + offset
else:
raise ValueError('Seek from end not supported')
if offset < self.offset:
raise OSError('Negative seek in write mode')
count = offset - self.offset
chunk = b'\0' * self._buffer_size
for i in range(count // self._buffer_size):
self.write(chunk)
self.write(b'\0' * (count % self._buffer_size))
elif self.mode == READ:
self._check_not_closed()
return self._buffer.seek(offset, whence)
return self.offset

@effigies effigies changed the title Seeking to current position in writable gzip stream incorrectly appends zero bytes GzipFile.seek makes invalid write if buffer is not flushed in Python 3.12rc1 Aug 22, 2023
effigies added a commit to effigies/nibabel that referenced this issue Aug 23, 2023
ambv added a commit that referenced this issue Aug 24, 2023
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Aug 24, 2023
…rites (pythonGH-108341)

(cherry picked from commit 2eb60c1)

Co-authored-by: Chris Markiewicz <effigies@gmail.com>
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
Yhg1s pushed a commit that referenced this issue Aug 24, 2023
…writes (GH-108341) (#108402)

gh-108111: Flush gzip write buffer before seeking, fixing bad writes (GH-108341)
(cherry picked from commit 2eb60c1)

Co-authored-by: Chris Markiewicz <effigies@gmail.com>
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
effigies added a commit to effigies/nibabel that referenced this issue Aug 28, 2023
effigies added a commit to effigies/nibabel that referenced this issue Sep 8, 2023
@hugovk
Copy link
Member

hugovk commented Nov 9, 2023

Thank you!

@hugovk hugovk closed this as completed Nov 9, 2023
clrpackages pushed a commit to clearlinux-pkgs/pypi-nibabel that referenced this issue Dec 14, 2023
…ion 5.2.0

Blake Dewey (2):
      Fix typing in SpatialImage __init__
      Try to fix mypy error based on type change

Christopher J. Markiewicz (116):
      DOC: Link to logo with full URL
      FIX: Catch random bad slice when testing image slicing
      CI: Switch to codecov action
      TEST: Remove potentially unstable argsort from parrec tests
      FIX: Use stable argsort
      TEST: Add test file with bad spacing field
      RF: Re-consolidate nifti error message
      STY: blue
      TEST: Use standard encode/decode
      TEST: Switch to single quotes for expected magic errors
      CI: Avoid Python 3.11.4 for unpacking sdist
      Revert "CI: Avoid Python 3.11.4 for unpacking sdist"
      CI: Add 3.12 pre-release tests
      TEST: Mark file:/// URL test as xfail
      CI: Disable building dependencies from source
      FIX: Hack around 3.12rc1 bug (python/cpython#108111)
      TEST: Use tmp_path and explicit delete to appease Windows tempdir cleanup
      TEST: Use a less finicky method of creating temporary files
      ENH: Add pointset data structures [BIAP9]
      Update nibabel/pointset.py
      MNT: Update pre-commit hooks
      RF: Recast Pointset as a dataclass with associated affines
      TEST: Test Pointset and GridIndices classes
      TEST: Test Grid methods
      Apply suggestions from code review
      RF: Drop ndim/shape attributes, make explicit comment on __array_priority__
      FIX: to_mask() implementation
      RF: Drop triangular meshes for now
      ENH: Expand CIFTI2 constants to use a synonym recoder
      Update nibabel/cifti2/cifti2.py
      ENH: Permit XmlSerializable.to_xml() to pass kwargs to ElementTree.tostring()
      DOC: Improve documentation of XmlSerializable class, update fallback return type
      MNT: Skip coverage of abstract methods
      CI: Add 3.12 to the stable test matrix
      NEP29+1y: Bump minimum numpy
      MNT: Update tox config to what we actually do
      STY: Apply style fixes
      TOX: Add -pre environments
      TOX: Split doc build from doctest
      TOX: Add build/publish environments
      MNT: Pacify build warnings
      MNT: Only error on build-strict
      CI: Update to latest checkout
      TOX: Encode minimum versions in dependencies, add -min and -full flags
      Update minimum pyzstd, add zenodo and pre-release environments
      MNT: Convert dev optional dependencies to tox
      CI: Convert pre-release CI jobs to tox
      CI: Convert stable CI to tox
      CI: Exclude 3.12 on x86, skip full dependencies for 3.12 for now
      CI: Run miscellaneous checks through tox
      MNT: Require wheels for things that cannot be built on CI
      CI: Do not fail fast
      CI: Just run tox directly for miscellaneous checks
      MNT: Push h5py support back a bit
      MNT: Drop tools/ci for tox
      CI: Install doctestplus correctly
      MNT: Use none to explicitly avoid dependencies, add labels
      CI: Show tox config for debugging
      CI: Hack around h5py weirdness
      MNT: Restore doc/test extras
      CI: Fix expr syntax
      MNT: scipy unavailable for some x86 Pythons
      TOX: Update environment list to match CI targets
      TOX: h5py is not unique, handle scipy likewise
      TOX: Fix h5py range, avoid indexed_gzip on 3.12
      CI: Pending wheels are covered by tox.ini
      DOC: Improve tox.ini documentation
      CI: Timeout tox and dump debug information if we go >20 minutes
      TOX: Use h5py wheels for all full/pre-x64 builds
      CI: Consolidate stable and pre-release tests
      CI: Add verbosity to tox
      CI: Remove unnecessary pipx call
      TOX: Pillow is hard to build on CI
      TOX: Match matplotlib conditions to scipy
      CI: Add install to none and full tests
      MNT: Ignore coverage/testing summary outputs
      DOC: Add docs on using tox and pre-commit
      TOX: Add NIPY_EXTRA_TESTS to pass_env
      CI: Quote python versions for consistency
      TOX: Update install_command overrides with x86/x64/pre-specific overrides
      CI: Merge checks into test workflow
      CI: Update action version
      TOX: Add spellcheck environment
      TEST: Unroll hash check, do not run unnecessarily
      FIX: Apply codespell suggestions
      CI: Add spellcheck job
      MNT: Add codespell to pre-commit
      DOC: Add docs for using and applying style/linting tools
      MNT: Add py312-dev-x64 environment
      RF: Remove old as_int() hack
      RF: Remove old int_to_float() hack
      TEST: Add fixture for relaxing digit limits
      MNT: Add doctest and coverage pragma
      MNT: Install indexed_gzip on 3.12, add dev to all full,pre groups
      MNT: Better sort of minimal dependencies
      ENH: Add copy() method to ArrayProxy
      ENH: Copy lock if filehandle is shared, add tests
      TEST: Check IndexedGzipFile ArrayProxys are copied properly
      CI: Add workflow_dispatch trigger to tests
      TEST: Chdir during doctest to avoid polluting the working dir
      CI: Enable colored output with FORCE_COLOR
      CI: Move to trusted publishing for PyPI uploads
      TOX: Make blue/isort fail on diff
      STY: Apply blue
      TOX: Pass color preferences to tools
      TOX: Enable pydicom@master for dev test
      RF: Replace deprecated pydicom.dicomio.read_file with dcmread
      MNT: Deprecate unused pydicom_compat module
      FIX: read_file -> dcmread
      MNT: Update requirements
      MNT: Update and simplify mailmap
      MNT: Update Zenodo ordering
      DOC: Add new contributors, insert old contributor
      MNT: Remove 3.12rc1 workaround for python/cpython#180111
      MNT: Update README
      DOC: 5.2.0 changelog

Dimitri Papadopoulos Orfanos (9):
      DOC: Fix typos found by codespell
      DOC: Fix typos found by codespell
      MNT: Fix typo found by codespell
      MNT: Use raw string to avoid escaping '\'
      MNT: Use tuples instead of list where possible
      MNT: Use list comprehension instead of calling append()
      MNT: `[:]` → `.copy()`
      MNT: do not refer to the optional data packages
      MNT: remove more stuff about optional data package

Eric Larson (26):
      MAINT: Deprecations
      FIX: Decode
      FIX: str
      FIX: Revert
      CI: Test NumPy 2.0
      FIX: Only need legacy if on 2.0
      FIX: Cast
      FIX: Consistency
      FIX: Newbyteorder
      FIX: check
      FIX: check
      FIX: Python types
      FIX: Preserve
      FIX: Simplify
      FIX: Maybe
      FIX: Better
      FIX: Revert
      FIX: ComplexWarning
      FIX: Context
      FIX: One more
      FIX: Explicit
      Apply suggestions from code review
      FIX: Style
      STY: Flake
      FIX: Test val equiv
      FIX: Version

Mathieu Scheltienne (28):
      replace np.sctypes with np.core.sctypes
      rm unused imports
      run blue instead of black
      add author entries
      manually define the mapping between str and scalar types in casting.py
      rm unused imports
      rm unused variable definition
      fix blue
      fix missing import
      try without using the sized aliases
      Revert "try without using the sized aliases"
      try with sized aliases again and np.longdouble instead of np.float128
      use combination of getattr and hasattr, include float96 and complex192 to the list
      rm use of np.maximum_sctype
      rm unused import
      fix quotes for blue style
      fix np.sctypeDict calls
      better var name
      rm unused imports
      fix blue
      fix spelling
      rm unused imports
      try test fix suggested by larsoner
      try simpler
      fix typo
      fix more stuff
      more fix
      fix more stuff

Matthew Brett (4):
      RF: refactor find_private_element
      Update nibabel/nicom/utils.py
      Update nibabel/nicom/utils.py
      Update nibabel/nicom/utils.py

Peter Suter (3):
      ENH: only warn about invalid Minc2 spacing declaration
      Update nibabel/minc2.py
      Update .zenodo.json

Reinder Vos de Wael (9):
      Allow relative and home paths
      Maintain relative path behavior
      Push to trigger CI
      Restore forward slash behavior
      Ensure posix test strings
      Update nibabel/tests/test_filename_parser.py
      Assert equal pathlib.Path instead of str
      Update nibabel/tests/test_filename_parser.py
      Update nibabel/tests/test_filename_parser.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
Status: Done
Development

No branches or pull requests

2 participants