Skip to content

gh-150075: tar.addfile() doesn't set member offsets#150128

Closed
grantlouisherman wants to merge 85 commits into
python:mainfrom
grantlouisherman:fix-150075
Closed

gh-150075: tar.addfile() doesn't set member offsets#150128
grantlouisherman wants to merge 85 commits into
python:mainfrom
grantlouisherman:fix-150075

Conversation

@grantlouisherman
Copy link
Copy Markdown

@grantlouisherman grantlouisherman commented May 19, 2026

Original bug report:

Bug report

Bug description:

It appears that tarfile.TarFile.addfile() sets neither offset nor offset_data attributes in the TarInfo added to self.members, when it's done adding the file. Members in a reopened TAR have correct values.

Reproducer

from io import BytesIO
from tarfile import TarFile, TarInfo

with TarFile("test.tar", "w") as tar:
    data = b"data"

    tarinfo = TarInfo("test1.txt")
    tarinfo.size = len(data)
    tar.addfile(tarinfo, BytesIO(data))

    tarinfo = TarInfo("test2.txt")
    tarinfo.size = len(data)
    tar.addfile(tarinfo, BytesIO(data))

    for member in tar.getmembers():
        print(member.name, member.offset, member.offset_data)

with TarFile("test.tar", "r") as tar:
    for member in tar.getmembers():
        print(member.name, member.offset, member.offset_data)

Expected output

test1.txt 0 512
test2.txt 1024 1536
test1.txt 0 512
test2.txt 1024 1536

Actual output

test1.txt 0 0
test2.txt 0 0
test1.txt 0 512
test2.txt 1024 1536

CPython versions tested on:

3.15

Operating systems tested on:

Linux


Fix
I added the offset info for a given file based on the current offset of the file and then after the block size is calculated. I also included a test to validate the behavior

hugovk and others added 30 commits May 7, 2026 19:01
Also add the python3.15.abi file as generated by the new job and remove
the 'main branch only' entry from .gitignore.

(adapted from commit 0eb2291)
…nGH-149519)

(cherry picked from commit b142878)

Co-authored-by: Brett Cannon <brett@python.org>
…hon `__next__` (pythonGH-149491) (python#149523)

pythongh-149481: skip `FOR_ITER` inline specialization for Python `__next__` (pythonGH-149491)
(cherry picked from commit 49918f5)

Co-authored-by: Neko Asakura <neko.asakura@outlook.com>
Co-authored-by: Savannah Ostrowski <savannah@python.org>
Co-authored-by: Stan Ulbrych <stan@python.org>
…on to Platforms directory (pythonGH-149543) (python#149545)

pythongh-146445: Update CODEOWNERS for Android and iOS migration to Platforms directory (pythonGH-149543)
(cherry picked from commit 5b58fbc)

Co-authored-by: Malcolm Smith <smith@chaquo.com>
…onGH-149506) (python#149546)

docs: Clarify docs for error case of `PyDict_GetItemRef` (pythonGH-149506)
(cherry picked from commit 3565d31)

Co-authored-by: Nathan Goldbaum <nathan.goldbaum@gmail.com>
… Platforms directory (pythonGH-149544) (python#149550)

pythongh-145176: Update CODEOWNERS for Emscripten migration to Platforms directory (pythonGH-149544)
(cherry picked from commit 52a05e8)

Co-authored-by: Malcolm Smith <smith@chaquo.com>
…imizes (pythonGH-149478) (python#149552)

pythongh-149459: Fix segfault when `_LOAD_SPECIAL` guard deoptimizes (pythonGH-149478)
(cherry picked from commit c341e34)

Co-authored-by: Hai Zhu <haiizhu@outlook.com>
…t_robotparser (pythonGH-149569) (pythonGH-149580)

Also, use urllib.request.urlcleanup() in NetworkTestCase.
(cherry picked from commit 57ef219)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
… audit hook and path-like support (pythonGH-149524) (python#149586)

pythongh-149474: use `Py_fopen` in `Binary{Reader,Writer}` for audit hook and path-like support (pythonGH-149524)
(cherry picked from commit 354ef33)

Co-authored-by: Maurycy Pawłowski-Wieroński <maurycy@maurycy.com>
…e` to `sentinel` (pythonGH-149536) (python#149592)

pythongh-149083: Convert `_initial_missing` for pure py `reduce` to `sentinel` (pythonGH-149536)
(cherry picked from commit bc8cf07)

Co-authored-by: sobolevn <mail@sobolevn.me>
pythonGH-149431) (python#149602)

pythongh-149430: Fix edge-cases in `profiling.sampling` outputs (pythonGH-149431)

The line highlights on the heatmap are driven by the URL hash and the
`:target` selector. When clicking a caller/callee link for the line that
was already selected, the hash doesn't change, so the browser keeps the
existing target state and doesn't restart the animation. Due to this the
highlight only works the first time.

With this fix, line navigation goes through JavaScript. If the target
URL already points to the current location, the highlight is replayed by
clearing the animation, forcing style recalculation, and restoring it.

The `baseline_self` variable isn't initialized for structural elided
roots. This variable is accessed later unconditionally and leads to a
crash.

The child process ends up being invoked with `--diff_flamegraph` instead
of the correct argument.
(cherry picked from commit 9587726)

Co-authored-by: László Kiss Kollár <kiss.kollar.laszlo@gmail.com>
…ythonGH-149518) (python#149605)

pythongh-149388: Make asyncio `PipeHandle.close` idempotent (pythonGH-149518)
(cherry picked from commit 7241f27)

Co-authored-by: Max Schmitt <max@schmitt.mx>
Fix minor typos in unicode.rst (pythonGH-149587)
(cherry picked from commit 4e97ff3)

Co-authored-by: Manoj K M <manojkmdev24@gmail.com>
…GH-149520) (python#149622)

pythongh-139871: Fix 3.15 bytearray.take_bytes example (pythonGH-149520)

Currently:
```python
buffer = bytearray(b'abc\ndef')
n = buffer.find(b'\n')
data = bytes(buffer[:n + 1])
del buffer[:n + 1]
assert data == b'abc'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
    assert data == b'abc'
           ^^^^^^^^^^^^^^
AssertionError
```

Adding in the `\n` makes the two match:

```python
buffer = bytearray(b'abc\ndef')
n = buffer.find(b'\n')
data = bytes(buffer[:n + 1])
del buffer[:n + 1]
assert data == b'abc\n'
assert buffer == bytearray(b'def')

buffer = bytearray(b'abc\ndef')
n = buffer.find(b'\n')
data = buffer.take_bytes(n + 1)
assert data == b'abc\n'
assert buffer == bytearray(b'def')
```
(cherry picked from commit cc5cf14)

Co-authored-by: Cody Maloney <cmaloney@users.noreply.github.com>
…taHandler (pythonGH-148904) (python#149639)

pythongh-148441: Avoid integer overflow in Expat's CharacterDataHandler (pythonGH-148904)
(cherry picked from commit bc1be4f)

Co-authored-by: ByteFlow <fakeshadow1337@gmail.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
…9641) (python#149652)

pythongh-139489: Add is_valid_text to xml.__all__ (pythonGH-149641)
(cherry picked from commit b45319e)

Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
…ture (pythonGH-149591) (python#149653)

pythongh-149083: use sentinel to fix _functools.reduce() signature (pythonGH-149591)
(cherry picked from commit c6fd7de)

Co-authored-by: Sergey B Kirpichev <skirpichev@gmail.com>
…rget (pythonGH-149487) (pythonGH-149553)

pythongh-149486: tarfile.data_filter: validate written link target (pythonGH-149487)

The data filter rewrote linknames with normpath() but ran the
containment check against the un-normalised value, and computed a
symlink's directory before stripping trailing slashes.  Both let a
crafted archive create links pointing outside the destination.  Also
reject link members that resolve to the destination directory itself,
which could otherwise replace it with a symlink and redirect all
subsequent members.

(cherry picked from commit 5784119)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
…6095) (pythonGH-149667)

(cherry picked from commit 833dae7)

Co-authored-by: Jonathan Dung <jonathandung@yahoo.com>
…python#149672)

pythongh-149663: fix typo in `unittest` docs (pythonGH-149670)

`hastattr` -> `hasattr`
(cherry picked from commit 4956d2b)

Co-authored-by: Árni Már Jónsson <arnimarj@gmail.com>
…GH-149624) (python#149678)

pythongh-144957: Fix lazy imports + module __getattr__ (pythonGH-149624)
(cherry picked from commit 56171da)

Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
…riptors (pythonGH-149577) (python#149656)

pythongh-112821: Fix rlcompleter failures on objects with descriptors (pythonGH-149577)

* pythongh-112821: Fix rlcompleter failures on objects with descriptors

* Confirm no accesses
(cherry picked from commit f23a183)

Co-authored-by: Michael Droettboom <mdboom@gmail.com>
…readed builds (pythongh-145233) (python#149690)

In free-threaded builds, concurrent calls to PyDict_AddWatcher, PyDict_ClearWatcher, PyDict_Watch, and PyDict_Unwatch can race on the shared callback array and the per-dict watcher tags. This change adds a mutex to serialize watcher registration and removal, atomic operations for tag updates, and atomic acquire/release synchronization for callback dispatch in _PyDict_SendEvent.

(cherry picked from commit 8a48959)

Co-authored-by: Alper <alperyoney@fb.com>
…Parser… (python#149693)

[3.15] pythongh-149614 - Restore deepcopiability of argparse.ArgumentParser instances (pythonGH-149617)
(cherry picked from commit fadd9bc)

Co-authored-by: David Ellis <ducksual@gmail.com>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
…pythonGH-148670) (python#149703)

pythongh-148669: Clarify `__reduce__()` module lookup behavior (pythonGH-148670)
(cherry picked from commit 54a5fd4)

Co-authored-by: Victorien <65306057+Viicos@users.noreply.github.com>
Update mypy to 2.1.0 (pythonGH-149709)
(cherry picked from commit b546cc1)

Co-authored-by: sobolevn <mail@sobolevn.me>
@JelleZijlstra
Copy link
Copy Markdown
Member

You did something wrong with git. Let's close this one and you can try again with a clean branch based off of main.

@AlexWaygood AlexWaygood removed their request for review May 19, 2026 23:11
@grantlouisherman
Copy link
Copy Markdown
Author

Yes @JelleZijlstra so sorry I forgot to switch back to main from 3.15 branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.