Skip to content

Incorrect parsing of TarInfo header when GNU long name and type AREGTYPE are combined #141707

@e-nomem

Description

@e-nomem

Bug report

Bug description:

When an entry uses GNU long name encoding the tarfile module reads in the data blocks for the name and then calls self.fromtarfile() again to get the 'actual' header. This second header is the source of truth for everything except the name which is just garbage data.

The problem is that fromtarfile() eventually calls frombuf() where this logic incorrectly uses the garbage data and overrides the entry type to directory, corrupting the entry.

Because the entry is detected as a directory, the offset is not updated properly and the next call to read a TarInfo entry will usually result in an exception. However, the exception lands up in this block where neither of the if conditions are met, so the exception is silently discarded. tarinfo remains None and the code eventually decides that there are no more entries in the tar file.

I initially ran into this issue due to reports of invalid sdists being generated by maturin.
See: PyO3/maturin#2855

CPython versions tested on:

3.9, 3.10, 3.11, 3.12, 3.13, 3.14

Operating systems tested on:

macOS, Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or error

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions