Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tarfile.extractall on existing symlink in Ubuntu overwrites target file, not symlink, unlinke GNU tar #79664

Closed
michaelbrandlaid-drivingeu mannequin opened this issue Dec 13, 2018 · 3 comments
Labels
3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@michaelbrandlaid-drivingeu
Copy link
Mannequin

BPO 35483
Nosy @gustaebel, @vadmium
Superseder
  • bpo-23228: The tarfile module crashes when tarfile contains a symlink and unpack directory contain it too
  • Files
  • symLinkBugRepro.tar.gz: Zip file containing a bash script and a python script for repro of tarfile symlink issue.
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-01-06.09:58:26.950>
    created_at = <Date 2018-12-13.15:22:39.471>
    labels = ['3.7', 'type-bug', 'library']
    title = 'tarfile.extractall on existing symlink in Ubuntu overwrites target file, not symlink, unlinke GNU tar'
    updated_at = <Date 2019-01-06.09:58:26.949>
    user = 'https://bugs.python.org/michaelbrandlaid-drivingeu'

    bugs.python.org fields:

    activity = <Date 2019-01-06.09:58:26.949>
    actor = 'martin.panter'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-01-06.09:58:26.950>
    closer = 'martin.panter'
    components = ['Library (Lib)']
    creation = <Date 2018-12-13.15:22:39.471>
    creator = 'michael.brandl@aid-driving.eu'
    dependencies = []
    files = ['47992']
    hgrepos = []
    issue_num = 35483
    keywords = []
    message_count = 3.0
    messages = ['331762', '331913', '331953']
    nosy_count = 3.0
    nosy_names = ['lars.gustaebel', 'martin.panter', 'michael.brandl@aid-driving.eu']
    pr_nums = []
    priority = 'normal'
    resolution = 'duplicate'
    stage = 'resolved'
    status = 'closed'
    superseder = '23228'
    type = 'behavior'
    url = 'https://bugs.python.org/issue35483'
    versions = ['Python 3.5', 'Python 3.6', 'Python 3.7']

    @michaelbrandlaid-drivingeu
    Copy link
    Mannequin Author

    In Ubuntu 16.04, with python 3.5, as well as custom built 3.6 and 3.7.1:

    Given a file foo.txt (with content "foo") and a symlink myLink to it, packed in a tar, and a file bar.txt (with content "bar") with a symlink myLink to it, packed in another tar,
    unpacking the two tars into the same folder (first foo.tar, then bar.tar) leads to the following behavior:

    In GNU tar, the directory will contain:
    foo.txt (content "foo")
    bar.txt (content "bar")
    myLink ->bar.txt.

    Using python's tarfile however, the result of calling tarfile.extractall on the two tars will give:
    foo.txt (content "bar")
    bar.txt (content "bar")
    myLink ->foo.txt.

    Repro:

    1. Unpack the attached symLinkBugRepro.tar.gz into a new folder
    2. run > bash repoSymlink.bash (does exactly what is described above)
    3. if the last two lines of the output are "bar" and "bar" (instead of "foo" and "bar"), then the content of foo.txt has been overwritten.

    Note that this is related to issues like
    https://bugs.python.org/issue23228
    https://bugs.python.org/issue1167128
    https://bugs.python.org/issue19974
    https://bugs.python.org/issue10761

    None of these issues target the issue at hand, however.

    The problem lies in line 2201 of https://github.com/python/cpython/blob/master/Lib/tarfile.py:
    The assumption is that any exception only comes from the os not supporting symlinks. But here, the exception comes from the symlink already existing, which should be caught separately. The correct behavior is then NOT to extract the member, but rather to overwrite the symlink (as GNU tar does).

    @michaelbrandlaid-drivingeu michaelbrandlaid-drivingeu mannequin added 3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Dec 13, 2018
    @vadmium
    Copy link
    Member

    vadmium commented Dec 16, 2018

    The first aspect, incorrectly assuming the OS does not support symlinks, is described at <https://bugs.python.org/issue23228#msg265146\>. Lars proposed a fix <https://bugs.python.org/file42780/windowserror.diff\> which will let the OS exception escape to the caller. However I think that patch needs more work.

    The second aspect is replacing existing symlinks and other directory entries. This was implemented in 2.7 in bpo-10761 and bpo-12088 (only when replacing non-subdirectories with symbolic links and hard links), and is discussed more generally in bpo-19974.

    I suggest to close this in favour of resolving bpo-23228 and bpo-19974.

    @michaelbrandlaid-drivingeu
    Copy link
    Mannequin Author

    Sounds good to me.

    @vadmium vadmium closed this as completed Jan 6, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant