Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tarfile.add() produces hard links instead of normal files #42497

Closed
mpitt mannequin opened this issue Oct 18, 2005 · 7 comments
Closed

tarfile.add() produces hard links instead of normal files #42497

mpitt mannequin opened this issue Oct 18, 2005 · 7 comments
Labels
stdlib Python modules in the Lib dir

Comments

@mpitt
Copy link
Mannequin

mpitt mannequin commented Oct 18, 2005

BPO 1330039
Nosy @gustaebel
Files
  • tarfile-bug.py: test case
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2005-10-20.16:29:46.000>
    created_at = <Date 2005-10-18.20:27:54.000>
    labels = ['library']
    title = 'tarfile.add() produces hard links instead of normal files'
    updated_at = <Date 2005-10-20.16:29:46.000>
    user = 'https://bugs.python.org/mpitt'

    bugs.python.org fields:

    activity = <Date 2005-10-20.16:29:46.000>
    actor = 'nnorwitz'
    assignee = 'nnorwitz'
    closed = True
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2005-10-18.20:27:54.000>
    creator = 'mpitt'
    dependencies = []
    files = ['1818']
    hgrepos = []
    issue_num = 1330039
    keywords = []
    message_count = 7.0
    messages = ['26616', '26617', '26618', '26619', '26620', '26621', '26622']
    nosy_count = 3.0
    nosy_names = ['nnorwitz', 'lars.gustaebel', 'mpitt']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue1330039'
    versions = ['Python 2.4']

    @mpitt
    Copy link
    Mannequin Author

    mpitt mannequin commented Oct 18, 2005

    When opening a tarfile for writing and adding several
    files, some files end up being a hardlink to a
    previously added tar member instead of being a proper
    file member.

    I attach a demo that demonstrates the problem. It
    basically does:

    tarfile.open('tarfile-bug.tar', 'w')
    tar.add('tarfile-bug-f1')
    tar.add('tarfile-bug-f2')
    tar.close()

    in the resulting tar, "tarfile-bug-f2" is a hard link
    to tarfile-bug-f1, although both entries should be
    proper files.

    It works when the tarfile is close()d and opened again
    in append mode between the two add()s, but that slows
    down the process dramatically and is certainly not the
    intended way.

    @mpitt mpitt mannequin closed this as completed Oct 18, 2005
    @mpitt mpitt mannequin assigned nnorwitz Oct 18, 2005
    @mpitt mpitt mannequin added the stdlib Python modules in the Lib dir label Oct 18, 2005
    @mpitt mpitt mannequin closed this as completed Oct 18, 2005
    @mpitt mpitt mannequin assigned nnorwitz Oct 18, 2005
    @mpitt mpitt mannequin added the stdlib Python modules in the Lib dir label Oct 18, 2005
    @gustaebel
    Copy link
    Mannequin

    gustaebel mannequin commented Oct 19, 2005

    Logged In: YES
    user_id=642936

    This is a feature ;-)
    tarfile.py records the inode and device number (st_ino,
    st_dev) for each added file in a list (TarFile.inodes). When
    a new file is added and its inode and device number is found
    in this list, it will be added as a hardlink member,
    otherwise as a regular file.
    Because your test script adds and immediately removes each
    file, both files are assigned the same inode number. If you
    had another process creating a file in the meantime, the
    problem would not occur, because it would take over the
    inode number before the second file has the chance.

    Your problem shows that the way tarfile.py handles hardlinks
    is too sloppy. It must take the stat.st_nlink field into
    account. I will create a fix for this.

    As a workaround you have several options:

    • Do not remove the files after adding them, but after the
      TarFile is closed.
    • Set TarFile.dereference to False before adding files, so
      files with several links would always be added as regular
      files (see the Documentation). Disadvantage: symbolic links
      would be added as regular files as well.
    • Tamper with the source code. Edit TarFile.gettarinfo().
      Change the line that says "if inode in self.inodes and not
      self.dereference:" to "if statres.st_nlink > 1 and inode in
      self.inodes and not self.dereference:".
    • Empy the TarFile.inodes list after each file. Ugh!

    @gustaebel
    Copy link
    Mannequin

    gustaebel mannequin commented Oct 19, 2005

    Logged In: YES
    user_id=642936

    I just submitted patch bpo-1331635 which ought to fix your
    problem. Thank you for your report.

    @nnorwitz
    Copy link
    Mannequin

    nnorwitz mannequin commented Oct 20, 2005

    Logged In: YES
    user_id=33168

    Martin, I have checked in Lars' patch. If this does not fix
    your problem, please re-open this bug report.

    Checked in as:

    @mpitt
    Copy link
    Mannequin Author

    mpitt mannequin commented Oct 20, 2005

    Logged In: YES
    user_id=80975

    Thanks for the quick reply!

    Unfortunately, not removing the files after adding them to
    the tarfile is not really an option. I want to create a
    really huge tar file and put compressed files into it. For
    that purpose I create a temporary gzip file, put that into
    the tarfile, and remove the temporary file again. First,
    keeping track of all temp files would be cumbersome, and
    second it could quickly lead to disk space exhaustion.

    I'll try your patch now.

    @mpitt
    Copy link
    Mannequin Author

    mpitt mannequin commented Oct 20, 2005

    Logged In: YES
    user_id=80975

    Confirmed, works perfectly now. Thank you very much! Will
    this also be fixed in a stable point release? Or just in 2.5?

    @nnorwitz
    Copy link
    Mannequin

    nnorwitz mannequin commented Oct 20, 2005

    Logged In: YES
    user_id=33168

    It will be fixed in 2.4.3 when released (that's the branch
    tags below, ie the second RCS rev number after each file).

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    0 participants