Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'tarfile.StreamError: seeking backwards is not allowed' when extract symlink #57009

Closed
adunand mannequin opened this issue Aug 20, 2011 · 13 comments
Closed

'tarfile.StreamError: seeking backwards is not allowed' when extract symlink #57009

adunand mannequin opened this issue Aug 20, 2011 · 13 comments
Assignees
Labels
3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@adunand
Copy link
Mannequin

adunand mannequin commented Aug 20, 2011

BPO 12800
Nosy @gustaebel, @catlee, @taleinat, @serhiy-storchaka, @JulienPalard, @miss-islington, @websurfer5
PRs
  • bpo-12800: 'tarfile.StreamError: seeking backwards is not allowed' when extract symlink #13217
  • bpo-12800: 'tarfile.StreamError: seeking backwards is not allowed' when extract symlink #20972
  • bpo-12800: tarfile: Restore fix from 011525ee9 #21409
  • [3.9] bpo-12800: tarfile: Restore fix from 011525ee9 (GH-21409) #23508
  • [3.8] bpo-12800: tarfile: Restore fix from 011525ee9 (GH-21409) #23509
  • Files
  • test.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/gustaebel'
    closed_at = <Date 2020-12-18.18:53:15.392>
    created_at = <Date 2011-08-20.22:57:39.951>
    labels = ['type-bug', '3.8', '3.9', '3.10', '3.7', 'library']
    title = "'tarfile.StreamError: seeking backwards is not allowed' when extract symlink"
    updated_at = <Date 2020-12-18.18:53:15.392>
    user = 'https://bugs.python.org/adunand'

    bugs.python.org fields:

    activity = <Date 2020-12-18.18:53:15.392>
    actor = 'mdk'
    assignee = 'lars.gustaebel'
    closed = True
    closed_date = <Date 2020-12-18.18:53:15.392>
    closer = 'mdk'
    components = ['Library (Lib)']
    creation = <Date 2011-08-20.22:57:39.951>
    creator = 'adunand'
    dependencies = []
    files = ['49168']
    hgrepos = []
    issue_num = 12800
    keywords = ['patch']
    message_count = 13.0
    messages = ['142580', '221577', '221595', '223171', '341969', '369061', '369119', '369120', '373384', '377173', '381803', '381807', '381810']
    nosy_count = 9.0
    nosy_names = ['lars.gustaebel', 'catlee', 'taleinat', 'adunand', 'serhiy.storchaka', 'andrew.garner', 'mdk', 'miss-islington', 'Jeffrey.Kintscher']
    pr_nums = ['13217', '20972', '21409', '23508', '23509']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue12800'
    versions = ['Python 3.7', 'Python 3.8', 'Python 3.9', 'Python 3.10']

    @adunand
    Copy link
    Mannequin Author

    adunand mannequin commented Aug 20, 2011

    When you extractall a tarball containing a symlink in stream mode ('r|'), an Exception happens:

    Traceback (most recent call last):
        File "./test_extractall_stream_symlink.py", line 26, in <module>
        tar.extractall(path=destdir)
        File "/usr/lib/python3.2/tarfile.py", line 2134, in extractall
        self.extract(tarinfo, path, set_attrs=not tarinfo.isdir())
        File "/usr/lib/python3.2/tarfile.py", line 2173, in extract
        set_attrs=set_attrs)
        File "/usr/lib/python3.2/tarfile.py", line 2249, in _extract_member
        self.makefile(tarinfo, targetpath)
        File "/usr/lib/python3.2/tarfile.py", line 2289, in makefile
        source.seek(tarinfo.offset_data)
        File "/usr/lib/python3.2/tarfile.py", line 553, in seek
        raise StreamError("seeking backwards is not allowed")
        tarfile.StreamError: seeking backwards is not allowed

    You can reproduce the bug with this snippet of code:

    TEMPDIR='/tmp/pyton_test'
    os.mkdir(TEMPDIR)
    tempdir = os.path.join(TEMPDIR, "testsymlinks")
    temparchive = os.path.join(TEMPDIR, "testsymlinks.tar")
    destdir = os.path.join(TEMPDIR, "extract")
    os.mkdir(tempdir)
    try:
    source_file = os.path.join(tempdir,'source')
    target_file = os.path.join(tempdir,'symlink')
    with open(source_file,'w') as f:
    f.write('something\n')
    os.symlink('source', target_file)
    tar = tarfile.open(temparchive,'w')
    tar.add(target_file, arcname=os.path.basename(target_file))
    tar.add(source_file, arcname=os.path.basename(source_file))
    tar.close()
    fo = open(temparchive, 'rb')
    tar = tarfile.open(fileobj=fo, mode='r|')
    try:
    tar.extractall(path=destdir)
    finally:
    tar.close()
    finally:
    os.unlink(temparchive)
    shutil.rmtree(TEMPDIR)

    If source_file is added before target_file, there is no Exception raised. But it still raised when you create the same tarball with GNU tar.

    @adunand adunand mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Aug 20, 2011
    @gustaebel gustaebel mannequin self-assigned this Sep 12, 2011
    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Jun 25, 2014

    ping.

    @serhiy-storchaka
    Copy link
    Member

    All works to me without exception in 2.7, 3.3 and 3.4.

    @andrewgarner
    Copy link
    Mannequin

    andrewgarner mannequin commented Jul 16, 2014

    This seems to be a similar to bpo-10761 where symlinks are not being overwritten by TarFile.extract but is only an issue in streaming mode and only in python3. To reproduce, attempt to extract a symlink from a tarfile opened with 'r|' and overwrite an existing file.

    Here's a simple scripts that demonstrates this behavior adapted from Aurélien's.

    #!/usr/bin/python

    import os
    import shutil
    import sys
    import tempfile
    import tarfile
    
    
    def main():
        tmpdir = tempfile.mkdtemp()
        try:
            os.chdir(tmpdir)
            source = 'source'
            link = 'link'
            temparchive = 'issue12800'
            # create source
            with open(source, 'wb'):
                pass
            os.symlink(source, link)
            with tarfile.open(temparchive, 'w') as tar:
                tar.add(source, arcname=os.path.basename(source))
                tar.add(link, arcname=os.path.basename(link))
    
            with open(temparchive, 'rb') as fileobj:
                with tarfile.open(fileobj=fileobj, mode='r|') as tar:
                    tar.extractall(path=tmpdir)
        finally:
            shutil.rmtree(tmpdir)
    
    if __name__ == '__main__':
        sys.exit(main())

    On python 3.3.2 I get the following results:

    $ python3.3 issue12800.py
    Traceback (most recent call last):
      File "issue12800.py", line 32, in <module>
        sys.exit(main())
      File "issue12800.py", line 27, in main
        tar.extractall(path=tmpdir)
      File "/usr/lib64/python3.3/tarfile.py", line 1984, in extractall
        self.extract(tarinfo, path, set_attrs=not tarinfo.isdir())
      File "/usr/lib64/python3.3/tarfile.py", line 2023, in extract
        set_attrs=set_attrs)
      File "/usr/lib64/python3.3/tarfile.py", line 2100, in _extract_member
        self.makelink(tarinfo, targetpath)
      File "/usr/lib64/python3.3/tarfile.py", line 2181, in makelink
        os.symlink(tarinfo.linkname, targetpath)
    FileExistsError: [Errno 17] File exists: '/tmp/tmpt0u1pn/link'

    On python 3.4.1 I get the following results:

    $ python3.4 issue12800.py
    Traceback (most recent call last):
      File "/usr/lib64/python3.4/tarfile.py", line 2176, in makelink
        os.symlink(tarinfo.linkname, targetpath)
    FileExistsError: [Errno 17] File exists: 'source' -> '/tmp/tmp3b96k5f0/link'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "issue12800.py", line 32, in <module>
        sys.exit(main())
      File "issue12800.py", line 27, in main
        tar.extractall(path=tmpdir)
      File "/usr/lib64/python3.4/tarfile.py", line 1979, in extractall
        self.extract(tarinfo, path, set_attrs=not tarinfo.isdir())
      File "/usr/lib64/python3.4/tarfile.py", line 2018, in extract
        set_attrs=set_attrs)
      File "/usr/lib64/python3.4/tarfile.py", line 2095, in _extract_member
        self.makelink(tarinfo, targetpath)
      File "/usr/lib64/python3.4/tarfile.py", line 2187, in makelink
        targetpath)
      File "/usr/lib64/python3.4/tarfile.py", line 2087, in _extract_member
        self.makefile(tarinfo, targetpath)
      File "/usr/lib64/python3.4/tarfile.py", line 2126, in makefile
        source.seek(tarinfo.offset_data)
      File "/usr/lib64/python3.4/tarfile.py", line 518, in seek
        raise StreamError("seeking backwards is not allowed")
    tarfile.StreamError: seeking backwards is not allowed

    @websurfer5
    Copy link
    Mannequin

    websurfer5 mannequin commented May 9, 2019

    The problem is in TarFile.makelink() in Lib/tarfile.py. It calls os.symlink() to create the link, which fails because the link already exists and triggers the exception handler. The exception handler then tries to create the linked file under the assumption (per source code comments) that the link creation failed because the system doesn't support symbolic links. The file creation then fails because it requires seeking backwards in the archive.

    @websurfer5 websurfer5 mannequin added 3.7 (EOL) end of life 3.8 (EOL) end of life labels May 9, 2019
    @catlee
    Copy link
    Mannequin

    catlee mannequin commented May 16, 2020

    Is there anything I can do to help get this landed? The PR in github works for me.

    @JulienPalard
    Copy link
    Member

    Hi Chris, which exception did you got exactly? Was it caused by the r| mode or by a symlink (or file) already existing?

    @catlee
    Copy link
    Mannequin

    catlee mannequin commented May 17, 2020

    It's caused by the combination of the symlink existing, and having the tarfile opened in r| mode.

    If I run the attached test file in a fresh directory, I get the following exception:

    raceback (most recent call last):
    File "/home/catlee/.pyenv/versions/3.8.2/lib/python3.8/tarfile.py", line 2227, in makelink
    os.symlink(tarinfo.linkname, targetpath)
    FileExistsError: [Errno 17] File exists: 'message.txt' -> './symlink.txt'

    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "../test.py", line 12, in <module>
        tf.extractall()
      File "/home/catlee/.pyenv/versions/3.8.2/lib/python3.8/tarfile.py", line 2024, in extractall
        self.extract(tarinfo, path, set_attrs=not tarinfo.isdir(),
      File "/home/catlee/.pyenv/versions/3.8.2/lib/python3.8/tarfile.py", line 2065, in extract
        self._extract_member(tarinfo, os.path.join(path, tarinfo.name),
      File "/home/catlee/.pyenv/versions/3.8.2/lib/python3.8/tarfile.py", line 2145, in _extract_member
        self.makelink(tarinfo, targetpath)
      File "/home/catlee/.pyenv/versions/3.8.2/lib/python3.8/tarfile.py", line 2237, in makelink
        self._extract_member(self._find_link_target(tarinfo),
      File "/home/catlee/.pyenv/versions/3.8.2/lib/python3.8/tarfile.py", line 2137, in _extract_member
        self.makefile(tarinfo, targetpath)
      File "/home/catlee/.pyenv/versions/3.8.2/lib/python3.8/tarfile.py", line 2176, in makefile
        source.seek(tarinfo.offset_data)
      File "/home/catlee/.pyenv/versions/3.8.2/lib/python3.8/tarfile.py", line 513, in seek
        raise StreamError("seeking backwards is not allowed")
    tarfile.StreamError: seeking backwards is not allowed

    @JulienPalard
    Copy link
    Member

    Strange fact, this was already fixed in 011525e (which closes bpo-10761, nice spot Andrew) but was lost during a merge in 0d28a61:

    $ git show 0d28a61d23
    commit 0d28a61d233c02c458c8b4a25613be2f4979331e
    Merge: ed3a303548 d7c9d9cdcd
    
    $ git show 0d28a61d23:Lib/tarfile.py | grep unlink  # The merge commit does no longer contains the fix
    
    $ git show ed3a303548:Lib/tarfile.py | grep unlink  # The "left" parent does not contains it neither
    
    $ git show d7c9d9cdcd:Lib/tarfile.py | grep unlink  # The "right" one does contains it.
                        os.unlink(targetpath)
                            os.unlink(targetpath)

    Stranger fact, the test was not lost during the merge, and still lives today (test_extractall_symlinks).

    Happen that the current test is passing because it's in part erroneous, instead of trying to create a symlink on an existing one, it creates a symlink far far away:

    (Pdb) p targetpath
    '/home/mdk/clones/python/cpython/@test_648875_tmp-tardir/testsymlinks/home/mdk/clones/python/cpython/@test_648875_tmp-tardir/testsymlinks/symlink'

    Aditionally it passes anway because tar.errorlevel equals 1, which means the error is logged but not raised.

    With the following small patch:

    --- a/Lib/test/test_tarfile.py
    +++ b/Lib/test/test_tarfile.py
    @@ -1339,10 +1339,10 @@ class WriteTest(WriteTestBase, unittest.TestCase):
                     f.write('something\n')
                 os.symlink(source_file, target_file)
                 with tarfile.open(temparchive, 'w') as tar:
    -                tar.add(source_file)
    -                tar.add(target_file)
    +                tar.add(source_file, arcname="source")
    +                tar.add(target_file, arcname="symlink")
                 # Let's extract it to the location which contains the symlink
    -            with tarfile.open(temparchive) as tar:
    +            with tarfile.open(temparchive, errorlevel=2) as tar:
                     # this should not raise OSError: [Errno 17] File exists
                     try:
                         tar.extractall(path=tempdir)

    the error is raised as expected: FileExistsError: [Errno 17] File exists: '/home/mdk/clones/python/cpython/@test_649794_tmpæ-tardir/testsymlinks/source' -> '/home/mdk/clones/python/cpython/@test_649794_tmpæ-tardir/testsymlinks/symlink'

    I'm opening an PR to restore this as it was intended.

    @taleinat
    Copy link
    Contributor

    See also another duplicate of this issue, bpo-40049.

    @taleinat taleinat added 3.9 only security fixes 3.10 only security fixes labels Sep 19, 2020
    @JulienPalard
    Copy link
    Member

    New changeset 4fedd71 by Julien Palard in branch 'master':
    bpo-12800: tarfile: Restore fix from 011525e (GH-21409)
    4fedd71

    @miss-islington
    Copy link
    Contributor

    New changeset 9d2c2a8 by Miss Islington (bot) in branch '3.9':
    bpo-12800: tarfile: Restore fix from 011525e (GH-21409)
    9d2c2a8

    @miss-islington
    Copy link
    Contributor

    New changeset bda2e68 by Miss Islington (bot) in branch '3.8':
    bpo-12800: tarfile: Restore fix from 011525e (GH-21409)
    bda2e68

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants