Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security] CVE-2007-4559: tarfile: Add absolute_path option to tarfile, disabled by default #73974

Closed
vstinner opened this issue Mar 10, 2017 · 45 comments
Labels
3.11 only security fixes stdlib Python modules in the Lib dir type-security A security issue

Comments

@vstinner
Copy link
Member

BPO 29788
Nosy @gustaebel, @vstinner, @jwilk, @berkerpeksag, @vadmium

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2017-03-10.16:13:44.639>
labels = ['type-security', 'library', '3.11']
title = '[Security] tarfile: Add absolute_path option to tarfile, disabled by default'
updated_at = <Date 2021-05-21.23:25:06.720>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2021-05-21.23:25:06.720>
actor = 'ned.deily'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2017-03-10.16:13:44.639>
creator = 'vstinner'
dependencies = []
files = []
hgrepos = []
issue_num = 29788
keywords = []
message_count = 2.0
messages = ['289388', '289437']
nosy_count = 5.0
nosy_names = ['lars.gustaebel', 'vstinner', 'jwilk', 'berker.peksag', 'martin.panter']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'security'
url = 'https://bugs.python.org/issue29788'
versions = ['Python 3.11']

@vstinner
Copy link
Member Author

I noticed that "python3 -m tarfile -x archive.tar" uses absolute paths by default, whereas the UNIX tar command doesn't by default. The UNIX tar command requires to add explicitly --absolute-paths (-P) option.

I suggest to add a boolean absolute_path option to tarfile, disabled by default.

Example to create such archive. See that tar also removes "/" by default and requires to pass explicitly -P:

$ cd $HOME
# /home/haypo
$ echo TEST > test
$ tar -cf test.tar /home/haypo/test
tar: Removing leading `/' from member names

$ rm -f test.tar
$ tar -P -cf test.tar /home/haypo/test
$ rm -f test

Extracting such archive using tar is safe *by default*:

$ mkdir z
$ cd z
$ tar -xf ~/test.tar
tar: Removing leading `/' from member names
$ find
.
./home
./home/haypo
./home/haypo/test

Extracting such archive using Python is unsafe:

$ python3 -m tarfile -e ~/test.tar
$ cat ~/test
TEST
$ pwd
/home/haypo/z

Python creates files outside the current directory which is unsafe, wheras tar doesn't.

@vstinner vstinner added 3.7 (EOL) end of life type-security A security issue stdlib Python modules in the Lib dir labels Mar 10, 2017
@vstinner vstinner changed the title Add absolute_path option to tarfile, disabled by default tarfile: Add absolute_path option to tarfile, disabled by default Mar 10, 2017
@vadmium
Copy link
Member

vadmium commented Mar 11, 2017

The CLI was added in bpo-13477. I didn’t see any discussion of traversal attacks there, so maybe it was overlooked. Perhaps there should also be a warning, like with the Tarfile.extract and “extractall” methods.

However I did see one of the goals was to keep the CLI simple, which I agree with. I would suggest that the CLI get this proposed behaviour by default (matching the default behaviour of modern “tar” commands), with no option to restore the current less-robust behaviour.

To implement it, I suggest to fix the module internals first: bpo-21109 and/or bpo-17102.

FWIW BSD calls the option “--absolute-paths” (plural paths) <https://www.freebsd.org/cgi/man.cgi?tar%281%29#OPTIONS\>, while Gnu calls it “--absolute-names” <https://www.gnu.org/software/tar/manual/html_chapter/tar_6.html#SEC121\>. Both these options disable other checks, such as for parent directories (..) and external symbolic link targets, so I think the term “absolute” is too specific. But please use at least replace the underscore with a dash or hyphen: “--absolute-path”, not “--absolute_path”.

@vstinner vstinner changed the title tarfile: Add absolute_path option to tarfile, disabled by default [Security] tarfile: Add absolute_path option to tarfile, disabled by default May 29, 2018
@ned-deily ned-deily added 3.11 only security fixes and removed 3.7 (EOL) end of life labels May 21, 2021
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@vstinner vstinner changed the title [Security] tarfile: Add absolute_path option to tarfile, disabled by default [Security] CVE-2007-4559: tarfile: Add absolute_path option to tarfile, disabled by default Sep 23, 2022
@jowagner
Copy link

jowagner commented Sep 23, 2022

Questions:

  1. Can we use https://bugs.python.org/file8339/insecure_pathnames.diff (found in issue tarfile insecure pathname extraction #45385) as a basis for a PR? How do we give credit if @gustaebel is not the contributor but just another user with the same surname?
  2. Can we use the abspath + commonprefix approach from https://gist.github.com/Kasimir123/9bdc38f2eac9a19f83ef6cc52c7ce82e?
  3. Changing behaviour can break existing software, e.g. restoring a backup that uses absolute paths. How much notice should be given in this case? On the one hand, the issue should be fixed soon as it impacts security. On the other hand, what are a few years more after 15 years? What is the process for ensuring the defaults are changed for a specific target version or year?
  4. absolute-path is not a good name if we also test for .. path components.
  5. In the safe mode, should we (a) bail out as soon as an absolute path is encountered or (b) only if the absolute path brings us outside the hierarchy of the target directory (the current working directory by default), or (c) only if the effective target directory is not '/', or (d) only if '/' is not explicitly specified as the target directory (parameter path of the extract functions)? Consider '/etc/passwd' in a tar file. Should this file allowed to be written in safe mode when inside '/etc'? Should it be ok if running from the top-level '/' root folder? Should it only work if path='/' is set when calling an extract function of the tarfile module? The latter is something we could ask developers of restore utilities to do to continue being able to restore backups produced with absolute paths (and this would also be compatible with future backups stripping the leading /).
  6. The patch in issue tarfile insecure pathname extraction #45385 suggests a parameter check_path. While this naming does not make explicit what is being checked, it can cover all future security issues arising from tarfile member names. Or would it be better to have a separate parameter for each check?
  7. Should the idea to allow the user to provide a function with check_path that returns True/False for whether to skip the entry (if an exception is desired the function can raise it - no need to define a return value for that) or even allows to change the path (returning a string) be implemented in the same PR or should this be a separate PR?
  8. Could a specially crafted tar file cause the tarfile module to create a symlink to a directory outside the target tree, e.g. example --> / and then contain a member example/etc/passwd that overwrites /etc/passwd? To protect against this, one could call realpath() but this would also evaluate pre-existing symlinks. Users may expect the old behaviour e.g. when receiving updates via tar files for multiple folders and one of the folders has been moved to a different filesystem on the receiving system. To satisfy the expectation of these users and being safe, we need a version of realpath() that does not consult the actual filesystem but only the folders and symlinks found in the tar file so far. [Edit: Yes, see https://github.com/tarfile: Traversal attack vulnerability #65308#issuecomment-1093650963]

@zhuofeng6
Copy link

Which versions are affected by this CVE?

@nanonyme
Copy link

nanonyme commented Sep 24, 2022

To be honest, whenever path is non-empty, I would expect it to be regarded "safe" to the extent that any final path must be contained inside the path user gives. So essentially passing relative paths would be fine as long as it does not escape the directory given by the user of the API. So eg I give path "/", then tarfile is allowed to unpack anywhere. I give path "", then there is no checking whatsoever (for backwards compat reasons). When I give path "something-else", then trying to unpack outside "something-else" raises an exception.

@nanonyme
Copy link

nanonyme commented Sep 24, 2022

Another possibility would be raising audit event for the path names so you can write an audit hook that prevents unpacking unsafe paths. This implies no API changes.

@nanonyme
Copy link

@alittlesir everything everywhere. Documentation says you must check paths to be safe before unpacking.

@rooterkyberian
Copy link

As a commercial python developer I would greatly appreciate this to become part of CPython.

So while "technically correct" python implementation was deemed "non-security issue" (#45385), the fact that it resoluted in thousands of vulnerable projects seems to indicate that in retrospect, it is an issue. Despite documentation warning. Like great, we have documentation which says to watch out, but here we are. People are not reading it. And all that for "being technically correct" and in support of a feature that hardly every anyone wants or needs.

Alternative non-breaking (and I really wonder which projects would be broken here) would be to have a subclass that does the protecting by default, in similar fashion as tarsafe project (https://github.com/beatsbears/tarsafe , but with different implementation, as the one there seems quite inefficient).

@nanonyme
Copy link

I would prefer secure by default. This is implicitly used by various API's like shutil.unpack_archive which does not check TarInfo items are safe paths. Even standard library is not following tarfile module documentation.

@gustaebel
Copy link
Contributor

gustaebel commented Sep 26, 2022

I wrote a comment back in 2014 on the various security aspects of tarfile: #65308 (comment)
I think it is still worth a read.

@nanonyme
Copy link

@gustaebel yes. I was mostly talking about other higher level tools in standard library that build on top of tarfile not being remotely secure either. I suggested adding audit event for unpacked TarInfo, that would easily allow gluing security on top without dropping all these nice things standard library has gathered. As said, shutil.unpack_archive will automatically call into insecure code in tarfile behind your back after sniffing that it needs tarfile based on file extension. This is not great design.

@jorhett
Copy link

jorhett commented Sep 27, 2022

I'd like to point out that PyPI itself is at risk, given that PEP-458 for signed package metadata has still not been implemented a decade after it was proposed. In short, every PyPI package should be considered "untrusted". While pip does check for filename extraction paths, one could simply add an installation script that runs tarfile directly to achieve the exploit.

@nanonyme
Copy link

nanonyme commented Sep 27, 2022

@jorhett the funny thing there is that this issue is about tarfile when in fact zipfile has the exact same vulnerability class considered by the CVE and PyPI has now converged around wheels that are essentially ZIP packages. Back when the CVE was filed tarfile was still used more.

@jorhett
Copy link

jorhett commented Sep 27, 2022

The not-funny thing is that PyPI packing is not cryptographically verifiable, there are no published instructions for validating PyPI packages, and yet Python maintainers consider it the user's responsibility to inspect the code before using it.

I do this for a living, and Python is the hardest to evaluate. It has the least verifications/validations of origin possible. So we have to build a VM and diff before/after to be certain of what it does. Yeah, all the people using Python plugins in their tools are responsible for taking this kind of effort 🤦

@vstinner
Copy link
Member Author

See also issue #94531 "Document zlib, gzip, bz2 and tarfile known vulnerabilities".

@zhuofeng6
Copy link

it seems to have a big impact. when are you going to fix this cve? @vstinner

@vstinner
Copy link
Member Author

it seems to have a big impact. when are you going to fix this cve? @vstinner

I don't plan to implement my proposed idea soon, but if someone does, I will review the change.

Anyone is free to propose a fix!

@nanonyme
Copy link

@Denelvo tarfile API clearly states you must manually sanitize filenames yourself. So does zipfile. What is more problematic is shutil.unpack_archive does not expose an API to do this. So clearly it must sanitize files for you as a high-level API. It probably did not exist back in 2007.

@gustaebel
Copy link
Contributor

@Denelvo I rather suggest everyone take a look at the actual code in Spyder IDE that is the basis for the exploit (from this blog article from Trellix).

  1. A tar archive is extracted in its entirety to the filesystem, just like that. No checks whatsoever.
  2. Actually only one file from the archive is needed. It is not only completely unnecessary to use TarFile.extractall(), but there would simply be no exploit if TarFile.extractfile() was used.
  3. The file in question turns out to be a pickle dump which is opened and loaded via pickle.loads().

To quote the pickle documentation:

The pickle module is not secure. Only unpickle data you trust.

This is a security vulnerability in Spyder IDE.

@nanonyme
Copy link

Kind of yes but also if you have code that doesn't sanitize unpacking paths, then unrelated code cannot trust any path in entire hard disk anymore for safe unpickling. And you could even simpler attack write on Linux .bashrc file into user home for arbitrary code execution afterwards.

@jowagner
Copy link

Making tarfile safe by default was never expected to solve all software problems such as unsafe use of the pickle module.

Can we move on to discussing what the PR should look like?

How about the following default behaviour:

  • All extract*() functions call a check_path function for each member, using the return value as the final path and skipping the member if the return value is empty or None.
  • If the target folder is empty, use a check_path function that raises an exception if the realpath is outside the target folder. This covers ../, / and symlink attacks.
  • If the target folder is not empty, use a check_path function that raises an exception if the normalised path (without symlink resolution) is or may be leading outside the folder (adding "may be" because ../ does not lead outside the target folder /). This does not protect against symlink attacks as trying to distinguish between pre-existing symlinks (that the user expects to be followed) and new symlinks from the archive would introduce new attack vectors. One could also argue to drop all checks for non-empty target folders as a symlink attack can achieve everything the other two attacks can do.

At minimum, the tarfile module should provide a check_path function that restores the current unsafe behaviour. For convenience, this could be selected with check_path=None.

@nsoranzo
Copy link
Contributor

nsoranzo commented Nov 4, 2022

2. Can we use the abspath + commonprefix approach from https://gist.github.com/Kasimir123/9bdc38f2eac9a19f83ef6cc52c7ce82e?

I don't think that's enough: https://gist.github.com/Kasimir123/9bdc38f2eac9a19f83ef6cc52c7ce82e?permalink_comment_id=4358347#gistcomment-4358347

@jowagner
Copy link

jowagner commented Nov 14, 2022

Digging deeper in the GNU tar behaviour with pre-existing symlinks, it seems that tar follows a pre-existing symlink when the file foo/bar is not proceeded by a tarfile member foo of type directory. If there is such a member, the symlink is replaced by a new folder foo and bar is placed inside it. A symlink foo --> /tmp is extracted without complains. Only when followed by a member using this symlink an error Cannot open: Not a directory is printed. (This error message is strange as tar happily writes foo/bar through a pre-existing symlink if there is no member foo before in the archive. The error message is also printed when there is a symlink in the archive matching the pre-existing symlink.)

How does GNU tar do this? I think it must be storing all folders and symlinks in memory as clearly the tarfile members are not evaluated in isolation and the state of the filesystem is not sufficient to explain behaviour differences. Do we want this level of protection against symlink attacks where the target folder is non-empty (for empty target folders, we can simply check paths with realpath()) at the expense of possibly high memory usage during extraction?

@mandolaerik
Copy link

Note that the symlink trick can be applied also to file symlinks; the tar file created by:

import tarfile
from pathlib import Path
with tarfile.open('foo.tar', 'w') as tf:
    ti = tarfile.TarInfo('foo')
    ti.type = tarfile.SYMTYPE
    ti.linkname = '../../../etc/passwd'
    tf.addfile(ti)
    Path('foo').write_text('garbage')
    tf.add('foo')

will try to write passwd when extracted by extractall. It seems that tar x doesn't have this problem. The following naive fix plugs this hole (only tested manually):

index 169c88d63f..aa4513e510 100755
--- a/Lib/tarfile.py
+++ b/Lib/tarfile.py
@@ -2211,6 +2211,8 @@ def makefile(self, tarinfo, targetpath):
         source = self.fileobj
         source.seek(tarinfo.offset_data)
         bufsize = self.copybufsize
+        if os.path.lexists(targetpath) and not os.path.isdir(targetpath):
+            os.unlink(targetpath)
         with bltn_open(targetpath, "wb") as target:
             if tarinfo.sparse is not None:
                 for offset, size in tarinfo.sparse:

(needs some more work, like a test, plus I suppose makefifo/makedev also need a corresponding fix)

fghaas added a commit to fghaas/tutor-contrib-backup that referenced this issue Jan 9, 2023
The tarfile.extractall() command is vulnerable to path traversal,
which may be exploited by adding a member with an "../" path to the
tarball. In our case, this might open up the possibility of malicious
data injection to someone that doesn't normally have access to the
Open edX cluster, but does have write access to the S3 bucket. In that
case, bad things could happen upon extraction of a thus-crafted
archive, during an automated restore.

This shouldn't have particularly wide-ranging implications since the
only filesystem affected by such an attack would be the restore job's
container, which is by definition short-lived. And an attacker with
access to the S3 bucket could already do far greater damage to the
Open edX installation by simply modifying the MongoDB or MySQL data
contained in the tarball.

Still, it does not hurt to use a safer (if slightly slower) approach
that is provided by the tarsafe module.

References:
python/cpython#73974
https://mail.python.org/pipermail/python-dev/2007-August/074290.html
https://nvd.nist.gov/vuln/detail/CVE-2007-4559
fghaas added a commit to fghaas/tutor-contrib-backup that referenced this issue Jan 9, 2023
The tarfile.extractall() command is vulnerable to path traversal,
which may be exploited by adding a member with an "../" path to the
tarball. In our case, this might open up the possibility of malicious
data injection to someone that doesn't normally have access to the
Open edX cluster, but does have write access to the S3 bucket. In that
case, bad things could happen upon extraction of a thus-crafted
archive, during an automated restore.

This shouldn't have particularly wide-ranging implications since the
only filesystem affected by such an attack would be the restore job's
container, which is by definition short-lived. And an attacker with
access to the S3 bucket could already do far greater damage to the
Open edX installation by simply modifying the MongoDB or MySQL data
contained in the tarball.

Still, it does not hurt to use a safer (if slightly slower) approach
that is provided by the tarsafe module.

References:
python/cpython#73974
https://mail.python.org/pipermail/python-dev/2007-August/074290.html
https://nvd.nist.gov/vuln/detail/CVE-2007-4559
@pombredanne
Copy link
Contributor

FWIW, there is a spam bot PR campaign by @TrellixVulnTeam and @Kasimir123 (they should be banned) that has been going on since September and spams a large number of python projects with junk PRs with incorrect patches.

Examples include:

@JLLeitschuh
Copy link

JLLeitschuh commented Jan 23, 2023

Hi All,

My name is Jonathan Leitschuh, I'm a Senior Software Security Researcher for Project Alpha Omega at the Open Source Security Foundation under the Linux Foundation.

I recently attended a talk by @Kasimir123 at ShmooCon about this vulnerability and this work. They recently opened 61,895 pull requests across the OSS ecosystem to fix this vulnerability as the Python maintainers have taken no action to fix this severe vulnerability after 6 years.

To be clear, vulnerabilities like TAR-slip (aka Zip-slip) can lead to remote code execution by allowing an attacker to write code into executable file locations. In the right contexts, this vulnerability has a CVSSv3.1 score of 10/10 (AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H).


While I believe that fixing the simlink problem is important. I believe that this should be broken down into two issues:

  1. Destination escape by path traversal
  2. Destination escape by simlink

Given the first is far more likely to be exploited in the real world, I propose fixing it first, then focusing on the second problem.

This will ensure that end-users are protected against the most likely risk sooner, while the less likely risk discussion continues.


To provide an incentive: I would also like to note that any developer who resolves this security issue in the python standard library would most likely be eligible for a multi-thousand dollar financial reward under the Secure Open Source Rewards (https://sos.dev) project which is also operated out of the Open Source Security Foundation.

@encukou
Copy link
Member

encukou commented Jan 23, 2023

FWIW, I'm reading up on this issue, and I plan to post a proposal on Discourse this week to unblock things.

@jowagner
Copy link

@encukou We have multiple proposals in the comments here and the linked issues. The problem is that no proposal has received positive feedback from the core cpython team to motivate one of us go to the next step of implementing it and making a PR. Since a previous PR fixing the issue has been rejected, rewards from a new PR (financial or not) are uncertain. If you make yet another proposal please link to it from here.

@encukou
Copy link
Member

encukou commented Jan 25, 2023

Yeah, it would be unfair if I ended up getting money for this, outside my normal salary.
Any solution will necessarily be based on lots of prior work and discussion, by people who seem to have burned out trying to fix the issue.
I won't apply for any bounty, and if I get one anyway I'll donate it to the PSF.

@encukou
Copy link
Member

encukou commented Jan 25, 2023

My proposal: https://discuss.python.org/t/23149

@encukou
Copy link
Member

encukou commented May 3, 2023

PEP-706, which adds filter rather than absolute_path, has been implemented in #102950. See the added docs.

Python 3.12, and security updates to some earlier releases, will allow users to avoid CVE-2007-4559 by changing their code/settings.
Python 3.12 will emit a warning urging people to do that.
Python 3.14 will fix CVE-2007-4559 by making safer behaviour the default.

@encukou encukou closed this as completed May 3, 2023
@nanonyme
Copy link

nanonyme commented May 3, 2023

@encukou have I missed something or do these changes now break shutil.unpack_archive abstraction layer for zipfile vs tarfile?

@encukou
Copy link
Member

encukou commented May 3, 2023

Yes. Sadly, in Python 3.12 and 3.13 you'll get a warning about changing behaviour if you use shutil.unpack_archive with a tarball.
You can ignore the warning, or inspect the archive and pass filter='data' if it's a tarball. Or if you can change settings of the whole Python interpreter, set tarfile.TarFile.extraction_filter = tarfile.data_filter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes stdlib Python modules in the Lib dir type-security A security issue
Projects
Status: Done
Development

No branches or pull requests