Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[16.07] Fix galaxy.util.in_directory()... #4856

Merged

Conversation

jmchilton
Copy link
Member

  • broken because it takes realpath of just one argument. Easy to see on Mac OS X since /tmp isn't really /tmp for instance:
% touch /tmp/moo
% python
>>> import galaxy.util
>>> galaxy.util.in_directory("/tmp/moo", "/tmp/moo")
False

... and this is why the Travis tests are failing under Mac OS X. Broken with 93a8bfc.

…just one argument.

Easy to see on Mac OS X since /tmp isn't really /tmp for instance:

```
% touch /tmp/moo
% python
>>> import galaxy.util
>>> galaxy.util.in_directory("/tmp/moo", "/tmp/moo")
False
```

... and this is why the Travis tests are failing under Mac OS X. Broken with 93a8bfc.
@natefoo
Copy link
Member

natefoo commented Oct 25, 2017

That was the whole point of this change, to prevent symlink attacks.

@jmchilton
Copy link
Member Author

That was the whole point of this change, to prevent symlink attacks.

It is broken right - galaxy.util.in_directory("/tmp/moo", "/tmp/moo") should always be true right? I'd suggest adding a test that fails with this change so we can work toward a mutual understanding of the problem?

@nsoranzo
Copy link
Member

Unit tests would really be helpful here.

@natefoo
Copy link
Member

natefoo commented Oct 25, 2017

It shouldn't be true if /tmp/moo exists and it, or an element in its path, is a symlink. Because then it's not in /tmp/moo.

@jmchilton
Copy link
Member Author

It shouldn't be true if /tmp/moo exists and it, or an element in its path, is a symlink. Because then it's not in /tmp/moo.

  • I really disagree with this sentence just philosophically speaking.
  • More concretely, are you stipulating that in_directory's "directory" argument cannot contain a symbolic link anywhere in its path?
    • If yes, is that required to prevent this attack or is that just a philosophical point we disagree on?

@nsoranzo
Copy link
Member

It shouldn't be true if /tmp/moo exists and it, or an element in its path, is a symlink. Because then it's not in /tmp/moo.

This also doesn't match the method documentation, which says:

def in_directory( file, directory, local_path_module=os.path ):
"""
    Return true, if the common prefix of both is equal to directory
    e.g. /a/b/c/d.rst and directory is /a/b, the common prefix is /a/b

Maybe you are confusing it with unsafe_walk()?

Also, it may be good to fix the following documentation bug in this PR:

diff --git a/lib/galaxy/util/path/__init__.py b/lib/galaxy/util/path/__init__.py
index 48a8355..3cde6fb 100644
--- a/lib/galaxy/util/path/__init__.py
+++ b/lib/galaxy/util/path/__init__.py
@@ -31,7 +31,7 @@ def safe_contains(prefix, path, whitelist=None):
 
     Given any two filesystem paths, ensure that ``path`` is contained in ``prefix``. If ``path`` exists (either as an
     absolute path or relative to ``prefix``), it is canonicalized with :func:`os.path.realpath` to ensure it is not a
-    symbolic link that points outside of ``path``. If it is a symbolic link and ``whitelist`` is set, the symbolic link
+    symbolic link that points outside of ``prefix``. If it is a symbolic link and ``whitelist`` is set, the symbolic link
     may also point inside a ``whitelist`` path.
 
     The ``path`` is checked against ``whitelist`` using either its absolute pathname (if passed in as absolute) or

@natefoo
Copy link
Member

natefoo commented Oct 25, 2017

It has to be that way for any user provided path, or a path that a user or tool could potentially manipulate on the filesystem. The best you could do in allowing symlinks is to allow them in the base directory configured by the administrator for whatever path is being checked. The function would either need a parameter for that path, or be clearly stated that the directory argument must already be trusted.

Also I almost added additional functions or parameters to differentiate between testing path strings and real paths on the file system. Right now it checks real paths if they exist, and just strings if they don't, which is not very intuitive. This is why the safe_relpath function exists.

@jmchilton
Copy link
Member Author

$ grep -r "in_directory" lib
lib/galaxy/jobs/runners/__init__.py:from galaxy.util import in_directory
lib/galaxy/jobs/runners/__init__.py:                    if in_directory(source_file, tool_working_directory):
Binary file lib/galaxy/jobs/runners/__init__.pyc matches
lib/galaxy/model/metadata.py:from galaxy.util import (in_directory, listify, string_as_bool,
lib/galaxy/model/metadata.py:            if compute_tmp_dir and tmp_dir and in_directory(path, tmp_dir):
Binary file lib/galaxy/model/metadata.pyc matches
lib/galaxy/tools/deps/containers.py:from galaxy.util import in_directory
lib/galaxy/tools/deps/containers.py:                    if in_directory(rw_path, path):
Binary file lib/galaxy/tools/deps/containers.pyc matches
lib/galaxy/tools/parameters/output_collect.py:        if not util.in_directory(directory, job_working_directory):
Binary file lib/galaxy/tools/parameters/output_collect.pyc matches
lib/galaxy/util/__init__.py:def in_directory(file, directory, local_path_module=os.path):
Binary file lib/galaxy/util/__init__.pyc matches
lib/galaxy/webapps/galaxy/api/job_files.py:        in_temp_dir = util.in_directory(path, app.config.new_file_path)
lib/galaxy/webapps/galaxy/api/job_files.py:                elif util.in_directory(path, dataset.extra_files_path):
lib/galaxy/webapps/galaxy/api/job_files.py:        return util.in_directory(path, working_directory)
Binary file lib/galaxy/webapps/galaxy/api/job_files.pyc matches

It has to be that way for any user provided path

Perhaps I'm missing something but there doesn't seem to be a single place in the code where it is used that way. Why would we want to check if a user supplied file is in a user supplied directory? At worst, we would want to know if a user supplied file is in a Galaxy supplied directory right?

@natefoo
Copy link
Member

natefoo commented Oct 25, 2017

@nsoranzo: you're right, I changed how the function worked without correcting the documentation, I can fix that based on what we decide to do.

@jmchilton: if it's all trusted paths then why do we need a check at all? It's used to verify that a user provided path, path fragment, or file, is in an admin-configured directory.

This function is used heavily in Pulsar btw. But in that case it's mostly not used on real paths, I believe.

@jmchilton jmchilton force-pushed the release_16.07_fix_in_directory branch from c2707f7 to 4b5dc39 Compare October 25, 2017 14:32
@jmchilton
Copy link
Member Author

@natefoo I feel like we are talking across streams here - please try to speak to me in code. Can you modify the following test so this branch does something bad when directory (the second argument) is "trusted" (provided by the admin or Galaxy itself) andfile (the first argument) is a user supplied, user controller file?

    >>> base_dir = tempfile.mkdtemp()
    >>> safe_dir = os.path.join(base_dir, "user")
    >>> os.mkdir(safe_dir)
    >>> good_file = os.path.join(safe_dir, "1")
    >>> with open(good_file, "w") as f: f.write("hello")
    >>> in_directory(good_file, safe_dir)
    True
    >>> in_directory("/other/file/is/here.txt", safe_dir)
    False
    >>> unsafe_link = os.path.join(safe_dir, "2")
    >>> os.symlink("/other/file/bad.fasta", unsafe_link)
    >>> in_directory(unsafe_link, safe_dir)
    False

@nsoranzo nsoranzo requested a review from natefoo October 25, 2017 14:45
@natefoo
Copy link
Member

natefoo commented Oct 25, 2017

With the addition of the requirement that directory be trusted, I don't think this is exploitable. At the time I changed this code, I didn't audit callers. If they all pass a trusted directory, then no change would have been necessary. If we do accept this change, those callers should be audited.

Perhaps it makes sense to split this in to two functions, one which operates on path strings only, and one that performs the actual filesystem check (with a param to indicate whether the directory is "trusted" or how much of the directory path is trusted?).

@jmchilton
Copy link
Member Author

jmchilton commented Oct 25, 2017

@natefoo Well I showed the grep for in_directory there - in all those cases directory is trusted. What if I reworked this change so that I performed the realpath on directory in in_directory and you can keep your stronger preconditions at the lower level?

@natefoo
Copy link
Member

natefoo commented Oct 25, 2017

I'm not sure how the grep indicates they're trusted if the directory param doesn't come directly from e.g. app.config. We'd need to look at how those are set. Or are you saying you did that?

We'd also need to check all the calls in Pulsar. I actually think I may have done a cursory check of the calls in Galaxy, but when I realized it was also used in Pulsar, I decided to just use the solution I ended up with that didn't require directory to be trusted.

Your solution is okay with me as long as we know that directory is always trusted.

@jmchilton
Copy link
Member Author

I reworked the approach as I described previously.

Your solution is okay with me as long as we know that directory is always trusted.

It is - trust me or re-verify yourself as you feel appropriate. I had already updated the docs to reflect this is a precondition of that function.

@natefoo
Copy link
Member

natefoo commented Oct 25, 2017

Sorry, I'm not following - you verified that it's called correctly?

@jmchilton
Copy link
Member Author

Yes.

@nsoranzo nsoranzo merged commit e3e28dd into galaxyproject:release_16.07 Oct 25, 2017
@galaxybot
Copy link
Contributor

This PR was merged without a milestone attached.

@nsoranzo nsoranzo added this to the 17.09 milestone Oct 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants