Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get real user name from username and email (for submission as real user with drmaa) #4096

Merged
merged 23 commits into from Jun 27, 2017

Conversation

Projects
None yet
6 participants
@bernt-matthias
Copy link
Contributor

commented May 22, 2017

The main part of this PR is that the drmaa job submission uses the username or the email that is stored in galaxy for submission as real user. This is in particular useful if the galaxy instance gets its users from LDAP. I do not know if this is already useful for users of other cluster software than drmaa.

A secondary change is to preserve the PATH, LD_LIBRARY_PATH .. variables for the sudo calls for the external script since otherwise these variables are not kept in the environment (sudo resets them). I guess this needs to be discussed since sudo does this reset on purpose. I needed this, since otherwise in my installation python (which is not in a default path) is not found anymore.

Furthermore, the external runner worked for me only if the --assign_all_groups parameter was used. Would be nice if somebody could verify if this is useful, since I did not understand this parameter.

Important note: I'm quite new to galaxy and this is my 1st "serious" PR. I guess this needs proper testing.

- get real user name from username and email
- sudo calls with preserving PATH, LD_LIBRARY_PATH ...

@galaxybot galaxybot added the triage label May 22, 2017

@galaxybot galaxybot added this to the 17.09 milestone May 22, 2017

bernt-matthias added some commits May 23, 2017

@bernt-matthias

This comment has been minimized.

Copy link
Contributor Author

commented May 23, 2017

Just added a small change to external_chown_script.py, that might inrease security (I thought it fits to the PR, since it affects the cluster access).

Our admins here were concerned that with sudo rights on this script it is possible to change the owner and group of every file in the system. They suggested to make a check which allows changes only in subdirectories of the galaxy installation directory. I have made this configurable, i.e. one can configure a list of directories. To be sure the path given on the command line is made absolute, i.e., it should not be possible to place links inside the directory that point somewhere else.

The idea would be that the admins can review the paths and then remove write access to the file when they add the sudo permissions.

@mvdbeek
Copy link
Member

left a comment

The core of the changes looks good, I would just like to confirm that specifically injecting PATH, LD_LIBRARY_PATH and PKG_CONFIG_PATH is really necessary.

cmd = [ '/usr/bin/sudo', '-E', external_chown_script, self.working_directory, username, str( gid ) ]

cmd = [ '/usr/bin/sudo', '-E', ]
for v in ["PATH", "LD_LIBRARY_PATH", "PKG_CONFIG_PATH"]:

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek May 23, 2017

Member

This shouldn't be necessary, as 'sudo -E' does preserve the environment.
This is a small snippet that you can use to verify this:

import os
import subprocess


print('without -E:')
subprocess.call(['sudo', './echo.sh'], shell=False)
print('with -E:')
subprocess.call(['sudo', '-E', './echo.sh'], shell=False)

Write a echo.sh script and echo some environment variable only available to the user.
You'll see that only with '-E' you get the environment variable echoed correctly.

This comment has been minimized.

Copy link
@bernt-matthias

bernt-matthias May 23, 2017

Author Contributor

The security policy of our cluster has the env_reset option enabled (see man 5 sudoers) which in particular resets PATH. This seems to be the default on CentOS 6.8. In addition the LD_LIBRARY_PATH and PKG_CONFIG_PATH are reset. I'm not sure if PKG_CONFIG_PATH is actually needed - I just added everything that was removed.

I think that the policy of sudo to reset the PATH and LIBRARY_PATH (etc) is actually a good idea. Just imagine a script with sudo rights that calls a binary B oder uses a library L. If you have the possibility to modify the PATH or LIBRARY_PATH before sudoing you have essentially the right to do everything.

I think this modification is likely not of interest for many (maybe only me). It could also be that it is completely wrong to solve my problem like this. The question would be how to proceed here?

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek May 23, 2017

Member

Hmm, could you ask your admins to make an exception for the external_chown_script.py ?
If I look at https://unix.stackexchange.com/a/13246 it appear you can specify the env vars to keep for a specific path. In a sense you're already circumventing this policy by injecting the the environment variables anyway ... . If not I guess we can again make the list of env vars configurable in the galaxy.ini file, I think.

This comment has been minimized.

Copy link
@bernt-matthias

bernt-matthias May 23, 2017

Author Contributor

For the PATH we also thought that it is sufficient to hard code the path to the python binary in the script. But then there is still the problem with the python shared object file which is not found. Maybe one of you has an idea? Maybe I could build a special statically linked python binary just for these scripts. Could be a better idea than adding a potential security hole to galaxy.

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek May 23, 2017

Member

Well, I don't think forcing users to build special version of the python interpreter just for this purpose is great. I kind of like injecting PATH (and LD_LIBRARY_PATH -urgs-, if necessary) instead of propagating all variables with sudo -E.

This comment has been minimized.

Copy link
@bernt-matthias

bernt-matthias May 24, 2017

Author Contributor

Would it be an option to call

/path/to/python external_chown_script.py ...

instead of

external_chown_script.py ... (using the shellbang)

Then we could just use the python of galaxy's virtual environment if this path is accessible from the code. But this still leaves the problem of the library path (I never understood why the python binary but not the shared object is copied to the venv).

This comment has been minimized.

Copy link
@natefoo

natefoo May 24, 2017

Member

I'd prefer that this list be configurable, but see also my comment about not defaulting to using sudo at all.

pass

# try to get user from email address/username
for cand in [job.user.email.split('@')[0], job.user.username]:

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek May 23, 2017

Member

I'd prefer this to be configurable via a variable in galaxy.ini (sth. like real_system_username, with options 'user_email' and 'username'), close to the external_chown_script variable.

This comment has been minimized.

Copy link
@bernt-matthias

bernt-matthias May 23, 2017

Author Contributor

Sounds possible.

The question is if this is necessary, since testing both adds only a little computation (I guess its the same as the if-statement that would be necessary when we make this configurable).

For me both possibilities would be fine.

If we leave it as is I could imagine in case that both ways to determine a user are successful a test checking if they yield the same user would be a good idea.

Otherwise: Can somebody point me to examples showing me how to access configuration variables?

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek May 23, 2017

Member

I'm worried about corner cases like users who register an email address whose first part matches another users' username ... who then would get billed for computation time.
And explicit is better then implicit.

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek May 23, 2017

Member

For adding options to galaxy.ini you'd write your option in the galaxy.ini.sample (with a bit of documentation) and then you reference it in the config object here. This will then be available as app.config.your_new_option (in this class it's self.app.config.your_new_option)

This comment has been minimized.

Copy link
@bernt-matthias

bernt-matthias May 23, 2017

Author Contributor

OK. Will do it this way.

subprocess.Popen( [ '/usr/bin/sudo', '-E', kill_script, str( ext_id ), str( self.userid ) ], shell=False )

command = [ '/usr/bin/sudo', '-E', ]
for v in ["PATH", "LD_LIBRARY_PATH", "PKG_CONFIG_PATH"]:

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek May 23, 2017

Member

Again, I don't understand why this is necessary, 'sudo -E' should really preserve the environment.

@@ -1,14 +1,34 @@
#!/usr/bin/env python
import os
import os.path

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek May 23, 2017

Member

This import is not required.

This comment has been minimized.

Copy link
@bernt-matthias

bernt-matthias May 23, 2017

Author Contributor

OK. I removed it.

# if set to None every file can be modified by the script.
# this can increase security in particular if write access to this
# script is removed by the admin.
# ALLOWED_PATHS = [ job_working_directory, new_file_path ]

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek May 23, 2017

Member

Can you quote job_working_directory and new_file_path ?

This comment has been minimized.

Copy link
@bernt-matthias

bernt-matthias May 23, 2017

Author Contributor

Done

bernt-matthias added some commits May 23, 2017

- removed uneccessary import
- added quotes to comment
@bernt-matthias

This comment has been minimized.

Copy link
Contributor Author

commented May 24, 2017

Just made it configurable via galaxi.ini

@@ -1743,7 +1743,14 @@ def _change_ownership( self, username, gid ):
job = self.get_job()
# FIXME: hardcoded path
external_chown_script = self.get_destination_configuration("external_chown_script", None)
cmd = [ '/usr/bin/sudo', '-E', external_chown_script, self.working_directory, username, str( gid ) ]

cmd = [ '/usr/bin/sudo', '-E', ]

This comment has been minimized.

Copy link
@natefoo

natefoo May 24, 2017

Member

Hardcoded sudo is actually a remnant of the original implementation. Ideally, if one needs sudo it should just be part of the value of the corresponding *_script call, at which point you'd have full control over the environment variables passed or not passed.

This comment has been minimized.

Copy link
@bernt-matthias

bernt-matthias May 25, 2017

Author Contributor

I dont get what you are suggesting. We could have a little bash helper/wrapper:

dosudo.sh:

#!/bin/bash
sudo -E AND_WHATEVER_IS_NEEDED "$@"

From python we could then call: dosudo.sh externalhelper.py ...

This comment has been minimized.

Copy link
@bernt-matthias

bernt-matthias May 26, 2017

Author Contributor

I think I got it. I moved the sudo calls to galaxy.ini

self.__user_system_pwent = pwd.getpwnam( job.user.email.split('@')[0] )
except:
pass

This comment has been minimized.

Copy link
@natefoo

natefoo May 24, 2017

Member

The extra whitespace here and below isn't needed (sorry for the nitpick).

This comment has been minimized.

Copy link
@bernt-matthias

bernt-matthias May 25, 2017

Author Contributor

With the helper/wrapper it would be configurable (at a single place)

cmd = [ '/usr/bin/sudo', '-E', external_chown_script, self.working_directory, username, str( gid ) ]

cmd = [ '/usr/bin/sudo', '-E', ]
for v in ["PATH", "LD_LIBRARY_PATH", "PKG_CONFIG_PATH"]:

This comment has been minimized.

Copy link
@natefoo

natefoo May 24, 2017

Member

I'd prefer that this list be configurable, but see also my comment about not defaulting to using sudo at all.

@@ -282,8 +282,30 @@ def create_paramfile( trans, uploaded_datasets ):
"""
def _chown( path ):
try:
pwent = pwd.getpwnam( trans.user.email.split('@')[0] )
cmd = [ '/usr/bin/sudo', '-E', trans.app.config.external_chown_script, path, pwent[0], str( pwent[3] ) ]
# get username from email/username

This comment has been minimized.

Copy link
@natefoo

natefoo May 24, 2017

Member

There is a lot of duplication here that should be moved to a single function or method called by all who need it.

This comment has been minimized.

Copy link
@bernt-matthias

bernt-matthias May 25, 2017

Author Contributor

Indeed. Would the helper script solve this problem already? Otherwise could you be more precise what should be packed in a function (just getting the user or also calling the external scripts). Where would such a function fit? I.e. in which python module?

@bernt-matthias

This comment has been minimized.

Copy link
Contributor Author

commented May 26, 2017

Currently remaining is the request of @natefoo to put the code getting the user in a function. The question is where to put this in the galaxy libraries?

Of course I should also extend the documentation.

@bernt-matthias

This comment has been minimized.

Copy link
Contributor Author

commented Jun 6, 2017

Can anybody tell me how to continue from here? I would like to finish this PR.

@rhpvorderman

This comment has been minimized.

Copy link
Contributor

commented Jun 13, 2017

Hi there,
We would also like to use this function so I have fixed the duplication of code by putting a new function in model/__init__.py
EDIT: Also removed the duplaction of code in the external_chown_script.py

Changes can be found here

@bernt-matthias
You can merge these changes into your branch with the following commands:

# Go to your galaxy repo
git remote add rhpvorderman https://github.com/rhpvorderman/galaxy.git
git fetch rhpvorderman
git merge rhpvorderman/fix_duplication
# commit the changes and push them.

Thanks for the great code! This was exactly what we needed.

@bernt-matthias

This comment has been minimized.

Copy link
Contributor Author

commented Jun 13, 2017

Thanks a lot to @rhpvorderman for the contributions.

external_chown_script = self.get_destination_configuration("external_chown_script", None)
cmd = [ '/usr/bin/sudo', '-E', external_chown_script, self.working_directory, username, str( gid ) ]
cmd = external_chown_script.split()

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek Jun 13, 2017

Member

shlex.split(external_chown_script) would be better here.

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek Jun 13, 2017

Member

And you also need to check that external_chown_script is not None.

cmd = [ '/usr/bin/sudo', '-E', trans.app.config.external_chown_script, path, pwent[0], str( pwent[3] ) ]
# get username from email/username
pwent = trans.user.system_user_pwent(trans.app.config.real_system_username)
cmd = trans.app.config.external_chown_script.split()

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek Jun 13, 2017

Member

same here, shlex.split(trans.app.config.external_chown_script) would be preferable.

@@ -13,6 +13,7 @@
import os
import socket
import time
import pwd

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek Jun 13, 2017

Member

this needs to go before import time to satisfy the linter.

This comment has been minimized.

Copy link
@rhpvorderman

rhpvorderman Jun 13, 2017

Contributor

My bad! I think it should go before import socket to satisfy the alphabetic order.

(stdoutdata, stderrdata) = p.communicate()
exitcode = p.returncode
if exitcode != 0:
sys.stderr.write("external_chown_script: could not chown\ncmd was %s\n" % " ".join(cmd))

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek Jun 13, 2017

Member

you can simplify this to sys.exit("external_chown_script: could not chown\ncmd was %s\n" % " ".join(cmd)").

@rhpvorderman

This comment has been minimized.

Copy link
Contributor

commented Jun 13, 2017

Thanks for the review @mvdbeek! If you are too busy @bernt-matthias or need help, please let me know and then I will pick it up We are eager to get galaxy running on our cluster and you have saved me lots of time in writing this.

@bernt-matthias

This comment has been minimized.

Copy link
Contributor Author

commented Jun 13, 2017

I can incorporate the changes.

implemented the changes requested by @mvdbeek:
- check for external_chown_script == None
- switched from string split to shlex.split
- changed import order for pwd
- stderr + exit -> just exit
@@ -283,7 +284,7 @@ def _chown( path ):
try:
# get username from email/username
pwent = trans.user.system_user_pwent(trans.app.config.real_system_username)

This comment has been minimized.

Copy link
@bernt-matthias

bernt-matthias Jun 13, 2017

Author Contributor

I was wondering if its possible to use the get_destination_configuration function here. If yes: what would be the access path. Otherwise how can we find out that the variable is not defined in the config?

This comment has been minimized.

Copy link
@rhpvorderman

rhpvorderman Jun 13, 2017

Contributor

This is already done in the system_user_pwent function by the following code:

# Checks if real_system_username is either user_email or username
       else:
           log.warning("invalid configuration of real_system_username")
           system_user_pwent = None

A log warning should be issued when the variable is not defined.

@@ -315,8 +315,9 @@ def stop_job( self, job ):
if kill_script is None:
self.ds.kill( ext_id )
else:
# FIXME: hardcoded path
subprocess.Popen( [ '/usr/bin/sudo', '-E', kill_script, str( ext_id ), str( self.userid ) ], shell=False )
command = kill_script.split()

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek Jun 13, 2017

Member

Here it would also be better to use shlex.split(kill_script), I should have mentioned it in the previous review round.

for script_argument in script_parts[1:]:
command.append(script_argument)

command = external_runjob_script.split()

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek Jun 13, 2017

Member

shlex.split

@mvdbeek

This comment has been minimized.

Copy link
Member

commented Jun 13, 2017

@galaxybot test this

command.extend( [ str(username), jobtemplate_filename ] )
command.append( "--assign_all_groups" )

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek Jun 13, 2017

Member

You can add this within the galaxy.ini (drmaa_external_runjob_script = sudo -E scripts/drmaa_external_runner.py --assign_all_groups), so I would prefer to not have this hardcoded here.

@galaxyproject galaxyproject deleted a comment from mvdbeek Jun 13, 2017


def validate_paramters():

def validate_paramaters():

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Jun 13, 2017

Member

Typo: s/paramaters/parameters/

os.system('chgrp -Rh %s %s' % (gid, path))
path, galaxy_user_name, gid = validate_paramaters()
for cmd in [[ 'chown', '-Rh', galaxy_user_name, path ], [ 'chgrp', '-Rh', gid, path ]]:
p = os.subprocess.Popen(cmd, shell=False, stdout=os.subprocess.PIPE, stderr=os.subprocess.PIPE)

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Jun 13, 2017

Member

You should change os.subprocess to subprocess (3 times).

bernt@informatik.uni-leipzig.de added some commits Jun 13, 2017

bernt@informatik.uni-leipzig.de
bug fixes
- system_user_pwent: undefined return value when KeyError
- external_chown_script:
  * typo
  * subprocess instead of os.subprocess
@bernt-matthias

This comment has been minimized.

Copy link
Contributor Author

commented Jun 13, 2017

I will test this tomorrow on our galaxy instance. But I can only test the behavior for the case where the username is taken directly (not from the mail). I will report possible errors / success. Can somebody test the other case?

@bernt-matthias

This comment has been minimized.

Copy link
Contributor Author

commented Jun 14, 2017

Unfortunately I could not test today because I did not have write access to the external scripts (I hope our admins change this by tomorrow). Therefore I started to document the changes in galaxy-hub. See galaxyproject/galaxy-hub#270 . Maybe you could have a look -- I might have forgotten something.

#drmaa_external_killjob_script = sudo -E scripts/drmaa_external_killer.py
#external_chown_script = sudo -E scripts/external_chown_script.py

# For the job submission as the actual system user galaxy can extract

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Jun 14, 2017

Member

user galaxy -> user, Galaxy


# For the job submission as the actual system user galaxy can extract
# the user name from the email address (actually the local-part before the @)
# or the usename which are both stored in the galaxy data base.

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Jun 14, 2017

Member

usename -> username and galaxy data base -> Galaxy database

@nsoranzo

This comment has been minimized.

Copy link
Member

commented Jun 15, 2017

Looks good to me, but I'm not a user of this feature. May be of interest for @moskalenko.

@bernt-matthias

This comment has been minimized.

Copy link
Contributor Author

commented Jun 15, 2017

I have successfully tested the code for univa gridengine using the username configuration. I cannot test it for the email configuration or other drmaa systems.

@natefoo natefoo merged commit 153b8d4 into galaxyproject:dev Jun 27, 2017

5 checks passed

api test Build finished. 279 tests run, 0 skipped, 0 failed.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
framework test Build finished. 150 tests run, 0 skipped, 0 failed.
Details
integration test Build finished. 34 tests run, 0 skipped, 0 failed.
Details
toolshed test Build finished. 579 tests run, 0 skipped, 0 failed.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.