Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use HTCondor python API #1933

Merged
merged 19 commits into from
Jul 26, 2021
Merged

Use HTCondor python API #1933

merged 19 commits into from
Jul 26, 2021

Conversation

mesmith75
Copy link
Contributor

@mesmith75 mesmith75 commented Jun 24, 2021

Start on #1916

  • Submit
  • Kill
  • Resubmit
  • Query
  • peek
  • Hold, release + other extras

@mesmith75 mesmith75 requested a review from egede June 24, 2021 10:26
@mesmith75 mesmith75 changed the title Add htcondor to requirements WIP: Add htcondor to requirements Jun 24, 2021
@mesmith75 mesmith75 changed the title WIP: Add htcondor to requirements Use HTCondor python API Jun 24, 2021
egede
egede previously approved these changes Jun 24, 2021
@mesmith75 mesmith75 changed the title Use HTCondor python API WIP: Use HTCondor python API Jun 24, 2021
@mesmith75 mesmith75 added this to the 8.5.1 milestone Jun 24, 2021
@egede
Copy link
Member

egede commented Jun 25, 2021

Remember that there is a new directory GangaCore/test/Condor where tests of HTCondor can be placed.

@mesmith75
Copy link
Contributor Author

The monitoring is currently returning an error:

ERROR    Exception raised executing 'updateMonitoringInformation' in Thread 'Backend Monitor':
Traceback (most recent call last):
  File "/afs/cern.ch/user/m/masmith/cmtuser/GANGA/GANGA_HEAD/install/ganga/ganga/GangaCore/Core/GangaThread/WorkerThreads/WorkerThreadPool.py", line 124, in __worker_thread
    result = item.command_input.function(*these_args, **item.command_input.kwargs)
  File "/afs/cern.ch/user/m/masmith/cmtuser/GANGA/GANGA_HEAD/install/ganga/ganga/GangaCore/Lib/Condor/Condor.py", line 480, in updateMonitoringInformation
    stati = schedd.query(constraint = expr_tree, projection = ["ClusterId", "ProcId", "JobStatus", "RemoteUserCpu","AllRemoteHosts"])
  File "/afs/cern.ch/user/m/masmith/gangatest_python3/lib/python3.6/site-packages/htcondor/_deprecation.py", line 111, in wrapper
    return method(self, *args, **kwargs)
  File "/afs/cern.ch/user/m/masmith/gangatest_python3/lib/python3.6/site-packages/htcondor/_lock.py", line 69, in wrapper
    rv = func(*args, **kwargs)
htcondor.HTCondorIOError: Failed to fetch ads from schedd, errmsg=SECMAN:2007:Failed to end classad message.

I am not sure if this is a fatal issue or something we can hide

@mesmith75
Copy link
Contributor Author

mesmith75 commented Jul 1, 2021

This is giving an error when trying to access a sandbox:

Traceback (most recent call last):
  File "./Ganga_13_Executable", line 74, in getPackedInputSandbox
    with closing(tarfile.open(tarpath, "r:*")) as tf:
  File "/usr/lib64/python3.6/tarfile.py", line 1573, in open
    return func(name, "r", fileobj, **kwargs)
  File "/usr/lib64/python3.6/tarfile.py", line 1638, in gzopen
    fileobj = gzip.GzipFile(name, mode + "b", compresslevel, fileobj)
  File "/usr/lib64/python3.6/gzip.py", line 163, in __init__
    fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
PermissionError: [Errno 13] Permission denied: '/afs/cern.ch/work/m/masmith/GangaTest/test/repos/workspace/masmith/LocalXML/13/input/_input_sandbox_13_master.tgz'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./Ganga_13_Executable", line 146, in <module>
    getPackedInputSandbox( inFile )
  File "./Ganga_13_Executable", line 77, in getPackedInputSandbox
    raise Exception("Error opening tar file: %s" % tarpath)
Exception: Error opening tar file: /afs/cern.ch/work/m/masmith/GangaTest/test/repos/workspace/masmith/LocalXML/13/input/_input_sandbox_13_master.tgz

It looks as if these jobs cannot access the AFS file system

@egede
Copy link
Member

egede commented Jul 12, 2021

Link to CERN helpdesk request

@egede
Copy link
Member

egede commented Jul 13, 2021

So, it seems like it is possible to preserve the kerberos tickets using the Python API as well. Just make sure that the code for that is written such that it doesn't break the Condor backend in places without afs.

@mesmith75
Copy link
Contributor Author

@egede I think this is ready - all the existing functionality has been updated (there is no python binding for condor_tail yet).

I'll put any other extras in another PR, along with the fix to the outputfiles.

@mesmith75 mesmith75 changed the title WIP: Use HTCondor python API Use HTCondor python API Jul 23, 2021
@egede egede merged commit 4d22512 into develop Jul 26, 2021
@egede egede deleted the mesmith75-patch-1 branch July 26, 2021 22:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants