Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logging during for script integrity exceptions #4720

Merged
merged 5 commits into from Sep 28, 2017

Conversation

Projects
None yet
3 participants
@erasche
Copy link
Member

erasche commented Sep 28, 2017

erasche added some commits Sep 28, 2017

Log more information on command execution
So we can tell *why* it failed on not just that it failed.

@erasche erasche added this to the 18.01 milestone Sep 28, 2017

@erasche erasche requested a review from mvdbeek Sep 28, 2017

@erasche erasche referenced this pull request Sep 28, 2017

Closed

Integrity exceptions #4719

except Exception:
pass
except Exception as e:
log.debug("Error syncing the filesystem %s", unicodify(e))

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Sep 28, 2017

Member

I'd add a colon after filesystem.

except Exception:
pass
except OSError as ose:
log.debug("Script not available yet %s", unicodify(ose))

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Sep 28, 2017

Member

These 2 lines are redundant since the error message is the same as below and OSError is a subclass of Exception.

This comment has been minimized.

Copy link
@erasche

erasche Sep 28, 2017

Author Member

Originally I'd had a different behaviour here. Will collapse.

except OSError as ose:
log.debug("Script not available yet %s", unicodify(ose))
except Exception as exc:
log.debug("Script not available yet %s", unicodify(exc))

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Sep 28, 2017

Member

Colon after yet?

@erasche

This comment has been minimized.

Copy link
Member Author

erasche commented Sep 28, 2017

This will conflict with @mvdbeek's #4718

except Exception:
pass
except Exception as exc:
log.debug("Script not available yet: %s", unicodify(exc))

This comment has been minimized.

Copy link
@mvdbeek

mvdbeek Sep 28, 2017

Member

Can you add a sleep here ? This will just cycle very rapidly through the for loop.

galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
galaxy.jobs.runners: DEBUG: (170) command is: rm -rf working; mkdir -p working; cd working; /tmp/tmpp_sF2R/tmpw89Dhd/tmphkVELS/database/job_working_directory_XR19i2/000/170/tool_script.sh; return_code=$?; cd '/tmp/tmpp_sF2R/tmpw89Dhd/tmphkVELS/database/job_working_directory_XR19i2/000/170'; 
[ "$GALAXY_VIRTUAL_ENV" = "None" ] && GALAXY_VIRTUAL_ENV="$_GALAXY_VIRTUAL_ENV"; _galaxy_setup_environment True
python "/tmp/tmpp_sF2R/tmpw89Dhd/tmphkVELS/database/job_working_directory_XR19i2/000/170/set_metadata_kPziaV.py" "/tmp/tmpp_sF2R/tmpw89Dhd/tmphkVELS/database/job_working_directory_XR19i2/000/170/registry.xml" "/tmp/tmpp_sF2R/tmpw89Dhd/tmphkVELS/database/job_working_directory_XR19i2/000/170/working/galaxy.json" "/tmp/tmpp_sF2R/tmpw89Dhd/tmphkVELS/database/job_working_directory_XR19i2/000/170/metadata_in_HistoryDatasetAssociation_178_wJ5pP7,/tmp/tmpp_sF2R/tmpw89Dhd/tmphkVELS/database/job_working_directory_XR19i2/000/170/metadata_kwds_HistoryDatasetAssociation_178_61XxNB,/tmp/tmpp_sF2R/tmpw89Dhd/tmphkVELS/database/job_working_directory_XR19i2/000/170/metadata_out_HistoryDatasetAssociation_178_xNFL_j,/tmp/tmpp_sF2R/tmpw89Dhd/tmphkVELS/database/job_working_directory_XR19i2/000/170/metadata_results_HistoryDatasetAssociation_178_6ZpSVA,/tmp/tmpp_sF2R/tmpw89Dhd/tmphkVELS/database/files/000/dataset_178.dat,/tmp/tmpp_sF2R/tmpw89Dhd/tmphkVELS/database/job_working_directory_XR19i2/000/170/metadata_override_HistoryDatasetAssociation_178_fPqz65" 5242880; sh -c "exit $return_code"
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
galaxy.jobs.runners.local: DEBUG: (170) executing job script: /tmp/tmpp_sF2R/tmpw89Dhd/tmphkVELS/database/job_working_directory_XR19i2/000/170/galaxy_170.sh
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
galaxy.jobs: DEBUG: (170) Persisting job destination (destination id: local:///)
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
requests.packages.urllib3.connectionpool: INFO: Starting new HTTP connection (1): localhost
galaxy.jobs.runners.local: DEBUG: execution finished: /tmp/tmpp_sF2R/tmpw89Dhd/tmphkVELS/database/job_working_directory_XR19i2/000/167/galaxy_167.sh
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
requests.packages.urllib3.connectionpool: DEBUG: "GET /api/histories/76bc6b8e6396dad0?key=c2c1276111cfc9584c170781285a636e HTTP/1.1" 200 None
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
galaxy.model.metadata: DEBUG: loading metadata from file for: HistoryDatasetAssociation 175
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
requests.packages.urllib3.connectionpool: INFO: Starting new HTTP connection (1): localhost
galaxy.jobs.runners.util.job_script: DEBUG: Script not available yet: [Errno 26] Text file busy
galaxy.jobs.runners: ERROR: (169) Unhandled exception calling queue_job
Traceback (most recent call last):
  File "/galaxy/lib/galaxy/jobs/runners/__init__.py", line 104, in run_next
    method(arg)
  File "/galaxy/lib/galaxy/jobs/runners/local.py", line 87, in queue_job
    command_line, exit_code_path = self.__command_line(job_wrapper)
  File "/galaxy/lib/galaxy/jobs/runners/local.py", line 76, in __command_line
    self.write_executable_script(job_file, job_file_contents)
  File "/galaxy/lib/galaxy/jobs/runners/__init__.py", line 327, in write_executable_script
    write_script(path, contents, self.app.config, mode=mode)
  File "/galaxy/lib/galaxy/jobs/runners/util/job_script/__init__.py", line 112, in write_script
    _handle_script_integrity(path, config)
  File "/galaxy/lib/galaxy/jobs/runners/util/job_script/__init__.py", line 146, in _handle_script_integrity
    raise Exception("Failed to write job script, could not verify job script integrity.")
Exception: Failed to write job script, could not verify job script integrity.

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Sep 28, 2017

Member

I was also going to suggest to move:

time.sleep(sleep_amt)

from line 141 to here. That would basically have the same effect of #4718.

Edit: I mean outside the external try/except.

This comment has been minimized.

Copy link
@erasche

erasche Sep 28, 2017

Author Member

merge @mvdbeek's then, I'll rebase mine accordingly.

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Sep 28, 2017

Member

I think it's better with the nested try/except as here, but I can push to your branch if you want.

@mvdbeek mvdbeek added the kind/bug label Sep 28, 2017

@mvdbeek

This comment has been minimized.

Copy link
Member

mvdbeek commented Sep 28, 2017

I've added the bug label -- I think this is why the job integrity check didn't work on some remote file systems. We should probably backport this.

@mvdbeek mvdbeek merged commit 1f0086d into galaxyproject:dev Sep 28, 2017

6 checks passed

api test Build finished. 292 tests run, 4 skipped, 0 failed.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
framework test Build finished. 161 tests run, 0 skipped, 0 failed.
Details
integration test Build finished. 46 tests run, 0 skipped, 0 failed.
Details
lgtm analysis: JavaScript No alert changes
Details
toolshed test Build finished. 579 tests run, 0 skipped, 0 failed.
Details
@nsoranzo

This comment has been minimized.

Copy link
Member

nsoranzo commented Sep 28, 2017

Great team work, thanks @erasche @mvdbeek!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.