Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

GridMap 0.13.0

@dan-blanchard dan-blanchard released this · 11 commits to master since this release

Fixes:

  • Remove ETS-specific path cleaning code (#35)
  • Module paths are now prepended to sys.path instead of appended (#26)
  • Made JobMonitor more resilient to SMTP settings problems (#34)

Improvements:

  • Improve exception handling when trying to send back job results.
  • Heartbeats start before fetching input to prevent invalid crash detection
  • Test against 3.4 instead of 3.3 on Travis
  • Updated copyright notices to say 2014
  • Add INFO-level logging messages about how jobs are running
  • Switch to using importlib instead of using __import__
  • Job name is no longer set using DRMAA native specification and instead uses JobTemplate.jobName.

Downloads

GridMap 0.12.5

@dan-blanchard dan-blanchard released this · 31 commits to master since this release

Fix issue where _process_jobs_locally would not work with max_processes > 1

Downloads

Version 0.12.4

@dan-blanchard dan-blanchard released this · 34 commits to master since this release

  • Added max_processes argument to grid_map function for consistency.

Downloads

Version 0.12.3

@dan-blanchard dan-blanchard released this · 37 commits to master since this release

Fixes local mode fallback when DRMAA Python isn't available.

Downloads

Version 0.12.2

@dan-blanchard dan-blanchard released this · 42 commits to master since this release

Just fixed a couple minor issues.

  • exception is now the properly set as the cause of death when a job encounters an exception.
  • Fixed a potential memory leak in the qmaster process caused by not cleaning up job info as recommended in the DRMAA Python documentation.
  • Changed default session_id in JobMonitor to None to be more Pythonic, instead of -1 like it was before.

Downloads

Version 0.12.1

@dan-blanchard dan-blanchard released this · 56 commits to master since this release

With the previous release things could still go wrong if a process died at just the wrong moment when we're trying to get it's status, so I've added some exception handling to take care of that. I've also:

  • Added a --version option for gridmap_web
  • Fixed an issue where log files weren't being attached to error reports.
  • Changed the wording of some logging messages.

Downloads

Version 0.12.0

@dan-blanchard dan-blanchard released this · 61 commits to master since this release

This release mostly features greatly improved reliability of stalled job detection, but also includes some refactoring. Here's the complete list:

  • Modified CPU load calculations used to determine if a job is stalled now included all of the children of a process. Before, if a parent process was sleeping and children were doing all the work, the job would get incorrectly detected as stalled and be resubmitted. This was particularly problematic for SKLL.
  • CPU usage and memory histories are now reset when a job is resubmitted. This means error emails will contain more sensible graphs for resubmitted jobs.
  • Now raise a JobException if we give up on a job instead of ending up in a bad state.
  • Renamed SEND_ERROR_MAILS environment variable to SEND_ERROR_MAIL.
  • Removed deprecated pg_map function. It was replaced by grid_map in 0.9.2
  • Removed runner module from generated API documentation, because no one should really need to use it directly.
  • Renamed Job.job_id to Job.id
  • Added missing local option to grid_map.
  • Added a bunch more unit tests.

Downloads

Version 0.11.4

@dan-blanchard dan-blanchard released this · 99 commits to master since this release

Fix typo in gridmap.runner.get_memory_usage

Downloads

Version 0.11.3

@dan-blanchard dan-blanchard released this · 106 commits to master since this release

Bug-fix release. Changes are:

  • JobMonitor is now a context manager so that all jobs get killed when an exception occurs.
  • All jobs are now killed if a single job encounters an exception.
  • Jobs no longer pass back exceptions as strings, where they go totally unnoticed.
  • Cleaned up debug output a bit.
  • Much prettier tracebacks for job exceptions.

Downloads

Version 0.11.2

@dan-blanchard dan-blanchard released this · 132 commits to master since this release

Just a minor bugfix release. Changes are:

  • Switched to using official version of drmaa-python, because it is now up-to-date on PyPI.
  • Added .gitatttributes file to keep line endings normalized.
  • Fixed issue where jobs that exceed resubmission limit would infinitely send error reports.
  • We now check to see if the DRMAA C library was imported correctly and switch to local mode if it wasn't, instead of just crashing (#19).
  • Switch to using psutil for process CPU and memory stats (#18).

Downloads

Version 0.11.1

@dan-blanchard dan-blanchard released this · 144 commits to master since this release

  • Made web front-end for job monitoring a separate script, gridmap_web, since it can be used to talk to any JobMonitor instance. (Fixes #14)
  • Fixed crash if a stalled job comes back from the dead (#15).
  • Fixed crash if job's hostname is somehow not in white list and the job needs to be resubmitted (#16).
  • Fixed crash from trying to set matplotlib back-end multiple times.
  • Cleaned up some imports and removed some unused variables.

Downloads

Version 0.11.0

@dan-blanchard dan-blanchard released this · 159 commits to master since this release

  • Vastly more reliable job completion information thanks to switch back to using 0MQ for communication with worker nodes. No more unpickling exceptions because the SGE DRMAA implementation frequently liked to say jobs were finished when they were not.
  • Add back web monitor to report basic job status.
  • Switch to using custom fork of drmaa-python until pygridtools/drmaa-python#4, which fixes Python 3 compatibility issues, gets merged.
  • Now creates temporary directory for storing log files if it doesn't exist.
  • Travis-CI SGE installation has been streamlined.
  • Switch to using sphinx and readthedocs for documentation.
  • Added detection of stalled jobs. GridMap will also automatically restart any jobs that appear stuck (up to 3 times by default), and email you a report describing their CPU and memory usage over time.

Downloads

Version 0.10.3

@dan-blanchard dan-blanchard released this · 259 commits to master since this release

  • Fix issue where clean_path wasn't being called on the working directory, which was causing ETS-specific issues.
  • Add a couple workarounds for issues with setting environment variables in Python 3.
  • Made examples into unit tests and added first attempt at getting Travis setup with SGE.

Downloads

Version 0.10.2

@dan-blanchard dan-blanchard released this · 269 commits to master since this release

  • Working directory is now correctly set for each job.
  • Simplified handling of environment variables. Should now all be passed on properly.

Downloads

Version 0.10.1

@dan-blanchard dan-blanchard released this · 272 commits to master since this release

  • Can now import JobException directly from gridmap package instead of having to import from gridmap.job.

Downloads

Version 0.10.0

@dan-blanchard dan-blanchard released this · 273 commits to master since this release

  • Now raise a JobException instead of an Exception when one of the jobs has crashed.
  • Fixed potential pip installation issue from importing package for version number.

Downloads

v0.9.9

@dan-blanchard dan-blanchard released this · 279 commits to master since this release

  • Changed way job results are retrieved to be a bit more efficient in cases of errors.
  • All job metadata is now retrieved before job output is, which should hopefully alleviate issues where we can't get the metadata because its been flushed too quickly by the grid engine.

Downloads

v0.9.8

@dan-blanchard dan-blanchard released this · 285 commits to master since this release

  • Fixed a bug where only the first error was still showing because of an extra exception caused by job_output being undefined.
  • Fixed unhandled Exception with error code 24 (since somehow that is not an InvalidJobException, but just an Exception in drmaa-python).

Downloads

v0.9.7

@dan-blanchard dan-blanchard released this · 287 commits to master since this release

  • No longer dies with InvalidJobException when failing to retrieve job metadata from DRMAA service.
  • Now print all exceptions encountered for jobs submitted instead of just exiting after first one.
  • Die via exception instead of sys.exit when there were problems with some of the submitted jobs.

Downloads

v0.9.6

@dan-blanchard dan-blanchard released this · 292 commits to master since this release

  • Fixed bug where jobs were being aborted before they ran.

Downloads

v0.9.5

@dan-blanchard dan-blanchard released this · 293 commits to master since this release

  • Fixed bug where GRID_MAP_USE_MEM_FREE would only be interpretted as true if spelled 'True'.
  • Added documentation describing how to override constants.

Downloads

v0.9.4

@dan-blanchard dan-blanchard released this · 297 commits to master since this release

  • Added support for overriding the default queue and other constants via environment variables. For example, to change the default queue, just set the environment variable GRID_MAP_DEFAULT_QUEUE.
  • Substantially more information is given about crashing jobs when we fail to unpickle the results from the Redis database.

Downloads

v0.9.3

@dan-blanchard dan-blanchard released this · 302 commits to master since this release

  • Fixed serious bug where gridmap could not be imported in some instances.
  • Refactored things a bit so there is no longer one large module with all of the code in it. (Doesn't change package interface)

Downloads

Something went wrong with that request. Please try again.