Skip to content
This repository has been archived by the owner on Jul 7, 2023. It is now read-only.

v0.1.14

Compare
Choose a tag to compare
@hrichardlee hrichardlee released this 22 Jul 20:46
· 330 commits to main since this release

New features:

  • Added the ability to specify more requirements on EC2 instances, e.g. GPUs, GPU memory, AVX512, etc.
  • Added support for git repo dependencies in pip requirements.txt and poetry project files.
  • Added the ability to open an arbitrary port for a job
  • Automatically set the working directory to be the remote equivalent of the current working directory so that relative paths mostly work as expected
  • Add the ability to request arbitrary apt packages in addition to a pip/poetry/conda file
  • Added a /meadowrun/machine_cache folder for containers on the same machine to share files

Improvements:

  • SSH connections are much faster, most noticeable in run_map as a result of switching from fabric to asyncssh
  • Change behavior when instances can't be created because of a quota. Previously we would just give up, new behavior is to just try more expensive instances if they are available.
  • Stdout from the remote machine shows up on the local machine much more quickly
  • Delete containers when we are done with them
  • Deallocate jobs when the client is terminated. Also convert the deallocate_jobs.py cron job to a systemd unit so that it runs more frequently (every 30 seconds for now)
  • Check for spot interruptions and prevent further allocations
  • Set the idle timeout for automatically cleaning up machines to 5 minutes. Print out surviving machines on manual clean up.

Bug fixes:

  • Fixes a bug where we did not take interruption probability into account when assigning jobs to existing instances
  • Fixes a bug where the background deallocate_jobs.py process was not running correctly on Azure
  • Fixes a bug where mirroring the current pip interpreter failed if pip was out of date

Full Changelog: v0.1.13...v0.1.14