This repository has been archived by the owner on Jul 7, 2023. It is now read-only.
v0.1.14
New features:
- Added the ability to specify more requirements on EC2 instances, e.g. GPUs, GPU memory, AVX512, etc.
- Added support for git repo dependencies in pip requirements.txt and poetry project files.
- Added the ability to open an arbitrary port for a job
- Automatically set the working directory to be the remote equivalent of the current working directory so that relative paths mostly work as expected
- Add the ability to request arbitrary apt packages in addition to a pip/poetry/conda file
- Added a /meadowrun/machine_cache folder for containers on the same machine to share files
Improvements:
- SSH connections are much faster, most noticeable in
run_map
as a result of switching from fabric to asyncssh - Change behavior when instances can't be created because of a quota. Previously we would just give up, new behavior is to just try more expensive instances if they are available.
- Stdout from the remote machine shows up on the local machine much more quickly
- Delete containers when we are done with them
- Deallocate jobs when the client is terminated. Also convert the deallocate_jobs.py cron job to a systemd unit so that it runs more frequently (every 30 seconds for now)
- Check for spot interruptions and prevent further allocations
- Set the idle timeout for automatically cleaning up machines to 5 minutes. Print out surviving machines on manual clean up.
Bug fixes:
- Fixes a bug where we did not take interruption probability into account when assigning jobs to existing instances
- Fixes a bug where the background deallocate_jobs.py process was not running correctly on Azure
- Fixes a bug where mirroring the current pip interpreter failed if pip was out of date
Full Changelog: v0.1.13...v0.1.14