Skip to content

v1.13.1

Compare
Choose a tag to compare
@eapolinario eapolinario released this 29 Jul 17:23
· 128 commits to master since this release
b79c7a3

Notes

flytekitplugins-kfpytorch

The distributed pytorch and distributed elastic-pytorch tasks in flytekitplugins-kfpytorch by default increase the shared memory limit by mounting an emptyDir volume with medium Memory to to /dev/shm as this is almost always required when working with torch multiprocessing (e.g. multi-processed data loader workers or local worker group in distributed training). To disable this, pass increase_shared_mem=False to task_config=PyTorch/Elastic. Elastic tasks now also set a default join timeout of 15 minutes to prevent timeouts when some worker pods require a node scale-up. This setting can be modified via task_config=Elastic(rdzv_configs{...}).

What's Changed

Full Changelog: v1.13.0...v1.13.1