I've recently started using StarCluster with the IPython plugin (from develop branch) and found I wanted to support cases where I can restart the ipcluster on the same ec2 nodes but don't lose currently running tasks. I modified StarCluster's IPython plugin to support this (linked below) and would love to know if this is worth integrating into the current StarCluster plugin.
Problem: The current implementation of the IPython plugin, ipcluster.py, does a hard kill of ipengines, which means we potentially lose long-running tasks.
Proposed Solution: Initiate ipclusters with the "--cluster-id <...>" parameter. When we call ipcluster.IPClusterStop and ipcluster.IPClusterRestartEngines, provide user the option to select which cluster-ids to keep alive or which cluster-ids to kill. Also, all proposed changes are backwards compatible.
I have a currently functioning design of this linked below. If this sounds like it's worth integrating into StarCluster, I would love some feedback on these current design problems:
I created a backwards compatible implementation of the above proposed solution to integrate --cluster-id to manage concurrent ipcluster instances. I created a PR for this here:
I'd very much appreciate your feedback, as this could be quite useful for others.
closing this as it's linked to above.