Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ipcluster plugin to allow specification of num_engines #379

Closed
wants to merge 11 commits into from

Conversation

corydolphin
Copy link
Contributor

Hello,

I am new to Starcluster, but have found the package extremely useful in running a parameter sweeping grid search training sklearn models. With my particular problem, each job requires a large amount of memory relative to the number of CPUs (compensating with memory optimized instances is not sufficient, each job takes ~40GB of memory when training a model). Thus, I needed to limit the number of ipengines on each node in the cluster. I edited the ipcluster plugin such that it supports this optional parameter, with the default behavior matching that of the original implementation.

I believe that others may find this modification useful, and I would love feedback on whether or not such a change is interesting to the team.

There are two oddities with the implementation that I wish to discuss:

  1. It requires the IPClusterRestartEngines plugin to also specify the number of engines
  2. It likely requires changes depending on the instance type. Alternatively, it would be trivial to specify an amount of memory per engine, i.e. start an engine for each 40GB of memory; this however may be difficult to explain.

Thanks for sharing this wonderful project, and I hope others find the limitation on number of engines useful.
Cory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant