Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First version of cluster web service. #1383

Merged
merged 17 commits into from Mar 9, 2012
Merged

Conversation

ellisonbg
Copy link
Member

Exposes the management of ipclusters in the notebook web app.

@minrk
Copy link
Member

minrk commented Feb 7, 2012

With this code, clusters are enforced as singletons per-profile. Is this intended? The cluster-id configurable was added to allow multiple clusters simultaneous clusters in a given profile.

@ellisonbg
Copy link
Member Author

Initially I was thinking the notebook would enforce the single
instance per profile. This is mainly to keep things simple. But this
is definitely something we can think about relaxing. I want to get
the initial UI in place first to see how things look.

On Tue, Feb 7, 2012 at 1:33 PM, Min RK
reply@reply.github.com
wrote:

With this code, clusters are enforced as singletons per-profile.  Is this intended?  The cluster-id configurable was added to allow multiple clusters simultaneous clusters in a given profile.


Reply to this email directly or view it on GitHub:
#1383 (comment)

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

help="Command line arguments to pass to ipcluster.")
ipcluster_subcommand = Unicode('start')
ipcluster_profile = Unicode('default')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other launchers use profile_dir directly, I think to avoid relying upon environment-variables. Presumably the IPCluster launcher should be consistent (I would not object to other launchers using profile-names instead of profile-dirs, but they should at least match).

Also, the profile and n traits should probably not have the ipcluster_ prefix, as it's entirely redundant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a subtle point about the profiles. There is always a profile
in the self.config object of configurables. But the profile used for
ipcluster can be different from that and I wanted to pick a name that
reflected that. But for the ipcluster_n, you are right that there is
no reason to not just a plain "n"

In terms of the profile_dir stuff, I think for this simple version it
makes sense to use profile, because that is what the list_profiles_in
function returns. But once we move over to the launchers, we can go
the profile_dir route. Or do you think we should create a
profile->profile_dir function and use the profile_dirs from the start?

On Tue, Feb 7, 2012 at 1:51 PM, Min RK
reply@reply.github.com
wrote:

         help="Command line arguments to pass to ipcluster.")
     ipcluster_subcommand = Unicode('start')

  •    ipcluster_profile = Unicode('default')

Other launchers use profile_dir directly, I think to avoid relying upon environment-variables.  Presumably the IPCluster launcher should be consistent (I would not object to other launchers using profile-names instead of profile-dirs, but they should at least match).

Also, the profile and n traits should probably not have the ipcluster_ prefix, as it's entirely redundant.


Reply to this email directly or view it on GitHub:
https://github.com/ipython/ipython/pull/1383/files#r425457

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a subtle point about the profiles. There is always a profile
in the self.config object of configurables. But the profile used for
ipcluster can be different from that and I wanted to pick a name that
reflected that.

The app profile is not exactly relevant, because the IPClusterLauncher will not pick up the IPythonApp.profile configurable because it isn't a subclass. You still have to pass it directly in one of the two following ways:

launcher = IPClusterLauncher(profile='foo')

or in config:

IPClusterLauncher.profile = 'foo'

It is not possible to specify the ipcluster profile without already having the word 'IPCluster' in the statement, thus making the prefix redundant in all circumstances.

On the profile-dir point, I just think it makes sense for Launchers to be internally consistent, and right now they are not - the IPClusterLauncher takes a name, and all others take a dir. We already have a profile->profile_dir function, it's called ProfileDir.find_profile_by_name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I will make these changes.

On Tue, Feb 7, 2012 at 3:10 PM, Min RK
reply@reply.github.com
wrote:

         help="Command line arguments to pass to ipcluster.")
     ipcluster_subcommand = Unicode('start')

  •    ipcluster_profile = Unicode('default')

There is a subtle point about the profiles.  There is always a profile
in the self.config object of configurables.  But the profile used for
ipcluster can be different from that and I wanted to pick a name that
reflected that.

The app profile is not exactly relevant, because the IPClusterLauncher will not pick up the IPythonApp.profile configurable because it isn't a subclass.  You still have to pass it directly in one of the two following ways:

   launcher = IPClusterLauncher(profile='foo')

or in config:

   IPClusterLauncher.profile = 'foo'

It is not possible to specify the ipcluster profile without already having the word 'IPCluster' in the statement, thus making the prefix redundant in all circumstances.

On the profile-dir point, I just think it makes sense for Launchers to be internally consistent, and right now they are not - the IPClusterLauncher takes a name, and all others take a dir.  We already have a profile->profile_dir function, it's called ProfileDir.find_profile_by_name.


Reply to this email directly or view it on GitHub:
https://github.com/ipython/ipython/pull/1383/files#r425821

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

@fperez
Copy link
Member

fperez commented Feb 9, 2012

Looking great, as mentioned on email, having feedback on cluster status and a 'stop' button is critical before we can merge, I think.

@ellisonbg
Copy link
Member Author

@fperez: sorry I wasn't clear - this PR is not even close to being done. The start button does nothing and there is obviously no place to specify the number of engine, stop the cluster, etc. I mainly just wanted some UI feedback before I got too far into things.

@minrk: I have fixed the tab styling to make it cleaner. We could tweak it further, but I think is semi-respectable now.

More code on its way...

@fperez
Copy link
Member

fperez commented Feb 9, 2012

No, that's no problem; I knew this was work in progress. I just wanted to highlight which points I thought were true blockers vs others that we may just keep on refining over time. I'm trying with reviews to find a good sweet spot between ensuring that PRs do refine important problems and not holding PR authors to such impossible standards that something takes forever to review. Since some of your work here is UI, it's a good candidate for 'forever in progress', and I was just trying to highlight one key point we do want fixed before merge.

I think this is a great start, and we can use this PR as working ground until we're in a good spot to merge.

@ellisonbg
Copy link
Member Author

OK, the notebook/parallel integration is now fully working. You can start/stop clusters in the notebook and set the number of engines. The UI is super simple, but it is a good start and seems to work quite well. Needs lots of testing.

@minrk
Copy link
Member

minrk commented Feb 11, 2012

Were you going to make the changes to the ipcluster launcher traits?

@minrk
Copy link
Member

minrk commented Feb 12, 2012

In the cluster list, it probably makes sense for the first entry to be the current profile in use, or otherwise stand out.

@minrk
Copy link
Member

minrk commented Feb 12, 2012

Has the tab area gotten wider? It seems much too wide, now.

@minrk
Copy link
Member

minrk commented Feb 12, 2012

The # of engines has a default value (ncpus). When unspecified, it should use the default, rather than saying "invalid engine #".

@minrk
Copy link
Member

minrk commented Feb 12, 2012

IPClusterLauncher should detect when ipcluster exits, so it doesn't still say 'running' even after shutdown or crash. To do this, hook up a callback with Launcher.on_stop.

@ellisonbg
Copy link
Member Author

@minrk: thanks for the feedback.

  • Will change the traits names.
  • Will look at organizing the profile list.
  • Yes, the tab area is now 920px, which is approximately what most web sites are using (matches github exactly). I know that it looks a little wide, but I think we want this to give space for the items in the lists to grow UI elements.
  • Will fix the ncpus default.
  • Will detect exit using on_stop.

@minrk
Copy link
Member

minrk commented Feb 13, 2012

  • Will change the traits names.
  • Will look at organizing the profile list.

Thanks! I don't think there's any logical differentiation of the profiles other than 'current' and 'other'. If it doesn't look nice, don't worry about it, but since I have about 30 profiles for testing (most of them empty), it wasn't as obvious at a glance where 'default' was as I thought it should be. I don't imagine normal people (or just those who clean up after themselves) would have this problem.

We might also want to allow for creation of new profiles. ipcluster start --profile=thisisnew will always work, so you can continually create unconfigured clusters. But perhaps exposing the cluster-id option for concurrent clusters on a given profile will be more useful in the long run than this. In either case, certainly not necessary in this first iteration.

  • Yes, the tab area is now 920px, which is approximately what most web sites are using (matches github exactly).

Hm, while I do see that it matches the general body width of GitHub, they actually have zero elements other than control-bars, title bars, etc. that actually come anywhere close to filling that width (biggest I found was 720px), because every view they have is multi-column. Are there any elements we might eventually add in side-columns next to the lists, so they aren't so fat?

  • Will fix the ncpus default.

Make sure the default is that n is not specified, because this can be set in config files. Do not override the existing value with ncpus, essentially clobbering config with a second assignment of the default. This means that the notebook interface will not be able to know how many engines are being started, and must accept that (also true when using the SSHLaunchers). Another disadvantage of using ipcluster as a launcher instead of the individual launchers directly.

  • Will detect exit using on_stop.

Great!

@ellisonbg
Copy link
Member Author

On Sun, Feb 12, 2012 at 10:04 PM, Min RK
reply@reply.github.com
wrote:

  • Will change the traits names.
  • Will look at organizing the profile list.

Thanks!  I don't think there's any logical differentiation of the profiles other than 'current' and 'other'.  If it doesn't look nice, don't worry about it, but since I have about 30 profiles for testing (most of them empty), it wasn't as obvious at a glance where 'default' was as I thought it should be.  I don't imagine normal people (or just those who clean up after themselves) would have this problem.

We might also want to allow for creation of new profiles.  ipcluster start --profile=thisisnew will always work, so you can continually create unconfigured clusters.  But perhaps exposing the cluster-id option for concurrent clusters on a given profile will be more useful in the long run than this.  In either case, certainly not necessary in this first iteration.

  • Yes, the tab area is now 920px, which is approximately what most web sites are using (matches github exactly).

Hm, while I do see that it matches the general body width of GitHub, they actually have zero elements other than control-bars, title bars, etc. that actually come anywhere close to filling that width (biggest I found was 720px), because every view they have is multi-column. Are there any elements we might eventually add in side-columns next to the lists, so they aren't so fat?

There may be other elements we would add on the project dashboard, but
I am don't know what they would be or if a multi-column approach would
be appropriate. I was going off (OK, almost blatantly copying them..)
this view in github, which is single column 920px:

https://github.com/ipython/ipython

  • Will fix the ncpus default.

Make sure the default is that n is not specified, because this can be set in config files.  Do not override the existing value with ncpus, essentially clobbering config with a second assignment of the default.  This means that the notebook interface will not be able to know how many engines are being started, and must accept that (also true when using the SSHLaunchers).  Another disadvantage of using ipcluster as a launcher instead of the individual launchers directly.

Yes, that is how I was going to handle it. But you are (again) right
that this is another strike against the ipcluster launcher.

Cheers,

Brian

  • Will detect exit using on_stop.

Great!

W


Reply to this email directly or view it on GitHub:
#1383 (comment)

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

@minrk
Copy link
Member

minrk commented Feb 13, 2012

I was going off (OK, almost blatantly copying them..) this view in github, which is single column 920px:

Okay, I see that they match, though ours still feels much wider. Perhaps it is the vast spacing between elements, and the fact that we actually have content on both edges, while 60% of their width is a text area, which never comes close to the right edge. Our much higher contrast background/border color scheme may contribute as well.

What about moving the buttons to the far-left, and/or significantly shrinking the spacing between columns, rather than trying to fill the space just because we have it? It looks a bit silly to have all that empty non-functional space.

...this is another strike against the ipcluster launcher.

Yes, the ipcluster launcher is clearly not a long-term solution, but you are still right in starting with it, because it's so simple for the basic 'just give me some engines' case most appropriate for the notebook at this point.

@ellisonbg
Copy link
Member Author

On Mon, Feb 13, 2012 at 3:57 PM, Min RK
reply@reply.github.com
wrote:

I was going off (OK, almost blatantly copying them..) this view in github, which is single  column 920px:

Okay, I see that they match, though ours still feels much wider.  Perhaps it is the vast spacing between elements, and the fact that we actually have content on both edges, while 60% of their width is a text area, which never comes close to the right edge. Our much higher contrast background/border color scheme may contribute as well.

I agree it feels bigger. So much so that I actually went back and
forth between the two when I first implemented it - to the eye they
don't look the same at all.

What about moving the buttons to the far-left, and/or significantly shrinking the spacing between columns, rather than trying to fill the space just because we have it?  It looks a bit silly to have all that empty non-functional space.

...this is another strike against the ipcluster launcher.

Yes, the ipcluster launcher is clearly not a long-term solution, but you are still right in starting with it, because it's so simple for the basic 'just give me some engines' case most appropriate for the notebook at this point.


Reply to this email directly or view it on GitHub:
#1383 (comment)

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

@ellisonbg
Copy link
Member Author

@minrk: I just pushed a commit that implemented everything using the proper controller and engine set launchers. Everything seems to work, but this will need some careful eyes on everything. Some particulars:

  • I read the standard cluster config files to build the same config object that ipcluster would get. We need to test with various settings to make sure that everything propagates to the launchers properly.
  • If you have some dedicated cluster profiles that use non-Local launchers it would be great if you could test those cases.
  • The default n is now taken from the config file or num_cpus() - it can be left empty in the UI. The proper number is returned to the UI upon starting.
  • For each cluster I keep a dict that stores the launchers. In my on_stop handlers I delete all references to the launchers. This happens right after the stop method of the launchers is called. Do you think I need to wait a little bit to let the launchers finish up before del'ing them?
  • Do you think I should clean up the log files on exit or cluster start or just leave them alone?
  • What to do about pid files. I don't create/delete them. Do you think we need to generalize the logic from ipclusterapp and manage pid files for this case?
  • Should we use a clusterid of something like "notebook_server" so that ipcluster could still start other cluster instances at the command line?

I think we are getting close to merging though.

Thanks!

@minrk
Copy link
Member

minrk commented Feb 18, 2012

  • I read the standard cluster config files to build the same config object that ipcluster would get. We need to test with various settings to make sure that everything propagates to the launchers properly.

I made a PR against your branch to use an IPClusterStart subclass to load this config. It's really much too hard to emulate config loading with all the subclasses and inheritance without using the actual classes, and we shouldn't ever do it if we can avoid it.

  • If you have some dedicated cluster profiles that use non-Local launchers it would be great if you could test those cases.

I have MPI locally, and I use SGE on the neuroscience cluster. I'm not in a great place to test SGE this week, but I am using MPI at least, and it seems to be working fine (after my PR above, which was required, because config loading was not working properly).

  • The default n is now taken from the config file or num_cpus() - it can be left empty in the UI. The proper number is returned to the UI upon starting.

Functionality is irrelevant after PR above, but there should be some UI indication. It is not clear that leaving the textarea blank is an acceptable action. I also just don't like the appearance of the textarea approach. Perhaps a slider?

  • For each cluster I keep a dict that stores the launchers. In my on_stop handlers I delete all references to the launchers. This happens right after the stop method of the launchers is called. Do you think I need to wait a little bit to let the launchers finish up before del'ing them?

No, because on_stop means the process isn't running anymore. The only place you might want to keep references around is if you stop the engines in the controller's on_stop, and delete refs right away. You might keep those refs around for a more forceful kill if the first attempt fails.

  • Do you think I should clean up the log files on exit or cluster start or just leave them alone?

Tough call. I currently have 401 files in profile_default/log, which is pretty bad, but you certainly don't want to have removed the logfiles if anything went wrong. Unfortunately, that's pretty tricky to detect, because a crash is not the only reason for a logfile to be interesting.

  • What to do about pid files. I don't create/delete them. Do you think we need to generalize the logic from ipclusterapp and manage pid files for this case?

I really want a way to query a cluster's state, so that we can see what's running and how many engines, etc., even if we aren't the process that started the cluster. Currently, PID files are the only thing we have in this regard, and they track IPCluster, not ipcontroller. So I'm not sure how useful they would be unless you want to add some significant monitoring machinery.

  • Should we use a clusterid of something like "notebook_server" so that ipcluster could still start other cluster instances at the command line?

No, cluster_id is too inconvenient in the Client, and should only be used when absolutely necessary and actively using multiple clusters with one profile.

One further note:

Now the cluster list seems entirely unordered. This is pretty problematic for me, when 'default' is 3/4 of the way down a list of 40 profiles. It should be sorted alphabetically, and the only ones that might get out of that order and moved to the top are:

  • the current profile
  • running clusters

I think we are getting close to merging though.

I agree, it's looking good. I still really want to fiddle with the aesthetics, as right now it looks quick clunky and unreasonably wide.

@ellisonbg
Copy link
Member Author

What specifically was not working in the config loading. I agree your
approach is better but I thought I had been careful to get everything
right.

Should I just load the default n into the UI? I can try a slider, but
what range of engine numbers would you use?

I will play with the aesthetics a bit and try some sorting of the profiles.

On Sat, Feb 18, 2012 at 9:42 AM, Min RK
reply@reply.github.com
wrote:

  • I read the standard cluster config files to build the same config object that ipcluster would get.  We need to test with various settings to make sure that everything propagates to the launchers properly.

I made a PR against your branch to use an IPClusterStart subclass to load this config.  It's really much too hard to emulate config loading with all the subclasses and inheritance without using the actual classes, and we shouldn't ever do it if we can avoid it.

  • If you have some dedicated cluster profiles that use non-Local launchers it would be great if you could test those cases.

I have MPI locally, and I use SGE on the neuroscience cluster.  I'm not in a great place to test SGE this week, but I am using MPI at least, and it seems to be working fine (after my PR above, which was required, because config loading was not working properly).

  • The default n is now taken from the config file or num_cpus() - it can be left empty in the UI.  The proper number is returned to the UI upon starting.

Functionality is irrelevant after PR above, but  there should be some UI indication.  It is not clear that leaving the textarea blank is an acceptable action.  I also just don't like the appearance of the textarea approach.  Perhaps a slider?

  • For each cluster I keep a dict that stores the launchers.  In my on_stop handlers I delete all references to the launchers.  This happens right after the stop method of the launchers is called.  Do you think I need to wait a little bit to let the launchers finish up before del'ing them?

No, because on_stop means the process isn't running anymore.  The only place you might want to keep references around is if you stop the engines in the controller's on_stop, and delete refs right away.  You might keep those refs around for a more forceful kill if the first attempt fails.

  • Do you think I should clean up the log files on exit or cluster start or just leave them alone?

Tough call.  I currently have 401 files in profile_default/log, which is pretty bad, but you certainly don't want to have removed the logfiles if anything went wrong.  Unfortunately, that's pretty tricky to detect, because a crash is not the only reason for a logfile to be interesting.

  • What to do about pid files.  I don't create/delete them.  Do you think we need to generalize the logic from ipclusterapp and manage pid files for this case?

I really want a way to query a cluster's state, so that we can see what's running and how many engines, etc., even if we aren't the process that started the cluster.  Currently, PID files are the only thing we have in this regard, and they track IPCluster, not ipcontroller.  So I'm not sure how useful they would be unless you want to add some significant monitoring machinery.

  • Should we use a clusterid of something like "notebook_server" so that ipcluster could still start other cluster instances at the command line?

No, cluster_id is too inconvenient in the Client, and should only be used when absolutely necessary and actively using multiple clusters with one profile.

One further note:

Now the cluster list seems entirely unordered.  This is pretty problematic for me, when 'default' is 3/4 of the way down a list of 40 profiles. It should be sorted alphabetically, and the only ones that might get out of that order and moved to the top are:

  • the current profile
  • running clusters

I think we are getting close to merging though.

I agree, it's looking good.  I still really want to fiddle with the aesthetics, as right now it looks quick clunky and unreasonably wide.


Reply to this email directly or view it on GitHub:
#1383 (comment)

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

@minrk
Copy link
Member

minrk commented Feb 19, 2012

What specifically was not working in the config loading. I agree your approach is better but I thought I had been careful to get everything right.

Subclasses. For instance, I happened to have IPClusterStart.engine_launcher_class = "MPI", because the typical way I do config is open the file, search for the first occurrence of the trait, and make the edit. And Start happened to come before Engines. If you want to emulate real loading, you have to check every descendent with the trait, not just the one that defines it.

Should I just load the default n into the UI? I can try a slider, but what range of engine numbers would you use?

I'm not sure. That would require loading config from every single profile on first load (and wouldn't reflect changes to config files after server start, which I was doing). I think the default should just be 'default' or 0. I've seen several UIs that are sliders, with an option to double-click to enter manually. In which case, I would only let the slider go to 16, or possibly 32, depending on how it feels. I'm not certain this will be better than a textarea. Maybe the real answer is just to style the textarea less starkly, so it doesn't stand out so much. It also probably shouldn't be much more than 3 characters wide.

This exposes ipcluster's over the web. The current implementation
uses IPClusterLauncher to run ipcluster in a separate process.
Here is the URL scheme we are using:

GET /clusters => list available clusters
GET /cluster/profile => list info for cluster with profile
POST /cluster/profile/start => start a cluster
POST /cluster/profile/stop => stop a cluster
* Created new base page html/css/js.
* Everything inherits from the page template.
* Universal header border.
* Notebook list borders are set to 1px all around.
* No border around notebook area.
* Border cleanup of toolbar/menubar.
* Lots of code reorg to get ready for further refactoring.
Not functional yet, but the idea is there.
You can start/stop clusters in the notebook with a very simple UI. More to do,
but this is a start.
if os.path.isdir(full_path):
profiles.append(profile)
return profiles

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two look fairly straightforward, but do need at least a couple of simple tests.

@minrk
Copy link
Member

minrk commented Mar 8, 2012

Don't forget to merge this for better config loading.

@ellisonbg
Copy link
Member Author

Thanks for the reminder, got it!

On Thu, Mar 8, 2012 at 11:49 AM, Min RK
reply@reply.github.com
wrote:

Don't forget to merge this for better config loading.


Reply to this email directly or view it on GitHub:
#1383 (comment)

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

ensures that config loading matches what would happen in ipcluster.

The only change needed in IPClusterStart itself was moving the on_stop registration from init_launchers to start_controller, where it should have been anyway.
@minrk
Copy link
Member

minrk commented Mar 8, 2012

I've written simple tests for the two profile list functions, but I'm at a coffee shop that apparently blocks ssh (?!), so I can't push.

patch: https://gist.github.com/2003276

@fperez
Copy link
Member

fperez commented Mar 8, 2012

Hey,

On Thu, Mar 8, 2012 at 12:36 PM, Min RK
reply@reply.github.com
wrote:

I've written simple tests for the two profile list functions, but I'm at a coffee shop that apparently blocks ssh (?!), so I can't push.

patch: https://gist.github.com/2003276

Funny, so had I! I'll incorporate your code into my PR and we'll
merge everything, as I also did some other refactoring.

Cheers,

f

@fperez
Copy link
Member

fperez commented Mar 8, 2012

@minrk : Done, thanks!

Nbparallel - add tests for profile listing.
ellisonbg added a commit that referenced this pull request Mar 9, 2012
IPython clusters can now be managed using the Notebook.
@ellisonbg ellisonbg merged commit 0f5060e into ipython:master Mar 9, 2012
mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this pull request Nov 3, 2014
IPython clusters can now be managed using the Notebook.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants