Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change cull_idle_timeout_minimum to 1 from 300 #2910

Merged
merged 3 commits into from Nov 1, 2017

Conversation

Projects
None yet
5 participants
@mheilman
Copy link
Contributor

mheilman commented Oct 9, 2017

I ran into a situation where it'd be nice to be able to set cull_idle_timeout below 5 minutes, but it looks like that's not possible currently.

This is related to #2215

How would people feel about changing the minimum to 1 second? I didn't see any comments suggesting this would be a bad idea in #2215, but I'm probably not aware of all the uses of this feature.

@parente

This comment has been minimized.

Copy link
Member

parente commented Oct 10, 2017

I think we decided on five minutes so that the culler didn't run too frequently. If the cull_idle_timeout minimum is reduced, cull_interval_default and and cull_interval need to be adjusted accordingly. At face value, it seems potentially harmful to allow the cull callback to run every second.

@takluyver

This comment has been minimized.

Copy link
Member

takluyver commented Oct 10, 2017

What's the situation where you want a shorter timeout? It sounds like it might be quite different from the use cases we were thinking of.

@mheilman

This comment has been minimized.

Copy link
Contributor Author

mheilman commented Oct 11, 2017

What's the situation where you want a shorter timeout?

Basically, I have a server that multiple people might connect to, and I don't want to have a bunch of kernels sitting around idle taking up resources if people disconnect. 5 minutes seems like a pretty long timeout. I'd like to set it to a minute or so (same with cull interval), but I figured that if I was going to propose changing 300 to 60 or so, then I might as well propose changing the minimum to 1 second.

If the cull_idle_timeout minimum is reduced, cull_interval_default and and cull_interval need to be adjusted accordingly.

Hmm... Perhaps I'm misreading the code. Isn't the default timeout infinity (represented by 0)? Why would other defaults need to be changed?

Thanks for taking a look at this.

@takluyver

This comment has been minimized.

Copy link
Member

takluyver commented Oct 11, 2017

I think the timeout is based on kernel activity, so if you stop to read a notebook for a minute, the kernel would be killed and you'd lose all your variables. That seems like it would be annoying. If it's really what you want, feel free to decrease the minimum, but let's leave it at some vaguely sensible value like 60 - I can't see a 1 second timeout ever making sense.

@mheilman

This comment has been minimized.

Copy link
Contributor Author

mheilman commented Oct 11, 2017

IIUC, kernels only get culled if all of the following conditions are satisfied (code):

  • they've been idle for longer than the timeout
  • they're not currently busy
  • they don't have any active connections

As long as you have your browser open to a notebook, the notebook app will keep the kernel running, even if the kernel is idle for an hour.

I tried the following two commands with my PR version. The first works just like a regular notebook in practice (i.e., the kernel keeps running even if I don't execute code for a while). The latter command results in kernels getting culled unless they're executing code every few seconds.

jupyter notebook --MappingKernelManager.cull_idle_timeout=5 --MappingKernelManager.cull_interval=5
jupyter notebook --MappingKernelManager.cull_idle_timeout=5 --MappingKernelManager.cull_connected=True --MappingKernelManager.cull_interval=5
@takluyver

This comment has been minimized.

Copy link
Member

takluyver commented Oct 11, 2017

Ah, OK, I'd missed that it tracked the connection as well. In that case, I think it's fine to have a very low minimum.

@takluyver

This comment has been minimized.

Copy link
Member

takluyver commented Oct 11, 2017

Though it might be worth putting a note in the docstring that with a very short timeout, users with connection problems may also find that their kernel gets killed.

@mheilman

This comment has been minimized.

Copy link
Contributor Author

mheilman commented Oct 11, 2017

Ah, good point. How's the note in f8ef573 look?

@takluyver

This comment has been minimized.

Copy link
Member

takluyver commented Oct 11, 2017

That looks good, thanks.

@mheilman mheilman changed the title change cull_idle_timeout_minimum to 1 from 500 change cull_idle_timeout_minimum to 1 from 300 Oct 19, 2017

lower disable culling. The minimum timeout is 300 seconds (5 minutes). Positive values less than the minimum value
will be set to the minimum."""
lower disable culling. The minimum timeout is 1 second. Positive values less than the minimum value
will be set to the minimum. Very short timeouts may result in kernels being culled for users with poor

This comment has been minimized.

@takluyver

takluyver Oct 20, 2017

Member

Nitpick: as this is an integer option, there are no longer meaningful positive values less than the minimum. I'd drop that sentence.

@mheilman

This comment has been minimized.

Copy link
Contributor Author

mheilman commented Oct 20, 2017

Looking at this again, I think that if the minimum is 1, and since it's an integer as @takluyver points, out, then cull_idle_timeout_minimum isn't actually necessary because cull_idle_timeout is only used when it's > 0. Am I missing something?

@takluyver

This comment has been minimized.

Copy link
Member

takluyver commented Oct 20, 2017

I think you're right, but I'd like another pair of eyes on the code. @kevin-bates implemented the culling in #2215, and @Carreau and @minrk asked for the minimum in review. What do you think of getting rid of the minimum on the timeout, as being connected to a kernel now keeps it alive?

@kevin-bates

This comment has been minimized.

Copy link
Member

kevin-bates commented Oct 20, 2017

The minimum was important at the time because the original implementation didn't discriminate against connect (or busy) status. Now that those distinctions exist, I think the timeout minimum can be lifted.

However, I tend to agree with @parente that an interval minimum is still warranted. If there are enough active kernels such that checking all for idle timeout exceeds the culling interval, then it seems as though the server could get into a pathological state of only culling (if I'm understanding how tornado works).

What about lifting the timeout minimum, but introduce an interval minimum for the sole purpose of saturation prevention? Yes, the interval minimum would then act as a logical timeout value. However, this would also help satisfy the connection issues condition, which could be considered another pathological state example.

I guess I'm thinking we should impose at least a 10 second interval minimum.

@mheilman

This comment has been minimized.

Copy link
Contributor Author

mheilman commented Oct 20, 2017

If there are enough active kernels such that checking all for idle timeout exceeds the culling interval

Interesting point. Does checking for idleness take that long? I was under the assumption that the idleness check would take milliseconds, such that the overhead wouldn't be a big issue in practice.

@kevin-bates

This comment has been minimized.

Copy link
Member

kevin-bates commented Oct 20, 2017

The typical sweep will be on the order of milliseconds. However, when there are timeouts to deal with, then the sweep duration could exceed an unconstrained interval since the kernel shutdown itself could take seconds (e.g., if the kernel doesn't immediately detect the shutdown request via the ZMQ socket - after which the server will begin its kernel signalling sequence). I hadn't actually thought about this case until now and probably argues for an even longer min interval than 10 seconds although I also lean toward Give 'em enough rope....

@takluyver takluyver added this to the 5.3 milestone Oct 31, 2017

@takluyver

This comment has been minimized.

Copy link
Member

takluyver commented Oct 31, 2017

Thanks @kevin-bates . That makes sense, and I think it can be done as a separate PR.

@Carreau @minrk - I'll merge this tomorrow unless you want more time to review it.

@takluyver takluyver merged commit bfe012e into jupyter:master Nov 1, 2017

4 checks passed

codecov/patch Coverage not affected when comparing 12592ef...b31194d
Details
codecov/project 79.04% (-0.05%) compared to 12592ef
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.