Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add idle culler #366

Merged
merged 8 commits into from
Jun 14, 2019
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,7 @@ Topic guides provide in-depth explanations of specific topics.
topic/tljh-config
topic/authenticator-configuration
topic/escape-hatch
topic/idle-culler


Troubleshooting
Expand Down
86 changes: 86 additions & 0 deletions docs/topic/idle-culler.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
.. _topic/idle-culler:

=============================
Culling idle notebook servers
=============================

The idle culler is a hub-managed service that automatically shuts down idle
GeorgianaElena marked this conversation as resolved.
Show resolved Hide resolved
single-user notebook servers in order to free up resources. After culling, any
in-memory data will be lost.


Disabling the idle culler
=========================

The idle culling service is enabled by default. To disable it, use the following
command:

.. code-block:: bash

sudo tljh-config set services.cull.enabled False
GeorgianaElena marked this conversation as resolved.
Show resolved Hide resolved


Configuring the idle culler
GeorgianaElena marked this conversation as resolved.
Show resolved Hide resolved
===========================

By **default**, JupyterHub will:
GeorgianaElena marked this conversation as resolved.
Show resolved Hide resolved
* Run the culling process every minute.
* Cull any user servers that have been inactive for more than 10 minutes.

The configuration options available are:

Idle timeout
------------

The idle timeout (in seconds) can be configured using:
GeorgianaElena marked this conversation as resolved.
Show resolved Hide resolved

.. code-block:: bash

sudo tljh-config set services.cull.timeout <max-idle-sec-before-server-is-culled>

*By default services.cull.timeout = 600*
GeorgianaElena marked this conversation as resolved.
Show resolved Hide resolved

Idle check interval
-------------------

The interval (in seconds) for checking for idle servers to cull can be configured using:

.. code-block:: bash

sudo tljh-config set services.cull.every <number-of-sec-this-check-is-done>

*By default services.cull.every = 60*

Maximum age
-----------

The maximum age (in seconds) of servers that should be culled even if they are active
GeorgianaElena marked this conversation as resolved.
Show resolved Hide resolved
can be configured using:

.. code-block:: bash

sudo tljh-config set services.cull.max_age <server-max-age>

*By default services.cull.max_age = 0*

User culling
------------

In addition to servers, the users will also be culled if the following command is used:
GeorgianaElena marked this conversation as resolved.
Show resolved Hide resolved

.. code-block:: bash

sudo tljh-config set services.cull.users True

*By default services.cull.users = False*

Concurrency
-----------

The number of concurrent requests made to the Hub ca be configured using:
GeorgianaElena marked this conversation as resolved.
Show resolved Hide resolved

.. code-block:: bash

sudo tljh-config set services.cull.concurrency <number-of-concurrent-hub-requests>

*By default services.cull.concurrency = 5*
98 changes: 97 additions & 1 deletion integration-tests/test_hub.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import requests
from hubtraf.user import User
from hubtraf.auth.dummy import login_dummy
from jupyterhub.utils import exponential_backoff
import secrets
import pytest
from functools import partial
Expand Down Expand Up @@ -137,4 +138,99 @@ async def test_long_username():
'-u', 'jupyterhub',
'--no-pager'
])
raise
raise


@pytest.mark.asyncio
async def test_idle_server_culled():
"""
User logs in, starts a server & stays idle for 1 min.
(the user's server should be culled during this period)
"""
# This *must* be localhost, not an IP
# aiohttp throws away cookies if we are connecting to an IP!
hub_url = 'http://localhost'
username = secrets.token_hex(8)

assert 0 == await (await asyncio.create_subprocess_exec(*TLJH_CONFIG_PATH, 'set', 'auth.type', 'dummyauthenticator.DummyAuthenticator')).wait()
# Check every 10s for idle servers to cull
assert 0 == await (await asyncio.create_subprocess_exec(*TLJH_CONFIG_PATH, 'set', 'services.cull.every', "10")).wait()
# Apart from servers, also cull users
assert 0 == await (await asyncio.create_subprocess_exec(*TLJH_CONFIG_PATH, 'set', 'services.cull.users', "True")).wait()
# Cull servers and users after 60s of activity
assert 0 == await (await asyncio.create_subprocess_exec(*TLJH_CONFIG_PATH, 'set', 'services.cull.max_age', "60")).wait()
assert 0 == await (await asyncio.create_subprocess_exec(*TLJH_CONFIG_PATH, 'reload')).wait()

async with User(username, hub_url, partial(login_dummy, password='')) as u:
await u.login()
# Start user's server
await u.ensure_server()
# Assert that the user exists
assert pwd.getpwnam(f'jupyter-{username}') is not None

# Check that we can get to the user's server
r = await u.session.get(u.hub_url / 'hub/api/users' / username,
headers={'Referer': str(u.hub_url / 'hub/')})
assert r.status == 200

async def _check_culling_done():
# Check that after 60s, the user and server have been culled and are not reacheable anymore
r = await u.session.get(u.hub_url / 'hub/api/users' / username,
headers={'Referer': str(u.hub_url / 'hub/')})
print(r.status)
return r.status == 403

await exponential_backoff(
_check_culling_done,
"Server culling failed!",
timeout=100,
)

@pytest.mark.asyncio
async def test_active_server_not_culled():
"""
User logs in, starts a server & stays idle for 30s
(the user's server should not be culled during this period).
"""
# This *must* be localhost, not an IP
# aiohttp throws away cookies if we are connecting to an IP!
hub_url = 'http://localhost'
username = secrets.token_hex(8)

assert 0 == await (await asyncio.create_subprocess_exec(*TLJH_CONFIG_PATH, 'set', 'auth.type', 'dummyauthenticator.DummyAuthenticator')).wait()
# Check every 10s for idle servers to cull
assert 0 == await (await asyncio.create_subprocess_exec(*TLJH_CONFIG_PATH, 'set', 'services.cull.every', "10")).wait()
# Apart from servers, also cull users
assert 0 == await (await asyncio.create_subprocess_exec(*TLJH_CONFIG_PATH, 'set', 'services.cull.users', "True")).wait()
# Cull servers and users after 60s of activity
assert 0 == await (await asyncio.create_subprocess_exec(*TLJH_CONFIG_PATH, 'set', 'services.cull.max_age', "60")).wait()
assert 0 == await (await asyncio.create_subprocess_exec(*TLJH_CONFIG_PATH, 'reload')).wait()

async with User(username, hub_url, partial(login_dummy, password='')) as u:
await u.login()
# Start user's server
await u.ensure_server()
# Assert that the user exists
assert pwd.getpwnam(f'jupyter-{username}') is not None

# Check that we can get to the user's server
r = await u.session.get(u.hub_url / 'hub/api/users' / username,
headers={'Referer': str(u.hub_url / 'hub/')})
assert r.status == 200

async def _check_culling_done():
# Check that after 30s, we can still reach the user's server
r = await u.session.get(u.hub_url / 'hub/api/users' / username,
headers={'Referer': str(u.hub_url / 'hub/')})
print(r.status)
return r.status != 200

try:
await exponential_backoff(
_check_culling_done,
"User's server is still reacheable!",
timeout=30,
)
except TimeoutError:
# During the 30s timeout the user's server wasn't culled, which is what we intended.
pass
44 changes: 44 additions & 0 deletions tests/test_configurer.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
"""

import os
import sys

from tljh import configurer

Expand Down Expand Up @@ -187,6 +188,49 @@ def test_set_traefik_api():
assert c.TraefikTomlProxy.traefik_api_password == '1234'


def test_cull_service_default():
"""
Test default cull service settings with no overrides
"""
c = apply_mock_config({})

cull_cmd = [
sys.executable, '/srv/src/tljh/cull_idle_servers.py',
'--timeout=600', '--cull-every=60', '--concurrency=5',
'--max-age=0'
]
assert c.JupyterHub.services == [{
'name': 'cull-idle',
'admin': True,
'command': cull_cmd,
}]


def test_set_cull_service():
"""
Test setting cull service options
"""
c = apply_mock_config({
'services': {
'cull': {
'every': 10,
'users': True,
'max_age': 60
}
}
})
cull_cmd = [
sys.executable, '/srv/src/tljh/cull_idle_servers.py',
'--timeout=600', '--cull-every=10', '--concurrency=5',
'--max-age=60', '--cull-users'
]
assert c.JupyterHub.services == [{
'name': 'cull-idle',
'admin': True,
'command': cull_cmd,
}]


def test_load_secrets(tljh_dir):
"""
Test loading secret files
Expand Down
44 changes: 44 additions & 0 deletions tljh/configurer.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
"""

import os
import sys

from .config import CONFIG_FILE, STATE_DIR
from .yaml import yaml
Expand Down Expand Up @@ -55,6 +56,16 @@
'user_environment': {
'default_app': 'classic',
},
'services': {
'cull': {
'enabled': True,
'timeout': 600,
'every': 60,
'concurrency': 5,
'users': False,
'max_age': 0
}
}
}

def load_config(config_file=CONFIG_FILE):
Expand Down Expand Up @@ -86,6 +97,7 @@ def apply_config(config_overrides, c):
update_user_environment(c, tljh_config)
update_user_account_config(c, tljh_config)
update_traefik_api(c, tljh_config)
update_services(c, tljh_config)


def set_if_not_none(parent, key, value):
Expand Down Expand Up @@ -191,6 +203,38 @@ def update_traefik_api(c, config):
c.TraefikTomlProxy.traefik_api_password = config['traefik_api']['password']


def set_cull_idle_service(config):
"""
Set Idle Culler service
"""
cull_cmd = [
sys.executable, '/srv/src/tljh/cull_idle_servers.py'
]
cull_config = config['services']['cull']
print()

cull_cmd += ['--timeout=%d' % cull_config['timeout']]
cull_cmd += ['--cull-every=%d' % cull_config['every']]
cull_cmd += ['--concurrency=%d' % cull_config['concurrency']]
cull_cmd += ['--max-age=%d' % cull_config['max_age']]
if cull_config['users']:
cull_cmd += ['--cull-users']

cull_service = {
'name': 'cull-idle',
'admin': True,
'command': cull_cmd,
}

return cull_service


def update_services(c, config):
c.JupyterHub.services = []
if config['services']['cull']['enabled']:
c.JupyterHub.services.append(set_cull_idle_service(config))


def _merge_dictionaries(a, b, path=None, update=True):
"""
Merge two dictionaries recursively.
Expand Down