Collect logs from culling script. #43

pminkov · 2017-07-06T16:29:09Z

I haven't found a way to collect logs from the cull-idle service without having to copy the hub docker image files and modify jupyterhub_config.py. It might be a good addition to add that to the config map. Here's how the code looks for enabling this without changing the config map.

if get_config('cull.enabled', False):
    cull_timeout = get_config('cull.timeout')
    cull_every = get_config('cull.every')
    c.JupyterHub.services = [
        {
            'name': 'cull-idle',
            'admin': True,
            'command': [
                '/usr/bin/python3',
                '/usr/local/bin/cull_idle_servers.py',
                '--timeout=%s' % cull_timeout,
                '--cull_every=%s' % cull_every,
                '--log_file_prefix=/srv/cull-idle-log.txt'   # <---- Added this line
            ]
        }
    ]

ryanlovett · 2017-07-06T17:11:46Z

The cull-idle logging is intermingled with the hub so one alternative is to kubectl --namespace=... log hub-... | grep cull-idle. (or use a more advanced k8s log collection thing)

yuvipanda · 2017-07-06T17:16:03Z

Indeed, you can get it from the hub's logs! We should add documentation on logging...

Does setting log_file_prefix make cull-idle log to the filesystem? If so those will be lost when the container restarts (after you update helm config, for example). Logging to stdout, as it does by default, will make it available to kubectl logs (and your log collector, if there is one configured!)

pminkov · 2017-07-08T20:07:37Z

I'm not sure what's going on but this doesn't seem to work. I modified cull_idle_servers.py to print out two logging statements (one to info and one to error) and I can see them in the cull-idle-log.txt file, but not in the logs.

My guess was that Docker is not able to catch the logs from this separate process, but after digging around I'm not exactly sure yet what the root cause is.

Here's the output:

$ kl logs hub-deployment-2709721724-bs2c7 | grep cull
[I 2017-07-08 19:09:48.216 JupyterHub service:220] Starting service 'cull-idle': ['/usr/bin/python3', '/usr/local/bin/cull_idle_servers.py', '--timeout=3600', '--cull_every=600', '--log_file_prefix=/srv/cull-idle-log.txt']
[I 2017-07-08 19:09:48.218 JupyterHub service:90] Spawning /usr/bin/python3 /usr/local/bin/cull_idle_servers.py --timeout=3600 --cull_every=600 --log_file_prefix=/srv/cull-idle-log.txt
[I 2017-07-08 19:09:48.388 JupyterHub log:100] 200 GET /hub/api/users (cull-idle@127.0.0.1) 13.48ms
[I 2017-07-08 19:19:48.502 JupyterHub log:100] 200 GET /hub/api/users (cull-idle@127.0.0.1) 14.26ms
[I 2017-07-08 19:29:48.480 JupyterHub log:100] 200 GET /hub/api/users (cull-idle@127.0.0.1) 34.81ms
[I 2017-07-08 19:39:48.483 JupyterHub log:100] 200 GET /hub/api/users (cull-idle@127.0.0.1) 16.28ms
[I 2017-07-08 19:49:48.479 JupyterHub log:100] 200 GET /hub/api/users (cull-idle@127.0.0.1) 12.16ms
[I 2017-07-08 19:59:48.479 JupyterHub log:100] 200 GET /hub/api/users (cull-idle@127.0.0.1) 12.97ms

Here's the code, see the two new "app_log" logging messages:

$ kl exec hub-deployment-2709721724-bs2c7 -- cat /usr/local/bin/cull_idle_servers.py
#!/usr/bin/env python
# Imported from https://github.com/jupyterhub/jupyterhub/blob/140c4f2909327f50a7452814f4ac702abfce11ea/examples/cull-idle/cull_idle_servers.py
"""script to monitor and cull idle single-user servers

Caveats:

last_activity is not updated with high frequency,
so cull timeout should be greater than the sum of:

- single-user websocket ping interval (default: 30s)
- JupyterHub.last_activity_interval (default: 5 minutes)

You can run this as a service managed by JupyterHub with this in your config::


    c.JupyterHub.services = [
        {
            'name': 'cull-idle',
            'admin': True,
            'command': 'python cull_idle_servers.py --timeout=3600'.split(),
        }
    ]

Or run it manually by generating an API token and storing it in `JUPYTERHUB_API_TOKEN`:

    export JUPYTERHUB_API_TOKEN=`jupyterhub token`
    python cull_idle_servers.py [--timeout=900] [--url=http://127.0.0.1:8081/hub/api]
"""

import datetime
import json
import os

from dateutil.parser import parse as parse_date

from tornado.gen import coroutine
from tornado.log import app_log
from tornado.httpclient import AsyncHTTPClient, HTTPRequest
from tornado.ioloop import IOLoop, PeriodicCallback
from tornado.options import define, options, parse_command_line


@coroutine
def cull_idle(url, api_token, timeout):
    """cull idle single-user servers"""
    auth_header = {
            'Authorization': 'token %s' % api_token
        }
    req = HTTPRequest(url=url + '/users',
        headers=auth_header,
    )
    now = datetime.datetime.utcnow()
    cull_limit = now - datetime.timedelta(seconds=timeout)
    client = AsyncHTTPClient()
    resp = yield client.fetch(req)
    users = json.loads(resp.body.decode('utf8', 'replace'))
    futures = []
    for user in users:
        last_activity = parse_date(user['last_activity'])
        if user['server'] and last_activity < cull_limit:
            app_log.info("Culling %s (inactive since %s)", user['name'], last_activity)
            req = HTTPRequest(url=url + '/users/%s/server' % user['name'],
                method='DELETE',
                headers=auth_header,
            )
            futures.append((user['name'], client.fetch(req)))
        elif user['server'] and last_activity > cull_limit:
            app_log.debug("Not culling %s (active since %s)", user['name'], last_activity)
    
    for (name, f) in futures:
        yield f
        app_log.debug("Finished culling %s", name)

if __name__ == '__main__':
    define('url', default=os.environ.get('JUPYTERHUB_API_URL'), help="The JupyterHub API URL")
    define('timeout', default=600, help="The idle timeout (in seconds)")
    define('cull_every', default=0, help="The interval (in seconds) for checking for idle servers to cull")
    
    parse_command_line()

    app_log.info('Starting cull-idle')
    app_log.error('hi there cull idle')

    if not options.cull_every:
        options.cull_every = options.timeout // 2
    
    api_token = os.environ['JUPYTERHUB_API_TOKEN']
    
    loop = IOLoop.current()
    cull = lambda : cull_idle(options.url, api_token, options.timeout)
    # run once before scheduling periodic call
    loop.run_sync(cull)
    # schedule periodic cull
    pc = PeriodicCallback(cull, 1e3 * options.cull_every)
    pc.start()
    try:
        loop.start()
    except KeyboardInterrupt:
        pass

And here's what's in the log file:

$ kl exec hub-deployment-2709721724-bs2c7 -- cat /srv/cull-idle-log.txt
[I 170708 19:09:48 cull_idle_servers:81] Starting cull-idle
[E 170708 19:09:48 cull_idle_servers:82] hi there cull idle

pminkov · 2017-07-09T00:53:49Z

Actually I tried removing the --log-file-prefix parameter from the service setup and I'm seeing the logs. I think setting this flag was changing something about the logging configuration and tornado was not printing logs to stdout.

pminkov closed this as completed Jul 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collect logs from culling script. #43

Collect logs from culling script. #43

pminkov commented Jul 6, 2017

ryanlovett commented Jul 6, 2017

yuvipanda commented Jul 6, 2017

pminkov commented Jul 8, 2017 •

edited

pminkov commented Jul 9, 2017

Collect logs from culling script. #43

Collect logs from culling script. #43

Comments

pminkov commented Jul 6, 2017

ryanlovett commented Jul 6, 2017

yuvipanda commented Jul 6, 2017

pminkov commented Jul 8, 2017 • edited

pminkov commented Jul 9, 2017

pminkov commented Jul 8, 2017 •

edited