Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable users to access the jupyter server logs #684

Open
consideRatio opened this issue Feb 1, 2022 · 12 comments
Open

Enable users to access the jupyter server logs #684

consideRatio opened this issue Feb 1, 2022 · 12 comments

Comments

@consideRatio
Copy link
Contributor

Problem

A long standing challenge for JupyterHub admins/users has been that only admins have had access to the user servers logs. In this jupyter forum post @manics suggests that a workaround is to copy stdin/stderr streams via a custom intervention.

I wonder if this could be solved in a different way to make it easier to expose jupyter server logs for users.

Proposed Solution

Could we for example allow jupyter_server to be configured to emit logs in some way? I'm thinking of writing continuously to a file or exposing a REST API for example.

Additional context

The key problem I'd like to solve would be to ensure that jupyterhub user's would be able to have as much access to the logs as the jupyterhub admin would have. Currently, a jupyterhub user won't have such access. With a feature like this where the jupyter_server could be configured to emit logs to a file or via a REST API, I imagine a JupyterLab extension could be developed to provide easy access to server logs. This would help a JupyterHub admin help its users by for example asking them to include these logs when asking for help.

This isn't the first time a need like this has surfaced, but I've never had a good idea on how to go about it. This idea seems somewhat reasonable to me at glance. The most recent need surfaced here https://discourse.pangeo.io/t/start-up-errors-on-pangeo-google-cloud-deployments/2101.

  • Could something like this be reasonable to implement in jupyter_server?
  • Are there examples of other software that expose their logs in a similar way to draw experience from?

/cc: @manics, @akhmerov, @sgibson91, @fperez who I think may be interested in this discussion.

@welcome
Copy link

welcome bot commented Feb 1, 2022

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@yuvipanda
Copy link
Contributor

JupyterHub used to allow this with extra_log_file, but that was deprecated a while ago by @minrk https://github.com/jupyterhub/jupyterhub/blob/36cad38ddf00c3fe92d813fd7bf8715fb876d006/jupyterhub/app.py#L1398. The reasoning given is:

                extra_log_file only redirects logs of the Hub itself,
                and will discard any other output, such as
                that of subprocess spawners or the proxy.
                It is STRONGLY recommended that you redirect process
                output instead, e.g.
                    jupyterhub &>> '{}'

@bollwyvl
Copy link
Contributor

bollwyvl commented Feb 1, 2022

Streaming logs also came up in the context of jupyter-server-proxy and jupyter-lsp (can't find a link).

As it might need to ship a non-trival UI, and we've been trying to shed those, perhaps thinking about it as a jupyter_server_logs that offered, in addition to a facility for capturing logs and REST/WebSocket handlers to view them, opt-in logging of jupyter-server itself, as well as a standalone/embeddable log viewer application.

Such a log viewer could use xtermjs and/or luimino's datagrid for structured logs, perhaps.

@kevin-bates
Copy link
Member

This sounds like a matter of configuring the handlers of a LoggingConfigurable class (from which Jupyter Server derives): ipython/traitlets#688. At least that feels like the right thing to do.

cc: @oliver-sanders

@minrk
Copy link
Contributor

minrk commented Feb 2, 2022

As I mentioned in the JupyterHub issue, I don't think Python logging is the right level at which to do this. Instead, I think process-level FD capture is where it should happen.

In repo2docker, we do this with an entrypoint, but you can also duplicate stdout/err to a file in-process (at least on non-Windows) with os.dup2, as seen in wurlitzer.

Logging would be the right place for more structural capture of a specific subset of events, though.

@oliver-sanders
Copy link
Contributor

oliver-sanders commented Feb 2, 2022

Streaming logs

capturing logs and REST/WebSocket handlers to view them

Logging would be the right place for more structural capture of a specific subset of events, though.

I think the logging level would be the most flexible approach.

For my purposes I would like to be able to configure a persistent rotating log in a standard location and to have additional logging handlers with different filters for specific purposes to assist with monitoring and debugging. I'm mostly interested in the output of one particular server extension so might want to separate its logging from other extensions.

So for me Python's logging config object would be a pretty ideal solution. I think it would probably be fairly straight forward to implement this at the Traitlets level where we already support a subset of logging configuration (see the issue linked above), but have been a bit too distracted of late to try it out.

@minrk
Copy link
Contributor

minrk commented Feb 2, 2022

Supporting Python logging config in LoggingConfigurable is definitely something we should do and would cover your case. But I wouldn't say that it addresses this issue. For general server output, I think process-level capture is the only robust approach. It's the only one that will reliably capture logged output from kernels and other subprocesses (which may not be Python), including crash messages, for example.

This snippet in a jupyter_server_config.py tees the server's own stdout/err (including subprocesses) to a single file:

import atexit
import os
import sys

from wurlitzer import Wurlitzer


class Tee:
    def __init__(self, stream, file, mode="a"):
        if hasattr(file, "write"):
            self.log_file = file
        else:
            self.log_file = open(file, mode="a")

        self.stream = stream

    def write(self, buf):
        # write to both the log file and the original stream
        for f in (self.stream, self.log_file):
            f.write(buf)
            f.flush()


log_file = open("test.log", "a")

# keep a handle on the original FDs after redirecting
real_stdout = os.fdopen(os.dup(sys.stdout.fileno()), "w")
real_stderr = os.fdopen(os.dup(sys.stderr.fileno()), "w")

w = Wurlitzer(
    stdout=Tee(real_stdout, log_file),
    stderr=Tee(real_stderr, log_file),
)

w.__enter__()
atexit.register(w.__exit__)

and similar logic could be behind a capture_output flag on the application.

I don't know how to do this on Windows, but I know someone does.

@yuvipanda
Copy link
Contributor

Based on my experience operating clusters over the last few years, I tend to agree with @minrk that capturing stdout / stderr is the way to go - not everything goes into python logging, and sometimes that is out of our control. This also matches the learnt wisdom of how 12 factor apps thinks this should be done.

@minrk
Copy link
Contributor

minrk commented Feb 3, 2022

Another option is to make this a feature request for Spawners so that JupyterHub could have a logs API that users can access for their own servers. Almost all spawners use an underlying mechanism that captures logs (k8s, docker, systemd).

Advantage of sending to a file, though, is that can be in a persistent volume and checked after crash. Container-based logs capture are typically inaccessible after the containers stop, unless you take a step up to the cloud provider's log aggregator API instead of k8s/docker directly, which would be harder to do at the Spawner level.

@oliver-sanders
Copy link
Contributor

FYI: If anyone is interested / would like to test or review. I have raised a Traitlets PR to handle the Python logging side of things - ipython/traitlets#698 (stdout/err redirection mentioned above is a whole other thing).

Here's some example configuration for adding a FileHandler to the "base" server and the Jupyter Lab server extension application:

# jupyter_config.py
from pathlib import Path
 
# direct Jupyter Server logs to jupyter_server.log
# (preserving the default stderr "console" logging)
c.ServerApp.logging_config = {
    'version': 1,
    'handlers': {
        'file': {
            'class': 'logging.FileHandler',
            'level': 'DEBUG',
            'filename': Path.cwd() / 'jupyter_server.log',
        },
    },
    'loggers': {
        'ServerApp': {  
            'level': 'DEBUG',  
            'handlers': ['console', 'file'],  
        },  
    }
}
 
# direct Jupyter Lab logs to jupyter_lab.log
# (preserving the default stderr "console" logging)
c.LabApp.logging_config = {
    'version': 1,
    'handlers': {
        'file': {
            'class': 'logging.FileHandler',
            'level': 'DEBUG',
            'filename': Path.cwd() / 'jupyter_lab.log',   
        },
    },
    'loggers': {
        'LabApp': {
            'level': 'DEBUG',
            'handlers': ['console', 'file'],
        },
    }
}

Handlers / formatters / levels to be edited to preference.

Note: Because server extension applications are separate Traitlets applications to the "base" server they use different loggers so must be configured separately.

Example:

$ jupyter lab --config jupyter_config.py &
...
$ head jupyter_server.log -n5
Looking for jupyter_config in /var/tmp/...
Loaded config file: /var/tmp/...
Paths used for configuration of jupyter_server_config: 
    /etc/jupyter/jupyter_server_config.json
Paths used for configuration of jupyter_server_config: 
$ head jupyter_lab.log -n 5
Looking for jupyter_lab_config in /etc/jupyter
Looking for jupyter_lab_config in /usr/local/etc/jupyter
Looking for jupyter_lab_config in ~/<env>/etc/jupyter
Looking for jupyter_lab_config in ~/.local/etc/jupyter
Looking for jupyter_lab_config in ~/.jupyter

@oliver-sanders
Copy link
Contributor

Traitlets 5.2.0 now provides a logging_config trait which allows additional file handlers to be configured, hope it helps.

This was the proposed solution from the OP and satisfies the use cases outlined there. I have opened a PR to document bump Jupyter Server onto Traitlets 5.2.1 (when it's released) and document usage of logging_config - #844.

This does not solve the trickier stdout/err redirection mentioned in other comments. I think this is a Spawner feature (as we couldn't reliably implement it from within the application itself?) so out of scope for Jupyter Server itself. Will leave you to decide what you want to do with this issue.

@athornton
Copy link

I disagree that process output redirection is the right thing. I want WARN and above to go to stderr, and INFO and below to stdout. Can't do that at the process level.

Also the logging_config doc refers to c.Application.logging_configurable which is wrong--it's config, not configurable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants