Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support proxying to a server process via a Unix socket #337

Merged
merged 19 commits into from
Apr 7, 2023

Conversation

takluyver
Copy link
Member

@takluyver takluyver commented Apr 25, 2022

Updated description

Unix sockets are an alternative to TCP sockets for local servers: the server listens (and the client connects) on a filesystem path, rather than a numeric port. The big advantage is that, with the right filesystem permissions, the OS can prevent other users (except root) from connecting to a Unix socket, so it's more secure on a multi-user system. See #321 for more details.

This PR adds support for named proxies (those defined in config) to forward to a Unix socket rather than a TCP socket.

  • If you set unix_socket=True in server process config, JSP will create a temporary directory, and fill the new {unix_socket} command template argument with a path inside there, where the server can create a socket. This is equivalent to choosing a random TCP port, and requires that JSP launches the server process itself.
  • If you set unix_socket='/path/to/some.socket' instead, you are telling JSP that the application will listen at that path. This is equivalent to specifying port=4321, and works whether or not JSP launches the process (with or without command set).

This works already for regular HTTP requests. Forwarding websockets over a Unix socket will work once Tornado 6.3 is released. If this is a sticking point, we could switch the client code to aiohttp

I have a corresponding branch of my hello_jupyter_proxy repo to test this with: https://github.com/takluyver/hello_jupyter_proxy/tree/unix-sock


Original, outdated description

This is fairly rough, but if the config for a server process includes 'unix': True and no port number, j-s-p will create a new temp folder and give the server process a path inside that instead of a TCP port number. It then expects the process to create and bind a Unix socket at the given path, and will connect to that to forward requests.

On a shared host, this means that only the user whose server this is can connect to the socket, whereas anyone with access to the system can connect to a localhost TCP socket. It also avoids the race condition where the parent selects an unused TCP port, but something else binds that port before the child process can.

For now, I've left websockets out of this, because Tornado's websocket client doesn't seem to easily support passing in a resolver object like its HTTPClient does. I think this can be useful even without websocket support, and I'd hope that we could either add to Tornado or find an alternative like aiohttp to fill that in later.

Fixes #321

@welcome
Copy link

welcome bot commented Apr 25, 2022

Thanks for submitting your first pull request! You are awesome! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please make sure you followed the pull request template, as this will help us review your contribution more quickly.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also a intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@takluyver takluyver mentioned this pull request Apr 29, 2022
@rvalieris
Copy link

hello there, great idea !

I tried using this with code-server: https://github.com/coder/code-server which already have a --socket option, but it gives me a 500 error on launch :
[W 2022-06-03 09:42:34.820 ServerApp] 500 GET /vscode/ (127.0.0.1): could not start vscode in time

I can verify with socat that code-server is running on the socket but the request is not going through the proxy or there is some timing issue ? this is my j-s-p config:

c.ServerProxy.servers = {
  'vscode': {
    'command': ['code-server', '--auth', 'none', '--socket', '{port}'],
    'unix': True,
    'absolute_url': False,
    'launcher_entry': {
        'title': 'VS Code'
    },
    'new_browser_tab': True
  }
}

I guess it wont be useful without websockets working, but I still expected to get a initial page to show up, maybe I am doing something wrong ?

@jhgoebbert
Copy link

This is a fantastic new feature!
For supporting code-server from within JupyterLab this would be great. For security reasons we cannot add code-server to the JupyterLab launchpad through jupyter-server-proxy, yet, on our multi-user login-nodes. It does not support tokens/password through urlparams. But as soon as jupyter-server-proxy can proxy it through a unix socket this would be possible. Cool!

@jhgoebbert
Copy link

I created a jupyter-codeserver-proxy at Github and PyPi which would highly profit from support for unix sockets.

@jhgoebbert
Copy link

jhgoebbert commented Jun 13, 2022

The branch unixsocket of jupyter-codeserver-proxy HERE shows that this already works partly for code-server. Cool. But it fails with this dialog:
image

@takluyver
Copy link
Member Author

@jhgoebbert could that error be related to connecting a websocket? Sadly, this branch doesn't support websockets yet. I think it would need either a change in Tornado (tornadoweb/tornado#3172 ), or switching the HTTP client machinery for j-s-p to some other library, such as aiohttp.

@rvalieris I'm not sure what went wrong there, though @jhgoebbert seems to have got past that. In general, that 'did not start ... in time' message means that it launched the process and then tried to get an HTTP response from it, for up to a second each try and up to 5 seconds total, but that didn't work.

@takluyver
Copy link
Member Author

Just to mention, I am still interested in pushing this forwards, if anyone more involved with JSP thinks it's worthwhile.

The biggest open question I see is what to do about websocket support - as it stands, websockets don't work when proxying to a Unix socket. There are 2½ options:

  1. Push on my PR to get this supported in Tornado (which hasn't seen any interest yet) and wait for a new release.
  2. Switch to aiohttp for the client part of the proxy - this may mean somewhat more 'translation' code for requests and responses, because the server side of the proxy would still be using Tornado. But aiohttp is already used to check when the server process is ready, so it's not a new dependency.
  3. Live with no websockets over Unix sockets for now, and improve this later.

@ryanlovett
Copy link
Collaborator

I think this would be an awesome feature @takluyver ! Supporting websockets is very important for the apps that I'm most interested in, but maybe others would be okay with living without ws for now.

@yuvipanda mentioned wanting to switch to aiohttp. Reading the tea leaves it seems like jupyter server may eventually move off of tornado anyways.

@jhgoebbert
Copy link

@rvalieris I'm not sure what went wrong there, though @jhgoebbert seems to have got past that.

Keep in mind that I applied the changes mentioned above ( https://github.com/jupyterhub/jupyter-server-proxy/pull/337/files/885243ac9f1f21ca4869876ecfba0286f486328f ) to run the example.

@takluyver
Copy link
Member Author

Ah, sorry, I missed your replies at the time. My tornado PR has now been merged (though it has yet to be released), so for now I've updated this PR to proxy websockets with tornado. This part will need the next version of Tornado to come out, presumably version 6.3.

@jhgoebbert would you have time to test again, with this branch and tornado installed from master?

I'd prefer any conversion to aiohttp to be a separate PR if possible - I think it's easier to review those changes independently than combined. But I'm happy to tackle it in this PR if you ask me to. I imagine it might depend on how soon we can expect a tornado release, and I've asked the maintainer about that.

@ryanlovett
Copy link
Collaborator

@takluyver I think a conversion to aiohttp should be in a separate PR as well. Tornado's two most recent minor releases were 7 and 8 months after the previous, so if that trend holds then it will be 5-6 months until the next one.

@takluyver
Copy link
Member Author

Thanks! Then 🤞 this should be ready to try out. The combination of a websocket connection and a Unix socket backend will need Tornado 6.3, but websockets+TCP and regular HTTP requests + Unix sockets should work with current versions.

Copy link
Member

@consideRatio consideRatio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting PR! As a novice with unix sockets I struggle to review this PR from a technical standpoint still and need to learn more to review it fully - but here is a partial review's suggested changes.

  • Naming of config/variables
    unix is too vague for configuration I think. I suggest the configuration is called unix_socket. Being explicit is more relevant for those like me that knows less and can't as easily guess how and what things does. I think for consistency, variable names should be named like that as well.
  • Documentation entry
    There is documentation in server-process that should be updated with all available configuration options available.
  • Basic configuration validity check
    This option should not be used alongside port right, because it would be nonsensical? It would be good to take some action related to this. I'm thinking one could enforce it by erroring, or providing a warning, or documenting it. I think warning + documenting it can make most sense as I think erroring can break things a bit too much.

@takluyver
Copy link
Member Author

unix is too vague for configuration I think. I suggest the configuration is called unix_socket

That makes sense to me, thanks.

Documentation entry

Good point, will do.

This option should not be used alongside port right, because it would be nonsensical? It would be good to take some action related to this. I'm thinking one could enforce it by erroring, or providing a warning, or documenting it. I think warning + documenting it can make most sense as I think erroring can break things a bit too much.

I've been thinking that it's easy for packages to specify unix_socket=True, and still support older versions of JSP, by binding a TCP socket if they get a numeric port - e.g. this code in my hello_jupyter_proxy project.

I assume in most cases, such projects would already let JSP select a TCP port at random, rather than specifying a fixed port. But I might not complain if both are specified, even though they can't both be used at the same time.

@consideRatio
Copy link
Member

consideRatio commented Dec 5, 2022

Thanks @takluyver for working on this!!

Review feedback

  • There is a merge conflict to resolve
  • There is a mix of unix_sock and unix_socket, I figure we should stick with one (my preference is unix_socket)
  • SuperviseAndProxyHandler and the unix_sock argument confusion:
    • I saw that self.unix_sock = False, and its only used as a boolean value in an if statement. This couples to the configuration option unix_socket, which has a description saying If *True*, the server will listen on a Unix socket at a filesystem path, instead of a TCP port.. To me, this implies that unix_socket is to be a boolean, but in the description it also sais The server should create a Unix socket bound to this path and listen for HTTP requests on it. - so, it should be a unix socket path really, which "if set" could imply something.

    • I'm also seeing is_unix_sock which reads the port and doesn't care for the unix_sock value, which make me even more confused as it seems like port is to hold the path after all.

    • I suggest that we seek agreement on what configuration API to implement in this PR, following which an implementation is made and reviewed. We could for example either:

      • support port to be a unix socket path
      • add unix_socket_path as a server process configuration option and emit a warning if port is configured (and therefore ignored) for a server process during the time service process configuration is read.

      Currently I see the second option as easier to understand intuitively, maintain, document, learn about as an end user, communicate as a new added feature in the changelog, and for users to adopt.

@ryanlovett
Copy link
Collaborator

Thanks @takluyver and @consideRatio ! I vote in favor of using a separate parameter for unix_socket_path and not overloading port.

@mahendrapaipuri
Copy link

This is a fantastic new feature! For supporting code-server from within JupyterLab this would be great. For security reasons we cannot add code-server to the JupyterLab launchpad through jupyter-server-proxy, yet, on our multi-user login-nodes. It does not support tokens/password through urlparams. But as soon as jupyter-server-proxy can proxy it through a unix socket this would be possible. Cool!

@jhgoebbert Instead of saving the password to a temp file as you are doing here, you could as well write it to $HOME/.config/code-server/config.yaml file and change the code server command to include config file in CLI arguments. When user launches code server from launchpad, login page will be shown to user and user can go find the password in $HOME/.config/code-server/config.yaml and use it to authenticate. This is more secure than passing password in urlparams as password always stays in user's home directory. Of course passing password through urlparams will avoid one more step for end user.

I agree that with unix sockets, it will be even more secure, but I think it is still doable with jupyter-server-proxy as it is now. Am I missing something here?

@consideRatio consideRatio changed the title Run server processes on a Unix socket Support proxying to a server processe via a Unix socket Dec 28, 2022
@consideRatio
Copy link
Member

@takluyver if you find time to work this, I'll make sure to find time to review this quickly going onwards!

@consideRatio consideRatio changed the title Support proxying to a server processe via a Unix socket Support proxying to a server process via a Unix socket Dec 29, 2022
@takluyver
Copy link
Member Author

Do you prefer me to resolve conflicts by rebasing (clean history) or by merging master in (accurate history)?

@consideRatio
Copy link
Member

Hi @takluyver!! Either would be acceptable but my preference at this point is a rebase!

@manics
Copy link
Member

manics commented Jan 17, 2023

I quite like the idea of having a single property to handle both tcp ports and unix sockets. This is similar to how you set the docker host (unix:///var/run/docker.sock, tcp://localhost:12345), and also maps well onto the underlying unix networking- they're effectively the same under the hood.

Would you support this if we renamed port to something more generic and kept {port} for backwards compatibility in the command template?

@takluyver
Copy link
Member Author

OK, with the new changes:

  • unix_socket=True still allocates a temp directory for a new socket, akin to port=0.
  • unix_socket='/var/run/something.sock' expects the socket at a known path. This can also be used without a command to have a named proxy to something not launched by JSP (as added for TCP ports in Accept an unset command to proxy to an already started process (unmanaged process) #339).
  • The command template uses {unix_socket} to get the path of the Unix socket. This expand to the empty string if we're using a TCP port, while {port} will expand to 0 if we're using a Unix socket.

I've rearranged some of the code in handlers.py to facilitate this. There's a new class NamedLocalProxyHandler for when we configure a proxy with a name but no command (so JSP is not managing the process). SuperviseAndProxyHandler is now back to handling only the case where we have a command to launch and supervise.

I quite like the idea of having a single property to handle both tcp ports and unix sockets. This is similar to how you set the docker host (unix:///var/run/docker.sock, tcp://localhost:12345),

ZMQ also uses a similar addressing scheme, although it calls Unix sockets ipc://. But there's no standard that I know of for this, especially if you want to include a way to specify 'Unix socket in new random temp directory'. And it's a fair few extra characters when you just want to specify a TCP port. Up to the JSP maintainers, but I think we're heading in the direction of having separate options.

@consideRatio
Copy link
Member

consideRatio commented Jan 18, 2023

Would you support this if we renamed port to something more generic and kept {port} for backwards compatibility in the command template?

Maybe, but I don't understand things well enough to form a clear opinion. I trust your judgement though @manics!


I've invested quite a bit of effort into understanding this feature already, but still feel quite lost. @manics could you work with @takluyver towards resolving this PR without me? I have a growing backlog of things I'd like to contribute with in the jupyterhub org, and reviewing something I don't understand well takes quite a bit of time and effort for me.

General review point

  • Update the PR title and description to reflect the state of the PR

@takluyver
Copy link
Member Author

I've invested quite a bit of effort into understanding this feature already, but still feel quite lost.

Sorry to hear that. If it would help, I'm happy to try writing a brief summary of the goal - as I see it - and how I've tried to implement it. But I'll respect your decision to focus on other things if you prefer.

@consideRatio
Copy link
Member

I'm happy to try writing a brief summary of the goal - as I see it - and how I've tried to implement it.

@takluyver that would be great to have, I think it would be suitable to put in an updated PR description! I'll probably end up personally benefiting from it as well when reviewing other work related to this in the feature.

I think a key piece of the complexity for me has been a lack of understanding of the assumptions of what is to be accomodated. I for example assumed that you would need to specify a unix socket path explicitly rather than allow for a temporary path be generated.

I think now, thinking about what I don't undertand its one key piece - that we need a configuration API to support the wish to use unix sockets without specifying it in the server process configuration.

Another key piece is the coupling with port in any way. But I'm starting to understand that its relevant because when you specify the command, you specify it without knowing if its linux or windows in case you develop a Python package that ships with a server process config snippet. I think I assumed that you wouldn't be able to provide either port or unix socket path to a command string and have it work in most applications. But, if that is expected in most applications, then having a single variable could be relevant. If taking this path though, one may still get into trouble if not all applications support this, so then maybe one end up needing to provide two different command's - or maybe one could let command be a callable to render differently based on linux / windows etc.

try writing a brief summary of the goal - as I see it - and how I've tried to implement it.

A big 👍 to providing a summary of the goals to accomplish with the provided implementation, the more I think about it, the more I understand that a common understanding of such goals is crucial to review and think about the implementation.


Oh I see you have now updated the description, its great!

This works already for regular HTTP requests. Forwarding websockets over a Unix socket will work once Tornado 6.3 is released.

Is this part of the traitlet configuration's help string already? If not, its probably worth putting there as well.

and fill the new {unix_socket} template argument

A key piece of understanding that I didn't caught onto quickly was that the server process definition's command was given these arguments, so maybe you could inject "command" in the sentence I quoted as well?


Thank you soo much for your thorough work on this @takluyver and helping me understand these details better!

@takluyver
Copy link
Member Author

Thanks @consideRatio !

I for example assumed that you would need to specify a unix socket path explicitly rather than allow for a temporary path be generated.

Gotcha. The temporary paths are an easy way to have a unique socket per process, and they help to ensure that only the relevant user can connect to them (this will normally be the case anyway, but it's easier to be sure with a temp folder).

one key piece - that we need a configuration API to support the wish to use unix sockets without specifying it in the server process configuration

Sorry, I don't follow? This does add an option to the server process config, and it will only use Unix sockets if you specify that option. I've focused on supplying the config via entry points, but it should work the same if you use the config system.

Another key piece is the coupling with port in any way. But I'm starting to understand that its relevant because when you specify the command, you specify it without knowing if its linux or windows in case you develop a Python package that ships with a server process config snippet.

This is still an open question for me. The config comes from Python code, it could check its platform before setting unix_socket. I haven't yet implemented an automatic fallback to TCP sockets on Windows, but it would be doable if we want.

I think I assumed that you wouldn't be able to provide either port or unix socket path to a command string and have it work in most applications.

I don't know what most applications do. 🙂 I've been thinking primarily about writing new (small) applications, where obviously I can make the argument parsing work however I want.

or maybe one could let command be a callable to render differently based on linux / windows etc.

Good news, you can already supply a callable!

@takluyver
Copy link
Member Author

Is this part of the traitlet configuration's help string already?

Ahem I had missed that help string entirely. I've updated it now.

@jhgoebbert
Copy link

@jhgoebbert Instead of saving the password to a temp file as you are doing here, you could as well write it to $HOME/.config/code-server/config.yaml file and change the code server command to include config file in CLI arguments. When user launches code server from launchpad, login page will be shown to user and user can go find the password in $HOME/.config/code-server/config.yaml and use it to authenticate. This is more secure than passing password in urlparams as password always stays in user's home directory. Of course passing password through urlparams will avoid one more step for end user.

I agree that with unix sockets, it will be even more secure, but I think it is still doable with jupyter-server-proxy as it is now. Am I missing something here?

In our scenario the user is already authenticated and authorized on a multi-user system as he is logged in through JupyterHub and our Identity Management System. From there he can use all different tools through JupyterLab in combination with jupyter-server-proxy. Of course we do not want to show him any second login page for other web-tools he can start from the launch pad. That might be different for other side's setups where a second login page is not such an issue - in our case it is a show-stopper.

@takluyver
Copy link
Member Author

Do let me know if there's anything else you want me to do here. I believe I've addressed all the feedback given so far.

I'm still open to ideas whether unix_socket=True in config means 'definitely use a Unix socket', or 'use a Unix socket if possible' (see previous comments). At present, it means definitely, but there's also no platform check to provide a consistent error message.

Tornado 6.3 is now on the horizon, but it looks like it will still be a little way away.

Copy link
Member

@consideRatio consideRatio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@takluyver I'm still struggling with review here, and we are very low on capacity so I'm trying to help out anyhow.

This is the things I'm considering:

  • Why is a new class NamedLocalProxyHandler introduced?
    Is it relevant for this feature, or is it refactoring work you think makes sense but is mostly unrelated? Concretely, if you think it should be part of this PR still, it would be helpful to let the class docstrings clarify what the class purpose is compared to its super- and sub- classes.
    I've historically found it hard to understand the separate responsibilities of _Proxy, SuperviseAndProxyHandler, new in PR: NamedLocalProxyHandler, LocalProxyHandler, and what motivated having many separate classes.
  • The term "named proxy" is unclear to me. If it refers to "a configured server process" or similar, I'd like to stick with the in-repo common terminology of configured server process to proxy to.

@takluyver
Copy link
Member Author

Right, good point. I had trouble making sense of the terminology and code structure following #339.

The docs refer to 'server processes' ('Processes that are supervised and proxied are called servers.'). The corresponding code is in SuperviseAndProxyHandler. Both the naming and the contents are dealing with processes that JSP starts and supervises. But since #339, you can configure... proxy things without asking JSP to start/supervise them. That's kind of a big conceptual change, even though it looks like a little new feature. I think this created a new type of thing that the code & docs don't really have a name for.

My attempt to represent this in the code was for NamedLocalProxyHandler to represent proxies where configuration maps a name to a port/socket, and the subclass SuperviseAndProxyHandler to represent those where JSP will also launch and supervise the server process. I could have worked in Unix sockets without this refactoring, but it was hard for me to think about the necessary changes without it. I didn't attempt to rework the docs, though I think someone should.

I'm open to other names, e.g. ConfiguredProxyHandler, if you prefer. Referring to 'server process' doesn't make much sense to me, though, because that implies that it deals with the process. (Of course, to be pedantic, anything we proxy must have a server process somewhere, but that's also true with a URL like /proxy/8000)

Copy link
Member

@consideRatio consideRatio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @takluyver for your thorough work on this, I learned a lot about this project reviewing this!

I'm looking into if I can quickly setup a test for this feature as well before merge.

Copy link
Member

@consideRatio consideRatio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, I'll go for a merge here at this point!

Thank you @takluyver for your effort into this!!! ❤️ 🎉 🌻

@consideRatio consideRatio merged commit bc49c6e into jupyterhub:main Apr 7, 2023
@welcome
Copy link

welcome bot commented Apr 7, 2023

Congrats on your first merged pull request in this project! 🎉
congrats
Thank you for contributing, we are very proud of you! ❤️

@takluyver
Copy link
Member Author

Thanks @consideRatio !

I've also just seen that Tornado 6.3 has just been released, which was the missing piece to allow this to work with websockets (it already worked with simple HTTP requests). The pieces are coming together! 🎉

@bollwyvl
Copy link
Collaborator

bollwyvl commented Apr 18, 2023

Wow, 6.3 looks like it has some massive quality of life improvements:

  • It is now much faster (no longer quadratic) to receive large messages that have been split into many fragments.
  • Tornado submodules are now imported automatically on demand. This means it is now possible to use a single import tornado statement and refer to objects in submodules such as tornado.web.RequestHandler.

My kingdom for HTTP/2, ASGI, and native brotli support, but I'll take what I can get!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proxy unix socket?
8 participants