Skip to content
This repository has been archived by the owner on Aug 2, 2023. It is now read-only.

pydevd: access tokens #1710

Closed
int19h opened this issue Aug 21, 2019 · 8 comments
Closed

pydevd: access tokens #1710

int19h opened this issue Aug 21, 2019 · 8 comments
Assignees
Milestone

Comments

@int19h
Copy link
Contributor

int19h commented Aug 21, 2019

Starting pydevd in server mode has security implications - any process that can connect to the opened port can then execute arbitrary code inside the process that hosts pydevd. In scenarios like #1706, this becomes a potential local escalation of privilege exploit.

The same thing goes for the debug adapter operating in server mode (accepting incoming DAP connections from debug servers, as in #1706) - if the adapter crashes or otherwise exits before the server can connect to it, a lower-privileged process can then take over the port, and intercept the incoming DAP connection to execute arbitrary code.

To prevent it, we need to add some basic authorization via access tokens. Pydevd needs to provide a way to specify a server and a client token for socket connections, as keyword parameters to pydevd.settrace(), e.g.:

pydevd.settrace(..., server_access_token="foo", client_access_token="bar")

When the server token is specified, any incoming connection must also communicate the matching token to pydevd in the "initialize" request:

{
  ...
  "command": "initialize",
  "arguments": {
    ...
    "pydevd": {
       "serverAccessToken": "foo"
    }
  }
}

If the token is not specified, or doesn't match the one that was passed to settrace(), pydevd should send a failure response with the appropriate message, and terminate the connection.

When the client token is specified, pydevd must send this token in "initialize" response:

{
  ...
  "command": "initialize",
  "success": true,
  "body": {
    ...
    "pydevd": {
       "clientAccessToken": "bar",
    }
  }
}

The client will immediately disconnect if the token it receives doesn't match the one it expects.

For the sake of simplicity and consistency, the checks should apply to all connections regardless of their direction (i.e. it should still check "serverAccessToken" even if the connection is server->client, and it should still check "clientAccessToken" even if the connection is client->server).

@int19h int19h changed the title pydevd: token for incoming connections pydevd: access tokens Aug 22, 2019
@fabioz
Copy link
Contributor

fabioz commented Aug 22, 2019

@int19h I was thinking a bit about this... instead of having a serverAccessToken and a clientAccessToken isn't a single accessToken (which would be used for both) enough?

@karthiknadig karthiknadig added this to the Aug 2019.2 milestone Aug 22, 2019
@int19h
Copy link
Contributor Author

int19h commented Aug 22, 2019

The problem with shared token is that the client has to send the token first (in "initialize" request) before it can validate the one that it gets from the server (in "initialize" response). If the server is not real pydevd, but a malicious process, it would simply return the same token in response that it received in the request, making client access unsecured. And we do want to secure it, because of the "runInTerminal" reverse request.

@fabioz
Copy link
Contributor

fabioz commented Aug 22, 2019

Ok

@fabioz
Copy link
Contributor

fabioz commented Aug 26, 2019

@int19h @karthiknadig

I'm currently implementing this (I hope to finish it today) -- now, I've actually named it just accessToken and clientAccessToken as I think it may be a bit more natural for the user to do: pydevd.settrace(access_token='foo', client_access_token='bar') -- but I'd like to check whether you think that's ok too...

i.e.: I think that by using just access_token it's already implicit that it's the access token for the debugger (otherwise maybe I'd call it debugger_access_token and ide_access_token or something else along those lines as I think the user can become confused on who's the client and who's the server).

@karthiknadig
Copy link
Member

pydevd.settrace(access_token='foo', client_access_token='bar') Looks fine to me.

@int19h
Copy link
Contributor Author

int19h commented Aug 26, 2019

We've had some lack of clarity on how different endpoints are referred to for a while. Officially it's "DAP client" and "DAP server", but like you say, that can be confusing from user perspective, because sometimes it's the client that is listening for connections, and the server that is connecting. So for ptvsd, we've been using "IDE" and "debug server" also, but not very consistently.

Recently, I cleaned up the docstrings for our public API, and it's always "IDE" and "debug server" there now:

def wait_for_attach():
"""If an IDE is connected to the debug server in this process,
returns immediately. Otherwise, blocks until an IDE connects.
While this function is waiting, it can be canceled by calling
wait_for_attach.cancel().
"""
from ptvsd.server import api
return api.wait_for_attach()
def enable_attach(address, log_dir=None, multiprocess=True):
"""Starts a DAP (Debug Adapter Protocol) server in this process,
listening for incoming socket connection from the IDE on the
specified address.
address must be a (host, port) tuple, as defined by the standard
socket module for the AF_INET address family.
If specified, log_dir must be a path to some existing directory;
the debugger will then create its log files in that directory.
A separate log file is created for every process, to accommodate
scenarios involving multiple processes. The log file for a process
with process ID <pid> will be named "ptvsd_<pid>.log".
If multiprocess is true, ptvsd will also intercept child processes
spawned by this process, inject a debug server into them, and
configure it to attach to the same IDE before the child process
starts running any user code.
Returns the interface and the port on which the debug server is
actually listening, in the same format as address. This may be
different from address if port was 0 in the latter, in which case
the server will pick some unused ephemeral port to listen on.
This function does't wait for the IDE to connect to the debug server
that it starts. Use wait_for_attach() to block execution until the
IDE connects.
"""
from ptvsd.server import api
return api.enable_attach(address, log_dir)
def attach(address, log_dir=None, multiprocess=True):
"""Starts a DAP (Debug Adapter Protocol) server in this process,
and connects it to the IDE that is listening for an incoming
connection on a socket with the specified address.
address must be a (host, port) tuple, as defined by the standard
socket module for the AF_INET address family.
If specified, log_dir must be a path to some existing directory;
the debugger will then create its log files in that directory.
A separate log file is created for every process, to accommodate
scenarios involving multiple processes. The log file for a process
with process ID <pid> will be named "ptvsd_<pid>.log".
If multiprocess is true, ptvsd will also intercept child processes
spawned by this process, inject a debug server into them, and
configure it to attach to the same IDE before the child process
starts running any user code.
This function doesn't return until connection to the IDE has been
established.
"""
from ptvsd.server import api
return api.attach(address, log_dir)
def is_attached():
"""True if an IDE is connected to the debug server in this process.
"""
from ptvsd.server import api
return api.is_attached()
def break_into_debugger():
"""If the IDE is connected, pauses execution of all threads, and
breaks into the debugger with current thread as active.
"""
from ptvsd.server import api
return api.break_into_debugger()
def debug_this_thread():
"""Tells debugger to start tracing the current thread.
Must be called on any background thread that is started by means
other than the usual Python APIs (i.e. the "threading" module),
for breakpoints to work on that thread.
"""
from ptvsd.server import api
return api.debug_this_thread()

So when we wrap pydevd APIs, that will be the terminology. Of course, since we do wrap it, pydevd doesn't need to be consistent with it - but we can standardize on this across debuggers for even more clarity :)

And I agree - for settrace (and enable_attach), the token that users will be dealing with most of the time is the server token, so we can just call that access_token, and the other one ide_access_token.

In the protocol, I'd still prefer it to have full names tho, mainly to make the message logs clearer - firstly to make it clear which token it is, when it's only specified for the server; and secondly, to make them easier to grep (if one is a substring of the other, that makes it harder). So maybe debugServerAccessToken and ideAccessToken in JSON?

fabioz added a commit to fabioz/ptvsd that referenced this issue Aug 27, 2019
fabioz added a commit to fabioz/ptvsd that referenced this issue Aug 27, 2019
@fabioz
Copy link
Contributor

fabioz commented Aug 27, 2019

Ok, I've just created the pull request for this. I kept debugServerAccessToken and ideAccessToken in json and access-token / ide-access-token on pydevd apis.

@fabioz
Copy link
Contributor

fabioz commented Aug 28, 2019

Done in the refactor branch.

@fabioz fabioz closed this as completed Aug 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants