Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gamemode: add new portal for GameMode access #314

Merged
merged 4 commits into from
May 20, 2019

Conversation

gicmo
Copy link
Contributor

@gicmo gicmo commented Apr 29, 2019

This is to resolve flathub/com.valvesoftware.Steam#77 where the main problem is that we need to resolve the process id from within the flatpak so that GameMode can work with the correct one:

Add a new portal so that GameMode service can be accessed from within the sandbox. GameMode expects to get the process id (pid) of the game so it can:

  • track the lifetime of the game
  • modify process parameters (nice level, scheduler policy, ...)
  • resolve the executable name and trigger configured actions

Since processes inside the sandbox run in a separate pid namespace sending the client pid obtained via getpid(2) won't work. Getting the pid via the D-Bus connection will result in getting the pid of the dbus proxy. According to pid_namespaces(7) process ids will be translated by the kernel when they are passed over a unix domain socket. But passing a connected unix socket via dbus messages and then obtaining the credentials for that socket (pair) will also resolve the pid to the dbus proxy. Therefore we nest one level more and send a socket of a socket pair via a connected socket that was passed via dbus. This additional level of indirection then results in the correct credentials.

I wanted to get some feedback on the general approach, before I continue going down that road. I looked at different options to get the pid from within the sandbox but most obvious was the pass socket over socket approach. If someone has a better idea I am all ears. Also, currently I am using the data part of the message, where the nested socket gets passed, to send which gamemode action (Register, Unregister) to execute but that could be split up into individual message if that is deemed to be cleaner.

There is a simple tester app (gamemode-tester) that has the corresponding client code in the portal branch (NB: the use native library option needs to be off).

@matthiasclasen
Copy link
Contributor

While this does sound like a technically working approach, getting the pid for an app seems like something that ideally shouldn't require direct app involvement - maybe we can add an api to the dbus proxy to get it, and make it available in XdpAppInfo for all portals ?

@gicmo
Copy link
Contributor Author

gicmo commented Apr 29, 2019

Yeah, I think adding the API to the proxy would be indeed be a better way. I briefly thought of that yesterday but I wanted to discuss it first because that would be more work. But I am happy to do it, if that is what we agree on!

@alexlarsson
Copy link
Member

I'm not sure what would be a good way to do that though. We could make the proxy intercept some magic name and return the pid, but older versions of the proxy will not do that, allowing the user to catch that and return whatever.

Also, any such method wouldn't work for older versions of flatpak/xdg-dbus-proxy which isn't ideal either.

@gicmo
Copy link
Contributor Author

gicmo commented Apr 29, 2019

Another approach I tried out yesterday was to map the pid from the container to the host via /proc/. One can get the pid namespace (the ino_t) for the container and then search /proc for processes within that and use NSpid in status entry to map the pid. I made a POC (pidmap) and it works as expected. Maybe if it only done once and stored in the XdpAppInfo it would not be that of an overhead?

@alexlarsson
Copy link
Member

@gicmo How would you know pid to look for though? You would need the app to self-identify the pid, which is not great for security/robustness.

@alexlarsson
Copy link
Member

Also, for the snap case, and later when we get the dbus work done we don't use a proxy, so Ideally we should not hardcode requirements on it.

@alexlarsson
Copy link
Member

Honestly, i prefer the pass fd + use peerpid. It is robust, safe and works for all dbus containment methods.

@gicmo
Copy link
Contributor Author

gicmo commented Apr 29, 2019

@alexlarsson true, but that is how GameMode API works (on the host). The game calls RegisterGame (pid_t) and self identifies. Of course in the sandbox this is the namespace'ed one (that needs to be mapped then).

@alexlarsson
Copy link
Member

Well, if the app needs to do something special it might as well do the working thing (socket) rather than the one needing lots of hackery (pid)

@gicmo
Copy link
Contributor Author

gicmo commented Apr 29, 2019

@alexlarsson sounds good to me! I will clean this up and add permission support etcpp then.

@alexlarsson
Copy link
Member

I remember some work on the kernel about opening dirs in /proc to get an fd to a pid.
Maybe that can work? Not sure if its landed in the kernel, and when though.

@gicmo
Copy link
Contributor Author

gicmo commented Apr 29, 2019

Do you mean that: https://lkml.org/lkml/2018/4/4/677 ?

@alexlarsson
Copy link
Member

No. There was some talk on lwn recently about having fds open to the directories in /proc/$pid as somehow referencing the pid.

Probably not even upstream yet, lets go with the socket thing, its lowtech enough to work in linux 1.0.

@smcv
Copy link
Collaborator

smcv commented Apr 29, 2019

GameMode expects to get the process id (pid) of the game

This is an inherently race-prone API, because Linux doesn't yet have anything that holds a reference to a process and prevents reuse (analogous to Windows process handles). In principle, a sufficiently malicious game could pass its process ID (let's say 12345) to the portal, then fork and exit repeatedly until the next process ID that will be used by the kernel is 12345, then do something that will cause a victim process to be started with pid 12345 (maybe D-Bus service activation). Now the process ID that was given to the GameMode service points to the victim process instead, and GameMode will be manipulating the victim process's nice level instead.

(I don't know how much of a threat this really is, because I don't know all the things that GameMode does.)

@alexlarsson
Copy link
Member

What systemd does in these situations is pair the pid with the process start time. You can then only race it if you manage to do so in 1 msec (or whatever the time scale is).

@smcv
Copy link
Collaborator

smcv commented Apr 29, 2019

... but see also CVE-2019-6133.

@gicmo
Copy link
Contributor Author

gicmo commented Apr 30, 2019

This is an inherently race-prone API

Yeah, it is not great. GameMode (the session service) does no verification that the calling process actually has the process id that was specified in the dbus call anyway. Hence, any program that can talk to GameMode service can just plainly request that GameMode should be activated for any process anyway. No need for trying to tricking it into it.

Now in the flatpak sandbox case, where we have separate pid namespace, if we were to use the same API (RegisterGame (pid_t)) for the portal, and then in the portal map the specified pid to the host one, that would all still be true for any programs in the sandbox. But a sandboxed program would have a hard time trying to request GameMode for a any program outside the sandbox.

NB: the suggested API, if we do the socket over socket think, would just be RegisterGame() and the pid of the caller would be inferred via SO_PEERCRED of the socket.

@matthiasclasen
Copy link
Contributor

Can we not ask for a proper api from the gamemode daemon?

@gicmo
Copy link
Contributor Author

gicmo commented Apr 30, 2019

@alexlarsson I realized that a set of new APIs was added to GameMode in master (what will be 1.4) that mimic the Register/Unregister/QueryStatus API but take an additional for_pid argument, so a process can make request on behave for another process (I imagine something like Steam activating GameMode for a game it has spawned). To support these the socket-over-socket would not work but we would have to map the pid via procfs. I wonder if we want to support those too and therefore should always do the pid mapping via procfs?

@alexlarsson
Copy link
Member

Yeah, maybe what we need is to be able to get the pid namespace in use for the peer, and then we can look up the pid in that namespace maybe? That way we can also minimize security issues, as you can't give out pids in other namespaces.

@gicmo
Copy link
Contributor Author

gicmo commented Apr 30, 2019

@alexlarsson my plan for the pid mapping was: in the portal we get the pid of the bwrap process inside the sandbox via /var/run/user/<uid>/.flatpak/<flatpak-instanceid>/bwrapinfo.json (the instance id via XdpAppInfo). For there we get the pid namespace via stat(2) & struct stat.st_ino on /proc/<pid>/ps/pid. We can then filter for processes in that pid namespace id by comparing and then using NSpid in /proc/<x>/status to match the pid within the container. I had a quick chat with Christian Brauner yesterday (the kernel guy working on https://lwn.net/Articles/784831/) and he could also not think of another way to do it.

@alexlarsson
Copy link
Member

bwrapinfo is flatpak >= 1.0.2, but that is probably ok to rely on at this point.
I wonder if we should extend that file with the various namespaces though, to avoid the stat (and corresponding races). We could add it and then use it if its there, falling back to the stat if not.
While doing that we should probably add the process start time too.

Also, the pid filtering should ensure uid matches for extra safety.

We can then filter for processes in that pid namespace id by comparing and then using NSpid in /proc//status to match the pid within the container.

Seems like we first need to filter by comparing stat /proc/$pid-we-listed/ns/pid and comparing with the one extracted from the sandbox. Then we can use the NSpid part of status to see if this is the pid that was meant.

Also, we should think about snappy, @jhenstridge does snap use pid namespaces?

@gicmo gicmo force-pushed the gamemode branch 2 times, most recently from 3536b5e to 2fe6df2 Compare May 6, 2019 12:57
@gicmo
Copy link
Contributor Author

gicmo commented May 6, 2019

@alexlarsson I think we meant the same thing; at least I implemented a version of what I described, so you can see the code. Currently the mapping is not cached, but that could be added without too much work.

@gicmo gicmo force-pushed the gamemode branch 3 times, most recently from df4194c to 006fde9 Compare May 7, 2019 12:18
@gicmo gicmo marked this pull request as ready for review May 7, 2019 12:21
@gicmo
Copy link
Contributor Author

gicmo commented May 7, 2019

Aright, I think that is a first complete version that is good to review. Currently the mapping is not cached yet, but I am not sure that it is worth the overhead it creates and most likely there will be only two calls "register" and "unregister" for the same pid in the same flatpak.

Once this is ok to merge, I am also happy to do the bwrap part to store the namespace ids in the bwrapinfo.json so we can avoid the initial lookup if that information is there.

@gicmo
Copy link
Contributor Author

gicmo commented May 7, 2019

I have updated GameMode Tester to work with the current portal API so that can be used to test.

@matthiasclasen
Copy link
Contributor

Before looking in detail at the code, lets discuss the api.

QueryStatus
RegisterGame
UnregisterGame

Looks like this is mirroring the existing gamemode api, rather than providing a portal-like api. Are RegisterGame/UnregisterGame really StartGameMode/EndGameMode ?

Are any of these calls meant to have user interaction ?

Who is expected to use this api from the application side ?

Do we expect gamemode to grow transparent portal support ?

@gicmo
Copy link
Contributor Author

gicmo commented May 10, 2019

Before discussing the API, let me just quickly re-iterate how the gamemode client side things work, just to be on the same page: The client side of gamemode is divided into two parts: the gamemode_client.h which contains the 3 (or 6 for >= 1.4) gamemode API calls implemented as static inline functions. These functions will at runtime open the actual client implementation library (libgamemode.so) via dlopen. This is convenient for us because we can add flatpak/portal support to already released games (Rise of the Tomb Raider) by either adding portal support to upstream gamemode or having our own version of libgamemode.so in the flatpak that supports the portal. I would indeed add the flatpak support to upstream library as well and I don't think they would be against it, but with the option to ship our own libgamemode.so in flatpaks don't even have to rely on that (and still can support all games).

In any case this portal support would be added in a transparent way to the libraries, i.e. if we are inside the flatpak, use the portal, otherwise talk to GameMode directly (I think that is what you mean, right?). My implementation of the client library already does it that way.

About user interaction, no, unless we want to ask the user if it is ok to activate gamemode, but after the recent discussion with Alex and also from my experience with thunderbolt I don't think that is a good question to present to the user.

Ok, now to the real API discussion: I think I understand what you mean by not very portal like. I initially was thinking of a Session-like API (RequestGameMode (void) -> org.freedesktop.portal.Session, and keeping track of it for QueryStatus.)
But then I decided to intentionally mirrored the API because I assume that games wont use the D-Bus API directly and instead use gamemode_client.h so they work transparently in outside as well as in flatpaks. Also it makes adding portal support to the existing libgamemode.so (and also my version of it) much harder (we don't have a mainloop guarantee, everything has to happen within a function call). And in that respect, even if clients were to start using the API directly it seems convenient to keep the API the same so it is easy to support the portal and non-portal way.

To be honest I also don't see much of a difference in calling RegisterGame/UnregisterGame StartGameMode/EndGameMode either way. But they don't precisely reflect the semantics: gamemode is active as long as there is at least one client registered. Therefore if a game in the flatpak is the second client, calling StartGameMode is not exactly "right" since it is already active ("started"). Analogous if there is another client around after the EndGameMode then gamemode will still be active.

@matthiasclasen
Copy link
Contributor

To be honest I also don't see much of a difference in calling RegisterGame/UnregisterGame StartGameMode/EndGameMode either way

I was just clarifying my understanding of the api, since there is no documentation

@gicmo
Copy link
Contributor Author

gicmo commented May 13, 2019

I was just clarifying my understanding of the api, since there is no documentation

Ah sorry, I misunderstood that as a suggestion. And indeed, GameMode could use a dbus interface description file, totally agree.

@jhenstridge
Copy link
Contributor

Also, we should think about snappy, @jhenstridge does snap use pid namespaces?

Sorry for the delay: no, snapd does not make use of pid namespaces in its sandboxing.

src/gamemode.c Outdated Show resolved Hide resolved
src/gamemode.c Outdated Show resolved Hide resolved
src/gamemode.c Outdated Show resolved Hide resolved
@matthiasclasen
Copy link
Contributor

Also, please rebase the branch

gicmo added 3 commits May 17, 2019 09:48
Don't use it, lose it!
Retrieves the flatpak instance id from the .flatpak_info keyfile.
Containers can use pid_namespaces(7) to encapsulate the processes
running inside the container from seeing the process ids (pids)
of the host. A generic way to translate a process identifier from
within the container to the outside that does not need the
cooperation of the target process is to use /proc which contains
the information about the namespace for each process as well as
the process identifier of the process in each namespace. The
namespace id can be obtained via '/proc/<pid>/ns/pid'; the pids
for all the namespaces via '/proc/<pid>/status' and its 'NSpid'
field.
For Flatpaks, which uses bubble wrap, the bwrapinfo.js file has the
child-pid field, which is the bwrap process inside the sandbox.
Via this pid the pid namespace id can be determined. All process in
/proc are then scanned and filtered for ones that belong to the same
pid namespace. For those their 'status' file is parsed and the pid
within the namespace matched against the ones that need to be
translated.
@pid: Process id of the game to register
@result: Status of the request

Register a game with GameMode and thus request GameMode to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this counting ? What happens if I call it twice for the same pid ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to look that up: Trying to register a client (game) that is already registered will result in an error being return (result 🠒 0). So no per-client counting, only counting the number of clients.

@result: Status of the request

Un-register a game from GameMode; if the call is successful
and @pid was the last registered game, GameMode will be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last registered game? This is unclear - isn't this just about how often Register/Unregister have been called for the same pid ?

Related question: what happens when the caller exits without unregistering ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean was: if the game with pid was the only game left that requested GameMode then GameMode will be globally deactivated; conversely even if Unregister ist successfully done for pid there might be other clients/games registered so GameMode will stay active.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related question: what happens when the caller exits without unregistering ?

The GameMode session daemon periodically checks if pid still exists, and auto-unregisters "expired" clients/games.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rephrased it, maybe it is clearer now? I also added some bits about monitoring to the general introduction.

src/gamemode.c Outdated
GType gamemode_get_type (void) G_GNUC_CONST;
static void gamemode_iface_init (XdpGameModeIface *iface);

G_DEFINE_TYPE_WITH_CODE (GameMode, gamemode, XDP_TYPE_GAME_MODE_SKELETON,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that you have a capital M here, I would expect the prefix to be game_mode

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I changed it to game_mode.

Add a new portal so that GameMode service can be accessed from
within the sandbox. GameMode expects to get the process id (pid)
of the game so it can:
  - track the lifetime of the game
  - modify process parameters (nice level, scheduler policy, ...)
  - resolve the executable name and trigger configured actions

Since processes inside the sandbox run in a separate pid namespace
sending the client pid that was handed to the portal is not what
GameMode is expecting, since it needs the pid outside the sandbox.
Therefore the pid is translated via xdg_app_info_map_pids().
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants