Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interfaces/builtin/opengl: allow access to Tegra X1 #6623

Merged
merged 4 commits into from
Mar 26, 2019

Conversation

alfonsosanchezbeato
Copy link
Member

Allow access to fuses and devices created by the Tegra X1 drivers, so
CUDA can be used in this chip.

Allow access to fuses and devices created by the Tegra X1 drivers, so
CUDA can be used in this chip.
@zyga zyga self-requested a review March 19, 2019 16:40
Copy link
Collaborator

@zyga zyga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@zyga zyga requested review from jdstrand and bboozzoo March 19, 2019 16:41
unix (bind,listen) type=seqpacket addr="@cuda-uvmfd-[0-9a-f]*",
/{dev,run}/shm/cuda.* rw,
/dev/nvhost-* rw,
/dev/nvmap rw,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's annoying how many different devices there are for nvidia.... Also note that the /dev/nvhost-* and /dev/nvmap devices have been prone to security issues with papers being written about them (eg, but this is only one of many CVEs: https://googleprojectzero.blogspot.com/2015/01/exploiting-nvmap-to-escape-chrome.html). My concerns regarding maintenance and direct access to the devices is expressed here: https://bugs.launchpad.net/ubuntu/+source/apparmor-easyprof-ubuntu/+bug/1197133 (though my main worry is direct access to all graphics devices, not just nvidia).

Unfortunately, access to these devices is a requirement for the hosts that have them and while the accesses make me uncomfortable, there isn't anything we can do about it atm with the state of graphics drivers access on Linux.

Regardless of my worry, this is going to need changes to setup_devices_cgroup() in cmd/snap-confine/udev-support.c (and the apparmor profile), no?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jdstrand agreed about required changes to snap-confine's Nvidia device handling.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zyga @jdstrand I have added the nvmap and nvhost-* devices to the udev plug so they get added to the cgroup of the confined process. Fortunately, in this case nvidia supports access from sysfs, so there is no need to modify udev-support.c.

One interesting thing that might be a bug: originally I was able to access the devices without changing openglConnectedPlugUDev because actually no cgroup was being created for the snap processes. I think that the reason was that the rules:

KERNEL=="renderD[0-9]*", TAG+="snap_matrix-mul_matrixMul"
KERNEL=="vchiq", TAG+="snap_matrix-mul_matrixMul"
SUBSYSTEM=="drm", KERNEL=="card[0-9]*", TAG+="snap_matrix-mul_matrixMul"
TAG=="snap_matrix-mul_matrixMul", RUN+="/usr/lib/snapd/snap-device-helper $env{ACTION} snap_matrix-mul_matrixMul $devpath $major:$minor"

did not trigger the call to snap-device-helper, as actually none of renderD[0-9]*, vchiq or drm actually exist in the system. Maybe there should always a rule here to make sure the call really happens (like KERNEL=="null"?)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did not trigger the call to snap-device-helper, as actually none of renderD[0-9]*, vchiq or drm actually exist in the system. Maybe there should always a rule here to make sure the call really happens (like KERNEL=="null"?)

We've considered unconditional use of the device cgroup, but that is known to break at least certain software (eg, greengrass) which is why we've recently added the ability for an interface to opt out of setting up the device cgroup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did not trigger the call to snap-device-helper, as actually none of renderD[0-9]*, vchiq or drm actually exist in the system. Maybe there should always a rule here to make sure the call really happens (like KERNEL=="null"?)

We've considered unconditional use of the device cgroup, but that is known to break at least certain software (eg, greengrass) which is why we've recently added the ability for an interface to opt out of setting up the device cgroup.

But, if connectedPlugUDev is defined, should not a cgroup be created always if the interface is connected? What I am saying is that depending on whether some devices happen to exist in the system, the cgroup would be created or not. That would mean, for instance, that the same snap, depending on whether installed in a Jetson TX1 or on, say, a PC laptop with nVidia GPU, in the first case it would not have a cgroup, while it would in the later (talking about this interface before the PR).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did not trigger the call to snap-device-helper, as actually none of renderD[0-9]*, vchiq or drm actually exist in the system. Maybe there should always a rule here to make sure the call really happens (like KERNEL=="null"?)

We've considered unconditional use of the device cgroup, but that is known to break at least certain software (eg, greengrass) which is why we've recently added the ability for an interface to opt out of setting up the device cgroup.

But, if connectedPlugUDev is defined, should not a cgroup be created always if the interface is connected? What I am saying is that depending on whether some devices happen to exist in the system, the cgroup would be created or not. That would mean, for instance, that the same snap, depending on whether installed in a Jetson TX1 or on, say, a PC laptop with nVidia GPU, in the first case it would not have a cgroup, while it would in the later (talking about this interface before the PR).

No. The current design is that if there are any tagged devices that are assigned to the snap, only then will a cgroup be used and the devices added since from a security POV there is no point in using a device cgroup if nothing is tagged for the device (indeed, as mentioned previously, there are times we explicitly don't want the device cgroup). Your evaluation of the situation is accurate and is the current design. We may change that design in the future, but it is not considered a bug.

When it can be a bug is for hot-pluggable devices such as joysticks since a snap might be started with no gamepads plugged in and then a gamepad is plugged in after the fact. This was the precise scenario when we considered making the device cgroup mandatory, but felt at the time it was too risky of a behavior change (it might be reconsidered at a future date). Feel free to add /dev/full to the opengl interface if you feel this is an issue for this interface.

@anonymouse64
Copy link
Member

One reason why I originally advocated putting these NVIDIA device accesses inside a cuda-support interface is that would help alleviate some of these security worries, as opengl is auto-connected and used for many non-CUDA applications, so we could move some of the less commonly used accesses like these to an interface that isn't auto-connected.

// will be added by snap-confine.
var openglConnectedPlugUDev = []string{
`SUBSYSTEM=="drm", KERNEL=="card[0-9]*"`,
`KERNEL=="vchiq"`,
`KERNEL=="renderD[0-9]*"`,
`KERNEL=="nvhost-*"`,
`KERNEL=="nvmap"`,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless those are GPL and udev sees them you need to also change snap-confine's udev-support.c

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if they are GPL or not, but udev does see them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are they represented in sysfs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bboozzoo
Copy link
Collaborator

Unit tests need an update

FAIL: opengl_test.go:92: OpenglInterfaceSuite.TestUDevSpec
opengl_test.go:95:
    c.Assert(spec.Snippets(), HasLen, 4)
... obtained []string = []string{"# opengl\nKERNEL==\"nvhost-*\", TAG+=\"snap_consumer_app\"", "# opengl\nKERNEL==\"nvmap\", TAG+=\"snap_consumer_app\"", "# opengl\nKERNEL==\"renderD[0-9]*\", TAG+=\"snap_consumer_app\"", "# opengl\nKERNEL==\"vchiq\", TAG+=\"snap_consumer_app\"", "# opengl\nSUBSYSTEM==\"drm\", KERNEL==\"card[0-9]*\", TAG+=\"snap_consumer_app\"", "TAG==\"snap_consumer_app\", RUN+=\"/usr/lib/snapd/snap-device-helper $env{ACTION} snap_consumer_app $devpath $major:$minor\""}

@bboozzoo
Copy link
Collaborator

Unit tests should pass now. I've also merged current master which has the necessary fixes for spread tests that were causing pain recently.

@alfonsosanchezbeato
Copy link
Member Author

One reason why I originally advocated putting these NVIDIA device accesses inside a cuda-support interface is that would help alleviate some of these security worries, as opengl is auto-connected and used for many non-CUDA applications, so we could move some of the less commonly used accesses like these to an interface that isn't auto-connected.

Right, this is a good idea. There are things that could be added there that should not be in the opengl interface (like be able to call mknod, thing that CUDA libraries do on first time call). Only issue, I'm not sure how separable are pure GPU devices of CUDA devices. But of course the shared resources could be in both interfaces.

@alfonsosanchezbeato
Copy link
Member Author

Unit tests should pass now. I've also merged current master which has the necessary fixes for spread tests that were causing pain recently.

Thanks!

@jdstrand
Copy link

jdstrand commented Mar 21, 2019

One reason why I originally advocated putting these NVIDIA device accesses inside a cuda-support interface is that would help alleviate some of these security worries, as opengl is auto-connected and used for many non-CUDA applications, so we could move some of the less commonly used accesses like these to an interface that isn't auto-connected.

This isn't just about CUDA though. These accesses were needed, for example, for certain models of Google phones to run with Ubuntu Touch (ie, these accesses are needed to use the GPU at all on those devices; CUDA also needs to use the GPU, but for different reasons so splitting them out, at least for this case, doesn't make sense).

Copy link

@jdstrand jdstrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes!

Copy link
Collaborator

@bboozzoo bboozzoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bboozzoo bboozzoo merged commit 4ebed4e into snapcore:master Mar 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants