VCHIQ breaking with namespaced PIDs (e.g. inside containers)

## Introduction

We have encountered an issue with VCHIQ via a simple EGL application which fails to operate correctly in a namespaced process hierarchy, which results in EGL applications not functioning correctly inside of a container.

This can easily be repro'd by attempting to run the firmware project's [hello triangle](https://github.com/raspberrypi/firmware/tree/master/opt/vc/src/hello_pi/hello_triangle) example inside a container/in the host OS direct, or if you don't want to use containers, via @petrosagg 's [example repro code](https://github.com/raspberrypi/linux/pull/1279#issuecomment-181282210) which simply creates a pid namespace and attempts to invoke the same example.

Typically, the application shows a frame or 2 then freezes, with `vcdbg log msg` outputting messages like:

```
025819.942: *** No KHAN handle found for pid 24
025836.615: *** No KHAN handle found for pid 24
025853.285: *** No KHAN handle found for pid 24
```

This message can be traced back to code contained inside the proprietary `start.elf` binary (and is perhaps coming from the GPU itself.)

When run outside of a namespaced process, the application functions entirely correctly.
### Previous reports

We've previously [reported this](https://github.com/raspberrypi/firmware/issues/532) over at the firmware project, along with [a PR](https://github.com/raspberrypi/linux/pull/1279) which does a major hack to work around the problem - associating a _namespaced_ pid with a VCHIQ instance rather than a global pid, but is not a good solution as it potentially results in collisions.
## Work-in-progress patch

**[diff vs. rpi-4.1.y](https://github.com/raspberrypi/linux/compare/rpi-4.1.y...lorenzo-stoakes:spelunk)** 

**Important:** This is an spelunky proof-of-concept RFC patch, see below for more details.
### Description

We've spent some time working on a solution to this problem and have a rather rough patch in development which takes a significantly saner approach:
1. When messages are queued by userspace via a VCHIQ_IOC_QUEUE_MESSAGE ioctl (in [vchiq_ioctl](https://github.com/raspberrypi/linux/blob/rpi-4.1.y/drivers/misc/vc04_services/interface/vchiq_arm/vchiq_arm.c#L645)), we manually alter the elements of the message to replace the namespaced PID with a global PID for the GPU to consume.
2. When the kernel calls back into userland (in [make_service_callback](https://github.com/raspberrypi/linux/blob/rpi-4.1.y/drivers/misc/vc04_services/interface/vchiq_arm/vchiq_core.c#L386)), and a PID is included in the header, we manually alter that header to replace the global PID with the namespaced PID.
3. When a bulk request specifies a service client ID (which is equivalent to a PID in the VCHIQ code), we replace the namespaced PID with a global PID.
4. When instance information is dumped from `/dev/vchiq`, the code now does _not_ dump data on instances with PIDs with a different namespace and shows the namespaced PID for each instance, this avoids collisions and confusing information for userland.
5. One fly in the ointment, which is an issue even if no code is changed, is that the debugfs entries at `/sys/kernel/debug/vchiq/clients/<pid>` are added with _global_ PIDs rather than namespaced PIDs. If we replace them with namespaced PIDs we can have collisions, if we don't namespaced userland won't know what to do with these. For the time being we've left this, as leaving it as-is sidesteps collisions. In theory it should be possible to change the layout of this directory for different PID namespaces.

With these changes, all internal kernel data structures and messages sent to the GPU reference _global_ PID, no collisions are possible and everything on that side works as if there wasn't a namespacing restraint at all, while as far as userland is concerned it is receiving messages with valid namespaced PIDs.
### Spelunking

Currently, our proof-of-concept code is dirtily spelunking into `void *` arrays and taking a guess at where to change to values, which I should point out is _clearly_ not what we are proposing long-term here. Additionally there are probably some details in this code which are flakey/wrong, at the moment it's a really rough experiment (ok enough caveats :)

However, I am sure that by using what we know about the data structures passed as messages we can correctly identify what needs to be changed and when without needing to guess anything. Advice on this would be useful too if you feel the approach isn't completely insane.

We'd like your input on this before we proceed further in case we are going down the wrong path, you have major objections to the approach or you have some input on this. If you feel another route should be taken do let us know, as we are eager to enable EGL applications (and VCHIQ clients in general) to work correctly inside containers (we = [resin.io](https://resin.io), who rather love containers :)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

VCHIQ breaking with namespaced PIDs (e.g. inside containers) #1382

Introduction

Previous reports

Work-in-progress patch

Description

Spelunking

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

VCHIQ breaking with namespaced PIDs (e.g. inside containers) #1382

Description

Introduction

Previous reports

Work-in-progress patch

Description

Spelunking

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions