Skip to content

VCHIQ breaking with namespaced PIDs (e.g. inside containers) #1382

@lorenzo-stoakes

Description

@lorenzo-stoakes

Introduction

We have encountered an issue with VCHIQ via a simple EGL application which fails to operate correctly in a namespaced process hierarchy, which results in EGL applications not functioning correctly inside of a container.

This can easily be repro'd by attempting to run the firmware project's hello triangle example inside a container/in the host OS direct, or if you don't want to use containers, via @petrosagg 's example repro code which simply creates a pid namespace and attempts to invoke the same example.

Typically, the application shows a frame or 2 then freezes, with vcdbg log msg outputting messages like:

025819.942: *** No KHAN handle found for pid 24
025836.615: *** No KHAN handle found for pid 24
025853.285: *** No KHAN handle found for pid 24

This message can be traced back to code contained inside the proprietary start.elf binary (and is perhaps coming from the GPU itself.)

When run outside of a namespaced process, the application functions entirely correctly.

Previous reports

We've previously reported this over at the firmware project, along with a PR which does a major hack to work around the problem - associating a namespaced pid with a VCHIQ instance rather than a global pid, but is not a good solution as it potentially results in collisions.

Work-in-progress patch

diff vs. rpi-4.1.y

Important: This is an spelunky proof-of-concept RFC patch, see below for more details.

Description

We've spent some time working on a solution to this problem and have a rather rough patch in development which takes a significantly saner approach:

  1. When messages are queued by userspace via a VCHIQ_IOC_QUEUE_MESSAGE ioctl (in vchiq_ioctl), we manually alter the elements of the message to replace the namespaced PID with a global PID for the GPU to consume.
  2. When the kernel calls back into userland (in make_service_callback), and a PID is included in the header, we manually alter that header to replace the global PID with the namespaced PID.
  3. When a bulk request specifies a service client ID (which is equivalent to a PID in the VCHIQ code), we replace the namespaced PID with a global PID.
  4. When instance information is dumped from /dev/vchiq, the code now does not dump data on instances with PIDs with a different namespace and shows the namespaced PID for each instance, this avoids collisions and confusing information for userland.
  5. One fly in the ointment, which is an issue even if no code is changed, is that the debugfs entries at /sys/kernel/debug/vchiq/clients/<pid> are added with global PIDs rather than namespaced PIDs. If we replace them with namespaced PIDs we can have collisions, if we don't namespaced userland won't know what to do with these. For the time being we've left this, as leaving it as-is sidesteps collisions. In theory it should be possible to change the layout of this directory for different PID namespaces.

With these changes, all internal kernel data structures and messages sent to the GPU reference global PID, no collisions are possible and everything on that side works as if there wasn't a namespacing restraint at all, while as far as userland is concerned it is receiving messages with valid namespaced PIDs.

Spelunking

Currently, our proof-of-concept code is dirtily spelunking into void * arrays and taking a guess at where to change to values, which I should point out is clearly not what we are proposing long-term here. Additionally there are probably some details in this code which are flakey/wrong, at the moment it's a really rough experiment (ok enough caveats :)

However, I am sure that by using what we know about the data structures passed as messages we can correctly identify what needs to be changed and when without needing to guess anything. Advice on this would be useful too if you feel the approach isn't completely insane.

We'd like your input on this before we proceed further in case we are going down the wrong path, you have major objections to the approach or you have some input on this. If you feel another route should be taken do let us know, as we are eager to enable EGL applications (and VCHIQ clients in general) to work correctly inside containers (we = resin.io, who rather love containers :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions