forked from gregkh/kdbus
-
Notifications
You must be signed in to change notification settings - Fork 0
/
kdbus.txt
333 lines (303 loc) · 19.9 KB
/
kdbus.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
D-Bus is a system for low-latency, low-overhead, easy to use interprocess
communication (IPC).
The focus of this document is an overview of the low-level, native kernel D-Bus
transport called kdbus. Kdbus in the kernel acts similar to a device driver,
all communication between processes take place over special device nodes in
/dev/kdbus/.
For the general D-Bus protocol specification, the payload format, the
marshaling, the communication semantics, please refer to:
http://dbus.freedesktop.org/doc/dbus-specification.html
For a kdbus specific userspace library implementation please refer to:
http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-bus.h
http://cgit.freedesktop.org/systemd/systemd/tree/src/systemd/sd-memfd.h
===============================================================================
Terminology
===============================================================================
Namespace:
A namespace is a named object containing a number of buses. A system
container which contains its own init system and users usually also
runs in its own kdbus namespace. The /dev/kdbus/ns/<container-name>/
directory shows up inside the namespace as /dev/kdbus/. Every namespace
offers a "control" device node to create new buses or namespaces.
Namespaces have no connection to each other, cannot see or talk to
each other. Only from the initial namespace, given the process has the
needed access rights, the device nodes inside of other namespaces
can be seen.
Bus:
A bus is a named object inside a namespace. Clients exchange messages
over a bus. Multiple buses themselves have no connection to each other,
messages are only exchanged on the same bus. The default entry point to a
bus, where clients establish the connection to, is the "bus" device node
/dev/kdbus/<bus name>/bus.
Common operating system setups create one "system bus" per system, and one
"user bus" for every logged-in user. Applications or services can create
their own private named buses if they want to.
Endpoint:
An endpoint provides the device node to talk to a bus. Every bus has
a default endpoint called "bus". A bus can offer additional endpoints
with custom names to provide a restricted access to the same bus. Custom
endpoints can carry additional policy which can be used to give sandboxed
processes only a locked-down, limited, filtered access to a bus.
Connection:
A connection to a bus is created by opening an endpoint device node of
a bus, and becoming an active client with the HELLO exchange. Every
connected client connection has a unique identifier on the bus, and can
address messages to every other connection on the same bus by using
the peer's connection id as the destination.
Well-known Names:
A connection can, in addition to its implicit unique connection id, request
the ownership of a textual well-known name. Well-known names are noted
in reverse-domain notation like com.example.service. Connections offering
a service on a bus are usually reached by its well-known name. The analogy
of connection id and well-known name is an IP address and a DNS name
associated with that address.
===============================================================================
Device Node Layout
===============================================================================
/sys/bus/kdbus
`-- devices
|-- kdbus!0-system!bus -> ../../../devices/virtual/kdbus/kdbus!0-system!bus
|-- kdbus!2702-user!bus -> ../../../devices/virtual/kdbus/kdbus!2702-user!bus
|-- kdbus!2702-user!ep.app -> ../../../devices/virtual/kdbus/kdbus!2702-user!ep.app
`-- kdbus!control -> ../../../devices/kdbus!control
/dev/kdbus
|-- control
|-- 0-system
| |-- bus
| `-- ep.apache
|-- 1000-user
| `-- bus
|-- 2702-user
| |-- bus
| `-- ep.app
`-- ns
|-- fedoracontainer
| |-- control
| |-- 0-system
| | `-- bus
| `-- 1000-user
| `-- bus
`-- mydebiancontainer
|-- control
`-- 0-system
`-- bus
Note:
The device node subdirectory layout is arranged that a future version of
kdbus could be implemented as a filesystem with a separate instance mounted
for each namespace. For any future changes, this always needs to be kept
in mind. Also the dependency on udev's userspace hookups or sysfs attribute
use should be limited for the same reason.
===============================================================================
Data Structures
===============================================================================
+-------------------------------------------------------------------------+
| Namespace (Init Namespace) |
| /dev/kdbus/control |
| +---------------------------------------------------------------------+ |
| | Bus (System Bus) | |
| | ./0-system/control | |
| | +-------------------------------+ +-------------------------------+ | |
| | | Endpoint | | Endpoint | | |
| | | ./bus | | ./ep.sandbox | | |
| | | +------------+ +------------+ | | +------------+ +------------+ | | |
| | | | Connection | | Connection | | | | Connection | | Connection | | | |
| | | | :1.22 | | :1.25 | | | | :1.55 | | :1:81 | | | |
| | | +------------+ +------------+ | | +------------+ +------------+ | | |
| | +-------------------------------+ +-------------------------------+ | |
| +---------------------------------------------------------------------+ |
| |
| +---------------------------------------------------------------------+ |
| | Bus (User Bus for UID 2702) | |
| | /dev/kdbus/2702-user/ | |
| | +-------------------------------+ +-------------------------------+ | |
| | | Endpoint | | Endpoint | | |
| | | /dev/kdbus/2702-user/bus | | /dev/kdbus/2702-user/ep.app | | |
| | | +------------+ +------------+ | | +------------+ +------------+ | | |
| | | | Connection | | Connection | | | | Connection | | Connection | | | |
| | | | :1.22 | | :1.25 | | | | :1.55 | | :1:81 | | | |
| | | +------------+ +------------+ | | +------------+ +------------+ | | |
| | +-------------------------------+ +-------------------------------+ | |
| +---------------------------------------------------------------------+ |
+-------------------------------------------------------------------------+
| Namespace (Container; inside it, fedoracontainer/ becomes /dev/kdbus/) |
| /dev/kdbus/ns/fedoracontainer/control |
| +---------------------------------------------------------------------+ |
| | Bus | |
| | ./0-system/ | |
| | +---------------------------------+ | |
| | | Endpoint | | |
| | | ./bus | | |
| | | +-------------+ +-------------+ | | |
| | | | Connection | | Connection | | | |
| | | | :1.22 | | :1.25 | | | |
| | | +-------------+ +-------------+ | | |
| | +---------------------------------+ | |
| +---------------------------------------------------------------------+ |
| |
| +---------------------------------------------------------------------+ |
| | Bus | |
| | /dev/kdbus/2702-user/ | |
| | +---------------------------------+ | |
| | | Endpoint | | |
| | | /dev/kdbus/2702-user/bus | | |
| | | +-------------+ +-------------+ | | |
| | | | Connection | | Connection | | | |
| | | | :1.22 | | :1.25 | | | |
| | | +-------------+ +-------------+ | | |
| | +---------------------------------+ | |
| +---------------------------------------------------------------------+ |
+-------------------------------------------------------------------------+
===============================================================================
Creation of new Namespaces and Buses
===============================================================================
The initial kdbus namespace is unconditionally created by the kernel module. A
namespace contains a "control" device node which allows to create a new bus or
namespace. New namespaces do not have any buses created by default.
Opening the control device node returns a file descriptor, it accepts the
ioctls KDBUS_CMD_BUS_MAKE/KDBUS_CMD_NS_MAKE which specify the name of the new
bus or namespace to create. The control file descriptor needs to be kept open
for the entire life-time of the created bus or namespace, closing it will
immediately cleanup the entire bus or namespace and all its associated
resources and connections. Every control file descriptor can only be used once
to create a new bus or namespace; from that point, it is not used for any
further communication until the final close().
===============================================================================
Connection IDs and Well-Known Connection Names
===============================================================================
Connections are identified by their connection id, internally implemented as a
uint64_t counter. The IDs of every newly created bus start at 1, and every new
connection will increment the counter by 1. The ids are not reused.
In higher level tools, the user visible representation of a connection is
defined by the D-Bus protocol specification as ":1.<id>".
Messages with a specific uint64_t destination id are directly delivered to
the connection with the corresponding id. Messages with the special destination
id 0xffffffffffffffff are broadcast messages and are potentially delivered
to all known connections on the bus; clients interested in broadcast messages
need to subscribe to the specific messages they are interested though, before
any broadcast message reaches them.
Messages synthesized and sent directly by the kernel, will carry the special
source id 0.
In addition to the unique uint64_t connection id, established connections can
request the ownership of well-known names, under which they can be found and
addressed by other bus clients. A well-known name is associated with one and
only one connection at a time.
Messages can specify the special destination id 0 and carry a well-known name
in the message data. Such a message is delivered to the destination connection
which owns that well-known name.
+-------------------------------------------------------------------------+
| +---------------+ +---------------------------+ |
| | Connection | | Message | -----------------+ |
| | :1.22 | --> | src: 22 | | |
| | | | dst: 25 | | |
| | | | | | |
| | | | | | |
| | | +---------------------------+ | |
| | | | |
| | | <--------------------------------------+ | |
| +---------------+ | | |
| | | |
| +---------------+ +---------------------------+ | | |
| | Connection | | Message | -----+ | |
| | :1.25 | --> | src: 25 | | |
| | | | dst: 0xffffffffffffffff | -------------+ | |
| | | | | | | |
| | | | | ---------+ | | |
| | | +---------------------------+ | | | |
| | | | | | |
| | | <--------------------------------------------------+ |
| +---------------+ | | |
| | | |
| +---------------+ +---------------------------+ | | |
| | Connection | | Message | --+ | | |
| | :1.55 | --> | src: 55 | | | | |
| | | | dst: 0 / org.foo.bar | | | | |
| | | | | | | | |
| | | | | | | | |
| | | +---------------------------+ | | | |
| | | | | | |
| | | <------------------------------------------+ | |
| +---------------+ | | |
| | | |
| +---------------+ | | |
| | Connection | | | |
| | :1.81 | | | |
| | org.foo.bar | | | |
| | | | | |
| | | | | |
| | | <-----------------------------------+ | |
| | | | |
| | | <----------------------------------------------+ |
| +---------------+ |
+-------------------------------------------------------------------------+
===============================================================================
Message Format, Content, Exchange
===============================================================================
Messages consist of fixed-size header followed directly by a list of
variable-sized data records. The overall message size is specified in the
header of the message. The chain of data records can contain well-defined
message metadata fields, raw data, references to data, or file descriptors.
Messages are passed to the kernel with the ioctl KDBUS_CMD_MSG_SEND. Depending
on the the destination address of the message, the kernel delivers the message
to the specific destination connection or to all connections on the same bus.
Messages are always queued in the destination connection.
Messages are received by the client with the ioctl KDBUS_CMD_MSG_RECV. The
endpoint device node of the bus supports poll() to wake up the receiving
process when new messages are queued up to be received.
+-------------------------------------------------------------------------+
| Message |
| +---------------------------------------------------------------------+ |
| | Header | |
| | size: overall message size, including the data records | |
| | destination: connection id of the receiver | |
| | source: connection id of the sender (set by kernel) | |
| | payload_type: "DBusVer1" textual identifier stored as uint64_t | |
| +---------------------------------------------------------------------+ |
| +---------------------------------------------------------------------+ |
| | Data Record | |
| | size: overall record size (without padding) | |
| | type: type of data | |
| | data: reference to data (address or file descriptor) | |
| +---------------------------------------------------------------------+ |
| +---------------------------------------------------------------------+ |
| | padding bytes to the next 8 byte alignment | |
| +---------------------------------------------------------------------+ |
| +---------------------------------------------------------------------+ |
| | Data Record | |
| | size: overall record size (without padding) | |
| | ... | |
| +---------------------------------------------------------------------+ |
| +---------------------------------------------------------------------+ |
| | padding bytes to the next 8 byte alignment | |
| +---------------------------------------------------------------------+ |
| +---------------------------------------------------------------------+ |
| | Data Record | |
| | size: overall record size | |
| | ... | |
| +---------------------------------------------------------------------+ |
| +---------------------------------------------------------------------+ |
| | padding bytes to the next 8 byte alignment | |
| +---------------------------------------------------------------------+ |
+-------------------------------------------------------------------------+
===============================================================================
Passing of Payload Data
===============================================================================
When connecting to the bus, receivers request a memory pool of a given size,
large enough to carry all backlog of data enqueued for the connection. The
pool is internally backed by a shared memory file which can be mmap()ed by
the receiver.
KDBUS_MSG_PAYLOAD_VEC:
Messages are directly copied by the sending process into the receiver's pool,
that way two peers can exchange data by effectively doing a single-copy from
one process to another, the kernel will not buffer the data anywhere else.
KDBUS_MSG_PAYLOAD_MEMFD:
Messages can reference kdbus_memfd special files which contain the data.
Kdbus_memfd files have special semantics, which allow the sealing of the
content of the file, sealing prevents all writable access to the file content.
Only sealed kdbus_memfd files are accepted as payload data, which enforces
reliable passing of data; the receiver can assume that the sender and nobody
else can alter the content after the message is sent.
Apart from the sender filling-in the content into the kdbus_memfd file, the
data will be passed as zero-copy from one process to another, read-only, shared
between the peers.
The sealing of a kdbus_memfd can be removed again by the sender or the
receiver, as soon as the kdbus_memfd is not shared anymore.