7431 ZFS Channel Programs #198

cwill · 2016-09-28T00:07:42Z

Reviewed by: Matt Ahrens mahrens@delphix.com ( @ahrens )
Reviewed by: George Wilson george.wilson@delphix.com ( @grwilson )
Reviewed by: John Kennedy john.kennedy@delphix.com ( @jwk404 )
Reviewed by: Chris Williamson chris.williamson@delphix.com ( @cwill )

ZFS channel programs (ZCP) adds support for performing compound ZFS administrative actions via Lua scripts in a sandboxed environment with time and memory limits.

Upstream bugs: DLPX-39221, DLPX-40120, DLPX-44957, DLPX-46641, DLPX-46247, DLPX-46672, DLPX-47073

zettabot · 2016-09-28T00:09:11Z

Can one of the admins verify this patch?

ikozhukhov · 2016-09-28T00:14:56Z

question - if i have lua5.2 as userland package - can i use it for builds? or you import all lua5.2 sources to uts?
i'm not sure that uts with fs/zfs is good place for lua sources

ghost · 2016-09-28T00:18:06Z

Agree with Igor, probably we should do the same as we did with ficl for loader project - install the binaries with -sys postifx.

cwill · 2016-09-30T19:16:45Z

It was necessary to slightly modify the base lua 5.2.4 interpreter for a couple reasons:

the need to disable a number of printing, file io, and pcall functions
error handling changes to allow channel programs to return errors rather than panicking the kernel
limited kernel stack space
math compatibility functions since we've changed the number representation from long double to int64_t
a handful of inconsistencies in expected standard library signatures

From looking at the configuration options for the packaged Lua interpreter, unfortunately I don't think we'd be able to just use binaries from a userland package with these modifications. As such we've added the full source.

I'm open to suggestions if there's a better home for the interpreter code than uts/common/fs/zfs/lua, though.

For reference, here's the diff between the stock Lua 5.2.4 interpreter and the modified one we've included:
https://gist.github.com/cwill/9b71422008c8c08ff091faefdcc0bc42

ikozhukhov · 2016-09-30T19:48:12Z

i think, based on info that current modified lua is part of kernel modules builds, will be better put sources to:
uts/common/lang/lua/* - where we can save original sources structure for next updates.

-Igor

On Sep 30, 2016, at 10:16 PM, Chris Williamson notifications@github.com wrote:

It was necessary to slightly modify the base lua 5.2.4 interpreter for a couple reasons:

the need to disable a number of printing, file io, and pcall functions
error handling changes to allow channel programs to return errors rather than panicking the kernel
limited kernel stack space
math compatibility functions since we've changed the number representation from long double to int64_t
a handful of inconsistencies in expected standard library signatures
From looking at the configuration options for the packaged Lua interpreter, unfortunately I don't think we'd be able to just use binaries from a userland package. As such we've added the full source.

I'm open to suggestions if there's a better home for the interpreter code than uts/common/fs/zfs/lua, though.

For reference, here's the diff between the stock Lua 5.2.4 interpreter and the modified one we've included:
https://gist.github.com/cwill/9b71422008c8c08ff091faefdcc0bc42 https://gist.github.com/cwill/9b71422008c8c08ff091faefdcc0bc42
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub #198 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AA5Gk15JDTKJVautHBZus9cVgrrS-INSks5qvWAfgaJpZM4KISTa.

jeffpc · 2016-10-02T15:05:27Z

Doesn't this sort of violate the illumos-gate rule of (in general) not introducing APIs without consumers? I haven't seen this year's presentation about channel programs, but based on the previously presented info, one of the goals motivating this work was to allow simpler interactions between userspace tools and kernel. To that end, I think it would make sense to attack some of the (uglier) parts of libzpool/libzfs/libzfs_core and have them make use of this API instead of the current mess of C. This would not only take care of the unused-API aspect, but it would also provide a set of excellent examples how to use the API. (Yes, I realize that the tests provide minimal examples, but it'd be nice to see real world ones as well.)

ahrens · 2016-10-02T17:37:02Z

We would like to make libzfs use channel programs where possible. We may be
able to do that more with future functionality, like getting and setting
properties.

However, this API is not without consumers. The snapshot deletion ioctl
uses a channel program. And, you can run a channel program from the cli
(with "zfs program"), so in that respect it is similar to other zfs
functionality, with a libzfs(_core) API, a cli subcommand, and tests which
exercise the sub command.

That said, we are concerned with adding any new functionality that may see
minimal use for most consumers, which is why we've included copious new
tests.

--matt

On Sunday, October 2, 2016, jeffpc notifications@github.com wrote:

Doesn't this sort of violate the illumos-gate rule of (in general) not
introducing APIs without consumers? I haven't seen this year's presentation
about channel programs, but based on the previously presented info, one of
the goals motivating this work was to allow simpler interactions between
userspace tools and kernel. To that end, I think it would make sense to
attack some of the (uglier) parts of libzpool/libzfs/libzfs_core and have
them make use of this API instead of the current mess of C. This would not
only take care of the unused-API aspect, but it would also provide a set of
excellent examples how to use the API. (Yes, I realize that the tests
provide minimal examples, but it'd be nice to see real world ones as well.)

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#198 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAwxlNzlfJaDwOyOtjv0BiqehPeGAkDIks5qv8g3gaJpZM4KISTa
.

jclulow · 2016-10-02T19:23:55Z

The manual page suggests that "Channel programs may only be run with root privileges". Would it be better to introduce a specific privileges(5) privilege for this functionality?

How does this mechanism interact with delegated datasets? If we're expecting to re-do existing C-based functionality in terms of channel programs, that seems like an important consideration.

I took a look in usr/src/uts/common/fs/zfs/zcp.c and I couldn't see a design document (e.g., big theory statement comment) that really describes the rationale, or how this subsystem is expected to work or be used. If I've just missed where this is included, then I apologise!

Have you given any thought to the potentially huge new attack surface that a turing complete interpreted language adds to the kernel? At least with the traditional ioctl(2)-based interfaces, the arguments being passed to the kernel were essentially declarative -- making it somewhat easier to fuzz that interface and to reason about upper bounds on resource usage.

The manual page suggests that if a program runs longer than its timeout allows, it will be "stopped and an error will be returned". What happens to the actions that the program was able to complete before termination? Is everything rolled back?

If a program runs for 10 seconds, what impact does that have on ZFS I/O activity for the pool in question? Does it mean that, say, all synchronous writes (e.g. fsync(2) or open(2) with O_SYNC) will block until that program completes?

cwill · 2016-10-12T00:47:46Z

The manual page suggests that "Channel programs may only be run with root privileges". Would it be better to introduce a specific privileges(5) privilege for this functionality?

The channel program ioctl currently uses zfs_secpolicy_config, so it has the same access as e.g. create. Given that currently we only allow a channel program to be run as root in the global zone, we haven't looked at much in the way of privileges for specific operations or datasets. Was there anything in particular you had in mind for how the privileges for a channel program would usefully differ from similar ioctls?

How does this mechanism interact with delegated datasets? If we're expecting to re-do existing C-based functionality in terms of channel programs, that seems like an important consideration.

Nothing in that area right now, but on the slate as a future feature.

I took a look in usr/src/uts/common/fs/zfs/zcp.c and I couldn't see a design document (e.g., big theory statement comment) that really describes the rationale, or how this subsystem is expected to work or be used. If I've just missed where this is included, then I apologise!

The first half of the zfs-program man page is intended to cover the high level overview of the general rationale and expected usage. More implementation-specific info is somewhat scattered through the comments in the source files, but there's not a summarizing header comment anywhere. Might be something to add.

The manual page suggests that if a program runs longer than its timeout allows, it will be "stopped and an error will be returned". What happens to the actions that the program was able to complete before termination? Is everything rolled back?

Nope, anything executed before an error in the channel program stays executed. We've taken measures to prevent this causing problems - each effectful library function has a corresponding dry-run check function which can be used to make sure in advance that an operation will succeed. It's not foolproof (since the script could change system state between the checkfunc and syncfunc), but it's generally possible to specify whatever error handling behavior you want within the script itself.

If a program runs for 10 seconds, what impact does that have on ZFS I/O activity for the pool in question? Does it mean that, say, all synchronous writes (e.g. fsync(2) or open(2) with O_SYNC) will block until that program completes?

We block only the transaction group syncing thread with the channel program execution. A very long-running channel program executed repeatedly could cause sync writes to get throttled, but this only really happens when e.g. destroying 10,000 snapshots, which has the same effect anyway. (in practice, using channel programs tends to have the effect of making these bulky operations complete much faster and let the system move on rather than taking up time in every txg sync).

ahrens · 2016-10-12T03:37:11Z

The channel program ioctl currently uses zfs_secpolicy_config,

Which checks for the SYS_CONFIG privilege.

so it has the same access as e.g. create.

Specifically, zpool create, and pretty much all other zpool subcommands (e.g. zpool add, zpool scrub). (zfs create checks SYS_MOUNT and can also be delegated with zfs allow.)

Given that currently we only allow a channel program to be run as root in the global zone

More precisely, we only allow a channel program to be run with the SYS_CONFIG privilege in the global zone.

ahrens · 2016-10-14T03:20:37Z

@zettabot go

cwill · 2016-10-21T23:52:12Z

Looks like there was a commit since these tests were written that changed how certain un-set properties behave, causing a few failures.

Wrote up a fix, verifying it now.

ahrens · 2016-10-25T18:01:51Z

@zettabot go

jclulow · 2016-11-07T07:32:52Z

Circling back around to this one. Sorry it's taken so long -- I've been outrageously busy the last month or so.

More implementation-specific info is somewhat scattered through the comments in the source files, but there's not a summarizing header comment anywhere. Might be something to add.

It'd be really great if we could get a big theory statement comment as part of this integration.

but it's generally possible to specify whatever error handling behavior you want within the script itself.

It doesn't seem like the script could, itself, gracefully handle the failure mode of running longer than 10 seconds?

I'm still a bit uncertain about how to reason about the correctness of an arbitrary channel program in the face of a wall clock expiry time. What if this channel program is running within a virtual machine, and the virtual CPU upon which the channel program is executing is, itself, not scheduled for 9.99 of those 10 seconds? What if this consistently happens, in a heavily overloaded hypervisor environment?

I don't see anything addressing one of my questions from earlier:

Have you given any thought to the potentially huge new attack surface that a turing complete interpreted language adds to the kernel? At least with the traditional ioctl(2)-based interfaces, the arguments being passed to the kernel were essentially declarative -- making it somewhat easier to fuzz that interface and to reason about upper bounds on resource usage.

I have also thought of a few more questions:

If we're going to be writing Lua programs as embedded char *-style strings, what kind of static analysis and correctness checking can be part of the build process? Some kind of Lua lint analogue, perhaps? Should the programs be written in actual *.lua files, to then be included within the eventual object via elfwrap(1), rather than embedded in C strings?
How is it expected that an engineer will debug these channel programs? Several different cases come to mind:
- A system has panicked due to a some fault, or due to the discretion of an operator (e.g., via NMI). What tools exist to locate and inspect the Lua interpreter state within the resulting crash dump? Is there something we could include in an mdb/kmdb module that can unpick the state of a channel program either on the live system or post-mortem?
- An engineer is trying to measure the run-time of particular operations within a channel program on a live system. Are there DTrace probes we can add to the Lua interpreter, or elsewhere, that would make it possible to collect precise timing information? Or to trace return codes from particular Lua functions?

cwill · 2016-11-07T21:19:27Z

It doesn't seem like the script could, itself, gracefully handle the failure mode of running longer than 10 seconds?

I'm still a bit uncertain about how to reason about the correctness of an arbitrary channel program in the face of a wall clock expiry time. What if this channel program is running within a virtual machine, and the virtual CPU upon which the channel program is executing is, itself, not scheduled for 9.99 of those 10 seconds? What if this consistently happens, in a heavily overloaded hypervisor environment?

You're right - a channel program can handle errors returned from ZFS but can't specify anything about what happens if there's an error with the script itself. In the case of timeouts, we have 2 ways in place of ensuring this won't cripple anything important:

When invoked from kernel code (i.e. directly calling zcp_eval()), an unlimited time limit can be specified, with the expectation that such a script is trusted as part of the kernel will not use more time than it needs. dsl_destroy_snapshots_nvl() currently uses this.
The maximum time and memory limits are tunables, so in practice if a user application is running up against a time limit it can be increased.

Hopefully, this should cover any use cases that are either larger scale or otherwise not tolerant of timeout failures.

Have you given any thought to the potentially huge new attack surface that a turing complete interpreted language adds to the kernel? At least with the traditional ioctl(2)-based interfaces, the arguments being passed to the kernel were essentially declarative -- making it somewhat easier to fuzz that interface and to reason about upper bounds on resource usage.

As far as security goes, we're somewhat leaning on the fact that we require root in the global zone to run a channel program script, given that with those privileges anything a malicious script could do could also easily be accomplished with mdb.

With respect to resource usage and testing, this is an area for future work I've been looking into. The limits we have in place so far seem to do a pretty good job of preventing a ZCP script from tanking performance, but I've been considering adding some randomized testing for channel program scripts, possibly to ztest.

If we're going to be writing Lua programs as embedded char *-style strings, what kind of static analysis and correctness checking can be part of the build process? Some kind of Lua lint analogue, perhaps? Should the programs be written in actual *.lua files, to then be included within the eventual object via elfwrap(1), rather than embedded in C strings?

Lua linting and checkers exist, but from what I've seen are pretty limited in usefulness - they're able to check for {undeclared, multiply-declared, attempting to change constant} variables, and not much else. It would probably be possible to move the scripts themselves out to their own files, but I'm not sure this would give any advantage over having the script with the related C code, given the lack of good static analysis.

How is it expected that an engineer will debug these channel programs? Several different cases come to mind:

I'm not sure how much additional info would be able to be gathered with the help of an mdb module. As it stands, the lua interpreter's state structure gives a reasonably good idea of what a running script is doing, so it's definitely possible at present to diagnose a crashed channel program, if finnicky.

Function entry/exit probes in the zcp code and lua interpreter have proved generally sufficient so far for live debugging, but I suspect there may be a number places where adding static probes could be quite helpful. This would be a good addition, though I think it's minor enough to be added gradually and/or with future changes.

ahrens · 2016-11-17T18:40:25Z

@zettabot go

ahrens · 2016-12-08T17:58:18Z

@zettabot go

…ully

dankimmel · 2017-06-13T20:28:19Z

Closing this as it has been updated and reposted as #397.

cwill force-pushed the zcp-upstream branch 3 times, most recently from 615a8d5 to f646606 Compare September 30, 2016 19:23

lundman mentioned this pull request Oct 6, 2016

ZFS Channel Programs openzfsonosx/zfs#535

Closed

cwill force-pushed the zcp-upstream branch from f646606 to 5099cd5 Compare October 25, 2016 17:42

ahrens mentioned this pull request Nov 4, 2016

7200 7199 dsl_dataset_rollback_sync should not leave a dirty dataset #215

Closed

cwill force-pushed the zcp-upstream branch from 1966754 to d60ae35 Compare November 8, 2016 21:03

ahrens added the test pass label Nov 29, 2016

cwill force-pushed the zcp-upstream branch from d60ae35 to b63dd0e Compare November 30, 2016 01:01

cwill and others added 4 commits January 3, 2017 16:41

7431 ZFS Channel Programs

8f4db7a

DLPX-47966 channel programs don't handle oversized allocations gracef…

a10a17a

…ully

DLPX-47893 memory leak in zcp_synctask_promote()

7191656

DLPX-48332 zcp_synctask_wrapper() leaks error list on IO error

3ff8d7e

cwill added 2 commits January 3, 2017 16:41

Add header comment

9f5c4f5

add tee to test commands file

4a69daa

cwill force-pushed the zcp-upstream branch from b63dd0e to 4a69daa Compare January 4, 2017 00:44

ahrens mentioned this pull request Jan 31, 2017

OpenZFS 7247 - zfs receive of deduplicated stream fails openzfs/zfs#5689

Merged

dankimmel mentioned this pull request Jun 12, 2017

7431 ZFS Channel Programs [updated] #397

Closed

dankimmel closed this Jun 13, 2017

don-brady mentioned this pull request Aug 25, 2017

OpenZFS 7431 - ZFS Channel Programs openzfs/zfs#6558

Closed

13 tasks

7431 ZFS Channel Programs #198

7431 ZFS Channel Programs #198

Uh oh!

Conversation

cwill commented Sep 28, 2016 • edited by dankimmel Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zettabot commented Sep 28, 2016

Uh oh!

ikozhukhov commented Sep 28, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Sep 28, 2016

Uh oh!

cwill commented Sep 30, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ikozhukhov commented Sep 30, 2016

Uh oh!

jeffpc commented Oct 2, 2016

Uh oh!

ahrens commented Oct 2, 2016

Uh oh!

jclulow commented Oct 2, 2016

Uh oh!

cwill commented Oct 12, 2016

Uh oh!

ahrens commented Oct 12, 2016

Uh oh!

ahrens commented Oct 14, 2016

Uh oh!

cwill commented Oct 21, 2016

Uh oh!

ahrens commented Oct 25, 2016

Uh oh!

jclulow commented Nov 7, 2016

Uh oh!

cwill commented Nov 7, 2016

Uh oh!

ahrens commented Nov 17, 2016

Uh oh!

ahrens commented Dec 8, 2016

Uh oh!

dankimmel commented Jun 13, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

cwill commented Sep 28, 2016 •

edited by dankimmel

Loading

ikozhukhov commented Sep 28, 2016 •

edited

Loading

cwill commented Sep 30, 2016 •

edited

Loading