Skip to content
This repository was archived by the owner on Nov 7, 2019. It is now read-only.

Conversation

cwill
Copy link

@cwill cwill commented Sep 28, 2016

Reviewed by: Matt Ahrens mahrens@delphix.com ( @ahrens )
Reviewed by: George Wilson george.wilson@delphix.com ( @grwilson )
Reviewed by: John Kennedy john.kennedy@delphix.com ( @jwk404 )
Reviewed by: Chris Williamson chris.williamson@delphix.com ( @cwill )

ZFS channel programs (ZCP) adds support for performing compound ZFS administrative actions via Lua scripts in a sandboxed environment with time and memory limits.

Upstream bugs: DLPX-39221, DLPX-40120, DLPX-44957, DLPX-46641, DLPX-46247, DLPX-46672, DLPX-47073

@zettabot
Copy link

Can one of the admins verify this patch?

@ikozhukhov
Copy link

ikozhukhov commented Sep 28, 2016

question - if i have lua5.2 as userland package - can i use it for builds? or you import all lua5.2 sources to uts?
i'm not sure that uts with fs/zfs is good place for lua sources

@ghost
Copy link

ghost commented Sep 28, 2016

Agree with Igor, probably we should do the same as we did with ficl for loader project - install the binaries with -sys postifx.

@cwill
Copy link
Author

cwill commented Sep 30, 2016

It was necessary to slightly modify the base lua 5.2.4 interpreter for a couple reasons:

  • the need to disable a number of printing, file io, and pcall functions
  • error handling changes to allow channel programs to return errors rather than panicking the kernel
  • limited kernel stack space
  • math compatibility functions since we've changed the number representation from long double to int64_t
  • a handful of inconsistencies in expected standard library signatures

From looking at the configuration options for the packaged Lua interpreter, unfortunately I don't think we'd be able to just use binaries from a userland package with these modifications. As such we've added the full source.

I'm open to suggestions if there's a better home for the interpreter code than uts/common/fs/zfs/lua, though.

For reference, here's the diff between the stock Lua 5.2.4 interpreter and the modified one we've included:
https://gist.github.com/cwill/9b71422008c8c08ff091faefdcc0bc42

@cwill cwill force-pushed the zcp-upstream branch 3 times, most recently from 615a8d5 to f646606 Compare September 30, 2016 19:23
@ikozhukhov
Copy link

i think, based on info that current modified lua is part of kernel modules builds, will be better put sources to:
uts/common/lang/lua/* - where we can save original sources structure for next updates.

-Igor

On Sep 30, 2016, at 10:16 PM, Chris Williamson notifications@github.com wrote:

It was necessary to slightly modify the base lua 5.2.4 interpreter for a couple reasons:

the need to disable a number of printing, file io, and pcall functions
error handling changes to allow channel programs to return errors rather than panicking the kernel
limited kernel stack space
math compatibility functions since we've changed the number representation from long double to int64_t
a handful of inconsistencies in expected standard library signatures
From looking at the configuration options for the packaged Lua interpreter, unfortunately I don't think we'd be able to just use binaries from a userland package. As such we've added the full source.

I'm open to suggestions if there's a better home for the interpreter code than uts/common/fs/zfs/lua, though.

For reference, here's the diff between the stock Lua 5.2.4 interpreter and the modified one we've included:
https://gist.github.com/cwill/9b71422008c8c08ff091faefdcc0bc42 https://gist.github.com/cwill/9b71422008c8c08ff091faefdcc0bc42

You are receiving this because you commented.
Reply to this email directly, view it on GitHub #198 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AA5Gk15JDTKJVautHBZus9cVgrrS-INSks5qvWAfgaJpZM4KISTa.

@jeffpc
Copy link

jeffpc commented Oct 2, 2016

Doesn't this sort of violate the illumos-gate rule of (in general) not introducing APIs without consumers? I haven't seen this year's presentation about channel programs, but based on the previously presented info, one of the goals motivating this work was to allow simpler interactions between userspace tools and kernel. To that end, I think it would make sense to attack some of the (uglier) parts of libzpool/libzfs/libzfs_core and have them make use of this API instead of the current mess of C. This would not only take care of the unused-API aspect, but it would also provide a set of excellent examples how to use the API. (Yes, I realize that the tests provide minimal examples, but it'd be nice to see real world ones as well.)

@ahrens
Copy link
Member

ahrens commented Oct 2, 2016

We would like to make libzfs use channel programs where possible. We may be
able to do that more with future functionality, like getting and setting
properties.

However, this API is not without consumers. The snapshot deletion ioctl
uses a channel program. And, you can run a channel program from the cli
(with "zfs program"), so in that respect it is similar to other zfs
functionality, with a libzfs(_core) API, a cli subcommand, and tests which
exercise the sub command.

That said, we are concerned with adding any new functionality that may see
minimal use for most consumers, which is why we've included copious new
tests.

--matt

On Sunday, October 2, 2016, jeffpc notifications@github.com wrote:

Doesn't this sort of violate the illumos-gate rule of (in general) not
introducing APIs without consumers? I haven't seen this year's presentation
about channel programs, but based on the previously presented info, one of
the goals motivating this work was to allow simpler interactions between
userspace tools and kernel. To that end, I think it would make sense to
attack some of the (uglier) parts of libzpool/libzfs/libzfs_core and have
them make use of this API instead of the current mess of C. This would not
only take care of the unused-API aspect, but it would also provide a set of
excellent examples how to use the API. (Yes, I realize that the tests
provide minimal examples, but it'd be nice to see real world ones as well.)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#198 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAwxlNzlfJaDwOyOtjv0BiqehPeGAkDIks5qv8g3gaJpZM4KISTa
.

@jclulow
Copy link

jclulow commented Oct 2, 2016

The manual page suggests that "Channel programs may only be run with root privileges". Would it be better to introduce a specific privileges(5) privilege for this functionality?

How does this mechanism interact with delegated datasets? If we're expecting to re-do existing C-based functionality in terms of channel programs, that seems like an important consideration.

I took a look in usr/src/uts/common/fs/zfs/zcp.c and I couldn't see a design document (e.g., big theory statement comment) that really describes the rationale, or how this subsystem is expected to work or be used. If I've just missed where this is included, then I apologise!

Have you given any thought to the potentially huge new attack surface that a turing complete interpreted language adds to the kernel? At least with the traditional ioctl(2)-based interfaces, the arguments being passed to the kernel were essentially declarative -- making it somewhat easier to fuzz that interface and to reason about upper bounds on resource usage.

The manual page suggests that if a program runs longer than its timeout allows, it will be "stopped and an error will be returned". What happens to the actions that the program was able to complete before termination? Is everything rolled back?

If a program runs for 10 seconds, what impact does that have on ZFS I/O activity for the pool in question? Does it mean that, say, all synchronous writes (e.g. fsync(2) or open(2) with O_SYNC) will block until that program completes?

@cwill
Copy link
Author

cwill commented Oct 12, 2016

The manual page suggests that "Channel programs may only be run with root privileges". Would it be better to introduce a specific privileges(5) privilege for this functionality?

The channel program ioctl currently uses zfs_secpolicy_config, so it has the same access as e.g. create. Given that currently we only allow a channel program to be run as root in the global zone, we haven't looked at much in the way of privileges for specific operations or datasets. Was there anything in particular you had in mind for how the privileges for a channel program would usefully differ from similar ioctls?

How does this mechanism interact with delegated datasets? If we're expecting to re-do existing C-based functionality in terms of channel programs, that seems like an important consideration.

Nothing in that area right now, but on the slate as a future feature.

I took a look in usr/src/uts/common/fs/zfs/zcp.c and I couldn't see a design document (e.g., big theory statement comment) that really describes the rationale, or how this subsystem is expected to work or be used. If I've just missed where this is included, then I apologise!

The first half of the zfs-program man page is intended to cover the high level overview of the general rationale and expected usage. More implementation-specific info is somewhat scattered through the comments in the source files, but there's not a summarizing header comment anywhere. Might be something to add.

The manual page suggests that if a program runs longer than its timeout allows, it will be "stopped and an error will be returned". What happens to the actions that the program was able to complete before termination? Is everything rolled back?

Nope, anything executed before an error in the channel program stays executed. We've taken measures to prevent this causing problems - each effectful library function has a corresponding dry-run check function which can be used to make sure in advance that an operation will succeed. It's not foolproof (since the script could change system state between the checkfunc and syncfunc), but it's generally possible to specify whatever error handling behavior you want within the script itself.

If a program runs for 10 seconds, what impact does that have on ZFS I/O activity for the pool in question? Does it mean that, say, all synchronous writes (e.g. fsync(2) or open(2) with O_SYNC) will block until that program completes?

We block only the transaction group syncing thread with the channel program execution. A very long-running channel program executed repeatedly could cause sync writes to get throttled, but this only really happens when e.g. destroying 10,000 snapshots, which has the same effect anyway. (in practice, using channel programs tends to have the effect of making these bulky operations complete much faster and let the system move on rather than taking up time in every txg sync).

@ahrens
Copy link
Member

ahrens commented Oct 12, 2016

The channel program ioctl currently uses zfs_secpolicy_config,

Which checks for the SYS_CONFIG privilege.

so it has the same access as e.g. create.

Specifically, zpool create, and pretty much all other zpool subcommands (e.g. zpool add, zpool scrub). (zfs create checks SYS_MOUNT and can also be delegated with zfs allow.)

Given that currently we only allow a channel program to be run as root in the global zone

More precisely, we only allow a channel program to be run with the SYS_CONFIG privilege in the global zone.

@ahrens
Copy link
Member

ahrens commented Oct 14, 2016

@zettabot go

@cwill
Copy link
Author

cwill commented Oct 21, 2016

Looks like there was a commit since these tests were written that changed how certain un-set properties behave, causing a few failures.

Wrote up a fix, verifying it now.

@ahrens
Copy link
Member

ahrens commented Oct 25, 2016

@zettabot go

@jclulow
Copy link

jclulow commented Nov 7, 2016

Circling back around to this one. Sorry it's taken so long -- I've been outrageously busy the last month or so.

More implementation-specific info is somewhat scattered through the comments in the source files, but there's not a summarizing header comment anywhere. Might be something to add.

It'd be really great if we could get a big theory statement comment as part of this integration.

but it's generally possible to specify whatever error handling behavior you want within the script itself.

It doesn't seem like the script could, itself, gracefully handle the failure mode of running longer than 10 seconds?

I'm still a bit uncertain about how to reason about the correctness of an arbitrary channel program in the face of a wall clock expiry time. What if this channel program is running within a virtual machine, and the virtual CPU upon which the channel program is executing is, itself, not scheduled for 9.99 of those 10 seconds? What if this consistently happens, in a heavily overloaded hypervisor environment?

I don't see anything addressing one of my questions from earlier:

Have you given any thought to the potentially huge new attack surface that a turing complete interpreted language adds to the kernel? At least with the traditional ioctl(2)-based interfaces, the arguments being passed to the kernel were essentially declarative -- making it somewhat easier to fuzz that interface and to reason about upper bounds on resource usage.

I have also thought of a few more questions:

  • If we're going to be writing Lua programs as embedded char *-style strings, what kind of static analysis and correctness checking can be part of the build process? Some kind of Lua lint analogue, perhaps? Should the programs be written in actual *.lua files, to then be included within the eventual object via elfwrap(1), rather than embedded in C strings?
  • How is it expected that an engineer will debug these channel programs? Several different cases come to mind:
    • A system has panicked due to a some fault, or due to the discretion of an operator (e.g., via NMI). What tools exist to locate and inspect the Lua interpreter state within the resulting crash dump? Is there something we could include in an mdb/kmdb module that can unpick the state of a channel program either on the live system or post-mortem?
    • An engineer is trying to measure the run-time of particular operations within a channel program on a live system. Are there DTrace probes we can add to the Lua interpreter, or elsewhere, that would make it possible to collect precise timing information? Or to trace return codes from particular Lua functions?

@cwill
Copy link
Author

cwill commented Nov 7, 2016

It doesn't seem like the script could, itself, gracefully handle the failure mode of running longer than 10 seconds?

I'm still a bit uncertain about how to reason about the correctness of an arbitrary channel program in the face of a wall clock expiry time. What if this channel program is running within a virtual machine, and the virtual CPU upon which the channel program is executing is, itself, not scheduled for 9.99 of those 10 seconds? What if this consistently happens, in a heavily overloaded hypervisor environment?

You're right - a channel program can handle errors returned from ZFS but can't specify anything about what happens if there's an error with the script itself. In the case of timeouts, we have 2 ways in place of ensuring this won't cripple anything important:

  1. When invoked from kernel code (i.e. directly calling zcp_eval()), an unlimited time limit can be specified, with the expectation that such a script is trusted as part of the kernel will not use more time than it needs. dsl_destroy_snapshots_nvl() currently uses this.
  2. The maximum time and memory limits are tunables, so in practice if a user application is running up against a time limit it can be increased.

Hopefully, this should cover any use cases that are either larger scale or otherwise not tolerant of timeout failures.

Have you given any thought to the potentially huge new attack surface that a turing complete interpreted language adds to the kernel? At least with the traditional ioctl(2)-based interfaces, the arguments being passed to the kernel were essentially declarative -- making it somewhat easier to fuzz that interface and to reason about upper bounds on resource usage.

As far as security goes, we're somewhat leaning on the fact that we require root in the global zone to run a channel program script, given that with those privileges anything a malicious script could do could also easily be accomplished with mdb.

With respect to resource usage and testing, this is an area for future work I've been looking into. The limits we have in place so far seem to do a pretty good job of preventing a ZCP script from tanking performance, but I've been considering adding some randomized testing for channel program scripts, possibly to ztest.

If we're going to be writing Lua programs as embedded char *-style strings, what kind of static analysis and correctness checking can be part of the build process? Some kind of Lua lint analogue, perhaps? Should the programs be written in actual *.lua files, to then be included within the eventual object via elfwrap(1), rather than embedded in C strings?

Lua linting and checkers exist, but from what I've seen are pretty limited in usefulness - they're able to check for {undeclared, multiply-declared, attempting to change constant} variables, and not much else. It would probably be possible to move the scripts themselves out to their own files, but I'm not sure this would give any advantage over having the script with the related C code, given the lack of good static analysis.

How is it expected that an engineer will debug these channel programs? Several different cases come to mind:

I'm not sure how much additional info would be able to be gathered with the help of an mdb module. As it stands, the lua interpreter's state structure gives a reasonably good idea of what a running script is doing, so it's definitely possible at present to diagnose a crashed channel program, if finnicky.

Function entry/exit probes in the zcp code and lua interpreter have proved generally sufficient so far for live debugging, but I suspect there may be a number places where adding static probes could be quite helpful. This would be a good addition, though I think it's minor enough to be added gradually and/or with future changes.

@ahrens
Copy link
Member

ahrens commented Nov 17, 2016

@zettabot go

@ahrens
Copy link
Member

ahrens commented Dec 8, 2016

@zettabot go

@dankimmel
Copy link

Closing this as it has been updated and reposted as #397.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants