Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C coded generators (explodes, readfile, _popenr) #1005

Closed
wants to merge 6 commits into from

Conversation

nicowilliams
Copy link
Contributor

NOTE: NOT DONE YET.

This is just a proof of concept. I'm able to read files, read from commands.

TODO:

  • DONE: Implement @stedolan's comments regarding C-coded generators.
  • Make a nice jq-coded wrapper around _popenr
  • NOT NOW: Add an argument array input version of _popenr (an arg-vector form should use posix_spawn(), or spawn() on Windows, and bypass the shell)
  • Figure out how to handle raw vs. parsed vs. streamed, slurped vs. not.
    • A separate builtin for each case? (that's six cases, so six file reading builtins, six popen reading builtins, ...)
    • Or one that takes an input describing what's desired?
    • Or always read a line at a time, raw, and output strings with the newline (so that a final read without a newline could be denoted)? The caller could slurp and/or parse as desired. But what if the file has no newlines and is huge? Incremental parsing seems like a much better option.
  • Implement privilege management design discussed earlier
    • A command-line option for enabling privileges for specific modules
    • Unprivileged versions of readfile and _popenr that raise an error
  • And, of course, increase test coverage (tests are missing; it's early days)

More importantly, how to handle something like writefile(EXP) or popen(EXP)? Even C-coded generators can't call jq expressions. So that means having file handles. We can avoid those for readfile. We've discussed how to do filehandles in a way that doesn't leak handles out, doesn't allow callers to slip in an incorrect handle, and doesn't have a dangling reference problem: make the C-coded functions to open/close handles and read from / write to handles be available only to privileged jq-coded functions that will use them in such a way as to never leak them:

def writefile(exp):
    openfile_write as $handle | try (exp | filehandle_write) catch ($handle|filehandle_close), error;

((Food for though: Some day it would be nice to be able to C-coded functions that take closure arguments. I imagine setting up a trampoline in the VM for a C-coded function to call a closure argument, but to avoid filling the C call stack the C-coded function would have to immediately return and get called again with the closure's output(s); state would be kept in a state structure pointed to by the C-coded function's jq VM frame. Calling closures in separate VMs could also be done, but arranging for them to be able to access their captured variables would be tricky. With such an extension there'd be no need for handles for simple I/O builtins.))

We'll still need handles, and ones that are exposed to jq callers, for co-routines, so we might want to bite this bullet sooner rather than later. I have an old branch with file handle support for the jq VM that I could revive. Still, for simple I/O builtins, we don't need to expose a concept of handles, and that seems appealing.

@nicowilliams
Copy link
Contributor Author

And fix the Travis CI build, natch.

@nicowilliams
Copy link
Contributor Author

@svnpenn If you want to kick the tires on this, go for it.

$ TZ= ./jq -nr '"date"|_popenr'
Wed Oct 28 02:13:31 UTC 2015
$ ./jq -nr '"locale"|first(_popenr)'
LANG=en_US.UTF-8
$ 

@nicowilliams
Copy link
Contributor Author

The privilege model where we assign privileges to particular modules and/or the top-level program is not perfect.

Since jq function argument closures take no arguments, they can't be used to wrap around functions like writefile(EXP). We could add syntax and machinery for passing closures that take N>0 argument closures -- something I've wanted before -- but that seems like a lot of work.

A better privilege model would allow us to have modules that perform authorization based on the name of module(s) in the call stack, a la Java's model.

@nicowilliams
Copy link
Contributor Author

Then we'd need to distinguish trusted modules from not, so we'd need something just like Java policies, probably a JSON file to load at runtime.

Can we grow the privilege model slowly, so we don't have to develop the whole thing at once?

Here's a thought:

$ # let the main program have privs
$ jq -P main ...

$ # let all modules have privs
$ jq -P all ...

$ # let specific modules have privs
$ jq -P foo/bar.jq -P foo/fooz.jq ...

$ # let all trusted modules have privs granted by policy
$ jq -P trusted ...

And then we can add the first and second usage now, the third a bit later, and the last one later still.

For the last one we'd have a local .json file indicating which modules get what privs, and we'd have a builtin callers that outputs an array of module names in the call stack reaching the frame of the caller of callers. Then a trusted module can perform authorization based on local policy, again, a la Java.

Now, the Java model is quite good, but falls down due to lack of attention to authorization by developers, and complexity of writing local policies. So, it's not exactly a great model. In practice we've ended up with people running with all classes as trusted and never downloading code -- the applet model failed. So perhaps we should NOT copy Java, and we should stop at the first two or three usages above.

I'm all ears as to better privilege models.

@nicowilliams
Copy link
Contributor Author

@svnpenn Let me rephrase for users.

Should users have to "bless" trusted modules manually, or even grant them specific privileges, and if so, should this be done only on the command-line, at module installation time, or either at the choice of the user?

I'm of the opinion that most users won't really be in a position to manage fine-grained code trust. For most users it will be all-or-nothing. I can live with that, but I suppose it'd be nice to be able to limit trust when installing random modules from random repositories.

Think of things like jqplay.org. It should allow you to run jq programs, but not to read/write files or run programs on their heroku. For jqplay a big switch is sufficient.

For scripts with a she-bang we may need to be able to request privilege in the she-bang invocation, but the user should still be able to refuse it. If we ever support jq -f foo.jq ... with jq reading options from the she-bang in foo.jq, then we'll need to be careful to not allow the script privilege if the user doesn't own it, and we'll need to make sure that this doesn't break jqplay.org. There's a few things like that to consider.

From the point of view of generality, I like the Java model, but from the point of view of usability, I think it's just best to KISS: one simple switch granting privilege to the entire program and the modules it uses.

@nicowilliams
Copy link
Contributor Author

@svnpenn

Suppose you download and install a jq program/module from some repository. Now you want to run it. But you don't want it to do entirely arbitrary things, like run "rm -rf /", add your host to a botnet, or similar other destructive things. Without I/O and shell-out builtins, jq is just a sandboxed filter, and the most harm it can do is consume memory and CPU -- that's not nothing, if it's a fair bit less severe a problem than being able to run arbitrary command, don't you think?

Remember, various contributors really want to build a package repository ecosystem for jq. I like the idea myself.

The easiest way to manage privilege then is to let the user decide whether to grant it to the whole jq program, modules and all, in each run (jq -P ...). Anything beyond that seems cumbersome. At the limit we'd have something like the Java model, which requires so much care that it's failed for Java, so it's bound to fail for jq. I'm tempted to go with the all-or-nothing -P option to enable I/O and shell-out globally.

@nicowilliams
Copy link
Contributor Author

The C-coded generators now consist of three functions: one to setup the state and return the first value, one to step to return the next value, and a reset function to cleanup the generator's state. The step function is new.

This saves a few lines of code in src/execute.c. It's cleaner too. But it does cost more in src/builtin.c. In particular, I made the stepper take a void *, not void **, and this feels wrong: it means that the _popenr builtin now has to allocate a state structure, and can't just use a FILE * as its state. I'll probably make the stepper take a void ** to avoid this. EDIT: Done.

@nicowilliams
Copy link
Contributor Author

Oy, I screwed up my history. EDIT: Fixed.

@nicowilliams
Copy link
Contributor Author

A trampoline for C-coding jq functions that take and call closure arguments won't be that hard, and could be quite useful. It will require two new opcodes: one to enter the C-coded function, and one to re-enter when a closure it "calls" outputs a value, that way the outputs of the closures can be made available to the C-coded closure. A start at such a thing can be seen in the c-coded-generics branch of my clone. (Among other things the C-coded generics could and would take arbitrary numbers of closure arguments.)

The idea is to make it possible to implement parse(readfileraw) with the existing C-coded JSON incremental parser. A C-coded parse/1 would invoke its argument closure to get a block of text to parse, update the parser, output any values until more input is needed, and then backtrack to the argument closure to get the next chunk of text to parse.

The same approach would allow for a C-coded writefile(EXP) that writes the outputs of EXP to a file.

Both could be implemented with something like file handles, of course.

Note that we could even implement co-routines with this C-coded generic function scheme -- without handles. A C-coded coroutine(EXP; EXP; ..; EXP) would create a VM for each closure argument, and would accept outputs from the expressions whose form allows the expressions to indicate that a value should be passed as an input to another co-routine ({call:"exp0,value:...}).

EDIT: Symbolic names for coroutines ("handles") would be best though -- more user-friendly. Passing a value to a co-routine would be something like cocall($foo_coroutine); making a co-routing would be something like coroutine(EXP) as $foo_coroutine | ....

@nicowilliams nicowilliams modified the milestones: 1.7 release, 1.6 release Feb 26, 2017
@norcalli
Copy link

How about a variant where the input is sent to stdin like a normal pipe, and make popen/1, i.e. jq 'strings | popen("xml2json")'.

@Pipeliner
Copy link

I would really like to see something like this. Is there any ETA?

@nicowilliams
Copy link
Contributor Author

Closing this in preference to #1843.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants