Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import callback should separate relative path conversion from returning file content #239

Open
fare opened this issue Oct 4, 2016 · 6 comments

Comments

@fare
Copy link

fare commented Oct 4, 2016

When you import a same file twice, the evaluation results are not cached.

Worse: the callback API for import does not distinguish (1) a file lookup step, (2) a file read step and (3) a code evaluation step. Therefore, a second import of the same file will have the callback return new file contents, which makes either cache or reevaluation cumbersome.

For compatibility with existing API, a SHA checksum for the contents could serve as the key to cache evaluation results.

In a newer API, the import function would be broken in three steps, and each could be cached separately, so that (1) file lookup returns an absolute file path or URL from the (probably relative) string of the requested import depending on the current file, (2) file read returns the file contents given the path or URL, and (3) code evaluation evaluates the code.

@fare
Copy link
Author

fare commented Oct 4, 2016

Note that with library files that include other library files, etc., you can easily get exponential blow off in re-importing the very same library functions N times:

B imports A
C imports B, A
D imports C, B, A
...
Z imports Y, X, ... B, A

Even with import only as headers, Z may load A factorial 26 times, instead of just once.

@mikedanese
Copy link
Contributor

mikedanese commented Oct 4, 2016

When you import a same file twice, the evaluation results are not cached.

The vm caches file content.

https://github.com/google/jsonnet/blob/master/core/vm.cpp#L465-L467

Are you talking about the parsed AST?

@sparkprime
Copy link
Member

On splitting into 3 parts: So, in another project I used a special string literal syntax to resolve relative paths, see http://www.gritengine.com/grit_book/lua.html#lua_paths

I was thinking in Jsonnet you sometimes want to refer to a file in the same directory as the jsonnet file, which would be properly expanded so that the manifested JSON still makes sense. There is a std.thisFile to help with this but it's uglier than the `` in Grit or the import/importstr which does it implicitly.

With that in mind, it would be nice to have some `` or other string literal syntax that means expand the relative path to an absolute path from the current working directory of the jsonnet executable. In that world, import would not resolve relative paths anymore, and that would simplify its implementation. However that is a breaking change for the import syntax so phasing it in would be tricky.

On cacheing and exponential blow up -- yes this is a concern. It would be good to have a way to optimize the case of the diamond where two files both include the same file and not having to re-execute it. It's not clear to me how much benefit this would bring in real use cases though because typically files contain a big object literal with a few locals above it for file-level constants and imports. So the constant factor would be small but maybe the exponential blowup means it is still a problem.

@fare
Copy link
Author

fare commented Oct 4, 2016

The pattern I'm using in my jsonnet libraries, which reminds me of how they do things in NixOS, is that instead of importing other files and evaluating to an object, like they used to do, my files instead evaluate to a function that takes an environment as argument and return the object as "linked" to the environment. The toplevel library file closes the loop by defining the toplevel environment recursively passed to all the other files.

One advantage is performance by only loading each file once (at most, if used, lazily). Another advantage is that it becomes easy to enable or disable debugging in all files "just" by toggling it in the toplevel environment.

@sparkprime
Copy link
Member

In that case the file is a big function literal :)

There is currently no optimization to hoist local variables out of the function when their expression does not depend on the function parameters. This means it will be recomputed each time you call the function. You can do that optimization manually and if it's a big help you can file a bug to automate it (or even try implementing that yourself).

@sparkprime sparkprime changed the title Cache imports Import callback should separate relative path conversion from returning file content Oct 17, 2016
@sparkprime
Copy link
Member

Renamed issue to capture the remaining part. I do support this change as it would cause more cacheing of imports. However it would not be backwards compatible so I'll have to think about whether the old API has to be kept around for a deprecation cycle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants