Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native Julia engine based on QuartoNotebookRunner.jl #8645

Merged
merged 107 commits into from Apr 9, 2024

Conversation

jkrumbiegel
Copy link
Contributor

@jkrumbiegel jkrumbiegel commented Feb 8, 2024

Description

This PR implements a native Julia execution engine as an alternative to the Jupyter workflow. The benefit is that only quarto and Julia have to be installed, without any Python intermediary. That makes it easier for Julia users to install and maintain a working quarto system.

We are executing qmd files via QuartoNotebookRunner.jl, which spins up a control server to communicate with via tcp. The control server itself keeps a number of worker processes which can be used for keepalive scenarios, to keep latency low.

Another benefit vs. the Jupyter workflow is that our control server runs out of a private Julia environment that quarto sets up the first time a Julia render is initiated. This means users do not have to install anything in their own Julia environments to run quarto, which is consistent with best practices not to crowd global environments.

In our internal testing, this system already works quite well, so we are interested in getting maintainer feedback on the implementation, as well as potential paths forward.

The engine currently only executes Julia code blocks, it is conceivable, though, to later add support for specialized R and Python compatibility via RCall.jl and PythonCall.jl or similar packages on top. This is not in scope of this PR, however.

Checklist

I have (if applicable):

  • filed a contributor agreement.
  • referenced the GitHub issue this PR closes
  • updated the appropriate changelog

@cscheid
Copy link
Collaborator

cscheid commented Feb 14, 2024

@jkrumbiegel Just checking in to see if you folks are looking for comments for us here or are still working on the PR.

@jkrumbiegel
Copy link
Contributor Author

@cscheid yes we're looking for comments, to see what your thoughts about the current implementation are, how it fits into the existing framework. I've made the PR a draft because there are no tests yet. I wanted to start adding such things after the general approach has been greenlit.

@cscheid
Copy link
Collaborator

cscheid commented Feb 14, 2024

@jkrumbiegel Ah, sorry for the delay, in that case. I hope to have comments for you by the end of the week.

@jkrumbiegel
Copy link
Contributor Author

Thank you!

@cscheid cscheid self-requested a review February 23, 2024 14:54
src/execute/engine.ts Outdated Show resolved Hide resolved
@cscheid
Copy link
Collaborator

cscheid commented Feb 23, 2024

Hi there, sorry for the delay. We would definitely like to merge this. In the small, everything in the PR looks good. There's going to be one conflict you'll need to solve in the engine declaration where I recently made a small change to how the global list of engines is managed.

You write this:

Another benefit vs. the Jupyter workflow is that our control server runs out of a private Julia environment that quarto sets up the first time a Julia render is initiated. This means users do not have to install anything in their own Julia environments to run quarto, which is consistent with best practices not to crowd global environments.

I really like this, and I think it's pretty consistent with how we'd like Quarto to operate. However, I have a few questions. How do you manage required packages in this environment? How and where are they declared/installed? How are versions declared/maintained?

Currently, Quarto defers to the user to maintain those environments. In a vacuum, I like the idea of Quarto taking over the maintenance of the environment, but I'm not sure all users will like that they will end up with duplicated packages in their system. They probably will run into surprises if upgrading their own environment doesn't mean that the environment we set up will not be upgraded as well.

Finally, I see a couple of failures in the Julia notebook when using multiline YAML arrays. We'd like to fix that before merging.

Please let me know if you need any more help here, but I'm quite excited. This looks like a really nice solution for pure-Julia users.

@MichaelHatherly
Copy link
Contributor

Finally, I see a couple of failures in the Julia notebook when using multiline YAML arrays. We'd like to fix that before merging.

Could you add an example here that fails and we'll get it sorted? Might have already been resolved in the latest version of QNR.jl since I did sort out some YAML parsing issues recently that aren't in the version that's committed here yet.

@cscheid
Copy link
Collaborator

cscheid commented Feb 23, 2024

Could you add an example here that fails and we'll get it sorted?

Sorry, it's in our test suite, here: https://github.com/quarto-dev/quarto-cli/actions/runs/7829788399/job/21362500154?pr=8645

@MichaelHatherly
Copy link
Contributor

YAML parsing for that case is fixed in the most recent version, so @jkrumbiegel upgrading that should help. There's a different issue related to display usage that we'll need to fix in QNR, I will sort that next week.

@jkrumbiegel
Copy link
Contributor Author

I really like this, and I think it's pretty consistent with how we'd like Quarto to operate. However, I have a few questions. How do you manage required packages in this environment? How and where are they declared/installed? How are versions declared/maintained?

Currently, Quarto defers to the user to maintain those environments. In a vacuum, I like the idea of Quarto taking over the maintenance of the environment, but I'm not sure all users will like that they will end up with duplicated packages in their system. They probably will run into surprises if upgrading their own environment doesn't mean that the environment we set up will not be upgraded as well.

The PR defines a Project.toml, a description of dependencies. That file is currently just:

[deps]
QuartoNotebookRunner = "4c0109c6-14e9-4c88-93f0-2b974d3468f4"

[compat]
QuartoNotebookRunner = "0.2.0"

So the server depends just on our package, and we fix the patch version, so quarto can decide to update that version whenever they want, and can be sure it's always compatible. One could depend on the minor but currently it was simpler and safer to fix the patch.

This Project toml is copied from the resources folder to the quarto runtime data directory, and there it's resolved and installed. Note that this doesn't interfere with any other environments on the user's computer. The source code of needed packages will be installed into the central depot, but that creates no problem and is how julia is meant to work. For the user, this is completely frictionless.

Note further that this environment is only how the server is run, the workers that execute the notebooks are separate julia processes, and their environments can be controlled by the users. They will default to whatever their julia's default environment is, but they can also easily choose to activate special environments for each quarto notebook, for better reproducibility. They will never interact with the server environment. So there are no duplicated packages or anything, as the packages for the workers are completely separate from the packages quarto needs to run the server.

I hope that clarifies things a bit, if not, ask away :)

I have a question, too, could you give some guidance how you want tests to be set up?

@cscheid
Copy link
Collaborator

cscheid commented Feb 23, 2024

So the server depends just on our package, and we fix the patch version, so quarto can decide to update that version whenever they want, and can be sure it's always compatible. One could depend on the minor but currently it was simpler and safer to fix the patch.

Sorry, I think I phrased my question poorly. My question was about what happens with dependencies of the code in Quarto documents, not Quarto itself.

Currently, Quarto takes a hands-off approach to those, so if a Quarto project (or file) needs specific libraries installed, then those libraries are the responsibility of the user. You've described the changes here as "[the] control server runs out of a private Julia environment", which I originally took to mean that package management now has become the responsibility of the engine somehow. But now I think this was a bad interpretation on my part, so I just remain confused.

We need that the base Quarto program to not a required dependency on Julia, because of the consequences this would have on our installers. In the long term, we are going to drop the baseline Python dependency as well and implement the Jupyter protocol ourselves. For example, users do not need R to be installed on the system in order to run Quarto. (I'm not claiming that your PR fails here, I just haven't verified that your PR satisfies this requirement)

EDIT: I now see how this all works. It's good, thank you!

@jkrumbiegel
Copy link
Contributor Author

@cscheid with QuartoNotebookRunner 0.7 I have added the HMAC verification now, as well as timeout features on the individual "kernels" (worker processes held by the main service process) and a timeout on the service process itself. Those were the last larger items I wanted to tackle before a review, I'll therefore switch this from draft mode to review mode now.

@jkrumbiegel jkrumbiegel marked this pull request as ready for review March 14, 2024 18:10
@cscheid
Copy link
Collaborator

cscheid commented Mar 20, 2024

Picking this up again. The only thing missing from this is writing the docs for it, and we prefer to have the docs written to merge together with the feature.

I think the most appropriate page to describe this new feature is https://quarto.org/docs/computations/julia.html.

Specifically, I think we could split that page into two main H2 headings, for example "Using the jupyter engine" and "Using the julia engine".

I'd be happy to give you concrete advice in how the docs look, but it might be easier for us to work on a fork of quarto.org that you have commit rights to. If that's ok with you, I'll fork quarto-web on my local repo and add you (and colleagues) as collaborators. We can coordinate here or over email, however you prefer.

@jkrumbiegel
Copy link
Contributor Author

Sounds good, coordinating over github seems most straightforward I think. We can open an issue on the fork for example.

@jkrumbiegel
Copy link
Contributor Author

The docs PR is here: cscheid/quarto-web#1

I've reverted the windows special spawning behavior back in, the tests were passing but I could see from the timings that deno was still waiting for the unref'd process to exit (with a timeout).

@jkrumbiegel
Copy link
Contributor Author

@cscheid what are the next steps, is it ok if I occasionally keep pushing minor changes to this branch until it is merged? I'm not sure if any review activity is in progress, so I don't know if I keep making the PR a moving target by doing this.

@jkrumbiegel
Copy link
Contributor Author

Bumping

@cscheid
Copy link
Collaborator

cscheid commented Apr 9, 2024

Thanks for bumping; I'm so sorry I missed this in the notification flood. It's totally ok to keep adding, but I think we're ready to merge. I'll review the docs, merge there and then here. Thanks again!

@jkrumbiegel
Copy link
Contributor Author

Great, thanks! Can you estimate when this will be available in a release?

@cscheid
Copy link
Collaborator

cscheid commented Apr 9, 2024

This should be available on pre-releases as soon as we merge the docs (ie, hopefully later today). We plan to make 1.5 a stable release around May-June.

@cscheid cscheid merged commit de208d3 into quarto-dev:main Apr 9, 2024
47 checks passed
@cscheid
Copy link
Collaborator

cscheid commented Apr 9, 2024

@jkrumbiegel it's merged 🎉 . I do have a few post-merge things I want to do to improve this. I'll open a separate issue to track.

@cscheid cscheid mentioned this pull request Apr 9, 2024
2 tasks
@storopoli storopoli deleted the jk/quartonotebookrunner branch April 9, 2024 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants