New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add support for rust scripts (enabling directly integrated ad-hoc robust high performance scripting) #1053
Conversation
What still needs to be done:
Edit: pick anything you like! Or come up with something else which I have forgotten/overlooked |
Great. There's plenty in that to keep us busy for a little while. Mind if I work "handling dependencies better"? i.e. allowing doc comment Cargo manifests in addition to the single-line dependencies? Also, I think you might not have added your test Rust script to the PR? |
That would be great!
Oh, you are right, they got skipped due to .gitignore rules. Fixed now. |
This NamedList implementation would allow access both by index and by name, under the assumption that the order of the dict entries (of the dict that gets pickled in python and unpickled in rust by serde) is stable and corresponds to the index scheme. use std::ops::Index;
#[derive(Debug, Deserialize)]
struct NamedList<V>(pub IndexMap<String, V>);
impl<V> Index<usize> for NamedList<V> {
type Output = V;
fn index(&self, index: usize) -> &Self::Output {
self.0
.get_index(index)
.unwrap_or_else(|| panic!("Index out of bounds: {}", index))
.1
}
}
impl<V> Index<&str> for NamedList<V> {
type Output = V;
fn index(&self, index: &str) -> &Self::Output {
self.0
.get(index)
.unwrap_or_else(|| panic!("No such key {}", &index))
}
} This will also panic for nonexistent keys/indices. But you could always use the inner indexmap instead, e.g. This also requires indexmap with its serde feature enabled, which will only be possible once the "better handling of dependencies" checkbox is ticked ;) |
The NamedList in Snakemake can also return a list of files/items sometimes. This is not properly represented here. |
Ah, you mean it's not always a map? Bummer. Well we could use a newtype wrapper around edit: nevermind, we can change lists to maps on the python side (and it's already done I think) |
Not sure what you mean. A name pointing to a list of files is important to keep of course. This here might be one option:
The other one would be two separate getters |
I think the former way is more idiomatic. And one could add helpers to the enum that provide the same convenient panic behavior. |
I thought about an enum solution aswell, but that would make the serde deserialization much more complicated. As it is now, we can just deserialize the dict items we get from python / the pickle file into a struct such as the NamedList one I proposed earlier.
? I don't get it. Are you saying: Case a) (if even possible) is already handled by just pretending it's a dict. |
So we can now handle either form of dependency specification (single-line or code block manifest). I added an additional test case so we are testing both forms. The tricky part was just that rust-script requires this before the preamble, so I had to scrape the dependency comment out of the original script and insert it above the preamble. Another annoying thing is that rust-script only allows inner doc comments I have also restructured the scripts docs a little. I basically just created subheadings for each language as we now support 4 languages 🎉 and thought this would make it easier to navigate. There's still some more docs that need to be added for rust but I might wait until we have nailed down things a little more. Please feel free to go hard with suggesting changes to my additions etc. Always love to learn better ways of doing things, especially from you two. |
I'm keen to get stuck into another element of the implementation. What are you working on @tedil and what would you consider highest priority for me to get stuck into?
Would you be able to elaborate a little on these? Also, regarding our dependency on serde (and derive feature), I had the original idea of just using some kind of template to generate the snakemake data structure directly from the python one. This is obviously a lot more work, but would allow us to directly encode the correct types without needing to use |
These might also have been python specific. I used the python script functionality implementation as a rough guide, so perhaps these aren't issues we have to deal with in the rust script case. If the channels used for logging are always stdout and stderr, we probably don't have to change anything; but if they are to be redirected (to a file, some logger somewhere else, etc) 🤷
Okay so: I talked with Johannes on Friday: The However, params etc can be arbitrary python objects. Which is why I just chose to use serde-pickle (or serde_json would probably work aswell) so I don't really have to handle that, since I did not want to roll my own code if there's already tested crates that cover most of the work.
I personally feel this'd be both very inconvenient to work with (having to parse strings yourself, especially inconvenient for nested lists/dicts!) and wrong in the sense that you have the type information of those values at runtime on the python side, just to throw it all away and move the burden of knowing those types to the person writing a rust script. The more I think of it, the better it is to stick with serde for now (we can exchange that with our own code later if needed) but provide convenience traits/wrappers for if let Value::F64(some_param) = params["this_is_a_float"] { do_stuff_with(some_param) } it's also possible to do let some_param = params["this_is_a_float"].float()?; or let some_param: f64 = params.get("this_is_a_float")?; |
The current version now has iterator and index implementations in input/output/wildcards for positional arguments only. Everything else must be accessed by field access. As for redirecting stdout with gag: For scripts without a main fn (i.e. those that are implicitly enclosed in a main fn), we can just add redirects to the preamble, for scripts with an explicit main fn, we'd have to inject that at the start of the main fn. |
good
For the other script types, we do not have automatic redirects, but the script author has to do it. I would think that is reasonable for rust as well (also helps with transparency when just reading). Hence, you do not need to detect these cases. Just a simple helper function that could be used like |
Sorry for the radio silence I agree that it is probably best to leave it to the user to do what they like with the error stream - I don't know why I was redirecting the rust-script streams to the log file, that seems like a terrible idea in hindsight.
The log should also be included. I've added it in fca4aaf and am testing it in one of the rust test scripts. One thing that is bugging me is a compiler warning
We have an allow attribute on the preceding line so I wonder if this is a rust bug? |
No harm done ;)
Good point, thanks!
Yes, it's indeed a bug, I think I have linked that bug in the source as well, so once it's resolved, we can remove that link. |
Testing fails because of some github actions missing, I guess that is due to merging the changes from main into this branch? (We might also want to modularize script.py into |
What's left to do now? Is it just docs? I'm happy to tackle that in the coming days? Are there any special requests for things to add to the docs? |
I think there's actually not much more to do now; if you could have a look at the docs, that would be great! |
Docs look great to me. The only two things for future would be the rust-script templates and also adding Rust support in jupyter notebooks (as mentioned in #913 (comment) probably best done with Also, CI has been failing due to
|
Kudos, SonarCloud Quality Gate passed!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description
This draft PR adds support for using rust-script scripts with snakemake's script directive. It's a proof of concept, see also @mbhall88's notes on rust script support here.
For example, the following script reads from a named input file, appends some string to it and writes it to a positional output file. It also shows how to specify additional dependencies:
Caveats / Details
snakemake
object has to be obtained usingSnakemake::load()
, which deserializes a pickled version of the (well, at the moment not actually the but rather a stripped down version) Snakemake (python) object into a rust Snakemake struct, reading from a (not yet temporary) file. This entails havingserde
,serde_derive
andserde-pickle
as default dependencies (which might be a bit much for a small script, but this was the quickest way for me to get started).snakemake.input["0"]
.cargo install rust-script
). There's no conda package for that at the time of writing.QC
docs/
) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).