Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for rust scripts (enabling directly integrated ad-hoc robust high performance scripting) #1053

Merged
merged 42 commits into from Aug 12, 2021
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
a803c6e
add support for rust scripts
tedil Jun 14, 2021
7df6194
Merge branch 'main' into rust-script
johanneskoester Jun 16, 2021
9e3c997
add rust environment yaml
tedil Jun 16, 2021
d003cd0
Merge branch 'rust-script' of github.com:snakemake/snakemake into rus…
tedil Jun 16, 2021
c69faf8
add missing files
tedil Jun 17, 2021
a5d7604
Merge branch 'main' into rust-script
tedil Jun 17, 2021
1da1e45
some basic docs
tedil Jun 17, 2021
793613b
Merge branch 'main' into rust-script
tedil Jun 22, 2021
9da571c
clarify default dependencies
mbhall88 Jun 24, 2021
6e31df0
add functionality to handle cargo manifest
mbhall88 Jun 25, 2021
cc3100e
Merge branch 'rust-script' of github.com:snakemake/snakemake into rus…
mbhall88 Jun 25, 2021
8591d80
remove redundant continue
mbhall88 Jun 25, 2021
5f83889
add some more rust script docs and restructure scripts docs
mbhall88 Jun 25, 2021
ecd8fe6
Merge branch 'main' into rust-script
tedil Jul 5, 2021
d9a1fac
use NamedList type instead of HashMap
tedil Jul 5, 2021
7c496ca
merge
tedil Jul 5, 2021
697960e
remove additional '--features', add indexmap/serde dependency+feature
tedil Jul 5, 2021
ceaf55c
update test-manifest.rs to use namedlist API aswell
tedil Jul 5, 2021
bf924f4
Merge branch 'main' into rust-script
tedil Jul 5, 2021
0f89579
add outer line doc testing and pin rust-script version
mbhall88 Jul 10, 2021
91533ed
Merge branch 'main' into rust-script
tedil Jul 11, 2021
31a281f
small fixes for rust outer doc test
mbhall88 Jul 13, 2021
078474c
format shell log string for rust-script
mbhall88 Jul 13, 2021
7a8ebf2
fmt
mbhall88 Jul 13, 2021
0b1a756
add missing test file
mbhall88 Jul 13, 2021
8da91f9
add missing rust script
mbhall88 Jul 13, 2021
4e90eca
replace serde-pickle with serde_json + json_typegen
tedil Jul 20, 2021
119af9d
fmt
tedil Jul 20, 2021
7e66fa5
only iter over positional items
tedil Jul 20, 2021
b7cbda3
fmt
tedil Jul 20, 2021
8a3cb97
add code to modify PATH, add functions for redirecting stdout, stderr…
tedil Jul 21, 2021
2da1fed
use fully qualified names instead of use statements
tedil Jul 21, 2021
4c8e031
update docs
tedil Jul 21, 2021
8706e94
remove print and todo
tedil Jul 23, 2021
c0fa475
use ordered list instead
tedil Jul 23, 2021
e25a7c3
remove example TODO
tedil Jul 23, 2021
dfff120
move comment about R snakemake@source() function to the R section
tedil Jul 23, 2021
57e16fb
update src comments
tedil Jul 23, 2021
513e2d6
Merge branch 'main' into rust-script
tedil Jul 26, 2021
fca4aaf
make log impl_iter and dont redirect rust-script stream
mbhall88 Jul 27, 2021
306ea86
minor additions to the docs
mbhall88 Aug 6, 2021
aa73982
Merge branch 'main' into rust-script
tedil Aug 11, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
140 changes: 109 additions & 31 deletions docs/snakefiles/rules.rst
Expand Up @@ -571,6 +571,9 @@ External scripts

A rule can also point to an external script instead of a shell command or inline Python code, e.g.

Python
~~~~~~

.. code-block:: python

rule NAME:
Expand All @@ -591,29 +594,24 @@ The script path is always relative to the Snakefile containing the directive (in
It is recommended to put all scripts into a subfolder ``scripts`` as above.
Inside the script, you have access to an object ``snakemake`` that provides access to the same objects that are available in the ``run`` and ``shell`` directives (input, output, params, wildcards, log, threads, resources, config), e.g. you can use ``snakemake.input[0]`` to access the first input file of above rule.

Apart from Python scripts, this mechanism also allows you to integrate R_ and R Markdown_ scripts with Snakemake, e.g.

.. _R: https://www.r-project.org
.. _Markdown: https://rmarkdown.rstudio.com
An example external Python script could look like this:

.. code-block:: python

rule NAME:
input:
"path/to/inputfile",
"path/to/other/inputfile"
output:
"path/to/outputfile",
"path/to/another/outputfile"
script:
"scripts/script.R"
def do_something(data_path, out_path, threads, myparam):
# python code

In the R script, an S4 object named ``snakemake`` analogous to the Python case above is available and allows access to input and output files and other parameters. Here the syntax follows that of S4 classes with attributes that are R lists, e.g. we can access the first input file with ``snakemake@input[[1]]`` (note that the first file does not have index ``0`` here, because R starts counting from ``1``). Named input and output files can be accessed in the same way, by just providing the name instead of an index, e.g. ``snakemake@input[["myfile"]]``.
do_something(snakemake.input[0], snakemake.output[0], snakemake.threads, snakemake.config["myparam"])

You can use the Python debugger from within the script if you invoke Snakemake with ``--debug``.

Alternatively, it is possible to integrate Julia_ scripts, e.g.
R and R Markdown
~~~~~~~~~~~~~~~~

.. _Julia: https://julialang.org
Apart from Python scripts, this mechanism also allows you to integrate R_ and R Markdown_ scripts with Snakemake, e.g.

.. _R: https://www.r-project.org
.. _Markdown: https://rmarkdown.rstudio.com

.. code-block:: python

Expand All @@ -625,23 +623,11 @@ Alternatively, it is possible to integrate Julia_ scripts, e.g.
"path/to/outputfile",
"path/to/another/outputfile"
script:
"path/to/script.jl"

In the Julia_ script, a ``snakemake`` object is available, which can be accessed similar to the Python case (see above), with the only difference that you have to index from 1 instead of 0.

For technical reasons, scripts are executed in ``.snakemake/scripts``. The original script directory is available as ``scriptdir`` in the ``snakemake`` object. A convenience method, ``snakemake@source()``, acts as a wrapper for the normal R ``source()`` function, and can be used to source files relative to the original script directory.

An example external Python script could look like this:

.. code-block:: python
"scripts/script.R"

def do_something(data_path, out_path, threads, myparam):
# python code
In the R script, an S4 object named ``snakemake`` analogous to the Python case above is available and allows access to input and output files and other parameters. Here the syntax follows that of S4 classes with attributes that are R lists, e.g. we can access the first input file with ``snakemake@input[[1]]`` (note that the first file does not have index ``0`` here, because R starts counting from ``1``). Named input and output files can be accessed in the same way, by just providing the name instead of an index, e.g. ``snakemake@input[["myfile"]]``.

do_something(snakemake.input[0], snakemake.output[0], snakemake.threads, snakemake.config["myparam"])

You can use the Python debugger from within the script if you invoke Snakemake with ``--debug``.
An equivalent script written in R would look like this:
An equivalent script (:ref:`to the Python one above <Python>`) written in R would look like this:

.. code-block:: r

Expand Down Expand Up @@ -703,6 +689,98 @@ In the R Markdown file you can insert output from a R command, and access variab
A link to the R Markdown document with the snakemake object can be inserted. Therefore a variable called ``rmd`` needs to be added to the ``params`` section in the header of the ``report.Rmd`` file. The generated R Markdown file with snakemake object will be saved in the file specified in this ``rmd`` variable. This file can be embedded into the HTML document using base64 encoding and a link can be inserted as shown in the example above.
Also other input and output files can be embedded in this way to make a portable report. Note that the above method with a data URI only works for small files. An experimental technology to embed larger files is using Javascript Blob `object <https://developer.mozilla.org/en-US/docs/Web/API/Blob>`_.

Julia_
~~~~~~

.. _Julia: https://julialang.org

.. code-block:: python

rule NAME:
input:
"path/to/inputfile",
"path/to/other/inputfile"
output:
"path/to/outputfile",
"path/to/another/outputfile"
script:
"path/to/script.jl"

In the Julia_ script, a ``snakemake`` object is available, which can be accessed similar to the :ref:`Python case <Python>`, with the only difference that you have to index from 1 instead of 0.

Rust_
~~~~~

.. _Rust: https://www.rust-lang.org/

.. code-block:: python

rule NAME:
input:
"path/to/inputfile",
"path/to/other/inputfile"
output:
"path/to/outputfile",
"path/to/another/outputfile"
params:
seed=4
script:
"path/to/script.rs"

The ability to execute Rust scripts is facilitated by |rust-script|_. As such, the
script must be a valid ``rust-script`` script.

In the Rust script, an instance of a ``Snakemake`` struct can be obtained via

.. code-block:: rust

let snakemake = Snakemake::load()?;
tedil marked this conversation as resolved.
Show resolved Hide resolved

where the ``Snakemake`` struct is defined as follows:

.. code-block:: rust

pub struct Snakemake {
input: HashMap<String, String>,
output: HashMap<String, String>,
params: HashMap<String, Value>,
wildcards: HashMap<String, Value>,
threads: u64,
log: Value,
resources: Value,
config: HashMap<String, Value>,
rulename: String,
bench_iteration: Option<usize>,
scriptdir: String,
}

where the ``Value`` type is a |serde_pickle_value|_.

So, for the above example, to get the value of ``params.seed``, or use ``0`` if it
isn't set we would do something like

.. code-block:: rust

let seed = match snakemake.params.get("seed") {
Some(Value::I64(i)) => i,
_ => 0
};

TODO: describe how to access positional and named args/params when the API is finalised

TODO: discuss what default dependencies and use statements are already used and the two types of manifest

TODO: add an example

.. |rust-script| replace:: ``rust-script``
.. _rust-script: https://rust-script.org/
.. |serde_pickle_value| replace:: ``serde_pickle::Value``
.. _serde_pickle_value: https://docs.rs/serde-pickle/0.6.2/serde_pickle/value/enum.Value.html

----

For technical reasons, scripts are executed in ``.snakemake/scripts``. The original script directory is available as ``scriptdir`` in the ``snakemake`` object. A convenience method, ``snakemake@source()``, acts as a wrapper for the normal R ``source()`` function, and can be used to source files relative to the original script directory.

.. _snakefiles_notebook-integration:

Jupyter notebook integration
Expand Down