-
-
Notifications
You must be signed in to change notification settings - Fork 786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for R / renv as a language #1799
Conversation
testing/get-renv.sh
Outdated
# Alternatively, we require the renv | ||
# package to be installed already, then we can | ||
# omit that. | ||
Rscript -e 'dir.create(Sys.getenv("R_LIBS_USER"), recursive = TRUE)' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work if a non-root user has never installed any R package (quite unlikely at that stage but...). Maybe catch the installation error and fail (more) informatively?
ca4a52b
to
a72bde1
Compare
pre_commit/languages/renv.py
Outdated
# expose entry to the user for maximal control | ||
return helpers.run_xargs( | ||
hook, hook.cmd, file_args, | ||
color=color, cwd=hook.prefix.prefix_dir, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the current expectation for hook authors is that hooks will be executed from the root of the repository, so cwd=...
won't be workable here
I think that necessarily changes how the environment setup is working now -- the ~normalish way of setting this up is by pointing some set of environment variables at the ENVIRONMENT_DIR (and subdirs)
note also that things must be installed entirely inside the ENVIRONMENT_DIR or they potentially won't be persisted by caching
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the current expectation for hook authors is that hooks will be executed from the root of the repository, so cwd=... won't be workable here.
of course 🤦.
the ~normalish way of setting this up is by pointing some set of environment variables at the ENVIRONMENT_DIR (and subdirs).
That's what I saw. Problem is (and that's a peculiarity of R) the most operations happen in an R session, not the terminal session. renv
can only be activated properly from inside R. For that reason, my current suggestion is to place a renv::activate()
call in a file, then point the R_PROFILE_USER
environment variable to that file so the code will run at R startup. The idea is now to allow two versions, inline code expressions and file in entry:
:
Rscript -e expr # like Rscript -e "1+1"
Rscript path/to/file
An inject the options --no-save
, --no-restore
, --no-site-file
and --no-environ
after Rscript
(to basically have --vanilla
start, but run this one renv::activate()
statement) and expand the path/to/file
similar to language: script
since we have to run the hooks in the working directory of the repo. The files and the hook args are then passed after the above, i.e.
Rscript --no-save --no-restore --no-site-file --no-environ /abs/path/ [files] [hook args]
Rscript --no-save --no-restore --no-site-file --no-environ -e expr [files] [hook args]
note also that things must be installed entirely inside the ENVIRONMENT_DIR or they potentially won't be persisted by caching.
This should be solved now. Just placing the whole env in a subdirectory now.
@@ -0,0 +1,18 @@ | |||
Package: gli.clu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DESCRIPTION
file is basically the equivalent of setup.py
, renv.lock
is the equivalent of the conda environment.yml
. Not sure if one or both should be listed in LOCAL_RESOURCES
as this was introduced here:
https://github.com/pre-commit/pre-commit/pull/1232/files#diff-a52ef0cf627ce783529c259e63d4ea82b77520936a2455cbf4f90ad2fb36d27dR176
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah LOCAL_RESOURCES
is to allow an additional_dependencies
-only repo: local
hook -- it seems like additional_dependencies is supported for R so we should probably make an empty one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks for the explanation. I added a template for renv.lock
in cf88089. However, there is an R version hard-coded in it which will get outdated. As far as I can tell from a quick experiment, the docs and issues in https://github.com/rstudio/renv, there is no indication that the R version recorded in the lock file will be used at renv::restore()
, which we use in renv.install_environment()
. Even removing seems fine (I left it there anyways). But that might still be a bit brittle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, should we add a unit test for a local hook with additional dependencies to check if things work as expected? I don't think that's currently covered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it would probably be good to have one unit test demoing a local
hook via additional_dependencies
tests/repository_test.py
Outdated
@@ -279,14 +280,123 @@ def test_node_hook_with_npm_userconfig_set(tempdir_factory, store, tmpdir): | |||
test_run_a_node_hook(tempdir_factory, store) | |||
|
|||
|
|||
def _test_renv_parsing( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me know if these are too many tests (because it slows down testing). I think we can get 100% coverage with fewer tests too (but then not cover as many combinations).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks great, really happy you took this on and I'm excited to add this support!
pre_commit/languages/all.py
Outdated
@@ -16,6 +16,7 @@ | |||
from pre_commit.languages import perl | |||
from pre_commit.languages import pygrep | |||
from pre_commit.languages import python | |||
from pre_commit.languages import renv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should maybe just be called r
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought about that also. The thing is that language: conda
can both host R and Python hooks and renv
can manage python dependencies, so I thought I go for a more specific name. On the other hand, language: python
uses venv
(and is still called python
, not venv
). Since working with virtual environments is still not as common in R compared to Python and not every R user knows renv
, using r
as a name might also make it easier for people to find what they are looking for. So I am happy to switch to r
, please let me know what you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think r
makes the most sense, conda
is kinda the special case right now and I think if people google "pre-commit r" they'll be much more likely to hit this than if it were named renv
we can always add other names if new ways of installing isolated r environments happen in the future and/or shuffle them around (there used to be a python_venv
language which was merged into python
as virtualenv
makes venv
-style environments anyway nowadays)
pre_commit/languages/renv.py
Outdated
with in_env(hook.prefix, hook.language_version): | ||
|
||
return helpers.run_xargs( | ||
hook, cmd_from_hook(hook), file_args, color=color, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually prefer to read a file from the bottom upwards -- also helper functions that live only in this module should be underscore-prefixed to make it more clear what is/isn't part of the api (the api specification is in the languages/all.py
file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can adapt the order. I can also prefix the helper functions not part of the API. But why is in_env
and get_env_patch
not prefixed in other modules although not listed in languages/all.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmmm yeah good point, those should probably be underscore-prefixed everywhere -- probably just legacy and copy paste 😅
pre_commit/languages/renv.py
Outdated
# TODO what if expr is x;2? not captured by shlex.split() | ||
""" | ||
if entry[0] != 'Rscript': | ||
raise SyntaxError('entry must start with `Rscript`.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SyntaxError is for python syntax being incorrect -- these should be ValueError
probably
pre_commit/languages/renv.py
Outdated
' '.join([ | ||
'The only valid syntax is `Rscript -e {expr}`', | ||
'or `Rscript path/to/hook/script`', | ||
]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be written as an implicitly joined string:
raise TypeError(
'first line here '
'second line here'
)
pre_commit/languages/renv.py
Outdated
]), | ||
) | ||
|
||
None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line does nothing and can be removed, note that python implicitly returns None
for "void" functions and doesn't support implicit-return-without-return
-keyword
@@ -0,0 +1,18 @@ | |||
Package: gli.clu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah LOCAL_RESOURCES
is to allow an additional_dependencies
-only repo: local
hook -- it seems like additional_dependencies is supported for R so we should probably make an empty one?
tests/repository_test.py
Outdated
@@ -19,6 +19,7 @@ | |||
from pre_commit.languages import helpers | |||
from pre_commit.languages import node | |||
from pre_commit.languages import python | |||
from pre_commit.languages import renv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the tests specifically for the renv helper functions should live in tests/languages/renv_test.py (or r_test.py
after renaming)
tests/repository_test.py
Outdated
|
||
|
||
def test_renv_parsing_expr_non_Rscirpt(tempdir_factory, store): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(throughout the review, nbd I can fix this myself later -- I don't put blank lines at the beginning or the ending of blocks)
tests/repository_test.py
Outdated
|
||
def test_renv_parsing_expr_non_Rscirpt(tempdir_factory, store): | ||
|
||
with pytest.raises(SyntaxError, match='entry must start with'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like the match=
arg and prefer to test the exact error message, you can follow this pattern:
with pytest.raises(TypeHere) as excinfo:
...
msg, = excinfo.value.args
assert msg == ...
tests/repository_test.py
Outdated
def test_run_a_ruby_hook(tempdir_factory, store): | ||
_test_hook_repo( | ||
tempdir_factory, store, 'ruby_hooks_repo', | ||
'ruby_hook', [os.devnull], b'Hello world from a ruby hook\n', | ||
) | ||
|
||
|
||
@xfailif_windows # pragma: win32 no cover | ||
@ xfailif_windows # pragma: win32 no cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decorators shouldn't have spaces after them -- not sure what happened here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also wondered, but one of the pre-commit hook changed it, maybe autopep? I even rebased my PR master and tried to get rid of this. Can revert but then not sure if the pre-commit.ci passes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah my guess is there was a syntax error earlier in the file and then autopep8 freaked -- I've been meaning to fix that in autopep8 but it's tricky (basically sometimes it will try and rewrite a file even if it has a syntax error 😱)
e07dfae
to
809c79e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a stub for this in the docs -- pre-commit/pre-commit.com#473 |
This PR aims to add 2nd level support for R to pre-commit via the virtual environment manager renv, motivated by isolated dependency management and (hopefully) compatibility with pre-commit.ci for 2nd level languages as outlined in lorenzwalthert/precommit#215. The implementation follows the recently contributed support for conda (#1232) for source code and tests as well as the implementation of the python virtualenv language.
Further considerations:
Initial tests only run on Linux. I have to add other platforms as we go if this PR is generally approved.I'd appreciate feedback to the questions in the source code comments, in particular on whether or not to removeAlso, let me know if you want certain comments that explain stuff to be removed. Was not sure how much.in_env()
as it seems not needed or whether it should be kept for consistency with other languages.Before this PR, I was not familiar with the pre-commit code base nor a very avid python user, hence sorry if there are obvious flaws in this PR.