Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in Python reticulate discovery within a project #11195

Closed
jgutman opened this issue May 12, 2022 · 14 comments
Closed

Regression in Python reticulate discovery within a project #11195

jgutman opened this issue May 12, 2022 · 14 comments
Assignees

Comments

@jgutman
Copy link
Contributor

jgutman commented May 12, 2022

Opening a new issue based on @salim-b 's report on previously closed #9990

I did not follow all the issues/PRs related to all the recent Python discovery changes. But I think I've just stumbled upon a serious regression:

Setup details

  • I use rstudio-2022.06.0-daily-394-amd64 on Ubuntu 22.04
  • I have set the environment variable WORKON_HOME=/home/salim/.by_pkg_mngr/virtualenvwrapper, then created the virtualenv r-reticulate (default name) using reticulate::virtualenv_create() and finally installed the Python package plotly in it using reticulate::virtualenv_install(packages = "plotly").
  • I did neither set RETICULATE_PYTHON nor RETICULATE_PYTHON_FALLBACK.

Problem description

If I open any RStudio project, trying to discover the proper python location fails:

> reticulate::py_discover_config(required_module = "plotly")
python:         /usr/bin/python3
libpython:      /usr/lib/python3.10/config-3.10-x86_64-linux-gnu/libpython3.10.so
pythonhome:     //usr://usr
version:        3.10.4 (main, Apr  2 2022, 09:04:19) [GCC 11.2.0]
numpy:          /usr/lib/python3/dist-packages/numpy
numpy_version:  1.21.5
plotly:         [NOT FOUND]

NOTE: Python version was forced by RETICULATE_PYTHON_FALLBACK

But when I open an RStudio session that is not associated with a project, my virtualenv r-reticulate is properly detected:

> reticulate::py_discover_config(required_module = "plotly")
python:         /home/salim/.by_pkg_mngr/virtualenvwrapper/r-reticulate/bin/python
libpython:      /usr/lib/python3.10/config-3.10-x86_64-linux-gnu/libpython3.10.so
pythonhome:     /home/salim/.by_pkg_mngr/virtualenvwrapper/r-reticulate:/home/salim/.by_pkg_mngr/virtualenvwrapper/r-reticulate
version:        3.10.4 (main, Apr  2 2022, 09:04:19) [GCC 11.2.0]
numpy:          /home/salim/.by_pkg_mngr/virtualenvwrapper/r-reticulate/lib/python3.10/site-packages/numpy
numpy_version:  1.22.3
plotly:         /home/salim/.by_pkg_mngr/virtualenvwrapper/r-reticulate/lib/python3.10/site-packages/plotly

python versions found: 
 /home/salim/.by_pkg_mngr/virtualenvwrapper/r-reticulate/bin/python
 /usr/bin/python3

It's also properly detected by running Rscript -e 'reticulate::py_discover_config(required_module = "plotly")' from a terminal.

This looks like a clear regression to me since the same reticulate setup on stable RStudio 2022.02.2+485 on Ubuntu 20.04 detects the virtualenv in both cases. My guess would be that this regression was introduced with #10518, thus pinging @kevinushey @jgutman

Originally posted by @salim-b in #9990 (comment)

@jgutman jgutman self-assigned this May 12, 2022
@jgutman
Copy link
Contributor Author

jgutman commented May 16, 2022

On additional review, I don't believe this to be a regression; it seems to be expected behavior, and I have provided the reporting user with multiple ways for them to achieve their desired behavior here.

@jgutman
Copy link
Contributor Author

jgutman commented May 16, 2022

Maybe we want to change the behavior of inferred python so we specifically check for virtualenv named r-reticulate/WORKON_HOME, and set RETICULATE_PYTHON_FALLBACK to that before selecting system python

@mikebessuille
Copy link
Contributor

The way we guess / set the version is probably worth a blog post

@t-kalinowski
Copy link
Member

t-kalinowski commented May 19, 2022

I just ran into this surprise, where reticulate binds to a different Python in the IDE vs the terminal (because the IDE sets RETICULATE_PYTHON_FALLBACK).

What is the reasoning behind setting RETICULATE_PYTHON_FALLBACK again? I think a short explanation of the design and motivations here would be great.

I'm eager also to hear if there are changes I can make in reticulate to simplify this, or align what we're doing in the IDE with what happens outside the IDE. Overall the "guess which python" reticulate behavior feels unpredictable, even without the IDE in the mix.

@jgutman
Copy link
Contributor Author

jgutman commented May 19, 2022

@kevinushey probably has more historical context around this situation, but here is my rough understanding.

Firstly, I believe that the RETICULATE_PYTHON_FALLBACK variable only gets set when a user is in an RStudio Project (not when outside of a project) and when the Global option Python > Automatically activate project-local Python environments is checked. Because we are trying to automatically activate an environment, if the user has not explicitly chosen their Python, we infer it.

We used to infer it and set this to RETICULATE_PYTHON, but this was bad because it would clobber the user's own preference, even if they tried to set the env using reticulate::use_virtualenv/reticulate::use_python or something like that. So we updated the behavior to set RETICULATE_PYTHON_FALLBACK instead, and the user can still uncheck the Automatically activate project-local Python environments setting if they desire.

That said I don't fully understand why and how we choose to infer a specific version of Python (I can point to where in the code that happens, but can't necessarily speak to design decisions around it).

I'm interested in more discussion around what would be the ideal user experience, and what you would expect to happen if a user selected "Automatically activate project-local Python environments" and hasn't specifically requested a particular python with reticulate::use_python or setting the project-specific Python interpreter in Project Options

@mikebessuille
Copy link
Contributor

I definitely agree this is worthy of some kind of article...

@kevinushey
Copy link
Contributor

kevinushey commented May 19, 2022

As I recall, the primary goal here was to ensure the version of Python available in an RStudio terminal, and the version of Python that reticulate would bind to (or is already bound to), are the same.

The RETICULATE_PYTHON_FALLBACK environment variable was an attempt to "guess" what the default version of Python to be used would be by reticulate, in the case that Python was not yet initialized.

The extra challenge here is that we wanted this all to succeed without forcing the reticulate package to be loaded, as we don't want the IDE to automatically load the reticulate package into the user's session.

Perhaps we should instead do something like:

  1. Check if reticulate is installed;
  2. If it is, launch a child R process and try to run reticulate::py_discover_config(),
  3. Use that version of Python as the "fallback".

The other slightly-related task is to ensure that the version of Python that RStudio has been configured to use (via global or project preferences) is also communicated to reticulate. That preference should take precedence over most of reticulate's auto-discovery machinery, but RETICULATE_PYTHON should still "win" if that's set (that should still act as an override of the preference).

There's also the added requirement that the version of Python used during the render of an R Markdown document (by reticulate) is the same as configured in the current session, as well as in deployments.

A lot of moving parts here!

@t-kalinowski
Copy link
Member

t-kalinowski commented May 19, 2022

Apologies, I don't fully understand: What exactly is checking the global option "Python > Automatically activate project-local Python environments" supposed to do?

Follow-up question: is that something reticulate should do by default? (…and we remove the option from the IDE? Or maybe help users set RETICULATE_PYTHON in a project local .Rprofile or .Renviron?)

@jgutman
Copy link
Contributor Author

jgutman commented Jun 14, 2022

Or maybe help users set RETICULATE_PYTHON in a project local .Rprofile or .Renviron?
I really don't think we want users to be setting RETICULATE_PYTHON unnecessarily, because then they won't be able to use any of the reticulate::use_python (e.g. reticulate::use_virtual_env) commands, which causes confusion for users.

What exactly is checking the global option "Python > Automatically activate project-local Python environments" supposed to do?
It's meant to activate the Python environment in the user's RStudio Terminal, so that if they're typing pip/python/ etc commands in the terminal, they align with what the user is getting in the RStudio R console, if they activate reticulate say via reticulate::repl_python or something like that. So no, this isn't something that can be handled by the reticulate package, because it's about aligning the behavior of the terminal hook to the behavior of the reticulate package

@t-kalinowski
Copy link
Member

t-kalinowski commented Jun 14, 2022

I really don't think we want users to be setting RETICULATE_PYTHON unnecessarily, because then they won't be able to use any of the reticulate::use_python (e.g. reticulate::use_virtual_env) commands, which causes confusion for users.

An alternative to setting RETICULATE_PYTHON in .Rprofile that still lets people call use_python() is to register a hook in the project .Rprofile, like this:

setHook(packageEvent("reticulate", "onLoad"),
        function(...) {
          try(reticulate::use_virtualenv("project-env", required = TRUE))
        })

It's meant to activate the Python environment in the user's RStudio Terminal

That makes perfect sense! Maybe the description should include the word "Terminal" to make that clearer. Or maybe the option should migrate to the "Terminal" options pane.

@kevinushey
Copy link
Contributor

For reference, we're doing something similar here:

# update default version of Python to be used when reticulate is loaded
.rs.registerPackageLoadHook("reticulate", function(...)
{
python <- .rs.readUiPref("python_path")
.rs.reticulate.usePython(python)
if (packageVersion("reticulate") >= "1.23")
{
.rs.addFunction("reticulate.describeObjectLength", function(object)
{
reticulate::py_len(object, -1L)
})
}
})

@jgutman
Copy link
Contributor Author

jgutman commented Aug 2, 2022

This occurs simply because RStudio tries to infer which version of Python the user wishes to use when the user has not explicitly specified a Python interpreter. The user can easily specify that they wish to use a virtual env Python by running reticulate::use_virtualenv() prior to running reticulate::py_discover_config(required_module = "plotly"), in the user reprex provided above, in which case the correct python interpreter is found. Users can also specify their preferred Python interpreter in either Global Options or Project Options. To resolve this issue without requiring the user to call reticulate::use_virtualenv() or otherwise specify that they want to use Python from their default virtual env, we'd have to assume that every user always wants the Python returned by reticulate:::virtualenv_path() rather than a global python, which does not seem desirable, especially when users can ensure they get the right Python simply by calling reticulate::use_virtualenv() in their project, or setting the interpreter in options

@jgutman jgutman closed this as completed Aug 2, 2022
@t-kalinowski

This comment was marked as resolved.

@t-kalinowski
Copy link
Member

Sorry for the previous message. User error--I had a stray .Renviron I forgot about 😊.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants