Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplifying Python installation scripts and rename install_binder.sh to install_jupyter.sh #494

Merged
merged 16 commits into from
Jul 7, 2022

Conversation

eitsupi
Copy link
Member

@eitsupi eitsupi commented Jun 25, 2022

close #443, close #484, close #495

Remove pyenv and venv related settings and unnecessary lines from Python related scripts and Dockerfiles.

Motivation

As for why install_python.sh uses pyenv, the rocker/ml description says:

Images also configure pipenv with pyenv by default. This makes it very easy to manage projects that require specific versions of Python as well as specific python modules. For instance, a project using the popular greta package for GPU-accelerated Bayesian inference needs Tensorflow 1.x, which requires Python <= 3.7, might do:

However, the greta package recommends setting up a Python environment using conda as follows.

To assist with installing these Python packages, greta provides an installation helper, install_greta_deps(), which installs the exact pythons package versions needed. It also places these inside a “greta-env” conda environment. This isolates these exact python modules from other python installations, so that only greta will see them. This helps avoids installation issues, where previously you might update tensorflow on your computer and overwrite the current version needed by greta. Using this “greta-env” conda environment means installing other python packages should not be impact the Python packages needed by greta.

https://greta-stats.org/articles/get_started.html

I think the current Python configuration is overly complex.
And instead of trying to cover a wide variety of Python setups with defaults (maybe not possible), I think default settings should be kept to a minimum and left to the user to configure.

As far as I know, some users use Docker to escape the complexity of Python configuration (i.e., they don't want to bother with venv and pyenv even in a Docker container).

ToDo

@eitsupi eitsupi requested a review from cboettig June 25, 2022 14:58
@eitsupi eitsupi marked this pull request as ready for review June 25, 2022 14:58
@cboettig
Copy link
Member

Thanks, I agree with the motivation here, but need a bit more time to review and maybe anticipate if there's any particularly adverse consequences.

In general it makes sense to allow other tools (like reticulate) to handle the job of providing a nice user interface for dealing with the python environment, rather than rolling our own solutions. I agree the old scripts did too much -- it would be better to have good documentation showing how to deal with different situations rather than complex pre-configuration.

I'm not sure pyenv was ever a great choice, but as these scripts show it's not all that straight forward to get pyenv+pipenv set up so that a user can easily switch between different versions of python. The route reticulate uses via miniconda installation of python is definitely cleaner and more streamlined. However, I don't fully appreciate the differences between conda and pypi or where the community primarily falls there -- last I looked conda-based installs were a bit more of a 'walled garden' where some python modules or versions might not be available, so I have so far tried to provide a pure pip / virtualenv setup that uses the system python by default. That may have been a poor choice, or something better left to documentation, but may merit more discussion.

Meanwhile, I support the changes here, and I think we can add more documentation and maybe helper scripts where appropriate to deal with the many edge cases (including those annoying BLAS issues).

thanks for the great work here!

@eitsupi
Copy link
Member Author

eitsupi commented Jun 27, 2022

I'm not sure pyenv was ever a great choice, but as these scripts show it's not all that straight forward to get pyenv+pipenv set up so that a user can easily switch between different versions of python.

In my opinion, users who need such complex configurations will have to learn how to configure conda (mamba), pyenv, pipenv or poetry themselves anyway.

I think the configuration would be far simpler if the final user account installed pyenv, etc. as needed, as installing these tools by the root user and share them is definitely complicating things here.

Copy link
Member

@cboettig cboettig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, finished going through this. Much more streamlined, let's merge and we can flush out the documentation (or point to better sources for doing so) down the road.

My one on-the-fence comment is whether we should also just go ahead and update-alternatives to standard BLAS to avoid the reticulate-crash on numpy, or if that should just go in docs?

@eitsupi
Copy link
Member Author

eitsupi commented Jun 28, 2022

Thanks for checking this.
Next we need to update this page. https://www.rocker-project.org/images/versioned/cuda.html

My one on-the-fence comment is whether we should also just go ahead and update-alternatives to standard BLAS to avoid the reticulate-crash on numpy, or if that should just go in docs?

I do not fully understand how numpy crashes can be reproduced.
If this happens frequently, it would be worth changing the default BLAS.

@cboettig
Copy link
Member

Re BLAS, there's a user-reported reprex here: rstudio/reticulate#1133

@eitsupi
Copy link
Member Author

eitsupi commented Jun 30, 2022

Thank you for letting me know that.
I was able to confirm that switching BLAS with the following command will work around this issue.
Perhaps this can be noted in the documentation as a precaution.

export ARCH=$(uname -m)
update-alternatives --set "libblas.so.3-${ARCH}-linux-gnu" "/usr/lib/${ARCH}-linux-gnu/blas/libblas.so.3"

@cboettig
Copy link
Member

Thanks @eitsupi . Do you think it would be too aggressive to set the alternatives by default, with documentation stating why we do this and how to switch it back on?

I think it kinda makes sense that the default behavior "just works" and the docs cover the optional configuration that may improve performance. Particularly given that the segfault in this case gives very little information that would point a user to BLAS (though that might change if reticulate starts to check for this conflict).

(ps I think the notes should also include lapack as well as blas alternatives).

@eitsupi
Copy link
Member Author

eitsupi commented Jul 1, 2022

Since rocker/cuda is an image that has reticulate installed from the beginning, it would make sense to switch the default BLAS.
Do you think it would be a good idea to add lines to install_python.sh in this PR to switch BLAS?

@cboettig
Copy link
Member

cboettig commented Jul 1, 2022

yup, I think it would be good to add that in install_python.sh

I don't think this issue impacts jupyter/ binder users, but yeah anywhere reticulate it used, switching back to default BLAS/LAPACK is probably the best default in my opinion.

(I don't really understand why the numpy libraries can't be pluggable to BLAS in this regard; I've never encountered this issue with R package binaries crashing because they were built on a system with a different BLAS, but that's above my scope! Dirk understands these things much better)

@eitsupi
Copy link
Member Author

eitsupi commented Jul 2, 2022

I added lines for switching BLAS in f4c2347

@cboettig
Copy link
Member

cboettig commented Jul 6, 2022

Awesome, thanks. good to merge I think! 🎉

@cboettig cboettig merged commit 6448ff8 into rocker-org:master Jul 7, 2022
@eitsupi eitsupi deleted the remove-pyenv branch July 7, 2022 22:42
@eitsupi eitsupi added enhancement New feature or request CI rocker scripts Related to rocker scripts pre-built images Related to pre-built images labels Aug 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI enhancement New feature or request pre-built images Related to pre-built images rocker scripts Related to rocker scripts
Projects
None yet
2 participants