# How to check and install scientific Python packages

 * Creator(s): Chris Slocum
 * Affiliation: NOAA Center for Satellite Applications and Research
 * History:
     * 2 July 2024 — initial version

---


## Overview

Physical science notebooks require using scientifically oriented software packages. Some interactive computational environments (e.g., Google Colab, Amazon Web Services SageMaker) offer some functionality with a default configuration. However, these defaults lack more specific packages. This notebook outlines one method for checking and installing scientific Python programming language packages into the interactive computational environment.

### Prerequisite

To successfully complete this notebook, a basic understanding of the following is helpful, but not necessary.
* Python programming standard library

### Learning Outcome

* Installing packages in multiple interactive computational environments

---

## Options for checking and installing packages

There are several ways to check and install packages in a Jupyter notebook:
1. Leveraging interactive computational environment specific tools
2. Building an independent environment (e.g., Jupyter Binder)
3. Applying notebook escape character or magic command to use `pip` (i.e., `!pip` or `%pip`)
4. Using lower-level standard library packages and `pip`

The first two potentially lock the notebook to a specific interactive computational environment. Some notebook explorers may want to export the notebooks to scripts (e.g., Python `.py` files). The third item is disabled in some computational environments or either the escape character or magic command works while the other does not.

While the most technical, the fourth option appears to work across the interactive computational environments commonly used for deploying Jupyter notebook material. Option four also allows notebook users to export the notebook to a script. This option might be useful if someone wants to further develop an ideal and run it as a script using a service like the cron job scheduler.

## Using lower-level standard library Python packages
The pip package manager is a package management system that allows users to install other software that does not ship with the Python standard library. The Python Software Foundation that oversees development and maintenance of the Python language recommends using pip as the primary Python package manager. Because of this, most installations of Python ship with pip and most interactive computational environments maintain support. Often, developers of these packages list the material on the Python package index [pypi.org](pypi.org).

1. First, you want to import the package if it exists. Some environments ship with [NumPy](numpy.org) installed. If that is the case, you want to import and skip installing.
2. To use pip from a Python script, you will need to call the system. Not knowing how the system is configured and which instance of Python runs the Jupyter notebooks, you will want to use the version associated with the script. Here, we do this by using `sys.executable`.
3. To call pip, again, we will avoid using a direct system call because we do not know which pip is associated with our notebook. Instead, we use the `-m <module-name>` command line flag, which calls searches the `sys.path` associated with the script for the named module (i.e., pip) and execute its contents as the __main__ module.
4. In making a system call, we use `subprocess.checkall`. The subprocess package will cast the command into the appropriate structure for the host operating system and perform a check prior to running.

Below is the snippet the implements the suggestions above.

In [None]:
# sys provides access to system level information (i.e., the executable for the Python installation)
import sys
# subprocess makes system level calls
import subprocess

# PACKAGES is a list of non-standard library, scientific packages to install.
# Note that packages should be pip installable
PACKAGES = ["netCDF4"]

# Loop through each package to either import or install
for package in PACKAGES:
    try:
        # First, attempt to import the package
        __import__(package)
    except ImportError:
        # If package import is unsuccessful, install using pip
        # The command structure is <python executable> -m pip install <package>
        subprocess.check_call([sys.executable, '-m', 'pip', 'install', package])

## References
* Python, 2023: The Python Standard Library. Accessed 4 April 2023, [https://docs.python.org/3/library/index.html](https://docs.python.org/3/library/index.html).

## Metadata
* Language / package(s): Python / Markdown
* Domain: general, training
* Application keywords: notebooks
* Geophysical keywords: N/A
* AI keywords: N/A

## Disclaimer
This Jupyter notebook is a scientific product and is not official communication of the National Oceanic and Atmospheric Administration, or the United States Department of Commerce. All NOAA Jupyter notebooks are provided on an 'as is' basis and the user assumes responsibility for its use. Any claims against the Department of Commerce or Department of Commerce bureaus stemming from the use of this Jupyter notebook will be governed by all applicable Federal law. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise does not constitute or imply their endorsement, recommendation or favoring by the Department of Commerce. The Department of Commerce seal and logo, or the seal and logo of a DOC bureau, shall not be used in any manner to imply endorsement of any commercial product or activity by DOC or the United States Government.