# HPC intro

## Using Python

Python is of course a very useful programming language for data processing, analysis and visualization.

There are many tutorials and courses that will teach you Python, so that is not the scope of this tutorial.  Here you will learn how to run Python on our HPC infrastructure, assuming you are already familiar with the language and packages.

### Python scripts

Given that Python scripts are simple text files, you can create or modify them using your favorite editor.  You can do this for instance on the infrastructure using `nano`, or on your own system and transfer the finished script or module to the HPC system.

To build up gradually, you can start with a very simple script that takes a string as a command line argument, and prints a greeting to standard output.  Your script is stored in a file `hello.py` which could look like this.

```
#!/usr/bin/env python

import argparse


arg_parser = argparse.ArgumentParser(description='say hello')
arg_parser.add_argument('name', help='who to say hello to')
options = arg_parser.parse_args()

print('Hello ' + options.name + '!')
```

The only module used in this script, `argparse` is in Python's standard library, and the script has been written in such a way that it will work with any version of Python.  In practice, use f-strings, and a recent version of Python.

### Running simple scripts

You can run it by giving it as a command line argument to the Python interpreter.

In [3]:
python hello.py there

Hello there!


You can of course easily check which version of Python is used to run your script, as well as where it is installed on the system.

In [4]:
python --version

Python 3.6.8


In [5]:
which python

/usr/bin/python


However, often the version of the Python interpreter that comes with the operating system is not the one you would like to use, or you want to use Python packages that are not installed, so what can you do?

Here, we will assume that you have a (fairly simple) script that computes a function for an array of floating point values, and that writes a line plot that shows the results to a file.

```
#!/usr/bin/env python

import matplotlib.pyplot as plt
import numpy as np


x = np.linspace(-2*np.pi, 2*np.pi, 501)
y = np.sin(x)

plt.plot(x, y, '-')
plt.savefig('sin.png')
```

This script requires both the numpy and matplotlib packages, running it with the default Python interpreter is not going to be a big success.

In [7]:
python sin_plot.py

Traceback (most recent call last):
  File "sin_plot.py", line 3, in <module>
    import numpy as np
ModuleNotFoundError: No module named 'numpy'


: 1

As it happens, there are quite a number of options,

  1. install packages in your home directory's `.local` directory using `pip`;
     * advantages: fairly straightforward
     * disadvantages: sure to create a dependency mess later on, performance is likely to be an issue
     * conclusion: *please don't*
  1. use the module system and Python versions and packages installed by your system administrator;
     * advantages: typically excellent performance
     * disadvantages: since system administrators really can't install any and all Python packages, you
       may have to install some packages yourself anyway
     * conclusion: perfect if you have no requirements beyond the packages that are available
  1. use a package manager such as [miniconda](https://docs.conda.io/en/latest/miniconda.html)
     or [mamba](https://github.com/mamba-org/mamba)
     * advantages: you have full control over the versions of Python and all packages
     * disadvantages: unless you know what you are doing, performance may be an issue
     * conclusion: way to go if you know what you are doing
  1. use [apptainer](https://apptainer.org/) or [podman](https://podman.io/) containers
     * advantages: if you know what you are doing, you can create a reproducible environment that is
       portable across systems
     * disadvantages: more involved than the other approaches, with considerable pitfalls
     * conclusion: not for the faint of heart
    
Given that the first option is not recommended at all, and the fourth goes beyond the scope of this tutorial, you will learn how to

  * use the module system
  * install and use miniconda

### Software module system

An HPC system is almost by definition

In [8]:
module av Python


-------------------- /apps/leuven/icelake/2021a/modules/all --------------------
   Bazaar/2.7.0-GCCcore-10.3.0-Python-2.7.18
   Boost.Python/1.76.0-GCC-10.3.0
   CGAL/4.11.1-foss-2021a-Python-3.9.5
   DOLFIN/2019.1.0.post0-foss-2021a-Python-3.9.5-SuperLU_DIST
   DOLFIN/2019.1.0.post0-foss-2021a-Python-3.9.5              (D)
   FFC/2018.1.0-foss-2021a-Python-3.9.5
   FFC/2019.1.0.post0-foss-2021a-Python-3.9.5                 (D)
   FIAT/2018.1.0-foss-2021a-Python-3.9.5
   FIAT/2019.1.0-foss-2021a-Python-3.9.5                      (D)
   PLY/3.11-foss-2021a-Python-3.9.5
   Python/2.7.18-GCCcore-10.3.0-bare
   Python/3.9.5-GCCcore-10.3.0-bare
   Python/3.9.5-GCCcore-10.3.0
   Python/3.10.8-GCCcore-10.3.0-bare                          (D)
   UFL/2018.1.0-foss-2021a-Python-3.9.5
   UFL/2019.1.0-foss-2021a-Python-3.9.5                       (D)
   dijitso/2019.1.0-foss-2021a-Python-3.9.5
   mshr/2019.1.0-foss-2021a-Python-3.9.5
   pkgconfig/1.5.4-GCCcore-10.3.0-python
   protobuf-python/3.

In [9]:
module load Python/3.10.8-GCCcore-10.3.0-bare

In [10]:
python --version

Python 3.10.8


In [11]:
which python

/apps/leuven/icelake/2021a/software/Python/3.10.8-GCCcore-10.3.0-bare/bin/python


In [12]:
python sin_plot.py

Traceback (most recent call last):
  File "/vsc-hard-mounts/leuven-data/300/vsc30032/Personas-material/tutorials/basics/sin_plot.py", line 3, in <module>
    import numpy as np
ModuleNotFoundError: No module named 'numpy'


: 1

In [6]:
module spider scipy


----------------------------------------------------------------------------
  SciPy-bundle:
----------------------------------------------------------------------------
    Description:
      Bundle of Python packages for scientific software

     Versions:
        SciPy-bundle/2021.05-foss-2021a
        SciPy-bundle/2021.05-intel-2021a

----------------------------------------------------------------------------
  For detailed information about a specific "SciPy-bundle" package (including how to load the modules) use the module's full name.
  Note that names that have a trailing (E) are extensions provided by other modules.
  For example:

     $ module spider SciPy-bundle/2021.05-intel-2021a
----------------------------------------------------------------------------

 



In [1]:
module load SciPy-bundle/2021.05-foss-2021a

In [2]:
python sin_plot.py

Traceback (most recent call last):
  File "/vsc-hard-mounts/leuven-data/300/vsc30032/Personas-material/tutorials/basics/sin_plot.py", line 4, in <module>
    import matplotlib.pyplot as plt
ModuleNotFoundError: No module named 'matplotlib'


: 1

In [3]:
module av matplotlib


-------------------- /apps/leuven/icelake/2021a/modules/all --------------------
   matplotlib/3.4.2-foss-2021a    matplotlib/3.4.2-intel-2021a (D)

  Where:
   D:  Default Module

If the avail list is too long consider trying:

"module --default avail" or "ml -d av" to just list the default modules.
"module overview" or "ml ov" to display the number of modules for each name.

Use "module spider" to find all possible modules and extensions.
Use "module keyword key1 key2 ..." to search for all possible modules matching
any of the "keys".




In [4]:
module load matplotlib/3.4.2-foss-2021a

In [5]:
python sin_plot.py

### Package manager: miniconda

Although using the module system guarantees that you will use a version of Python and packages that give you good performance, this approach may not be flexible enough for you.  You may want to use other versions of Python or packages than provided through the module system, or even use packages that are not provided at all.

Of course, you can ask the helpdesk to install them for you, but typically this is only done for packages that are used fairly frequently.

Using a package manager such as miniconda can help you with this issue.  Moreover, using conda environments helps you manage your dependencies and keep them sane.  With respect to reproducable computations, they are a great help as well since you can freeze an environment for a particular project and be sure that it will run with the identical software stack at a later stage.

#### Installing miniconda

The first step is to download the miniconda installer script, and that is easy to do on the cluster itself using `wget`, a command line tool for downloading files from the web (and much more, but that is outside the scope of this tutorial).

In [1]:
 wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

--2023-07-06 12:28:56--  https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
Resolving repo.anaconda.com (repo.anaconda.com)... 2606:4700::6810:8203, 2606:4700::6810:8303, 104.16.130.3, ...
Connecting to repo.anaconda.com (repo.anaconda.com)|2606:4700::6810:8203|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 73134376 (70M) [application/x-sh]
Saving to: ‘Miniconda3-latest-Linux-x86_64.sh’


2023-07-06 12:28:56 (226 MB/s) - ‘Miniconda3-latest-Linux-x86_64.sh’ saved [73134376/73134376]



You can verify that the installer was downloaded, it is a shell script with the `.sh` extension.

In [2]:
ls

001_artefacts			 002_running_python.ipynb	    README.md
001_files_and_directories.ipynb  hello.py			    sin_plot.py
002_artefacts			 Miniconda3-latest-Linux-x86_64.sh  sin.png


It is important to install miniconda in your data directory since this directory will also contain all your environments, and this can easily run into the gigabytes of storage after a short while.  Thiss would exceed the quota of your home directory.  You can specify the directory where you want to install using the `-p` option.

In [6]:
bash Miniconda3-latest-Linux-x86_64.sh -b -p $VSC_DATA/miniconda3

PREFIX=/data/leuven/300/vsc30032/miniconda3
Unpacking payload ...
                                                                                
Installing base environment...


Downloading and Extracting Packages


Downloading and Extracting Packages

Preparing transaction: done
Executing transaction: done
installation finished.


You can make miniconda more convenient to use by adding some configuration information to the files that control your settings.  This is easy using the following command.

In [7]:
$VSC_DATA/miniconda3/bin/conda init

no change     /data/leuven/300/vsc30032/miniconda3/condabin/conda
no change     /data/leuven/300/vsc30032/miniconda3/bin/conda
no change     /data/leuven/300/vsc30032/miniconda3/bin/conda-env
no change     /data/leuven/300/vsc30032/miniconda3/bin/activate
no change     /data/leuven/300/vsc30032/miniconda3/bin/deactivate
no change     /data/leuven/300/vsc30032/miniconda3/etc/profile.d/conda.sh
no change     /data/leuven/300/vsc30032/miniconda3/etc/fish/conf.d/conda.fish
no change     /data/leuven/300/vsc30032/miniconda3/shell/condabin/Conda.psm1
no change     /data/leuven/300/vsc30032/miniconda3/shell/condabin/conda-hook.ps1
no change     /data/leuven/300/vsc30032/miniconda3/lib/python3.10/site-packages/xontrib/conda.xsh
no change     /data/leuven/300/vsc30032/miniconda3/etc/profile.d/conda.csh
no change     /user/leuven/300/vsc30032/.bashrc
No action taken.


To make these new settings active for this notebook, you should reload your `.bashrc` file by sourcing it.

In [8]:
source ~/.bashrc

(base) 


: 1

Clearly, you have to do this only once.  Now you are ready to use `conda` conveniently and create your first environment.

#### Create an environment

To create a new environment, you have to specify a name, e.g., `tutorial` for this example, and a list of packagesyou would like to include, e.g., `numpy`.  Of course, you also want matplotlib, but for the sake of this tutorial, you'll do that later so that you also know how to install new packages in an existing environment.

In [2]:
conda create -y -q -n tutorial numpy

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /data/leuven/300/vsc30032/miniconda3/envs/tutorial

  added / updated specs:
    - numpy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    blas-1.0                   |              mkl           6 KB
    ca-certificates-2023.05.30 |       h06a4308_0         120 KB
    intel-openmp-2023.1.0      |   hdb19cb5_46305        17.1 MB
    libffi-3.4.4               |       h6a678d5_0         142 KB
    mkl-2023.1.0               |   h6d00ec8_46342       171.5 MB
    mkl-service-2.4.0          |  py311h5eee18b_1          54 KB
    mkl_fft-1.3.6              |  py311ha02d727_1         217 KB
    mkl_random-1.2.2           |  py311ha02d727_1         291 KB
    numpy-1.25.0               |  py311h08b1b3b_0          12 KB
    numpy-base-1.25.0  

#### Activating an environment

To use an environment, you have to activate it.  You can do this as follows.

In [3]:
conda activate tutorial

(tutorial) 


: 1

When you are done, you can deactivate the currently active environment very easily.

In [5]:
conda deactivate

(base) 


: 1

#### Installing packages

You still need to install matplotlib.  Since you can only install packages in an active environment, make sure that the one you want to install in is active.

In [1]:
conda activate tutorial

(tutorial) 


: 1

To install matplotlib, you can use `conda install`.

In [4]:
conda install -y -q matplotlib

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /data/leuven/300/vsc30032/miniconda3/envs/tutorial

  added / updated specs:
    - matplotlib


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    brotli-1.0.9               |       h5eee18b_7          18 KB
    brotli-bin-1.0.9           |       h5eee18b_7          19 KB
    contourpy-1.0.5            |  py311hdb19cb5_0         212 KB
    cycler-0.11.0              |     pyhd3eb1b0_0          12 KB
    dbus-1.13.18               |       hb2f20db_0         504 KB
    expat-2.4.9                |       h6a678d5_0         156 KB
    fontconfig-2.14.1          |       h52c9d5c_1         281 KB
    fonttools-4.25.0           |     pyhd3eb1b0_0         632 KB
    freetype-2.12.1            |       h4a9f257_0         626 KB
    giflib-5.2.1  

: 1

Note that you can install multiple packages by simply listing them, e.g., `conda install pandas seaborn`.

Now you can run `sin_plot.py` in your new `tutorial` environment.

In [4]:
python sin_plot.py

(tutorial) 


: 1

In [5]:
ls

001_artefacts			 002_running_python.ipynb	    README.md
001_files_and_directories.ipynb  hello.py			    sin_plot.py
002_artefacts			 Miniconda3-latest-Linux-x86_64.sh  sin.png
(tutorial) 


: 1

As you can see, 

## Summary

## Where to go from here?