# HPC intro

## Using Python

Python is of course a very useful programming language for data processing, analysis and visualization.

There are many tutorials and courses that will teach you Python, so that is not the scope of this tutorial.  Here you will learn how to run Python on our HPC infrastructure, assuming you are already familiar with the language and packages.

### Python scripts

Given that Python scripts are simple text files, you can create or modify them using your favorite editor.  You can do this for instance on the infrastructure using `nano`, or on your own system and transfer the finished script or module to the HPC system.

To build up gradually, you can start with a very simple script that takes a string as a command line argument, and prints a greeting to standard output.  Your script is stored in a file `hello.py` which could look like this.

```
#!/usr/bin/env python

import argparse


arg_parser = argparse.ArgumentParser(description='say hello')
arg_parser.add_argument('name', help='who to say hello to')
options = arg_parser.parse_args()

print('Hello ' + options.name + '!')
```

The only module used in this script, `argparse` is in Python's standard library, and the script has been written in such a way that it will work with any version of Python.  In practice, use f-strings, and a recent version of Python.

### Running simple scripts

You can run it by giving it as a command line argument to the Python interpreter.

In [3]:
python hello.py there

Hello there!


You can of course easily check which version of Python is used to run your script, as well as where it is installed on the system.

In [4]:
python --version

Python 3.6.8


In [5]:
which python

/usr/bin/python


However, often the version of the Python interpreter that comes with the operating system is not the one you would like to use, or you want to use Python packages that are not installed, so what can you do?

Here, we will assume that you have a (fairly simple) script that computes a function for an array of floating point values, and that writes a line plot that shows the results to a file.

```
#!/usr/bin/env python

import matplotlib.pyplot as plt
import numpy as np


x = np.linspace(-2*np.pi, 2*np.pi, 501)
y = np.sin(x)

plt.plot(x, y, '-')
plt.savefig('sin.png')
```

This script requires both the numpy and matplotlib packages, running it with the default Python interpreter is not going to be a big success.

In [7]:
python sin_plot.py

Traceback (most recent call last):
  File "sin_plot.py", line 3, in <module>
    import numpy as np
ModuleNotFoundError: No module named 'numpy'


: 1

As it happens, there are quite a number of options,

  1. install packages in your home directory's `.local` directory using `pip`;
     * advantages: fairly straightforward
     * disadvantages: sure to create a dependency mess later on, performance is likely to be an issue
     * conclusion: *please don't*
  1. use the module system and Python versions and packages installed by your system administrator;
     * advantages: typically excellent performance
     * disadvantages: since system administrators really can't install any and all Python packages, you
       may have to install some packages yourself anyway
     * conclusion: perfect if you have no requirements beyond the packages that are available
  1. use a package manager such as [miniconda](https://docs.conda.io/en/latest/miniconda.html)
     or [mamba](https://github.com/mamba-org/mamba)
     * advantages: you have full control over the versions of Python and all packages
     * disadvantages: unless you know what you are doing, performance may be an issue
     * conclusion: way to go if you know what you are doing
  1. use [apptainer](https://apptainer.org/) or [podman](https://podman.io/) containers
     * advantages: if you know what you are doing, you can create a reproducible environment that is
       portable across systems
     * disadvantages: more involved than the other approaches, with considerable pitfalls
     * conclusion: not for the faint of heart
    
Given that the first option is not recommended at all, and the fourth goes beyond the scope of this tutorial, you will learn how to

  * use the module system
  * install and use miniconda

### Software module system

An HPC system is almost by definition

In [8]:
module av Python


-------------------- /apps/leuven/icelake/2021a/modules/all --------------------
   Bazaar/2.7.0-GCCcore-10.3.0-Python-2.7.18
   Boost.Python/1.76.0-GCC-10.3.0
   CGAL/4.11.1-foss-2021a-Python-3.9.5
   DOLFIN/2019.1.0.post0-foss-2021a-Python-3.9.5-SuperLU_DIST
   DOLFIN/2019.1.0.post0-foss-2021a-Python-3.9.5              (D)
   FFC/2018.1.0-foss-2021a-Python-3.9.5
   FFC/2019.1.0.post0-foss-2021a-Python-3.9.5                 (D)
   FIAT/2018.1.0-foss-2021a-Python-3.9.5
   FIAT/2019.1.0-foss-2021a-Python-3.9.5                      (D)
   PLY/3.11-foss-2021a-Python-3.9.5
   Python/2.7.18-GCCcore-10.3.0-bare
   Python/3.9.5-GCCcore-10.3.0-bare
   Python/3.9.5-GCCcore-10.3.0
   Python/3.10.8-GCCcore-10.3.0-bare                          (D)
   UFL/2018.1.0-foss-2021a-Python-3.9.5
   UFL/2019.1.0-foss-2021a-Python-3.9.5                       (D)
   dijitso/2019.1.0-foss-2021a-Python-3.9.5
   mshr/2019.1.0-foss-2021a-Python-3.9.5
   pkgconfig/1.5.4-GCCcore-10.3.0-python
   protobuf-python/3.

In [9]:
module load Python/3.10.8-GCCcore-10.3.0-bare

In [10]:
python --version

Python 3.10.8


In [11]:
which python

/apps/leuven/icelake/2021a/software/Python/3.10.8-GCCcore-10.3.0-bare/bin/python


In [12]:
python sin_plot.py

Traceback (most recent call last):
  File "/vsc-hard-mounts/leuven-data/300/vsc30032/Personas-material/tutorials/basics/sin_plot.py", line 3, in <module>
    import numpy as np
ModuleNotFoundError: No module named 'numpy'


: 1

In [6]:
module spider scipy


----------------------------------------------------------------------------
  SciPy-bundle:
----------------------------------------------------------------------------
    Description:
      Bundle of Python packages for scientific software

     Versions:
        SciPy-bundle/2021.05-foss-2021a
        SciPy-bundle/2021.05-intel-2021a

----------------------------------------------------------------------------
  For detailed information about a specific "SciPy-bundle" package (including how to load the modules) use the module's full name.
  Note that names that have a trailing (E) are extensions provided by other modules.
  For example:

     $ module spider SciPy-bundle/2021.05-intel-2021a
----------------------------------------------------------------------------

 



In [1]:
module load SciPy-bundle/2021.05-foss-2021a

In [2]:
python sin_plot.py

Traceback (most recent call last):
  File "/vsc-hard-mounts/leuven-data/300/vsc30032/Personas-material/tutorials/basics/sin_plot.py", line 4, in <module>
    import matplotlib.pyplot as plt
ModuleNotFoundError: No module named 'matplotlib'


: 1

In [3]:
module av matplotlib


-------------------- /apps/leuven/icelake/2021a/modules/all --------------------
   matplotlib/3.4.2-foss-2021a    matplotlib/3.4.2-intel-2021a (D)

  Where:
   D:  Default Module

If the avail list is too long consider trying:

"module --default avail" or "ml -d av" to just list the default modules.
"module overview" or "ml ov" to display the number of modules for each name.

Use "module spider" to find all possible modules and extensions.
Use "module keyword key1 key2 ..." to search for all possible modules matching
any of the "keys".




In [4]:
module load matplotlib/3.4.2-foss-2021a

In [5]:
python sin_plot.py

### Package manager: miniconda

## Summary

## Where to go from here?