# HPC intro

## Using R

R is of course a very useful programming language for data processing, analysis and visualization.

There are many tutorials and courses that will teach you R, so that is not the scope of this tutorial.  Here you will learn how to run R scripts on our HPC infrastructure, assuming you are already familiar with the language and packages.

### R scripts

You may be used to executing R commands line by line in RStudio, if so, forget about that approach.  It is (perhaps, probably not) fine while you are developing a new R script, but that approach is inherently interactive, and hence doesn't lend itself well to HPC systems, where most scripts are executed in batch mode, i.e., without any user interaction.  You will need to think in terms of R scripts that are completely executed from top to bottom.

Given that R scripts are simple text files, you can create or modify them using your favorite editor.  You can do this for instance on the infrastructure using `nano`, or on your own system and transfer the finished script or library to the HPC system.

To build up gradually, you can start with a very simple script that takes a string as a command line argument and prints a greeting to standard output.  Your script is stored in a file `hello.R` which could look like this.

```
#!/usr/bin/env Rscript

args = commandArgs(trailingOnly=TRUE)

if (length(args) != 1) {
        stop('a name is required')
}
cat('Hello ', args[1], '!', sep='``hon.

This is of course a very simple R script, but it shows you how to handle command line arguments.  The vector `args` contains string values, so if you want to pass numerical arguments via the command line, you would have to convert these strings to numerical values, e.g.,
```
x <- as.double(args[2])
n <- as.integer(args[3])
```

### Running simple scripts

An HPC system is almost by definition a multi-tenant system.  The users on such a system have specific requirements with respect to the software they want to use.  For instance, some may want to work with a certain version of Python, while others prefer a newer one.

To deal with this, most HPC system use a module system that allows to easily pick the software and its specific version you want to use.  There is just a single command to interact with the software stack: `module`.  It has several subcommand that you will learn about belo.

#### Available software

In order to get a list of the software that is available through the module system, you can use the `module available` command.  That will list all the software packages that you can use on the system.

Since this list is huge, you can be a bit more specific by providing (part of) the name of the software package you are looking for.  Note that this is case-sensitive.  Since you only want to see the available  versions of R, you can use the `-r` option to specify a regular expression as a search pattern.  The regular expression is `'^R/'`, which means that you select software packages that start (`^`) with capital R, and that are followed by a `'/`'` which separates the package name from its version.

In [8]:
module -r available '^R/'


-------------------- /apps/leuven/icelake/2021a/modules/all --------------------
   Bazaar/2.7.0-GCCcore-10.3.0-Python-2.7.18
   Boost.Python/1.76.0-GCC-10.3.0
   CGAL/4.11.1-foss-2021a-Python-3.9.5
   DOLFIN/2019.1.0.post0-foss-2021a-Python-3.9.5-SuperLU_DIST
   DOLFIN/2019.1.0.post0-foss-2021a-Python-3.9.5              (D)
   FFC/2018.1.0-foss-2021a-Python-3.9.5
   FFC/2019.1.0.post0-foss-2021a-Python-3.9.5                 (D)
   FIAT/2018.1.0-foss-2021a-Python-3.9.5
   FIAT/2019.1.0-foss-2021a-Python-3.9.5                      (D)
   PLY/3.11-foss-2021a-Python-3.9.5
   Python/2.7.18-GCCcore-10.3.0-bare
   Python/3.9.5-GCCcore-10.3.0-bare
   Python/3.9.5-GCCcore-10.3.0
   Python/3.10.8-GCCcore-10.3.0-bare                          (D)
   UFL/2018.1.0-foss-2021a-Python-3.9.5
   UFL/2019.1.0-foss-2021a-Python-3.9.5                       (D)
   dijitso/2019.1.0-foss-2021a-Python-3.9.5
   mshr/2019.1.0-foss-2021a-Python-3.9.5
   pkgconfig/1.5.4-GCCcore-10.3.0-python
   protobuf-python/3.

To run R, the `R/4.2.1-GCC-10.3.0` module sounds promising.  The name may seem a bit cryptic, but once you understand the pattern, it is easy to interpret.

The name of a module consists of several parts that provide useful information:
  * `R` is the name of the software package;
  * `4.2.1` is the version of that package, i.e., of the R distribution;
  * `GCC-10.3.0` tells you that this R distribution has been compiled using the GCC compiler suite, version 10.3.0.

#### Using a software package

To use a software package, you simply load the corresponding module.

In [9]:
module load R/4.2.1-GCC-10.3.0

You can verify that you now use the Python interpreter you expect by checking version and location of the `python` executable.

In [10]:
R --version

Python 3.10.8


In [11]:
which R

/apps/leuven/icelake/2021a/software/Python/3.10.8-GCCcore-10.3.0-bare/bin/python


Now you can run your script using this version of Python.

In [12]:
Rscript hello.R "wonderful world of HPC"

Traceback (most recent call last):
  File "/vsc-hard-mounts/leuven-data/300/vsc30032/Personas-material/tutorials/basics/sin_plot.py", line 3, in <module>
    import numpy as np
ModuleNotFoundError: No module named 'numpy'


: 1

Note that since you want to pass the string "wonderful world of HPC" as a single argument to your script, you have to quote it.

There is a lot more to learn about the module system, and you are likely to need that at some point.  Perhaps it would be a good idea to check out the [tutorial](003_software_modules.ipynb) now.

## R packages

Of course, R by itself is not that useful, you like want to use multiple R packages from CRAN.  Some may have been preinstalled for you, but likely not all.  If so, no worries, you can (sort of) easily install them yourself.

Click to watch the [**video**](https://youtu.be/jMZIeh3RA5s).

However, life is sometimes a bit more complicated.  Some R packages depend on software packages that should be installed on your system.  For instance, as you will see in the next video, the R package `gsl` relies on the GNE GSL (GNU Scientific Library) package, which itself has some other dependencies.

Click to watch the [**video**](https://youtu.be/NcPeRuqhUS4).

Remember (you did the [tutorial]() on the module system, right?) that you can search for GSL.

In [None]:
module spider GSL

This will work for many R packages, but not all.  If you have trouble, please [contact support](https://docs.vscentrum.be/en/latest/user_support.html).

## Summary

In this tutorial you learned how to run R scripts using an R module.
  
For the module system, You learned how to
  * list available modules using `module available`
  * search for modules using `module spider`
  * use the software package using `module load`

You also learned how to install your own R packages.

## Where to go from here?

You can now run an R script on the login node, but that is only useful for very short computation, i.e., scripts that run in a minute or less.  You share the login node with many other users of the HPC system, so if you perform computationally intensive computations on this system, it will impact the performance for all other users.

Your real workloads will run on the compute nodes of the HPC system, and these computations are typically run via a job script.  You can learn more about that in
  * [job scripts and the scheduler](020_jobs.ipynb).