# Setting up a parallel notebook with heat, SLURM, and ipyparallel on HAICORE/Horeka

The original version of this tutorial was inspired by the [CS228 tutorial](https://github.com/kuleshov/cs228-material/blob/master/tutorials/python/cs228-python-tutorial.ipynb) by Volodomyr Kuleshov and Isaac Caswell.




## Introduction
---
<div class="alert alert-block alert-warning">
<b>Note:</b>
This notebook expects that you will be working on the JupyterLab hosted in <a href="https://haicore-jupyter.scc.kit.edu/">HAICORE</a>, at the Karlsruhe Institute of Technology.

If you want to run the tutorial on your local machine, or on another systems, please refer to the <a href="../0_setup/0_setup_local.ipynb">local setup notebook</a> in this repository for reference, or to our <a href="https://heat.readthedocs.io/en/stable/tutorial_notebook_gallery.html">notebook gallery</a> for more examples.
</div>

<div style="float: right; padding-right: 2em; padding-top: 2em;">
    <img src="https://raw.githubusercontent.com/helmholtz-analytics/heat/master/doc/images/logo.png"></img>
</div>


## Setting up the environment

The rest of this tutorial assumes you have started a JupyterLab at [Jupyter for HAICORE](https://haicore-jupyter.scc.kit.edu/) with the following parameters:

| **Resources**     |     |
| ---               | --- |
| Nodes             | 1   |
| GPUs              | 4   |
| Runtime (hours) | 4   |


### Resources

We will be running the tutorial on the GPU partition of the [HAICORE](https://www.nhr.kit.edu/userdocs/haicore/hardware/) cluster, with the following hardware:

- 2× Intel Xeon Platinum 8368, 2 × 38 cores
- 4x NVIDIA A100-40


### Setup environment

The first step is to load (and unload) the right modules on HAICORE+Jupyter. 

On the left bar on Jupyter Lab, open the modules tab, and make to unload any ```jupyter``` modules, and the load ```mpi/openmpi/4.1``` and ```devel/cuda/12.4```.

Afterwards, run the cell below.

In [None]:
%%bash
# Report modules
ml list

# Create a virtual environment
python3.11 -m venv heat-env
source heat-env/bin/activate
pip install heat[hdf5] ipyparallel xarray matplotlib scikit-learn perun[nvidia]

python -m ipykernel install \
      --user \
      --name heat-env \
      --display-name "heat-env"
deactivate


Currently Loaded Modules:
i/4.1dot                       3) numlib/mkl/2022.0.2       5) mpi/openmp
  2) compiler/intel/2023.1.0   4) devel/cuda/12.4     (E)

  Where:
   E:  Experimental

 











Installed kernelspec myEnv in /hkfs/home/haicore/scc/io3047/.local/share/jupyter/kernels/myenv


To be able to run this tutorial interactively for parallel computing, we need to start an [IPython cluster](https://ipyparallel.readthedocs.io/en/latest/tutorial/process.html).


In the terminal, type:

```bash
ipcluster start -n 4 --engines=MPI --MPILauncher.mpi_args="--oversubscribe"
```
On your terminal, you should see something like this:

```bash
2024-03-04 16:30:24.740 [IPController] Registering 4 new hearts
2024-03-04 16:30:24.740 [IPController] registration::finished registering engine 0:63ac2343-f1deab70b14c0e14ca4c1630 in 5672ms
2024-03-04 16:30:24.740 [IPController] engine::Engine Connected: 0
2024-03-04 16:30:24.744 [IPController] registration::finished registering engine 3:673ce83c-eb7ccae6c69c52382c8349c1 in 5397ms
2024-03-04 16:30:24.744 [IPController] engine::Engine Connected: 3
2024-03-04 16:30:24.745 [IPController] registration::finished registering engine 1:d7936040-5ab6c117b845850a3103b2e8 in 5627ms
2024-03-04 16:30:24.745 [IPController] engine::Engine Connected: 1
2024-03-04 16:30:24.745 [IPController] registration::finished registering engine 2:ca57a419-2f2c89914a6c17865103c3e7 in 5508ms
2024-03-04 16:30:24.745 [IPController] engine::Engine Connected: 2
```

<div class="alert alert-block alert-info">
<b>Note:</b>
You must now reload the kernel to be able to access the IPython cluster.
</div>


To be able to start working with Heat on an HPC cluster, we first need to check the health of the available processes. We will use `ipyparallel` for this. For a great intro on `ipyparallel` usage on our supercomputers, check out Jan Meinke's tutorial ["Interactive Parallel Computing with IPython Parallel"](https://gitlab.jsc.fz-juelich.de/sdlbio-courses/hpc-python/-/blob/master/06_LocalParallel.ipynb) or the [ipyparallel docs](https://ipyparallel.readthedocs.io/en/latest/).

In [None]:
from ipyparallel import Client
rc = Client(profile="default")
rc.wait_for_engines(4)

Earlier, we have started an IPython cluster with 4 processes. We can now check if the processes are available.

In [None]:
rc.ids

[0, 1, 2, 3]

The `px` magic command allows you to execute Python commands or a Jupyter cell on the ipyparallel engines interactively ([%%px documentation](https://ipyparallel.readthedocs.io/en/latest/tutorial/magics.html)).

We can now finally import `heat` on our 4-process cluster.

In [None]:
%%px
import heat as ht

%px: 100%|██████████| 4/4 [00:01<00:00,  2.77tasks/s]


In [None]:
%%px
ht.use_device("gpu")

In [None]:
%%px
x = ht.ones((10,10), split=0)
x.larray


[0;31mOut[3:3]: [0m
tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]], device='cuda:1')

[0;31mOut[0:3]: [0m
tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]], device='cuda:0')

[0;31mOut[2:3]: [0m
tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]], device='cuda:0')

[0;31mOut[1:3]: [0m
tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]], device='cuda:1')

In [None]:
%%px
import torch
import os

print(torch.cuda.is_available())
print(torch.cuda.device_count())
print(os.environ["CUDA_VISIBLE_DEVICES"])

[stdout:0] True
2
0,1


[stdout:2] True
2
0,1


[stdout:3] True
2
0,1


[stdout:1] True
2
0,1
