# Heat Tutorial
---

The original version of this tutorial was inspired by the [CS228 tutorial](https://github.com/kuleshov/cs228-material/blob/master/tutorials/python/cs228-python-tutorial.ipynb) by Volodomyr Kuleshov and Isaac Caswell.

For this interactive HPC adaptation, we have heavily referenced the [HPC Python](https://gitlab.jsc.fz-juelich.de/sdlbio-courses/hpc-python) course and the [jupyter-jsc](https://github.com/FZJ-JSC/jupyter-jsc-notebooks) repository. Many thanks Jan Meinke, Jens Henrik Goebbert, Tim Kreuzer, Alice Gorsch @ JSC for help setting this up.

## Introduction
---

**Table of Contents**
(copilot generated, needs to be updated)
1. [Introduction](#Introduction)
2. [Getting Started](#Getting-Started)
3. [Heat Basics](#Heat-Basics)
4. [Heat Arrays](#Heat-Arrays)
5. [Heat Operations](#Heat-Operations)


<div style="float: right; padding-right: 2em; padding-top: 2em;">
    <img src="https://raw.githubusercontent.com/helmholtz-analytics/heat/master/doc/images/logo.png"></img>
</div>


This tutorial is designed to run on [Jupyter Notebook servers at the  Jülich Supercomputing Centre](https://jupyter-jsc.fz-juelich.de/). 

deRSE24 participants will have [received instructions](https://pad.gwdg.de/s2GbnPwcTWeSK-OFKs4nAw#) on how to request compute resources associated with the training. 


Log in to the [jupyter-jsc](https://jupyter-jsc.fz-juelich.de/) hub and start a new terminal. In the terminal, copy the tutorials from the project directory to your home directory:

```bash
cd
cp -r /p/project/training2404/tutorials* .
```


In general, you can install Heat easily on your laptop via `pip` or `conda`:
```bash
pip install heat
``` 
or 
```bash
conda install -c conda-forge heat
```
with the main dependencies being:

- an MPI distribution and `mpi4py`;
- a `torch` installation suited to your hardware accelerators (if any). 

Installation on an HPC system is also straightforward, but heavily tuned to the available hardware. **In this short tutorial, we will skip the installation part, and use a dedicated kernel that we have created in advance.** 

Before we load our kernel, we need to load the cluster modules needed for the kernel to work. In the terminal, type:

```bash
source /p/project/training2404/heat_derse24.sh
```

When asked to select a Python kernel, choose  `heat-1.4.0-dev`. 

We will be running the tutorial on the GPU partition of the [JURECA](https://apps.fz-juelich.de/jsc/hps/jureca/configuration.html) cluster, with the following hardware:

- 2× AMD EPYC 7742, 2× 64 cores, 2.25 GHz
- 4× NVIDIA A100 GPU, 4× 40 GB HBM2e

Before we can test if Heat can be imported, we need to initialize the `ipcluster`. In the terminal, type:

```bash
ipcontroller  &
srun -n 4 -c 12 --ntasks-per-node 4 --time 00:90:00   -A training2404 -p dc_gpu ipengine start
```
On your terminal, you should see something like this:

```bash
FILL OUT
```

Reload the kernel. You now have access to 4 MPI processes that can be used by Heat either on CPU, or on the 4 GPUs available on each node.




## Basics: what is Heat for?
---

[**deRSE24 NOTE**:  do attend Fabian Hoppe's talk [TODAY at 16:30](https://events.hifis.net/event/994/contributions/7940/) for more details, benchmarks, and an overview of the parallel Python ecosystem.] 


Straight from our [GitHub repository](https://github.com/helmholtz-analytics/heat):

Heat builds on [PyTorch](https://pytorch.org/) and [mpi4py](https://mpi4py.readthedocs.io) to provide high-performance computing infrastructure for memory-intensive applications within the NumPy/SciPy ecosystem.


With Heat you can:
- port existing NumPy/SciPy code from single-CPU to multi-node clusters with minimal coding effort;
- exploit the entire, cumulative RAM of your many nodes for memory-intensive operations and algorithms;
- run your NumPy/SciPy code on GPUs (CUDA, ROCm, coming up: Apple MPS).


Why?

- significant outperformance with respect to task-parallel frameworks;
- analysis of massive datasets without breaking them up in artificially independent chunks;
- ease of use: script and test on your laptop, port straight to HPC cluster; 
- PyTorch-based: GPU support beyond the CUDA ecosystem.

To be able to start working with Heat on an HPC cluster, we first need to check the health of the available processes. We will use the `ipyparallel` client for this.

In [None]:
from ipyparallel import Client
rc = Client(profile="default")

We have started the `ipcontroller` and `ipengine` processes with 4 processes. We can now check if the processes are available.

In [3]:
rc.ids

[0, 1, 2, 3]

TODO: Here explain %%px magic

In [5]:
%px import heat as ht

  from .autonotebook import tqdm as notebook_tqdm


  from .autonotebook import tqdm as notebook_tqdm


  from .autonotebook import tqdm as notebook_tqdm


  from .autonotebook import tqdm as notebook_tqdm


%px: 100%|██████████| 4/4 [00:07<00:00,  1.96s/tasks]
