# HeAT Tutorial
---

Inspired by the [CS228 tutorial](https://github.com/kuleshov/cs228-material/blob/master/tutorials/python/cs228-python-tutorial.ipynb) from Volodomyr Kuleshov and Isaac Caswell.

## Introduction
---

**Table of Contents**

<div style="float: right; padding-right: 2em; padding-top: 2em;">
    <img src="https://raw.githubusercontent.com/helmholtz-analytics/heat/master/doc/images/logo_HeAT.png"></img>
</div>

* [Installation](#Installation)
    * [Dependencies](#Dependencies)
    * [Dependencies](#Dependencies)
* [HeAT Arrays](#HeAT-Arrays)
    * [Datatypes](#Datatypes)
    * [Operations](#Operations)
    * [Indexing](#Indexing)
* [Parallel Processing](#Parallel-Processing)
    * [GPUs](#Dependencies)
    * [Distributed Computing](#Distributed-Computing)
    * [Dos and Don'ts](#Dos-and-Don'ts)

HeAT is a flexible and seamless open-source software for high performance data analytics and machine learnings. It provides highly optimized algorithms and data structures for multi-dimensional arrays computations using CPUs, GPUs and distributed cluster systems. The goal of HeAT is to fill the gap between data analytics and machine learning libraries with a strong focus on on single-node performance, and traditional high-performance computing (HPC). HeAT's generic Python-first programming interface integrates seamlessly with the existing data science ecosystem and makes it as effortless as using numpy to write scalable scientific and data science applications that go beyond the computational and memory needs of your laptop and desktop.

For this tutorial, we assume that you are somewhat proficient in the Python programming language. Equally, it is beneficial that you have worked with vectorized multi-dimensional array data structures before, as offered by NumPy, Matlab or R for example. If not or you feel like refreshing your knowledge, you might find the following ressources useful: [CS228 Python and NumPy Tutorial](https://github.com/kuleshov/cs228-material/blob/master/tutorials/python/cs228-python-tutorial.ipynb), [NumPy for MATLAB users](https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html) and [NumPy for R users](http://mathesaurus.sourceforge.net/r-numpy.html)

In line with this tutorial, we will cover the following topics

* Installation and setup of HeAT
* Working with HeAT arrays, operations, indexing etc.
* Utilizing HeAT's scalable parallel processing capabilities

## Installation
---

In most use cases the best way to install HeAT on your system is to use the official pre-built package from the Python Package index (PyPi) as follow.

```bash
python -m pip install heat
```

You might need to use the `--user` flag or a [virtual environment](https://docs.python.org/3/library/venv.html) on systems where you do not have sufficient priviliges.

You can also install the latest greatest HeAT version by cloning the HeAT source code repository and a manual installation.

```bash
git clone https://github.com/helmholtz-analytics/heat && cd heat && pip install .
```

### Dependencies

HeAT requires you to have an [MPI](https://computing.llnl.gov/tutorials/mpi/) installation on your system in order to enable parallel processing capabilities. If not already present on your system (also applies to laptops, desktops etc.) you can obtain it through your systems package manager (here: OpenMPI), e.g.:

```bash
apt-get install libopenmpi-dev (Ubuntu, Debian)
dnf install openmpi-devel (Fedora)
yum install openmpi-devel (CentOS)
```

Installing these dependencies usually requires administrator priviliges.

### Optional Features

HeAT may be installed with several optional features, i.e. GPU support on top of CUDA, HDF5 and NetCDF4 (parallel) I/O. If you feel like using these features, this how you can enable them

* GPU support—ensure that CUDA is installed on your system. You may find an installation guide [here](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html).
* HDF5 support—install HDF5 via your system's package manager, preferably with parallel I/O capabilities

```bash
apt-get install libhdf5-openmpi-dev (Ubuntu, Debian)
dnf install hdf5-openmpi-devel (Fedora)
yum install hdf5-openmpi-devel (CentOS)
```

* NetCDF4 support—install NetCDF4 via your system's package manager, preferably with parallel I/O capabilities

```bash
apt-get install libnetcdf-dev (Ubuntu, Debian)
dnf install netcdf-openmpi-devel (Fedora)
yum install netcdf-openmpi-devel (CentOS)
```

When you install HeAT you need to explicitly state that you also want to install all modules for HDF5 and NetCDF4 support by specifying an extras flag, i.e.:

```bash
pip install -e .[hdf5,netcdf] heat
```

respectively

```bash
git clone https://github.com/helmholtz-analytics/heat && cd heat && pip install -e [hdf5,netcdf] .
```



## HeAT Arrays
---

### Datatypes

### Operations

### Indexing

## Parallel Processing
---

### GPUs

### Distributed Computing

### Dos and Don'ts