Accelerating portable HPC Applications with ISO Fortran
===

## Learning Objectives

This full-day hands-on tutorial teaches how to accelerate portable HPC applications with CPUs and GPUs using the parallelism and concurrency features of the Fortran 2008, 2018 and 2023 standards. Attendees will accelerate a canonical PDE solver for the unsteady heat equation from a single threaded implementation to a multi-CPU/multi-GPU implementation that overlaps computation with communication. Along the way, they will learn about Fortran concurrency features like do concurrent, math intrinsics. The tutorial teaches how to integrate these features into hybrid HPC applications using MPI.

### Outline

This tutorial contains three hands-on labs:

- [Lab 1: MATMUL]: Using NVIDIA Tensor Cores from standard Fortran (beginner).
- [Lab 2: DAXPY]: Fundamentals of parallel Fortran (beginner).
- [Lab 3: Heat Equation]: Solving the two-dimensional unsteady heat equation (beginner).

[Lab 1: MATMUL]: lab1_matmul/matmul.ipynb
[Lab 2: DAXPY]: lab2_daxpy/daxpy.ipynb
[Lab 3: Heat Equation]: lab3_heat/heat.ipynb

## Audience, Content Level, Prerequisites, and Duration

This full-day tutorial is relevant for those interested in parallel programming models, the Fortran programming language, performance portability and heterogeneous systems. 

The content is structured into three topics, and progressesn from beginner to intermediate to advanced.

Beginner-level experience with Fortran90, OpenMP, and MPI is required.

## Getting started

Run the following cells by selecting them and pressing `CTRL+ENTER`.

Let's start by testing the compiler versions:

In [3]:
!gfortran --version
!nvfortran --version

GNU Fortran (Ubuntu 12.1.0-2ubuntu1~22.04) 12.1.0
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


nvfortran 23.5-0 64-bit target on x86-64 Linux -tp icelake-server 
NVIDIA Compilers and Tools
Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.


, the CUDA Driver and GPU you are running the code on in this lab:

In [1]:
!nvidia-smi

Mon Jun 19 08:48:18 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.50                 Driver Version: 530.50       CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA Graphics Device          On | 00000000:6B:00.0 Off |                    0 |
| N/A   26C    P0               62W / 700W|      0MiB / 81559MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                    

and the CPUs on the system:

In [2]:
!lscpu

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 57 bits virtual
  Byte Order:            Little Endian
CPU(s):                  32
  On-line CPU(s) list:   0-31
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz
    CPU family:          6
    Model:               106
    Thread(s) per core:  2
    Core(s) per socket:  16
    Socket(s):           1
    Stepping:            6
    CPU max MHz:         3400.0000
    CPU min MHz:         800.0000
    BogoMIPS:            4800.00
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc
                         a cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss 
                         ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art
                          arch_perfmon pebs bts rep_good nopl xtopology nonstop_
                         tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor 

--- 

## Licensing 

This material is provided under the MIT License:

```
SPDX-FileCopyrightText: Copyright (c) 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: MIT

Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and associated documentation files (the "Software"),
to deal in the Software without restriction, including without limitation
the rights to use, copy, modify, merge, publish, distribute, sublicense,
and/or sell copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.
```