# Getting Started
Before we begin, let us execute the below cell to display information about the NVIDIA® CUDA® driver and the GPUs running on the server by running the `nvidia-smi` command. To do this, execute the cell block below by clicking on it with your mouse, and pressing Ctrl+Enter, or pressing the play button in the toolbar above. You should see some output returned below the grey cell.

In [None]:
!nvidia-smi

Since the code will be run on Multicore as well try running the cell below and get details of the nnumber of core and CPU architecure on the system

In [None]:
!cat /proc/cpuinfo

# A MINI-CFD APPLICATION

In this lab we will accelerate a Simple 2D regular-grid CFD simulation for teaching GPU programming using multiple approaches.
This is a simple simulation of an incompressible fluid flowing in a cavity using the 2D Navier-Stokes equation. The fluid flow can either be viscous (finite Reynolds number and vortices in the flow) on non-viscous (no Reynolds
number specified and no vortices in the flow).

It is deliberately written to be very simple and easy to understand so it can be used as a teaching example.


In this exercise the finite difference approach is used to determine the flow pattern of a fluid in a cavity. For simplicity, the liquid is assumed to have zero viscosity which implies that there can be no vortices (i.e. no whirlpools) in the flow. The cavity is a square box with an inlet on one side and an outlet on another as shown below:

<img src="images/cfd_flow.png" width="50%" height="50%">

### The objective of this exercise is not to dwell into the Maths part of it but to make use of different approaches to GPU programming to parallelize and improve the performance.

The general flow of the code is as shown in form of pseudo code

```cpp
set the boundary values for Ψ 
while (convergence == FALSE)  do 
    for each interior grid point do 
        update Ψ by averaging with its 4 nearest neighbours 
    end do 
    
    check for convergence 
end do 

for each interior grid point do 
    calculate 𝑢𝑥 calculate 𝑢𝑦 
end do

```

## Steps to follow
We will follow the Optimization cycle for porting and improving the code performance.

<img src="images/Optimization_Cycle.jpg" width="80%" height="80%">


### Understand and Analyze the code
Analyze the code and the Makefile for how to compile the code:

[cfd code](../source_code/serial/cfd.cpp) 

[Makefile](../source_code/serial/Makefile)

## Compile the code

In [None]:
!cd ../source_code/serial && make clean && make

## Run the CPU code

In [None]:
!cd ../source_code/serial && ./cfd 64 500

## Profiling

For this section, we will be using Nsight systems profiler and as the code is a CPU code, we will be tracing NVTX APIs (already integrated to the application). NVTX is useful for tracing of CPU events and time ranges. For more info on Nsight profiler, please see the __[profiler documentation](https://docs.nvidia.com/nsight-systems/)__.

### Viewing the profler output
There are two ways to look at profiled code: 

1) Command line based: Use `nsys` to collect and view profiling data from the command-line. Profiling results are displayed in the console after the profiling data is collected.

2) NVIDIA Nsight System: Open the Nsight System profiler and click on file > open, and choose the profiler output called `minicfd_profile.nsys-rep`. If you would like to view this on your local machine, this requires that the local system has CUDA toolkit installed of same version. More details on where to download CUDA toolit can be found in the links in resources section below.

## Profile the CPU code to find hotspots

In [None]:
!cd ../source_code/serial && nsys profile -t nvtx --stats=true --force-overwrite true -o minicfd_profile ./cfd 64 500

Download and save the report file by holding down <mark>Shift</mark> and <mark>right-clicking</mark> [here](../source_code/serial/minicfd_profile.nsys-rep) then choosing <mark>save Link As</mark>. Once done, open it via the GUI.

---

# Start Accelerating code

[stdpar](minicfd_stdpar.ipynb)

[OpenACC](minicfd_openacc.ipynb)

[OpenMP](minicfd_openmp.ipynb)

[CUDA C](minicfd_cudac.ipynb)





## Final Results

Modify and add timings for the accelerated code usinf different methods

| | OpenACC | OpenMP | stdpar | CUDA Languages ( C ) |
| --- | --- | --- | --- | --- |
| Multicore |   |  |   |  |
| GPU  |  |  |  |  |



## Licensing 

Copyright © 2022 OpenACC-Standard.org.  This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). These materials may include references to hardware and software developed by other entities; all applicable licensing and copyrights apply.