# Laplace Equation

Laplace Equation is a well-studied linear partial differential equation that governs steady state heat conduction, irrotational fluid flow, and many other phenomena. 

In this lab, we will consider the 2D Laplace Equation on a rectangle with Dirichlet boundary conditions on the left and right boundary and period boundary conditions on top and bottom boundary. We wish to solve the following equation:

$\Delta u(x,y) = 0\;\forall\;(x,y)\in\Omega,\delta\Omega$

# Jacobi Method

The Jacobi method is an iterative algorithm to solve a linear system of strictly diagonally dominant equations. The governing equation is discretized and converted to a matrix amenable to Jacobi-method based solver.

## The Code

Let's understand the single-GPU code first. The source code file is available here: [jacobi.cu](../../source_code/single_gpu/jacobi.cu).

Alternatively, you can open the `File` menu and click on the `Open...` option which opens Jupyter's file explorer in a new tab. Then, navigate to `CFD/English/C/source_code/single_gpu/` directory in which you can view the `jacobi.cu` file. 

Similarly, have look at the [Makefile](../../source_code/single_gpu/Makefile). 

Refer to the `single_gpu(...)` function. The important steps at iteration of the Jacobi Solver (that is, the `while` loop) are:
1. The norm is set to 0.
2. The device kernel is called to update the interier points.
3. The norm is copied back to the host, and
4. The boundary conditions are re-applied for the next iteration.

Note that we run the Jacobi solver for 1000 iterations over the grid.

## Compilation and Execution

Let's compile the single-GPU code:

In [44]:
!cd ../../source_code/single_gpu && make clean && make

rm -f jacobi jacobi.qdrep
nvcc -DHAVE_CUB -Xcompiler -fopenmp -lineinfo -DUSE_NVTX -lnvToolsExt -gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_80,code=compute_80 -std=c++14 jacobi.cu -o jacobi


Now, let us execute the program: 

In [45]:
!cd ../../source_code/single_gpu && ./jacobi

Single GPU jacobi relaxation: 1000 iterations on 16384 x 16384 mesh with norm check every 1 iterations
    0, 31.999022
  100, 0.897983
  200, 0.535684
  300, 0.395651
  400, 0.319039
  500, 0.269961
  600, 0.235509
  700, 0.209829
  800, 0.189854
  900, 0.173818
16384x16384: 1 GPU:   3.3650 s


The output reports the norm value every 100 iterations and the total execution time of the Jacobi Solver. We would like to decrease the overall execution time of the program. To quantify the performance gain, we denote the single-GPU execution time as $T_s$ and multi-GPU execution time for $P$ GPUs as $T_p$. using this, we obtain the figures-of-merit, speedup $S = T_s/T_p$ (optimal is $P$) and efficiency $E = S/P$ (optimal is $1$). 

In [None]:
 !cd ../../source_code/mpi && make clean && make

In [None]:
!cd ../../source_code/mpi && mpirun -np 8 nsys profile --trace=mpi,cuda,nvtx ./jacobi