<a href="https://colab.research.google.com/github/robertopsouto/invmultifis_notebooks/blob/main/english/INVMULTIFIS_CSEM3D_nvhpc.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **INVMULTIFIS Project: Development of multi-physics data inversion software with optimization via artificial intelligence**

The project proposes the development of an innovative inversion technology for the characterization and monitoring of deep water reservoirs for Petrobras (the Brazilian Oil Company) using CSEM (Controlled-Source Electromagnetic Methods), a robust risk reduction tool in the drilling of oil basins, using multiphysics data in the 3D domain. One of the main objectives of this project is to develop, optimize and parallelize CSEM codes, aiming at improving their performance. 

## Donwload and install NVIDIA HPC SDK

In [None]:
%%time
# Downloading and installing deb packages. This will take 5 minutes.
! curl https://developer.download.nvidia.com/hpc-sdk/ubuntu/DEB-GPG-KEY-NVIDIA-HPC-SDK | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-hpcsdk-archive-keyring.gpg
! echo 'deb [signed-by=/usr/share/keyrings/nvidia-hpcsdk-archive-keyring.gpg] https://developer.download.nvidia.com/hpc-sdk/ubuntu/amd64 /' | sudo tee /etc/apt/sources.list.d/nvhpc.list
! sudo apt-get update -y
! sudo apt-get install -y nvhpc-22-11

## Install `environment-modules` package to load `nvhpc`

In [None]:
%%bash
apt install environment-modules

## Loading `nvhpc` and run `nvaccelinfo` command

---
Let's execute the cell below to display information about the GPUs running on the server by running the `nvaccelinfo` command, which ships with the NVIDIA HPC compiler that we will be using. To do this, execute the cell block below by giving it focus (clicking on it with your mouse), and hitting Ctrl-Enter, or pressing the play button in the toolbar above.  If all goes well, you should see some output returned below the grey cell.

In [None]:
%%bash
source /usr/share/modules/init/bash
module use /opt/nvidia/hpc_sdk/modulefiles
module load nvhpc/22.11
nvaccelinfo

The output of the above command will vary according to which GPUs you have in your system. For example, if you are running the lab on a machine with NVIDIA Tesla V100 GPUs, you might would see the following:

```
CUDA Driver Version:           10010
NVRM version:                  NVIDIA UNIX x86_64 Kernel Module  418.87.01  Wed Sep 25 06:00:38 UTC 2019

Device Number:                 0
Device Name:                   Tesla V100-SXM2-16GB
Device Revision Number:        7.0
Global Memory Size:            16914055168
Number of Multiprocessors:     80
Concurrent Copy and Execution: Yes
Total Constant Memory:         65536
Total Shared Memory per Block: 49152
Registers per Block:           65536
Warp Size:                     32
Maximum Threads per Block:     1024
Maximum Block Dimensions:      1024, 1024, 64
Maximum Grid Dimensions:       2147483647 x 65535 x 65535
Maximum Memory Pitch:          2147483647B
Texture Alignment:             512B
Clock Rate:                    1530 MHz
Execution Timeout:             No
Integrated Device:             No
Can Map Host Memory:           Yes
Compute Mode:                  default
Concurrent Kernels:            Yes
ECC Enabled:                   Yes
Memory Clock Rate:             877 MHz
Memory Bus Width:              4096 bits
L2 Cache Size:                 6291456 bytes
Max Threads Per SMP:           2048
Async Engines:                 6
Unified Addressing:            Yes
Managed Memory:                Yes
Concurrent Managed Memory:     Yes
Preemption Supported:          Yes
Cooperative Launch:            Yes
  Multi-Device:                Yes
Default Target:            -ta=tesla:cc70
```

This gives us lots of details about the GPU, for instance the device number, the type of device, and at the very bottom the command line argument we should use when targeting this GPU (see *_NVIDIA HPC Compiler Option_*). We will use this command line option a bit later to build for our GPU.

# Steps to install CSEM3D program

### Installing GNU Make v4.3 to correct deal with CSEM3D `makefile`

In [None]:
%%bash
wget https://ftp.gnu.org/gnu/make/make-4.3.tar.gz
tar xfz make-4.3.tar.gz

In [None]:
%%bash
cd make-4.3
./configure
make
make install

In [None]:
%%bash
make -v

##   Creating user, once OpenMPI does not recommend run MPI with `root` user



In [None]:
%%bash
adduser csem

## PETSC v3.18.4

### Download, extract the source code file and run `configure` file

In [None]:
%%bash
su csem
source /usr/share/modules/init/bash
module use /opt/nvidia/hpc_sdk/modulefiles
module load nvhpc/22.11
mkdir -p ${HOME}/petsc/nvhpc
cd ${HOME}/petsc/nvhpc
wget https://gitlab.com/petsc/petsc/-/archive/v3.18.4/petsc-v3.18.4.tar.gz
tar zxvf petsc-v3.18.4.tar.gz
cd petsc-v3.18.4
./configure \
 --prefix=${PWD}/installdir \
 --with-fortran \
 --with-fortran-kernels=true \
 --with-cuda \
 --download-fblaslapack \
 --with-scalar-type=complex \
 --with-precision=double \
 --with-debugging=0 \
 --with-x=0 \
 --with-cc=mpicc \
 --with-cxx=mpicxx \
 --with-fc=mpif90 \
 --with-make-exec=make \
 2>&1 | tee ../configure.out

### Run `make all` phase. 
```bash
If it is successfully finished, this message must appear:

=========================================
Now to install the libraries do:
make PETSC_DIR=/home/csem/petsc/nvhpc/petsc-v3.18.4 PETSC_ARCH=arch-linux-c-opt install
=========================================
```

In [None]:
%%bash
su csem
source /usr/share/modules/init/bash
module use /opt/nvidia/hpc_sdk/modulefiles
module load nvhpc/22.11
cd /home/csem/petsc/nvhpc/petsc-v3.18.4
make PETSC_DIR=/home/csem/petsc/nvhpc/petsc-v3.18.4 PETSC_ARCH=arch-linux-c-opt all

### Run `make install` phase
```bash
If it is successfully finished, this message must appear:

====================================
Install complete.
Now to check if the libraries are working do (in current directory):
make PETSC_DIR=/home/csem/petsc/nvhpc/petsc-v3.18.4/installdir PETSC_ARCH="" check
====================================
```

In [None]:
%%bash
su csem
source /usr/share/modules/init/bash
module use /opt/nvidia/hpc_sdk/modulefiles
module load nvhpc/22.11
cd /home/csem/petsc/nvhpc/petsc-v3.18.4
make PETSC_DIR=/home/csem/petsc/nvhpc/petsc-v3.18.4 PETSC_ARCH=arch-linux-c-opt install

### Run `make check` phase
```bash
If it is successfully finished, this message must appear:

Running check examples to verify correct installation
Using PETSC_DIR=/home/csem/petsc/nvhpc/petsc-v3.18.4/installdir and PETSC_ARCH=
C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes
C/C++ example src/snes/tutorials/ex19 run successfully with cuda
Fortran example src/snes/tutorials/ex5f run successfully with 1 MPI process
Completed test examples
```

In [None]:
%%bash
su csem
source /usr/share/modules/init/bash
module use /opt/nvidia/hpc_sdk/modulefiles
module load nvhpc/22.11
cd /home/csem/petsc/nvhpc/petsc-v3.18.4
make PETSC_DIR=/home/csem/petsc/nvhpc/petsc-v3.18.4/installdir PETSC_ARCH="" check

### Test an example performing complex numbers (`ex11f.F90`)

In [None]:
%%bash
su csem
source /usr/share/modules/init/bash
module use /opt/nvidia/hpc_sdk/modulefiles
module load nvhpc/22.11
cd /home/csem/petsc/nvhpc/petsc-v3.18.4/src/ksp/ksp/tutorials
make ex11f
mpirun -n 1 ./ex11f -norandom -pc_type none -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always

### Check with reference output (`output/ex11f_1.out`)

In [None]:
%%bash
su csem
source /usr/share/modules/init/bash
module use /opt/nvidia/hpc_sdk/modulefiles
module load nvhpc/22.11
cd /home/csem/petsc/nvhpc/petsc-v3.18.4/src/ksp/ksp/tutorials
cat output/ex11f_1.out

### Following instructions in https://petsc.org/release/developers/testing/ to run an example that requires CUDA.

In [None]:
%%bash
su csem
source /usr/share/modules/init/bash
module use /opt/nvidia/hpc_sdk/modulefiles
module load nvhpc/22.11
cd /home/csem/petsc/nvhpc/petsc-v3.18.4/
make print-test query='suffix' queryval='2_aijcusparse'

In [None]:
%%bash
su csem
source /usr/share/modules/init/bash
module use /opt/nvidia/hpc_sdk/modulefiles
module load nvhpc/22.11
cd /home/csem/petsc/nvhpc/petsc-v3.18.4/
make test search=ksp_ksp_tutorials-ex1_2_aijcusparse

In [None]:
%%bash
su csem
cd /home/csem/petsc/nvhpc/petsc-v3.18.4/
cat arch-linux-c-opt/tests/ksp/ksp/tutorials/runex1_2_aijcusparse/ksp_ksp_tutorials-ex1_2_aijcusparse.sh

In [None]:
%%bash
su csem
source /usr/share/modules/init/bash
module use /opt/nvidia/hpc_sdk/modulefiles
module load nvhpc/22.11
cd /home/csem/petsc/nvhpc/petsc-v3.18.4/
cd src/ksp/ksp/tutorials/
make ex1
mpiexec --oversubscribe  -n 1  ./ex1 \
-petsc_ci \
-pc_type sor \
-pc_sor_symmetric \
-ksp_monitor_short \
-ksp_gmres_cgs_refinement_type refine_always \
-mat_type aijcusparse \
-vec_type cuda \
-use_gpu_aware_mpi 0


If it is successfully finished, this output must appear:
```bash
  0 KSP Residual norm 0.968764 
  1 KSP Residual norm 0.361001 
  2 KSP Residual norm 0.247329 
  3 KSP Residual norm 0.0808915 
  4 KSP Residual norm 0.01289 
  5 KSP Residual norm 0.00375064 
  6 KSP Residual norm 0.000294092 
  7 KSP Residual norm 1.40861e-05 
  8 KSP Residual norm 3.48863e-07 
KSP Object: 1 MPI process
  type: gmres
    restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with one step of iterative refinement
    happy breakdown tolerance 1e-30
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI process
  type: sor
    type = symmetric, iterations = 1, local iterations = 1, omega = 1.
  linear system matrix = precond matrix:
  Mat Object: 1 MPI process
    type: seqaijcusparse
    rows=10, cols=10
    total: nonzeros=28, allocated nonzeros=28
    total number of mallocs used during MatSetValues calls=0
      not using I-node routines
Norm of error 4.10316e-07, Iterations 8
  0 KSP Residual norm 0.377523 
  1 KSP Residual norm 0.0140399 
  2 KSP Residual norm 0.000364106 
  3 KSP Residual norm 7.83047e-06 
  4 KSP Residual norm 1.33045e-07 

  ```

In [None]:
%%bash
su csem
cd /home/csem/petsc/nvhpc/petsc-v3.18.4/
cd src/ksp/ksp/tutorials/
cat output/ex1_2_aijcusparse.out

### Profiling with `nvprof`

In [None]:
%%bash
su csem
source /usr/share/modules/init/bash
module use /opt/nvidia/hpc_sdk/modulefiles
module load nvhpc/22.11
cd /home/csem/petsc/nvhpc/petsc-v3.18.4/
cd src/ksp/ksp/tutorials/
make ex1
mpiexec --oversubscribe  -n 1 nvprof -f -o ex1.%q{OMPI_COMM_WORLD_RANK}.nvprof ./ex1 \
-petsc_ci \
-pc_type sor \
-pc_sor_symmetric \
-ksp_monitor_short \
-ksp_gmres_cgs_refinement_type refine_always \
-mat_type aijcusparse \
-vec_type cuda

### Show profiling obtained with `nvprof`

In [None]:
%%bash
su csem
source /usr/share/modules/init/bash
module use /opt/nvidia/hpc_sdk/modulefiles
module load nvhpc/22.11
cd /home/csem/petsc/nvhpc/petsc-v3.18.4/
cd src/ksp/ksp/tutorials/
nvprof -i ex1.0.nvprof

## `CSEM3D` program

### Download the source code files of `CSEM3D` program in the `root` area.

In [None]:
%%bash
wget --no-check-certificate 'https://docs.google.com/uc?export=download&id=1iNGEZIM8Whd1mRmfA2HekxBII5OXVcZ_' -O csem3d_w-v1.0.3.tar.gz

In [None]:
%%bash
rm csem3d_w-v1.0.3.tar.gz

In [None]:
%%bash
wget --no-check-certificate 'https://docs.google.com/uc?export=download&id=1iNGEZIM8Whd1mRmfA2HekxBII5OXVcZ_' -O csem3d_w-v1.0.3.tar.gz

In [None]:
%%bash
ls csem3d_w-v1.0.3.tar.gz -lh

### Copy the tarball file to `csem` user account area, and change the onwer of this file to `csem` user.

In [None]:
%%bash
cp csem3d_w-v1.0.3.tar.gz /home/csem/
chown csem:csem /home/csem/csem3d_w-v1.0.3.tar.gz

### Unpacking the tarball file

In [None]:
%%bash
su csem
cd /home/csem
tar zxvf csem3d_w-v1.0.3.tar.gz

### Run `make all` to install the `CSEM3D` program
```bash
If it is successfully finished, this message must appear:

mpif90 -o CSEM3D_W outfields_E_Tx.o spline.o bottom.o allvars.o intpol1.o biprho.o dimped.o in3dmod.o compute_src_wts.o set_resist_vector.o locals.o kinds.o outfields_B_Tx.o B_Tx_B_Rx.o dimens.o set_P.o set_bv_e.o d1imped.o set_src.o grid.o in3drho.o CSEM3D_mod.o CSEM3D_W.o set_bv_h.o bipole2.o set_A.o chk_rx_tx.o set_1d_resist.o txrx.o splint.o set_rhs.o blocks.o E_Tx_E_Rx.o convres.o abs_to_rel.o B_Tx_E_Rx.o E_Tx_B_Rx.o addair.o -L/home/csem/petsc/gnu/petsc-v3.18.4/installdir/lib -lpetsc  
```

In [None]:
%%bash
su csem
source /usr/share/modules/init/bash
module use /opt/nvidia/hpc_sdk/modulefiles
module load nvhpc/22.11
cd /home/csem/csem3d_w-v1.0.3/CSEM3D_W/CSEM3D_W
make PETSC_DIR=${HOME}/petsc/nvhpc/petsc-v3.18.4/installdir -f scripts/makefile_nvhpc clean

In [None]:
%%bash
su csem
source /usr/share/modules/init/bash
module use /opt/nvidia/hpc_sdk/modulefiles
module load nvhpc/22.11
cd /home/csem/csem3d_w-v1.0.3/CSEM3D_W/CSEM3D_W
make PETSC_DIR=${HOME}/petsc/nvhpc/petsc-v3.18.4/installdir -f scripts/makefile_nvhpc all

### Run this bash `script` to execute the generated `CSEM3D_W` binary file.

```bash
If it is successfully finished, a similar message like below must appear:

 16 KSP preconditioned resid norm 3.863008301354e-18 true resid norm 1.284864871592e-05 ||r(i)||/||b|| 3.405644702963e-01
 17 KSP preconditioned resid norm 2.061402843466e-18 true resid norm 1.054741071689e-05 ||r(i)||/||b|| 2.795681805313e-01
 18 KSP preconditioned resid norm 1.062033155132e-18 true resid norm 3.992776343547e-06 ||r(i)||/||b|| 1.058319664983e-01
 converged reason            2
 total number of relaxations           18
 ========================================

 
 ************************************************
  3D finished
  Total CPU time:    18.7500000      seconds
 ************************************************
 
 total cpu time:    18.7500000      seconds
 CSEM3D_W finished
```



In [None]:
%%bash
su csem
source /usr/share/modules/init/bash
module use /opt/nvidia/hpc_sdk/modulefiles
module load nvhpc/22.11

cd /home/csem/csem3d_w-v1.0.3/CSEM3D_W/CSEM3D_W

PETSC_DIR=${HOME}/petsc/nvhpc/petsc-v3.18.4/installdir

dataset=Sintetico
ntasks=1
nnodes=1

TIMESTART=$(date +%Y%m%d%H%M%S)

if [[ -L ${dataset} ]]
then
    echo "Link já existe para o dataset ${dataset}"
else
    ln -s dataset/${dataset}
fi
sed 's/\.\//'${dataset}'\//g' ${dataset}/Parameters.inp | \
sed 's/'${dataset}'\/OutData/OutData/g' > Parameters.inp

outputdir="OutData"
if [[ -d ${outputdir} ]]
then
    echo "OutData já existe."
    rm -fr ${outputdir}
fi
mkdir ${outputdir}


resultsdir=results/${dataset}/NUMNODES-${nnodes}/MPI-${ntasks}/EXECSTART-${TIMESTART}
mkdir -p ${resultsdir}

export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${PETSC_DIR}/lib

executable=CSEM3D_W

echo "mpirun -np $ntasks ./${executable}"
mpirun -np $ntasks ./${executable} \
 -A_mat_type mpiaij \
 -P_mat_type mpiaij \
 -em_ksp_monitor_true_residual \
 -em_ksp_type bcgs \
 -em_pc_type bjacobi \
 -em_sub_pc_type ilu \
 -em_sub_pc_factor_levels 3 \
 -em_sub_pc_factor_fill 6 \
 < ./Parameters.inp \
 2>&1 | tee csem3d_w-${TIMESTART}.out

mv $outputdir/ ${resultsdir}/
cp csem3d_w-${TIMESTART}.out ${resultsdir}/


In [None]:
%%bash
su csem
source /usr/share/modules/init/bash
module use /opt/nvidia/hpc_sdk/modulefiles
module load nvhpc/22.11

cd /home/csem/csem3d_w-v1.0.3/CSEM3D_W/CSEM3D_W

PETSC_DIR=${HOME}/petsc/nvhpc/petsc-v3.18.4/installdir


dataset=Sintetico
ntasks=1
nnodes=1

TIMESTART=$(date +%Y%m%d%H%M%S)

if [[ -L ${dataset} ]]
then
    echo "Link já existe para o dataset ${dataset}"
else
    ln -s dataset/${dataset}
fi
sed 's/\.\//'${dataset}'\//g' ${dataset}/Parameters.inp | \
sed 's/'${dataset}'\/OutData/OutData/g' > Parameters.inp

outputdir="OutData"
if [[ -d ${outputdir} ]]
then
    echo "OutData já existe."
    rm -fr ${outputdir}
fi
mkdir ${outputdir}


resultsdir=results/${dataset}/NUMNODES-${nnodes}/MPI-${ntasks}/EXECSTART-${TIMESTART}
mkdir -p ${resultsdir}

export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${PETSC_DIR}/lib

executable=CSEM3D_W

echo "mpirun -np $ntasks nvprof -f -o ${executable}.%q{OMPI_COMM_WORLD_RANK}.nvprof ./${executable}"
mpirun -np $ntasks nvprof -f -o ${executable}.%q{OMPI_COMM_WORLD_RANK}.nvprof ./${executable} \
 -A_mat_type aijcusparse \
 -P_mat_type aijcusparse \
 -vec_type cuda \
 -use_gpu_aware_mpi 0 \
 -em_ksp_monitor_true_residual \
 -em_ksp_type bcgs \
 -em_pc_type bjacobi \
 -em_sub_pc_type ilu \
 -em_sub_pc_factor_levels 3 \
 -em_sub_pc_factor_fill 6 \
 < ./Parameters.inp \
 2>&1 | tee csem3d_w-${TIMESTART}.out

mv $outputdir/ ${resultsdir}/
cp csem3d_w-${TIMESTART}.out ${resultsdir}/