<a href="https://colab.research.google.com/github/robertopsouto/invmultifis_notebooks/blob/main/english/INVMULTIFIS_CSEM3D_perf.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **INVMULTIFIS Project: Development of multi-physics data inversion software with optimization via artificial intelligence**

The project proposes the development of an innovative inversion technology for the characterization and monitoring of deep water reservoirs for Petrobras (the Brazilian Oil Company) using CSEM (Controlled-Source Electromagnetic Methods), a robust risk reduction tool in the drilling of oil basins, using multiphysics data in the 3D domain. One of the main objectives of this project is to develop, optimize and parallelize CSEM codes, aiming at improving their performance.

In [3]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0


# Steps to install CSEM3D program

### Installing GNU Make v4.3 to correct deal with CSEM3D `makefile`

In [4]:
%%bash
wget https://ftp.gnu.org/gnu/make/make-4.3.tar.gz
tar xfz make-4.3.tar.gz

--2023-08-06 23:39:34--  https://ftp.gnu.org/gnu/make/make-4.3.tar.gz
Resolving ftp.gnu.org (ftp.gnu.org)... 209.51.188.20, 2001:470:142:3::b
Connecting to ftp.gnu.org (ftp.gnu.org)|209.51.188.20|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2317073 (2.2M) [application/x-gzip]
Saving to: ‘make-4.3.tar.gz’

     0K .......... .......... .......... .......... ..........  2%  300K 7s
    50K .......... .......... .......... .......... ..........  4%  599K 5s
   100K .......... .......... .......... .......... ..........  6% 81.0M 4s
   150K .......... .......... .......... .......... ..........  8% 1.24M 3s
   200K .......... .......... .......... .......... .......... 11% 1.12M 3s
   250K .......... .......... .......... .......... .......... 13% 42.2M 2s
   300K .......... .......... .......... .......... .......... 15% 74.4M 2s
   350K .......... .......... .......... .......... .......... 17%  112M 2s
   400K .......... .......... .......... .......... ....

In [5]:
%%bash
cd make-4.3
./configure
make
make install

checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /usr/bin/mkdir -p
checking for gawk... no
checking for mawk... mawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether make supports the include directive... yes (GNU style)
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking whether gcc understands -c and -o together... yes
checking dependency style of gcc... gcc3
checking how to run the C preprocessor... gcc -E
checking for grep that handles long lines and -e... /usr/bi

src/main.c: In function ‘main’:
 1938 |       p[-1] = '\0';
      |       ~~~~~~^~~~~~
src/main.c:1935:15: note: destination object of size [0, 9223372036854775807] allocated by ‘quote_for_env’
 1935 |           p = quote_for_env (p, eval_strings->list[i]);
      |               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from src/makeint.h:31,
                 from src/main.c:17:
lib/alloca.h:46:18: note: at offset -1 into destination object of size [0, 9223372036854775807] allocated by ‘__builtin_alloca’
   46 | #  define alloca __builtin_alloca
src/main.c:1930:19: note: in expansion of macro ‘alloca’
 1930 |       p = value = alloca (len);
      |                   ^~~~~~
In file included from src/makeint.h:31,
                 from src/read.c:17:
src/read.c: In function ‘eval_makefile’:
   46 | #  define alloca __builtin_alloca
src/read.c:443:3: note: in expansion of macro ‘alloca’
  443 |   alloca (0);
      |   ^~~~~~
src/read.c: In function ‘eval_buffer’:
   46 | #

In [6]:
%%bash
make -v

GNU Make 4.3
Built for x86_64-pc-linux-gnu
Copyright (C) 1988-2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.


##   Creating user, once OpenMPI does not recommend run MPI with `root` user



In [7]:
%%bash
adduser csem

Adding user `csem' ...
Adding new group `csem' (1000) ...
Adding new user `csem' (1000) with group `csem' ...
Creating home directory `/home/csem' ...
Copying files from `/etc/skel' ...
Try again? [y/N] Changing the user information for csem
Enter the new value, or press ENTER for the default
	Full Name []: 	Room Number []: 	Work Phone []: 	Home Phone []: 	Other []: Is the information correct? [Y/n] 

New password: Password change has been aborted.
passwd: Authentication token manipulation error
passwd: password unchanged
Use of uninitialized value $answer in chop at /usr/sbin/adduser line 595.
Use of uninitialized value $answer in pattern match (m//) at /usr/sbin/adduser line 596.
Use of uninitialized value $answer in chop at /usr/sbin/adduser line 625.
Use of uninitialized value $answer in pattern match (m//) at /usr/sbin/adduser line 626.


## PETSC v3.18.4

### Download, extract the source code file and run `configure` file

In [8]:
%%bash
su csem
mkdir -p ${HOME}/petsc/gnu
cd ${HOME}/petsc/gnu
wget https://gitlab.com/petsc/petsc/-/archive/v3.18.4/petsc-v3.18.4.tar.gz
tar zxvf petsc-v3.18.4.tar.gz
cd petsc-v3.18.4
./configure \
 --prefix=${PWD}/installdir \
 --with-fortran \
 --with-fortran-kernels=true \
 --with-cuda \
 --download-fblaslapack \
 --with-scalar-type=complex \
 --with-precision=double \
 --with-debugging=0 \
 --with-x=0 \
 --with-gnu-compilers=1 \
 --with-cc=mpicc \
 --with-cxx=mpicxx \
 --with-fc=mpif90 \
 --with-make-exec=make \
 2>&1 | tee ../configure.out

petsc-v3.18.4/
petsc-v3.18.4/.clang-format
petsc-v3.18.4/.dir-locals.el
petsc-v3.18.4/.gitignore
petsc-v3.18.4/.gitlab-alcf-ci.yml
petsc-v3.18.4/.gitlab-ci.yml
petsc-v3.18.4/.gitlab/
petsc-v3.18.4/.gitlab/CODEOWNERS
petsc-v3.18.4/.gitmessage
petsc-v3.18.4/.mailmap
petsc-v3.18.4/CODE_OF_CONDUCT.md
petsc-v3.18.4/CONTRIBUTING
petsc-v3.18.4/GNUmakefile
petsc-v3.18.4/LICENSE
petsc-v3.18.4/README.md
petsc-v3.18.4/config/
petsc-v3.18.4/config/BuildSystem/
petsc-v3.18.4/config/BuildSystem/.hgignore
petsc-v3.18.4/config/BuildSystem/.hgtags
petsc-v3.18.4/config/BuildSystem/RDict.py
petsc-v3.18.4/config/BuildSystem/__init__.py
petsc-v3.18.4/config/BuildSystem/args.py
petsc-v3.18.4/config/BuildSystem/config/
petsc-v3.18.4/config/BuildSystem/config/__init__.py
petsc-v3.18.4/config/BuildSystem/config/atomics.py
petsc-v3.18.4/config/BuildSystem/config/base.py
petsc-v3.18.4/config/BuildSystem/config/compile/
petsc-v3.18.4/config/BuildSystem/config/compile/C.py
petsc-v3.18.4/config/BuildSystem/config/c

--2023-08-06 23:39:53--  https://gitlab.com/petsc/petsc/-/archive/v3.18.4/petsc-v3.18.4.tar.gz
Resolving gitlab.com (gitlab.com)... 172.65.251.78, 2606:4700:90:0:f22e:fbec:5bed:a9b9
Connecting to gitlab.com (gitlab.com)|172.65.251.78|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/octet-stream]
Saving to: ‘petsc-v3.18.4.tar.gz’

     0K .......... .......... .......... .......... ..........  789K
    50K .......... .......... .......... .......... ..........  883K
   100K .......... .......... .......... .......... .......... 1.57M
   150K .......... .......... .......... .......... .......... 1.31M
   200K .......... .......... .......... .......... .......... 1.92M
   250K .......... .......... .......... .......... .......... 1.09M
   300K .......... .......... .......... .......... .......... 4.29M
   350K .......... .......... .......... .......... .......... 5.21M
   400K .......... .......... .......... .......... .......... 4.2

### Run `make all` phase.
```bash
If it is successfully finished, this message must appear:

=========================================
Now to install the libraries do:
make PETSC_DIR=/home/csem/petsc/gnu/petsc-v3.18.4 PETSC_ARCH=arch-linux-c-opt install
=========================================
```

In [9]:
%%bash
su csem
cd /home/csem/petsc/gnu/petsc-v3.18.4
make PETSC_DIR=/home/csem/petsc/gnu/petsc-v3.18.4 PETSC_ARCH=arch-linux-c-opt all

/usr/bin/python3 ./config/gmakegen.py --petsc-arch=arch-linux-c-opt
 
See documentation/faq.html and documentation/bugreporting.html
for help with installation problems.  Please send EVERYTHING
printed out below when reporting problems.  Please check the
mailing list archives and consider subscribing.
 
  https://petsc.org/release/community/mailing/
 
Starting make run on 879e90c994b2 at Sun, 06 Aug 2023 23:41:58 +0000
Machine characteristics: Linux 879e90c994b2 5.15.109+ #1 SMP Fri Jun 9 10:57:30 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
-----------------------------------------
Using PETSc directory: /home/csem/petsc/gnu/petsc-v3.18.4
Using PETSc arch: arch-linux-c-opt
-----------------------------------------
PETSC_VERSION_RELEASE    1
PETSC_VERSION_MAJOR      3
PETSC_VERSION_MINOR      18
PETSC_VERSION_SUBMINOR   4
PETSC_VERSION_DATE       "unknown"
PETSC_VERSION_GIT        "unknown"
PETSC_VERSION_DATE_GIT   "unknown"
PETSC_VERSION_EQ(MAJOR,MINOR,SUBMINOR) \
PETSC_VERSION_ PETSC_VERS

### Run `make install` phase
```bash
If it is successfully finished, this message must appear:

====================================
Install complete.
Now to check if the libraries are working do (in current directory):
make PETSC_DIR=/home/csem/petsc/gnu/petsc-v3.18.4/installdir PETSC_ARCH="" check
====================================
```

In [10]:
%%bash
su csem
cd /home/csem/petsc/gnu/petsc-v3.18.4
make PETSC_DIR=/home/csem/petsc/gnu/petsc-v3.18.4 PETSC_ARCH=arch-linux-c-opt install

/usr/bin/python3 /home/csem/petsc/gnu/petsc-v3.18.4/config/gmakegentest.py --petsc-dir=/home/csem/petsc/gnu/petsc-v3.18.4 --petsc-arch=arch-linux-c-opt --testdir=./arch-linux-c-opt/tests
*** Using PETSC_DIR=/home/csem/petsc/gnu/petsc-v3.18.4 PETSC_ARCH=arch-linux-c-opt ***
*** Installing PETSc at prefix location: /home/csem/petsc/gnu/petsc-v3.18.4/installdir  ***
Install complete.
Now to check if the libraries are working do (in current directory):
make PETSC_DIR=/home/csem/petsc/gnu/petsc-v3.18.4/installdir PETSC_ARCH="" check
/usr/local/bin/make  --no-print-directory -f makefile PETSC_ARCH=arch-linux-c-opt PETSC_DIR=/home/csem/petsc/gnu/petsc-v3.18.4 petsc4py-install libmesh-install mfem-install slepc-install hpddm-install amrex-install bamg-install
make[2]: Nothing to be done for 'petsc4py-install'.
make[2]: Nothing to be done for 'libmesh-install'.
make[2]: Nothing to be done for 'mfem-install'.
make[2]: Nothing to be done for 'slepc-install'.
make[2]: Nothing to be done for 'hpddm

### Run `make check` phase
```bash
If it is successfully finished, this message must appear:

Running check examples to verify correct installation
Using PETSC_DIR=/home/csem/petsc/gnu/petsc-v3.18.4/installdir and PETSC_ARCH=
C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes
C/C++ example src/snes/tutorials/ex19 run successfully with cuda
Fortran example src/snes/tutorials/ex5f run successfully with 1 MPI process
Completed test examples
```

In [11]:
%%bash
su csem
cd /home/csem/petsc/gnu/petsc-v3.18.4
make PETSC_DIR=/home/csem/petsc/gnu/petsc-v3.18.4/installdir PETSC_ARCH="" check

Running check examples to verify correct installation
Using PETSC_DIR=/home/csem/petsc/gnu/petsc-v3.18.4/installdir and PETSC_ARCH=
C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes
C/C++ example src/snes/tutorials/ex19 run successfully with cuda
Fortran example src/snes/tutorials/ex5f run successfully with 1 MPI process
Completed test examples


### Test an example performing complex numbers (`ex11f.F90`)

In [12]:
%%bash
su csem
cd /home/csem/petsc/gnu/petsc-v3.18.4/src/ksp/ksp/tutorials
make ex11f
mpirun -n 1 ./ex11f -norandom -pc_type none -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always

mpif90 -fPIC -Wall -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -Wno-unused-dummy-argument -g -O   -I/home/csem/petsc/gnu/petsc-v3.18.4/include -I/home/csem/petsc/gnu/petsc-v3.18.4/arch-linux-c-opt/include -I/usr/local/cuda/include     ex11f.F90  -Wl,-rpath,/home/csem/petsc/gnu/petsc-v3.18.4/arch-linux-c-opt/lib -L/home/csem/petsc/gnu/petsc-v3.18.4/arch-linux-c-opt/lib -Wl,-rpath,/home/csem/petsc/gnu/petsc-v3.18.4/installdir/lib -L/home/csem/petsc/gnu/petsc-v3.18.4/installdir/lib -Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64/stubs -Wl,-rpath,/usr/lib/x86_64-linux-gnu/openmpi/lib/fortran/gfortran -L/usr/lib/x86_64-linux-gnu/openmpi/lib/fortran/gfortran -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/11 -L/usr/lib/gcc/x86_64-linux-gnu/11 -Wl,-rpath,/usr/local/cuda/lib64/stubs -lpetsc -lflapack -lfblas -lm -lcudart -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lstdc++ -ldl -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lm

### Check with reference output (`output/ex11f_1.out`)

In [13]:
%%bash
su csem
cd /home/csem/petsc/gnu/petsc-v3.18.4/src/ksp/ksp/tutorials
cat output/ex11f_1.out

  0 KSP Residual norm 4.62271 
  1 KSP Residual norm 1.58711 
  2 KSP Residual norm 0.767563 
  3 KSP Residual norm 0.472102 
  4 KSP Residual norm 0.435655 
  5 KSP Residual norm 0.154866 
  6 KSP Residual norm < 1.e-11
Norm of error < 1.e-12,iterations     6


### Following instructions in https://petsc.org/release/developers/testing/ to run an example that requires CUDA.

In [14]:
%%bash
su csem
cd /home/csem/petsc/gnu/petsc-v3.18.4/
make print-test query='suffix' queryval='2_aijcusparse'

ksp_ksp_tutorials-ex1_2_aijcusparse


In [15]:
%%bash
su csem
cd /home/csem/petsc/gnu/petsc-v3.18.4/
make test search=ksp_ksp_tutorials-ex1_2_aijcusparse

Using MAKEFLAGS: -- search=ksp_ksp_tutorials-ex1_2_aijcusparse
          CC arch-linux-c-opt/tests/ksp/ksp/tutorials/ex1.o
     CLINKER arch-linux-c-opt/tests/ksp/ksp/tutorials/ex1
        TEST arch-linux-c-opt/tests/counts/ksp_ksp_tutorials-ex1_2_aijcusparse.counts
 ok ksp_ksp_tutorials-ex1_2_aijcusparse
 ok diff-ksp_ksp_tutorials-ex1_2_aijcusparse


In [16]:
%%bash
su csem
cd /home/csem/petsc/gnu/petsc-v3.18.4/
cat arch-linux-c-opt/tests/ksp/ksp/tutorials/runex1_2_aijcusparse/ksp_ksp_tutorials-ex1_2_aijcusparse.sh

/usr/bin/mpiexec --oversubscribe  -n 1 ../ex1 -petsc_ci -pc_type sor -pc_sor_symmetric -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always -mat_type aijcusparse -vec_type cuda  > ex1_2_aijcusparse.tmp 2> runex1_2_aijcusparse.err


In [17]:
%%bash
su csem
cd /home/csem/petsc/gnu/petsc-v3.18.4/
cd src/ksp/ksp/tutorials/
make ex1
/usr/bin/mpiexec --oversubscribe  -n 1  ./ex1 \
-petsc_ci \
-pc_type sor \
-pc_sor_symmetric \
-ksp_monitor_short \
-ksp_gmres_cgs_refinement_type refine_always \
-mat_type aijcusparse \
-vec_type cuda \
-use_gpu_aware_mpi 0


mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas -Wno-lto-type-mismatch -Wno-stringop-overflow -fstack-protector -fvisibility=hidden -g -O  -I/home/csem/petsc/gnu/petsc-v3.18.4/include -I/home/csem/petsc/gnu/petsc-v3.18.4/arch-linux-c-opt/include -I/usr/local/cuda/include     -export-dynamic ex1.c  -Wl,-rpath,/home/csem/petsc/gnu/petsc-v3.18.4/arch-linux-c-opt/lib -L/home/csem/petsc/gnu/petsc-v3.18.4/arch-linux-c-opt/lib -Wl,-rpath,/home/csem/petsc/gnu/petsc-v3.18.4/installdir/lib -L/home/csem/petsc/gnu/petsc-v3.18.4/installdir/lib -Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64/stubs -Wl,-rpath,/usr/lib/x86_64-linux-gnu/openmpi/lib/fortran/gfortran -L/usr/lib/x86_64-linux-gnu/openmpi/lib/fortran/gfortran -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/11 -L/usr/lib/gcc/x86_64-linux-gnu/11 -Wl,-rpath,/usr/local/cuda/lib64/stubs -lpetsc -lflapack -lfblas -lm -lcudart -lnvToolsExt -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda -lstdc++ -ldl -lmpi_u

If it is successfully finished, this output must appear:
```bash
  0 KSP Residual norm 0.968764
  1 KSP Residual norm 0.361001
  2 KSP Residual norm 0.247329
  3 KSP Residual norm 0.0808915
  4 KSP Residual norm 0.01289
  5 KSP Residual norm 0.00375064
  6 KSP Residual norm 0.000294092
  7 KSP Residual norm 1.40861e-05
  8 KSP Residual norm 3.48863e-07
KSP Object: 1 MPI process
  type: gmres
    restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with one step of iterative refinement
    happy breakdown tolerance 1e-30
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI process
  type: sor
    type = symmetric, iterations = 1, local iterations = 1, omega = 1.
  linear system matrix = precond matrix:
  Mat Object: 1 MPI process
    type: seqaijcusparse
    rows=10, cols=10
    total: nonzeros=28, allocated nonzeros=28
    total number of mallocs used during MatSetValues calls=0
      not using I-node routines
Norm of error 4.10316e-07, Iterations 8
  0 KSP Residual norm 0.377523
  1 KSP Residual norm 0.0140399
  2 KSP Residual norm 0.000364106
  3 KSP Residual norm 7.83047e-06
  4 KSP Residual norm 1.33045e-07

  ```

In [18]:
%%bash
su csem
cd /home/csem/petsc/gnu/petsc-v3.18.4/
cd src/ksp/ksp/tutorials/
cat output/ex1_2_aijcusparse.out

  0 KSP Residual norm 0.968764 
  1 KSP Residual norm 0.361001 
  2 KSP Residual norm 0.247329 
  3 KSP Residual norm 0.0808915 
  4 KSP Residual norm 0.01289 
  5 KSP Residual norm 0.00375064 
  6 KSP Residual norm 0.000294092 
  7 KSP Residual norm 1.40861e-05 
  8 KSP Residual norm 3.48863e-07 
KSP Object: 1 MPI process
  type: gmres
    restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with one step of iterative refinement
    happy breakdown tolerance 1e-30
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI process
  type: sor
    type = symmetric, iterations = 1, local iterations = 1, omega = 1.
  linear system matrix = precond matrix:
  Mat Object: 1 MPI process
    type: seqaijcusparse
    rows=10, cols=10
    total: nonzeros=28, allocated nonzeros=50
    total number of mallocs used during MatSetValu

### Profiling with `nvprof`

In [19]:
%%bash
su csem
cd /home/csem/petsc/gnu/petsc-v3.18.4/
cd src/ksp/ksp/tutorials/
make ex1
export PATH=/usr/local/cuda/bin:$PATH
/usr/bin/mpiexec --oversubscribe  -n 1 nvprof -f -o ex1.%q{OMPI_COMM_WORLD_RANK}.nvprof ./ex1 \
-petsc_ci \
-pc_type sor \
-pc_sor_symmetric \
-ksp_monitor_short \
-ksp_gmres_cgs_refinement_type refine_always \
-mat_type aijcusparse \
-vec_type cuda \
-use_gpu_aware_mpi 0

make: 'ex1' is up to date.
  0 KSP Residual norm 0.968764 
  1 KSP Residual norm 0.361001 
  2 KSP Residual norm 0.247329 
  3 KSP Residual norm 0.0808915 
  4 KSP Residual norm 0.01289 
  5 KSP Residual norm 0.00375064 
  6 KSP Residual norm 0.000294092 
  7 KSP Residual norm 1.40861e-05 
  8 KSP Residual norm 3.48863e-07 
KSP Object: 1 MPI process
  type: gmres
    restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization with one step of iterative refinement
    happy breakdown tolerance 1e-30
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI process
  type: sor
    type = symmetric, iterations = 1, local iterations = 1, omega = 1.
  linear system matrix = precond matrix:
  Mat Object: 1 MPI process
    type: seqaijcusparse
    rows=10, cols=10
    total: nonzeros=28, allocated nonzeros=50
    total number of mal

==36711== NVPROF is profiling process 36711, command: ./ex1 -petsc_ci -pc_type sor -pc_sor_symmetric -ksp_monitor_short -ksp_gmres_cgs_refinement_type refine_always -mat_type aijcusparse -vec_type cuda -use_gpu_aware_mpi 0
==36711== Generated result file: /home/csem/petsc/gnu/petsc-v3.18.4/src/ksp/ksp/tutorials/ex1.0.nvprof


### Show profiling obtained with `nvprof`

In [20]:
%%bash
su csem
cd /home/csem/petsc/gnu/petsc-v3.18.4/
cd src/ksp/ksp/tutorials/
export PATH=/usr/local/cuda/bin:$PATH
nvprof -i ex1.0.nvprof

            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:   18.21%  351.01us       107  3.2800us  2.8160us  4.3200us  void axpy_kernel_val<double2, double2>(cublasAxpyParamsVal<double2, double2, double2>)
                   17.87%  344.54us        92  3.7450us  3.5840us  10.176us  void dot_kernel<double2, int=64, int=1, cublasDotParams<cublasGemvTensor<double2 const >, cublasGemvTensorStridedBatched<double2>>>(double2 const )
                   17.17%  330.91us        92  3.5960us  3.5200us  5.3120us  void reduce_1Block_kernel<double2, int=64, int=6, cublasGemvTensorStridedBatched<double2>, cublasGemvTensorStridedBatched<double2>, cublasGemvTensorStridedBatched<double2>>(double2 const *, double2, double2, int, double2 const *, double2, cublasGemvTensorStridedBatched<double2>, cublasGemvTensorStridedBatched<double2>, cublasPointerMode_t, cublasLtEpilogue_t, cublasGemvTensorStridedBatched<biasType<cublasGemvTensorStridedBatched<double2::value_type

## `CSEM3D` program

### Download the source code files of `CSEM3D` program in the `root` area.

In [21]:
%%bash
wget --no-check-certificate 'https://docs.google.com/uc?export=download&id=10WfuzFuv9bfr9MTeyphTyRM7i9rjGlxf' -O csem3d_w-v1.0.2.tar.gz

--2023-08-06 23:47:35--  https://docs.google.com/uc?export=download&id=10WfuzFuv9bfr9MTeyphTyRM7i9rjGlxf
Resolving docs.google.com (docs.google.com)... 108.177.127.100, 108.177.127.102, 108.177.127.139, ...
Connecting to docs.google.com (docs.google.com)|108.177.127.100|:443... connected.
HTTP request sent, awaiting response... 303 See Other
Location: https://doc-0s-38-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/612rvk27negknh1av5h3jr1aldgnjh6h/1691365650000/11313332932477869617/*/10WfuzFuv9bfr9MTeyphTyRM7i9rjGlxf?e=download&uuid=bbc4ff08-d3ea-4481-b073-4e388bbc41a6 [following]
--2023-08-06 23:47:39--  https://doc-0s-38-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/612rvk27negknh1av5h3jr1aldgnjh6h/1691365650000/11313332932477869617/*/10WfuzFuv9bfr9MTeyphTyRM7i9rjGlxf?e=download&uuid=bbc4ff08-d3ea-4481-b073-4e388bbc41a6
Resolving doc-0s-38-docs.googleusercontent.com (doc-0s-38-docs.googleusercontent.com)... 108.177.126.132, 2a00:

### Copy the tarball file to `csem` user account area, and change the onwer of this file to `csem` user.

In [22]:
%%bash
cp csem3d_w-v1.0.2.tar.gz /home/csem/
chown csem:csem /home/csem/csem3d_w-v1.0.2.tar.gz

### Unpacking the tarball file

In [23]:
%%bash
su csem
cd /home/csem
tar zxvf csem3d_w-v1.0.2.tar.gz

csem3d_w-v1.0.2/
csem3d_w-v1.0.2/CSEM3D_W/
csem3d_w-v1.0.2/CSEM3D_W/CSEM3D_W.sln
csem3d_w-v1.0.2/CSEM3D_W/CSEM3D_W/
csem3d_w-v1.0.2/CSEM3D_W/CSEM3D_W/.vscode/
csem3d_w-v1.0.2/CSEM3D_W/CSEM3D_W/.vscode/launch.json
csem3d_w-v1.0.2/CSEM3D_W/CSEM3D_W/B_Tx_B_Rx.F
csem3d_w-v1.0.2/CSEM3D_W/CSEM3D_W/B_Tx_E_Rx.F
csem3d_w-v1.0.2/CSEM3D_W/CSEM3D_W/CSEM3D_W.F
csem3d_w-v1.0.2/CSEM3D_W/CSEM3D_W/CSEM3D_W.srm
csem3d_w-v1.0.2/CSEM3D_W/CSEM3D_W/CSEM3D_W.u2d
csem3d_w-v1.0.2/CSEM3D_W/CSEM3D_W/CSEM3D_W.vfproj
csem3d_w-v1.0.2/CSEM3D_W/CSEM3D_W/CSEM3D_mod.F
csem3d_w-v1.0.2/CSEM3D_W/CSEM3D_W/E_Tx_B_Rx.F
csem3d_w-v1.0.2/CSEM3D_W/CSEM3D_W/E_Tx_E_Rx.F
csem3d_w-v1.0.2/CSEM3D_W/CSEM3D_W/abs_to_rel.F
csem3d_w-v1.0.2/CSEM3D_W/CSEM3D_W/addair.F
csem3d_w-v1.0.2/CSEM3D_W/CSEM3D_W/allvars.F
csem3d_w-v1.0.2/CSEM3D_W/CSEM3D_W/bipole2.F
csem3d_w-v1.0.2/CSEM3D_W/CSEM3D_W/biprho.F
csem3d_w-v1.0.2/CSEM3D_W/CSEM3D_W/blocks.F
csem3d_w-v1.0.2/CSEM3D_W/CSEM3D_W/bottom.F
csem3d_w-v1.0.2/CSEM3D_W/CSEM3D_W/chk_rx_tx.F
csem3d_w-v1.0.

### Run `make all` to install the `CSEM3D` program
```bash
If it is successfully finished, this message must appear:

mpif90 -o CSEM3D_W outfields_E_Tx.o spline.o bottom.o allvars.o intpol1.o biprho.o dimped.o in3dmod.o compute_src_wts.o set_resist_vector.o locals.o kinds.o outfields_B_Tx.o B_Tx_B_Rx.o dimens.o set_P.o set_bv_e.o d1imped.o set_src.o grid.o in3drho.o CSEM3D_mod.o CSEM3D_W.o set_bv_h.o bipole2.o set_A.o chk_rx_tx.o set_1d_resist.o txrx.o splint.o set_rhs.o blocks.o E_Tx_E_Rx.o convres.o abs_to_rel.o B_Tx_E_Rx.o E_Tx_B_Rx.o addair.o -L/home/csem/petsc/gnu/petsc-v3.18.4/installdir/lib -lpetsc  
```

In [24]:
%%bash
su csem
cd /home/csem/csem3d_w-v1.0.2/CSEM3D_W/CSEM3D_W
make PETSC_DIR=${HOME}/petsc/gnu/petsc-v3.18.4/installdir -f scripts/makefile_gnu clean

rm -rf *.o CSEM3D_W *.mod


In [25]:
%%bash
su csem
cd /home/csem/csem3d_w-v1.0.2/CSEM3D_W/CSEM3D_W
make PETSC_DIR=${HOME}/petsc/gnu/petsc-v3.18.4/installdir -f scripts/makefile_gnu all

mpif90 -ffixed-line-length-none -g -O2 -ffpe-trap=zero,invalid,overflow -Wcompare-reals -Wconversion -I/home/csem/petsc/gnu/petsc-v3.18.4/installdir/include   -c -o kinds.o kinds.F
mpif90 -ffixed-line-length-none -g -O2 -ffpe-trap=zero,invalid,overflow -Wcompare-reals -Wconversion -I/home/csem/petsc/gnu/petsc-v3.18.4/installdir/include   -c -o CSEM3D_mod.o CSEM3D_mod.F
mpif90 -ffixed-line-length-none -g -O2 -ffpe-trap=zero,invalid,overflow -Wcompare-reals -Wconversion -I/home/csem/petsc/gnu/petsc-v3.18.4/installdir/include   -c -o locals.o locals.F
mpif90 -ffixed-line-length-none -g -O2 -ffpe-trap=zero,invalid,overflow -Wcompare-reals -Wconversion -I/home/csem/petsc/gnu/petsc-v3.18.4/installdir/include   -c -o txrx.o txrx.F
mpif90 -ffixed-line-length-none -g -O2 -ffpe-trap=zero,invalid,overflow -Wcompare-reals -Wconversion -I/home/csem/petsc/gnu/petsc-v3.18.4/installdir/include   -c -o abs_to_rel.o abs_to_rel.F
mpif90 -ffixed-line-length-none -g -O2 -ffpe-trap=zero,invalid,overflow -Wc

### Run this bash `script` to execute the generated `CSEM3D_W` binary file.

```bash
If it is successfully finished, a similar message like below must appear:

 16 KSP preconditioned resid norm 3.863008301354e-18 true resid norm 1.284864871592e-05 ||r(i)||/||b|| 3.405644702963e-01
 17 KSP preconditioned resid norm 2.061402843466e-18 true resid norm 1.054741071689e-05 ||r(i)||/||b|| 2.795681805313e-01
 18 KSP preconditioned resid norm 1.062033155132e-18 true resid norm 3.992776343547e-06 ||r(i)||/||b|| 1.058319664983e-01
 converged reason            2
 total number of relaxations           18
 ========================================


 ************************************************
  3D finished
  Total CPU time:    18.7500000      seconds
 ************************************************

 total cpu time:    18.7500000      seconds
 CSEM3D_W finished
```



In [26]:
%%bash
su csem
cd /home/csem/csem3d_w-v1.0.2/CSEM3D_W/CSEM3D_W

PETSC_DIR=${HOME}/petsc/gnu/petsc-v3.18.4/installdir

dataset=Sintetico
ntasks=1
nnodes=1

TIMESTART=$(date +%Y%m%d%H%M%S)

if [[ -L ${dataset} ]]
then
    echo "Link já existe para o dataset ${dataset}"
else
    ln -s dataset/${dataset}
fi
sed 's/\.\//'${dataset}'\//g' ${dataset}/Parameters.inp | \
sed 's/'${dataset}'\/OutData/OutData/g' > Parameters.inp

outputdir="OutData"
if [[ -d ${outputdir} ]]
then
    echo "OutData já existe."
    rm -fr ${outputdir}
fi
mkdir ${outputdir}


resultsdir=results/${dataset}/NUMNODES-${nnodes}/MPI-${ntasks}/EXECSTART-${TIMESTART}
mkdir -p ${resultsdir}

export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${PETSC_DIR}/lib

executable=CSEM3D_W

echo "mpirun -np $ntasks ./${executable}"
mpirun -np $ntasks ./${executable} \
 -A_mat_type mpiaij \
 -P_mat_type mpiaij \
 -em_ksp_monitor_true_residual \
 -em_ksp_type bcgs \
 -em_pc_type bjacobi \
 -em_sub_pc_type ilu \
 -em_sub_pc_factor_levels 3 \
 -em_sub_pc_factor_fill 6 \
 < ./Parameters.inp \
 2>&1 | tee csem3d_w-${TIMESTART}.out

mv $outputdir/ ${resultsdir}/
cp csem3d_w-${TIMESTART}.out ${resultsdir}/


mpirun -np 1 ./CSEM3D_W



 enter 3D model rhoxx file name

 enter 3D model rhoyy file name

 enter 3D model rhozz file name
 enter error level for stopping? (>= 1e-6 suggested)
   (this is the value rnorm/bnorm where rnorm is the
    L2 norm of the residual and bnorm is the L2 norm
    of the right-hand side. Smaller values result in
    more accurate solutions, but at the
    expense of more relaxation iterations.
 enter max number of relaxations (50-100 suggested)
 use 1D boundary values (y/n)
 [IF YOU TYPE N, THEN ZERO BOUNDARY VALUES WILL BE USED]
 do a deep water caclulation? (y/n)
 [THIS WILL REPLACE AIR LAYERS WITH SEA LAYERS]
 enter directory name for input Tx files
 enter directory name for output 3D files
 force receivers to seafloor (y/n)?
 Compute results for electric or magnetic dipole sources (e/m)?
 how many Tx files to compute?
 enter name for Tx data file           1

 Reading model dimensions from file Sintetico/Model/ResUni1D_RhoH.out                                

In [27]:
%%bash
su csem
cd /home/csem/csem3d_w-v1.0.2/CSEM3D_W/CSEM3D_W

PETSC_DIR=${HOME}/petsc/gnu/petsc-v3.18.4/installdir


export PATH=/usr/local/cuda/bin:$PATH

dataset=Sintetico
ntasks=1
nnodes=1

TIMESTART=$(date +%Y%m%d%H%M%S)

if [[ -L ${dataset} ]]
then
    echo "Link já existe para o dataset ${dataset}"
else
    ln -s dataset/${dataset}
fi
sed 's/\.\//'${dataset}'\//g' ${dataset}/Parameters.inp | \
sed 's/'${dataset}'\/OutData/OutData/g' > Parameters.inp

outputdir="OutData"
if [[ -d ${outputdir} ]]
then
    echo "OutData já existe."
    rm -fr ${outputdir}
fi
mkdir ${outputdir}


resultsdir=results/${dataset}/NUMNODES-${nnodes}/MPI-${ntasks}/EXECSTART-${TIMESTART}
mkdir -p ${resultsdir}

export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${PETSC_DIR}/lib

executable=CSEM3D_W

echo "mpirun -np $ntasks nvprof -f -o ${executable}.%q{OMPI_COMM_WORLD_RANK}.nvprof ./${executable}"
mpirun -np $ntasks nvprof -f -o ${executable}.%q{OMPI_COMM_WORLD_RANK}.nvprof ./${executable} \
 -A_mat_type aijcusparse \
 -P_mat_type aijcusparse \
 -vec_type cuda \
 -use_gpu_aware_mpi 0 \
 -em_ksp_monitor_true_residual \
 -em_ksp_type bcgs \
 -em_pc_type bjacobi \
 -em_sub_pc_type ilu \
 -em_sub_pc_factor_levels 3 \
 -em_sub_pc_factor_fill 6 \
 < ./Parameters.inp \
 2>&1 | tee csem3d_w-${TIMESTART}.out

mv $outputdir/ ${resultsdir}/
cp csem3d_w-${TIMESTART}.out ${resultsdir}/

Link já existe para o dataset Sintetico
mpirun -np 1 nvprof -f -o CSEM3D_W.%q{OMPI_COMM_WORLD_RANK}.nvprof ./CSEM3D_W
==37155== NVPROF is profiling process 37155, command: ./CSEM3D_W -A_mat_type aijcusparse -P_mat_type aijcusparse -vec_type cuda -use_gpu_aware_mpi 0 -em_ksp_monitor_true_residual -em_ksp_type bcgs -em_pc_type bjacobi -em_sub_pc_type ilu -em_sub_pc_factor_levels 3 -em_sub_pc_factor_fill 6



 enter 3D model rhoxx file name

 enter 3D model rhoyy file name

 enter 3D model rhozz file name
 enter error level for stopping? (>= 1e-6 suggested)
   (this is the value rnorm/bnorm where rnorm is the
    L2 norm of the residual and bnorm is the L2 norm
    of the right-hand side. Smaller values result in
    more accurate solutions, but at the
    expense of more relaxation iterations.
 enter max number of relaxations (50-100 suggested)
 use 1D boundary values (y/n)
 [IF YOU TYPE N, THEN ZERO BOUNDARY VALUES WILL BE USED]
 do a deep water caclulation? (y/n)
 [THIS WILL REPLACE AI