Add CUDA support to dense direct solver options #950

victorapm · 2023-07-20T01:48:56Z

This PR adds CUDA support to the dense direct solver options (98, 99, 198, and 199) of BoomerAMG and MGR:

Options 98 and 99 compute the LU factorization with pivoting.
Options 198 and 199 compute the dense inverse matrix explicitly.

The difference between 98 and 99 or 198 and 199 lies on the strategy used for computing the dense local matrix from the distributed ParCSR matrix. In the first case (98 and 198), hypre's internal DataExchange is used , in the latter case (99 and 199), MPI collectives defined on a sub-communicator are used.

GPU support is achieved via the vendor math libraries (cuSOLVER in this case) or via MAGMA, when hypre is configured with MAGMA support. In the latter scenario, MAGMA takes precedence over cuSOLVER.

In addition to these changes, the par_gauss_elim.c file was majorly refactored.

TODOs for following PRs:

Add regression tests for the new capabilities.
Create a separate data structure for accessing dense direct solvers.
Add GPU-aware MPI to the matrix/vector gathering phases.
Add HIP and SYCL support.
Add a runtime option to switch between solver implementations (vendor or MAGMA)

… runs on CPU)

…G (option 98)

…place with new one (with device support)

rfalgout

Looks good to me @victorapm! Thanks!

ulrikeyang

this looks fine, as far as I can tell.

oseikuffuor1

Thanks again Victor. Nice work.

liruipeng · 2023-08-24T21:45:19Z

src/utilities/device_utils.h

@@ -360,7 +360,6 @@ using hypre_DeviceItem = sycl::nd_item<3>;
   if (cudaSuccess != err) {                                                                 \
      printf("CUDA ERROR (code = %d, %s) at %s:%d\n", err, cudaGetErrorString(err),          \
                   __FILE__, __LINE__);                                                      \
-      hypre_assert(0); exit(1);                                                              \


Why remove? In the case of CUDA ERROR, do we want to exit right away?

@rfalgout and I talked about this at some point. My understanding was that we shouldn't exit from hypre and the application that calls hypre would decide how to handle the error. If confirmed, this behavior needs to be replicated in the other macros (in a separate PR)

OK. Then, at least a return hypre_error should be called? IMHO, there's no reason to continue to when a CUDA error occurs.

liruipeng · 2023-08-24T22:08:29Z

Looks good to me. I would like to merge #927 before this. Thanks!

victorapm · 2023-08-24T22:09:24Z

OK! I can wait for you

* Improve ILU documentation (#939) Add warnings for Euclid and PILUT redirecting users to hypre-ILU. Rewrite hypre-ILU overview section. Add new sections to hypre-ILU documentation: "User-level functions", "ILU as smoother for BoomerAMG", and "GPU support". Include info about new iterative ILU options. Update BoomerAMG complex smoothers section. Change name "hypre-ILU" to "ILU" * Add FSAI support with CUDA and HIP (#739) This PR adds CUDA and HIP support to FSAI according to a static pattern generation algorithm. The resulting method can also be used as a preconditioner for BoomerAMG. A detailed list of changes is given below: * Add par_fsai_device.c * Add hypre_FSAIApply * Add function to dump local linear systems in dense format * Implement static FSAI pattern computation via powers of A * Improve filtering of candidate pattern * Improve local linear systems extraction * Add option for a 125pt matrix (27pt squared) * Add options to control sizes of the memory pools with umpire * Add hypre_GpuProfiling calls * Improve candidate pattern truncation times * Add max_nnz_row member and its private and public functions to FSAI * Use max_nnz_row in FSAISetupDevice * Add num_levels member and its private and public functions to FSAI * Add threshold member and its public/private functions to FSAI * Expose FSAI algorithm type to BoomerAMG * Expose options to control FSAI setup * Add cuSOLVER variables and calls * Add batched dense linear solver calls to FSAI * Improve execution time for generating random numbers * Show FSAI parameters when amg_print_level >= 1 * Improve output of FSAIPrintStats * Implement warp calls * Add hypre_mask type and hypre_ballot_sync wrapper function * Add hypre_popc and hypre_ffs wrapper functions * Implement warp_allreduce_max calls * Change: hypreDevice -> hypre_*Device * Add rocSOLVER calls * Apply astyle * Remove redundant line * Add MAGMA option to FSAI (#940) Allow the use of MAGMA as local linear solver for FSAI. Add `HYPRE_FSAISetLocalSolveType` for choosing the local linear solve type used in FSAI and add `HYPRE_BoomerAMGSetFSAILocalSolveType` for the case when FSAI is used as a smoother to BoomerAMG. * Fix Copyright message (#951) Fix year in Copyright message of a few source files. * Change sh to bash (#900) Change shell scripts from `#!/bin/sh` to `#!/bin/bash` * Add Binary I/O functions for IJ matrices and vectors (#826) This PR adds new Print and Read functions for matrices and vectors to be stored/read in binary format. A detailed list of changes is given below: * Add IJMatrix/ParCSRMatrix routines for binary I/O * Add IJVector/ParVector routines for binary I/O * Add typedefs for unsigned integer types and single-precision floating-point * Change char sizes to HYPRE_MAX_FILE_NAME_LEN * Add options to IJ driver for reading binary matrices/vectors * Add regression tests for IJ input/output * Keep smooth_num_levels in sync with amg_data (#954) This solves an out-of-bounds memory error during `hypre_BoomerAMGSetup` when called multiple times without a call to `hypre_BoomerAMGDestroy` interleaved. This pull request makes sure that `smooth_num_levels` is reset to `hypre_ParAMGDataSmoothNumLevels(amg_data)` before the smoothers variable is allocated. * L1 HSGS (#927) This PR provides a convergent l1-hybrid symmetric Gauss-Seidel (HSGS) method. * Apply Debian patches (#966) 1. Fix make checkpar 2. Add missing `finalizeAllTimings 3. Add error code support to checktest.sh --------- Co-authored-by: Drew Parsons <dparsons@debian.org> * Add HYPRE_GetExecutionPolicyName (#969) * Add HYPRE_GetExecutionPolicyName * Add doc entries to memory/execution routines * Doc updates (#974) * Updated documentation for clarity and to clean up a few typos. * Add warning messages to FEI, ParaSails, PILUT, Euclid. * Improved and updated GPU information * Added CMake build information * Add HYPRE_MGRSetLevelPMaxElmts (#975) Also adds its private interface, and accompanying code (#975) * Fix regression tests (#979) Initialize `P_max_elmts` if not set by user. * Add CUDA support to dense direct solver options (#950) This PR adds CUDA support to dense direct solver options (98, 99, 198, and 199) of BoomerAMG and MGR: - Options 98 and 99 compute the LU factorization with pivoting. - Options 198 and 199 compute the dense inverse matrix explicitly. Detailed list of changes below: * Add hypre_ParCSRMatrixToCSRMatrixAll_v2 * Add hypre_SeqVectorMigrate * Add hypre_ParVectorToVectorAll_v2 * Refactor implementation of BoomerAMG's Gaussian Elimination * Add hypre_GaussElimAllSetup and hypre_GaussElimAllSolve * Add device support via MAGMA and cuSOLVER to BoomerAMG's LU coarsest linear solver (options 98, 99) * Add device support via MAGMA and cuSOLVER to BoomerAMG's exact inverse solver (options 198, 199) * Add wrappers to MAGMA's getrf and getrs * Add MAGMA info on AMG stats + code formatting * Add wrappers to cuSOLVER and cuBLAS functions * Add wrapper hypre_magma_getri_nb * Add header file for collecting hypre functors * Add memory location to Gaussian elimination data structure * Improve description of coarsest level solver options * Update GE data structure in MGR * Change Ainv to Awork * Fix MSVC build (#978) * Allocate buffer on heap memory * Fix Pragma definition for MSVC * Fix uninitiliazed variable * Loop counter cannot be non-negative for MSVC * SYCL triangular solves, Chebyshev relaxation, etc. (#972) Adding more sycl functionality including chebyshev relaxation and triangular solves, which in turn enables Gauss-Seidel, ILU, etc. * Add MGR statistics (#897) This PR improves statistics reporting for MGR. A list with detailed changes is given below: * Add MatrixStats and MatrixStatsArray * Add hypre_squared utility * Fix divisor line location for Rectangular matrices * Minor fix on hypre_squared definition * Move nonzero variable definitions up * Initialize global number of nonzeros at matrix creation * Add hypre_ParCSRMatrixStatsArray and helper functions * Add par_csr_matstats_device * Add par_mgr_stats * Print F-relax data only once * Fix clang-13 build * IJ driver now passes in print_level option to MGR * GPU runs always require A_FF in MGR * Add HYPRE_PRINT_SHIFTED_PARAM macro * Add hypre_IntArraySetInterleavedValues (host/device implementations) * Fix F-relaxation reporting + refactoring * Update global number of nonzeros of the matrix * Move new BoomerAMG functions to par_stats * Apply astyle * Improve MGR data printing (#976) This enhances what print_level can achieve in MGR. Particularly, now we can dump linear system info to files according to the print_level code. We also have the ability now of printing a sequence of linear systems to file (useful when hypre is used in time-stepping application). A detailed list of changes is given below: * Add utilities for creating/checking directories * Add print_level codes to MGR and new info_path member * Add hypre_MGRDataPrint * Add call to hypre_MGRDataPrint and logic to update the print_level variable * Update MGRSolve with new print_level logic * Remove hypre_MGRWriteSolverParams * Update documentation for HYPRE_MGRSetPrintLevel * Implement new logic for HYPRE_MGR_PRINT_MODE_ASCII * Fix regressions (#988) * CMake uses C99 by default * HYPRE_PRINT_INDENT works without a loop * Fix compilation on Windows (#990) * Do not use dirent.h in windows --------- Co-authored-by: Victor A. P. Magri <50467563+victorapm@users.noreply.github.com> Co-authored-by: tisaac <toby.isaac@gmail.com> Co-authored-by: Rui Peng Li <li50@llnl.gov> Co-authored-by: Drew Parsons <dparsons@debian.org> Co-authored-by: Wayne Mitchell <mitchell82@llnl.gov>

This PR adds CUDA support to dense direct solver options (98, 99, 198, and 199) of BoomerAMG and MGR: - Options 98 and 99 compute the LU factorization with pivoting. - Options 198 and 199 compute the dense inverse matrix explicitly. Detailed list of changes below: * Add hypre_ParCSRMatrixToCSRMatrixAll_v2 * Add hypre_SeqVectorMigrate * Add hypre_ParVectorToVectorAll_v2 * Refactor implementation of BoomerAMG's Gaussian Elimination * Add hypre_GaussElimAllSetup and hypre_GaussElimAllSolve * Add device support via MAGMA and cuSOLVER to BoomerAMG's LU coarsest linear solver (options 98, 99) * Add device support via MAGMA and cuSOLVER to BoomerAMG's exact inverse solver (options 198, 199) * Add wrappers to MAGMA's getrf and getrs * Add MAGMA info on AMG stats + code formatting * Add wrappers to cuSOLVER and cuBLAS functions * Add wrapper hypre_magma_getri_nb * Add header file for collecting hypre functors * Add memory location to Gaussian elimination data structure * Improve description of coarsest level solver options * Update GE data structure in MGR * Change Ainv to Awork

victorapm added 30 commits March 20, 2023 20:54

Add MAGMA support to autotools build

78b544d

Merge branch 'master' into add-magma

8094fc6

Add MAGMA interface files

b1ec3bc

Call MAGMA init/finalize interfaces

427288f

Merge branch 'master' into add-magma

1a6cdda

Solve MAGMA's fortran name mangling issue

5383b64

Add MAGMA support to CMake build

8a9264e

Remove extra flag

ac931ea

Minor fixes

c285879

Merge branch 'master' into add-magma

0c45eb1

Add memory_location to hypre_ParCSRMatrixToCSRMatrixAll

5ec85d9

Add hypre_SeqVectorMigrate

d63831b

Add memory_location as input argument to hypre_ParVectorToVectorAll

026164d

Coarse level solver based on LU (option 98) works on device runs (but…

e0b9391

… runs on CPU)

Add device support via MAGMA to LU coarsest linear solver of BoomerAM…

85a17bf

…G (option 98)

Add hypre_GaussElimAllSetup and hypre_GaussElimAllSolve

7ce5bbc

Remove old implmentation of gaussian elimination via option 19 and re…

2736caf

…place with new one (with device support)

Remove old code that was not being compiled

b455efc

Code style changes + refactoring

e09851a

Add wrappers to MAGMA's getrf and getrs

034fe2b

Add MAGMA info on AMG stats + code formatting

a08091f

Merge branch 'master' into magma-lu

bc965d1

Merge branch 'master' into magma-lu

eebd6c6

Add sanity check and remove unnecessary include

8382d6d

Merge branch 'master' into magma-lu

4c26ca7

Merge branch 'master' into magma-lu

e549229

Refactoring on par_gauss_elim

434a0cb

Fix host compilation

eddd33f

More refactoring

dd7c393

Update wrapper to all dense direct solvers

f9de8c7

victorapm added 2 commits July 19, 2023 18:49

Change Ainv to Awork

7b0dcb0

Apiv is on the host memory when using MAGMA

67df6a0

victorapm requested review from liruipeng, ulrikeyang, rfalgout and oseikuffuor1 July 20, 2023 01:49

rfalgout approved these changes Jul 20, 2023

View reviewed changes

victorapm added 2 commits July 20, 2023 13:09

Remove old temporary code

300892b

Merge branch 'master' into magma-lu

351db10

ulrikeyang approved these changes Aug 17, 2023

View reviewed changes

victorapm added 2 commits August 24, 2023 11:16

Remove old info

ad8d537

Merge branch 'master' into magma-lu

2b3a094

oseikuffuor1 approved these changes Aug 24, 2023

View reviewed changes

liruipeng reviewed Aug 24, 2023

View reviewed changes

victorapm added 10 commits September 18, 2023 19:51

Add check for empty ranks

0a476b8

Add hypre_ParVectorToVectorAll_v2

28c5de2

Merge branch 'master' into magma-lu

186389a

Merge branch 'magma-lu' of github.com:hypre-space/hypre into magma-lu

c71ee1f

Add missing free

ea4dcba

Apply astyle

4464990

Fix memory leak

c58ce28

Add CSRMatrixAll_v2 function

9ae94ae

Fix check-double test

374e205

Fixes for lassen

de7dda5

victorapm merged commit fc49a5e into master Oct 8, 2023

victorapm deleted the magma-lu branch October 8, 2023 15:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CUDA support to dense direct solver options #950

Add CUDA support to dense direct solver options #950

victorapm commented Jul 20, 2023 •

edited

Loading

rfalgout left a comment

ulrikeyang left a comment

oseikuffuor1 left a comment

liruipeng Aug 24, 2023

victorapm Aug 24, 2023 •

edited

Loading

liruipeng Aug 24, 2023

victorapm Aug 24, 2023

liruipeng commented Aug 24, 2023

victorapm commented Aug 24, 2023

Add CUDA support to dense direct solver options #950

Add CUDA support to dense direct solver options #950

Conversation

victorapm commented Jul 20, 2023 • edited Loading

rfalgout left a comment

Choose a reason for hiding this comment

ulrikeyang left a comment

Choose a reason for hiding this comment

oseikuffuor1 left a comment

Choose a reason for hiding this comment

liruipeng Aug 24, 2023

Choose a reason for hiding this comment

victorapm Aug 24, 2023 • edited Loading

Choose a reason for hiding this comment

liruipeng Aug 24, 2023

Choose a reason for hiding this comment

victorapm Aug 24, 2023

Choose a reason for hiding this comment

liruipeng commented Aug 24, 2023

victorapm commented Aug 24, 2023

victorapm commented Jul 20, 2023 •

edited

Loading

victorapm Aug 24, 2023 •

edited

Loading