Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CUDA support to dense direct solver options #950

Merged
merged 68 commits into from
Oct 8, 2023
Merged

Add CUDA support to dense direct solver options #950

merged 68 commits into from
Oct 8, 2023

Conversation

victorapm
Copy link
Contributor

@victorapm victorapm commented Jul 20, 2023

This PR adds CUDA support to the dense direct solver options (98, 99, 198, and 199) of BoomerAMG and MGR:

  • Options 98 and 99 compute the LU factorization with pivoting.
  • Options 198 and 199 compute the dense inverse matrix explicitly.

The difference between 98 and 99 or 198 and 199 lies on the strategy used for computing the dense local matrix from the distributed ParCSR matrix. In the first case (98 and 198), hypre's internal DataExchange is used , in the latter case (99 and 199), MPI collectives defined on a sub-communicator are used.

GPU support is achieved via the vendor math libraries (cuSOLVER in this case) or via MAGMA, when hypre is configured with MAGMA support. In the latter scenario, MAGMA takes precedence over cuSOLVER.

In addition to these changes, the par_gauss_elim.c file was majorly refactored.

TODOs for following PRs:

  • Add regression tests for the new capabilities.
  • Create a separate data structure for accessing dense direct solvers.
  • Add GPU-aware MPI to the matrix/vector gathering phases.
  • Add HIP and SYCL support.
  • Add a runtime option to switch between solver implementations (vendor or MAGMA)

Copy link
Contributor

@rfalgout rfalgout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me @victorapm! Thanks!

Copy link
Contributor

@ulrikeyang ulrikeyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks fine, as far as I can tell.

Copy link
Contributor

@oseikuffuor1 oseikuffuor1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again Victor. Nice work.

@@ -360,7 +360,6 @@ using hypre_DeviceItem = sycl::nd_item<3>;
if (cudaSuccess != err) { \
printf("CUDA ERROR (code = %d, %s) at %s:%d\n", err, cudaGetErrorString(err), \
__FILE__, __LINE__); \
hypre_assert(0); exit(1); \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove? In the case of CUDA ERROR, do we want to exit right away?

Copy link
Contributor Author

@victorapm victorapm Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rfalgout and I talked about this at some point. My understanding was that we shouldn't exit from hypre and the application that calls hypre would decide how to handle the error. If confirmed, this behavior needs to be replicated in the other macros (in a separate PR)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Then, at least a return hypre_error should be called? IMHO, there's no reason to continue to when a CUDA error occurs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

@liruipeng
Copy link
Contributor

Looks good to me. I would like to merge #927 before this. Thanks!

@victorapm
Copy link
Contributor Author

OK! I can wait for you

@victorapm victorapm merged commit fc49a5e into master Oct 8, 2023
@victorapm victorapm deleted the magma-lu branch October 8, 2023 15:39
ulrikeyang added a commit that referenced this pull request Oct 20, 2023
* Improve ILU documentation (#939)

Add warnings for Euclid and PILUT redirecting users to hypre-ILU.
Rewrite hypre-ILU overview section.
Add new sections to hypre-ILU documentation: "User-level functions", "ILU as smoother for BoomerAMG", and "GPU support".
Include info about new iterative ILU options.
Update BoomerAMG complex smoothers section.
Change name "hypre-ILU" to "ILU"

* Add FSAI support with CUDA and HIP (#739)

This PR adds CUDA and HIP support to FSAI according to a static pattern generation algorithm. The resulting method can also be used as a preconditioner for BoomerAMG. A detailed list of changes is given below:

* Add par_fsai_device.c 
* Add hypre_FSAIApply
* Add function to dump local linear systems in dense format
* Implement static FSAI pattern computation via powers of A
* Improve filtering of candidate pattern
* Improve local linear systems extraction
* Add option for a 125pt matrix (27pt squared)
* Add options to control sizes of the memory pools with umpire
* Add hypre_GpuProfiling calls
* Improve candidate pattern truncation times
* Add max_nnz_row member and its private and public functions to FSAI
* Use max_nnz_row in FSAISetupDevice
* Add num_levels member and its private and public functions to FSAI
* Add threshold member and its public/private functions to FSAI
* Expose FSAI algorithm type to BoomerAMG
* Expose options to control FSAI setup
* Add cuSOLVER variables and calls
* Add batched dense linear solver calls to FSAI
* Improve execution time for generating random numbers
* Show FSAI parameters when amg_print_level >= 1
* Improve output of FSAIPrintStats 
* Implement warp calls
* Add hypre_mask type and hypre_ballot_sync wrapper function
* Add hypre_popc and hypre_ffs wrapper functions
* Implement warp_allreduce_max calls
* Change: hypreDevice -> hypre_*Device
* Add rocSOLVER calls
* Apply astyle
* Remove redundant line

* Add MAGMA option to FSAI (#940)

Allow the use of MAGMA as local linear solver for FSAI.
Add `HYPRE_FSAISetLocalSolveType` for choosing the local linear solve type used in FSAI and add `HYPRE_BoomerAMGSetFSAILocalSolveType` for the case when FSAI is used as a smoother to BoomerAMG.

* Fix Copyright message (#951)

Fix year in Copyright message of a few source files.

* Change sh to bash (#900)

Change shell scripts from `#!/bin/sh` to `#!/bin/bash`

* Add Binary I/O functions for IJ matrices and vectors (#826)

This PR adds new Print and Read functions for matrices and vectors to be stored/read in binary format. A detailed list of changes is given below:

* Add IJMatrix/ParCSRMatrix routines for binary I/O
* Add IJVector/ParVector routines for binary I/O
* Add typedefs for unsigned integer types and single-precision floating-point
* Change char sizes to HYPRE_MAX_FILE_NAME_LEN
* Add options to IJ driver for reading binary matrices/vectors
* Add regression tests for IJ input/output

* Keep smooth_num_levels in sync with amg_data (#954)

This solves an out-of-bounds memory error during `hypre_BoomerAMGSetup` when called multiple times without a call to `hypre_BoomerAMGDestroy` interleaved. This pull request makes sure that `smooth_num_levels` is reset to `hypre_ParAMGDataSmoothNumLevels(amg_data)` before the smoothers variable is allocated.

* L1 HSGS (#927)

This PR provides a convergent l1-hybrid symmetric Gauss-Seidel (HSGS) method.

* Apply Debian patches (#966)

1. Fix make checkpar
2. Add missing `finalizeAllTimings
3. Add error code support to checktest.sh

---------

Co-authored-by: Drew Parsons <dparsons@debian.org>

* Add HYPRE_GetExecutionPolicyName (#969)

* Add HYPRE_GetExecutionPolicyName
* Add doc entries to memory/execution routines

* Doc updates (#974)

* Updated documentation for clarity and to clean up a few typos.
* Add warning messages to FEI,  ParaSails, PILUT, Euclid.
* Improved and updated GPU information
* Added CMake build information

* Add HYPRE_MGRSetLevelPMaxElmts (#975)

Also adds its private interface, and accompanying code (#975)

* Fix regression tests (#979)

Initialize `P_max_elmts` if not set by user.

* Add CUDA support to dense direct solver options (#950)

This PR adds CUDA support to dense direct solver options (98, 99, 198, and 199) of BoomerAMG and MGR:
  - Options 98 and 99 compute the LU factorization with pivoting.
  - Options 198 and 199 compute the dense inverse matrix explicitly.

Detailed list of changes below:

* Add hypre_ParCSRMatrixToCSRMatrixAll_v2
* Add hypre_SeqVectorMigrate
* Add hypre_ParVectorToVectorAll_v2
* Refactor implementation of BoomerAMG's Gaussian Elimination
* Add hypre_GaussElimAllSetup and hypre_GaussElimAllSolve
* Add device support via MAGMA and cuSOLVER to BoomerAMG's LU coarsest linear solver (options 98, 99)
* Add device support via MAGMA and cuSOLVER to BoomerAMG's exact inverse solver (options 198, 199)
* Add wrappers to MAGMA's getrf and getrs
* Add MAGMA info on AMG stats + code formatting
* Add wrappers to cuSOLVER and cuBLAS functions
* Add wrapper hypre_magma_getri_nb
* Add header file for collecting hypre functors
* Add memory location to Gaussian elimination data structure
* Improve description of coarsest level solver options
* Update GE data structure in MGR
* Change Ainv to Awork

* Fix MSVC build (#978)

* Allocate buffer on heap memory
* Fix Pragma definition for MSVC
* Fix uninitiliazed variable
* Loop counter cannot be non-negative for MSVC

* SYCL triangular solves, Chebyshev relaxation, etc. (#972)

Adding more sycl functionality including chebyshev relaxation and triangular solves,
which in turn enables Gauss-Seidel, ILU, etc.

* Add MGR statistics (#897)

This PR improves statistics reporting for MGR.

A list with detailed changes is given below:
* Add MatrixStats and MatrixStatsArray
* Add hypre_squared utility
* Fix divisor line location for Rectangular matrices
* Minor fix on hypre_squared definition
* Move nonzero variable definitions up
* Initialize global number of nonzeros at matrix creation
* Add hypre_ParCSRMatrixStatsArray and helper functions
* Add par_csr_matstats_device
* Add par_mgr_stats
* Print F-relax data only once
* Fix clang-13 build
* IJ driver now passes in print_level option to MGR
* GPU runs always require A_FF in MGR
* Add HYPRE_PRINT_SHIFTED_PARAM macro
* Add hypre_IntArraySetInterleavedValues (host/device implementations)
* Fix F-relaxation reporting + refactoring
* Update global number of nonzeros of the matrix
* Move new BoomerAMG functions to par_stats
* Apply astyle

* Improve MGR data printing (#976)

This enhances what print_level can achieve in MGR. Particularly, now we can dump linear system info to files according to the print_level code. We also have the ability now of printing a sequence of linear systems to file (useful when hypre is used in time-stepping application).

A detailed list of changes is given below:

* Add utilities for creating/checking directories
* Add print_level codes to MGR and new info_path member
* Add hypre_MGRDataPrint
* Add call to hypre_MGRDataPrint and logic to update the print_level variable
* Update MGRSolve with new print_level logic
* Remove hypre_MGRWriteSolverParams
* Update documentation for HYPRE_MGRSetPrintLevel
* Implement new logic for HYPRE_MGR_PRINT_MODE_ASCII

* Fix regressions (#988)

* CMake uses C99 by default
* HYPRE_PRINT_INDENT works without a loop

* Fix compilation on Windows (#990)

* Do not use dirent.h in windows

---------

Co-authored-by: Victor A. P. Magri <50467563+victorapm@users.noreply.github.com>
Co-authored-by: tisaac <toby.isaac@gmail.com>
Co-authored-by: Rui Peng Li <li50@llnl.gov>
Co-authored-by: Drew Parsons <dparsons@debian.org>
Co-authored-by: Wayne Mitchell <mitchell82@llnl.gov>
geraldc-unm pushed a commit that referenced this pull request Mar 27, 2024
This PR adds CUDA support to dense direct solver options (98, 99, 198, and 199) of BoomerAMG and MGR:
  - Options 98 and 99 compute the LU factorization with pivoting.
  - Options 198 and 199 compute the dense inverse matrix explicitly.

Detailed list of changes below:

* Add hypre_ParCSRMatrixToCSRMatrixAll_v2
* Add hypre_SeqVectorMigrate
* Add hypre_ParVectorToVectorAll_v2
* Refactor implementation of BoomerAMG's Gaussian Elimination
* Add hypre_GaussElimAllSetup and hypre_GaussElimAllSolve
* Add device support via MAGMA and cuSOLVER to BoomerAMG's LU coarsest linear solver (options 98, 99)
* Add device support via MAGMA and cuSOLVER to BoomerAMG's exact inverse solver (options 198, 199)
* Add wrappers to MAGMA's getrf and getrs
* Add MAGMA info on AMG stats + code formatting
* Add wrappers to cuSOLVER and cuBLAS functions
* Add wrapper hypre_magma_getri_nb
* Add header file for collecting hypre functors
* Add memory location to Gaussian elimination data structure
* Improve description of coarsest level solver options
* Update GE data structure in MGR
* Change Ainv to Awork
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants