Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix solver kernel documentation and add memory estimates #691

Merged
merged 16 commits into from
Mar 2, 2021

Conversation

hartwiganzt
Copy link
Collaborator

@hartwiganzt hartwiganzt commented Jan 7, 2021

The CGS documentation for step_2 was in the wrong line.

EDIT by Tobias: We decided to repurpose this PR to include memory estimates based on the solver kernel implementations. If you have some time, please check our computations - especially GMRES and IDR are challenging.

  • BiCG
  • BiCGSTAB
  • CG
  • CGS
  • FCG
  • GMRES
  • IDR

@ginkgo-bot ginkgo-bot added mod:core This is related to the core module. type:solver This is related to the solvers labels Jan 7, 2021
@upsj
Copy link
Member

upsj commented Jan 7, 2021

This should be the same for all the other steps as well, right? The comments are below the kernels right now.

@upsj upsj force-pushed the fix_cgs_documentation branch 2 times, most recently from 2be1cd6 to 4edc7e1 Compare January 7, 2021 17:43
@upsj upsj changed the title fix cgs documentation Fix solver kernel documentation and add memory estimates Jan 7, 2021
@upsj upsj added reg:documentation This is related to documentation. 1:ST:ready-for-review This PR is ready for review labels Jan 7, 2021
@upsj upsj mentioned this pull request Jan 7, 2021
@codecov
Copy link

codecov bot commented Jan 8, 2021

Codecov Report

Merging #691 (d11ec4a) into develop (cab4358) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #691      +/-   ##
===========================================
+ Coverage    92.44%   92.45%   +0.01%     
===========================================
  Files          362      362              
  Lines        26920    26916       -4     
===========================================
- Hits         24887    24886       -1     
+ Misses        2033     2030       -3     
Impacted Files Coverage Δ
core/solver/cgs.cpp 98.61% <ø> (-0.04%) ⬇️
omp/solver/idr_kernels.cpp 86.79% <ø> (ø)
reference/solver/idr_kernels.cpp 100.00% <ø> (ø)
core/solver/bicg.cpp 88.09% <100.00%> (ø)
core/solver/bicgstab.cpp 97.64% <100.00%> (-0.06%) ⬇️
core/solver/cg.cpp 98.36% <100.00%> (ø)
core/solver/fcg.cpp 98.43% <100.00%> (ø)
core/solver/gmres.cpp 100.00% <100.00%> (ø)
core/solver/idr.cpp 98.18% <100.00%> (ø)
reference/solver/bicgstab_kernels.cpp 100.00% <0.00%> (+4.91%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cab4358...d11ec4a. Read the comment docs.

Copy link
Contributor

@Slaedr Slaedr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get different numbers for two solvers - primarily IDR, but also GMRES. I hope I have not made mistakes. I think there are two sources of differences: (1) computing averages / sums for the restarts, and (2) ignoring load/store of arrays of small sizes independent of n. In the current comments, I think point 2 is assumed in some places but not in others - I always assume this.

core/solver/gmres.cpp Outdated Show resolved Hide resolved
core/solver/gmres.cpp Outdated Show resolved Hide resolved
core/solver/gmres.cpp Outdated Show resolved Hide resolved
core/solver/idr.cpp Outdated Show resolved Hide resolved
core/solver/idr.cpp Outdated Show resolved Hide resolved
core/solver/idr.cpp Outdated Show resolved Hide resolved
core/solver/idr.cpp Outdated Show resolved Hide resolved
core/solver/idr.cpp Outdated Show resolved Hide resolved
core/solver/idr.cpp Outdated Show resolved Hide resolved
core/solver/idr.cpp Outdated Show resolved Hide resolved
@upsj
Copy link
Member

upsj commented Jan 10, 2021

@Slaedr I incorporated your suggestions, there are a few places where I get slightly different results due to 0-based vs. 1-based indexing. Can you take a second look at the updated numbers? (Also, I didn't incorporate the update iteration count for IDR yet, so we would need to divide everything by (s+1))

core/solver/bicg.cpp Outdated Show resolved Hide resolved
Comment on lines 141 to 149
* 29n * values + 2 * matrix/preconditioner storage
* 2x SpMV: 4n * values + 2 * storage
* 2x Preconditioner: 4n * values + 2 * storage
* 3x dot 6n
* 1x norm2 n
* 1x step 1 (fused axpys) 4n
* 1x step 2 (axpy) 3n
* 1x step 3 (fused axpys) 7n
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to separate the loop.
The first half and the second half are different

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will average this over even and odd iterations, so a separation should not be necessary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@upsj are you done with this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are okay with leaving it averaged over even and odd iterations, then yes

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are.

core/solver/gmres.cpp Outdated Show resolved Hide resolved
core/solver/cgs.cpp Outdated Show resolved Hide resolved
core/solver/gmres.cpp Outdated Show resolved Hide resolved
core/solver/gmres.cpp Outdated Show resolved Hide resolved
core/solver/idr.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@Slaedr Slaedr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went over the new calculations. I think there are some small errors, but otherwise it looks good.

core/solver/idr.cpp Outdated Show resolved Hide resolved
core/solver/gmres.cpp Outdated Show resolved Hide resolved
core/solver/idr.cpp Outdated Show resolved Hide resolved
@Slaedr
Copy link
Contributor

Slaedr commented Jan 12, 2021

Sorry, the review got split into two parts because I had two windows open.

core/solver/gmres.cpp Outdated Show resolved Hide resolved
core/solver/gmres.cpp Outdated Show resolved Hide resolved
@yhmtsai
Copy link
Member

yhmtsai commented Jan 15, 2021

My loops count for gmres: loops_k = floor(loops/k) loops_r = floor(loops/k) * (k - 1) * k / 2 + (r - 1) * r/ 2

Read:
loops_k * ((k^2/2 + 3k/2 + nk + 8n + 4) * ValueType + precond_storage + matrix_storage)
+ loops * ((8n + 5) * ValueType + 8 + precond_storage + matrix_storage)
+ loops_r * (4 + 4n) * ValueType
Write:
loops_k * ((6n + k + 2) * ValueType + 8) 
+ loops * ((4n + 8) * ValueType + 8)
+ loops_r * (2 + n) * ValueType

for restarting, I count (14n + kn) for each k (d) iteraions

IDR: (s-1) * s/2 (loops_sk)

Read: 
loops * (
  (s * n + n) * ValueType
  + s * ((s^2/2 + 13s/2 + 8n + 2ns) * ValueType + precond_storage + matrix_storage)
  + loops_sk * (3n - 5) * ValueType
  + (11n + 6) * ValueType + matrix_storage + precond_storage
)
Write:
loops * (
  s * ValueType
  + s * (6n + 3s) * ValueType
  + loops_sk * (3n - 1) * ValueType
  + (5n + 5) * ValueType
 )

More detail in compute_memory_rebased

Copy link
Member

@upsj upsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First GMRES:

core/solver/gmres.cpp Outdated Show resolved Hide resolved
Comment on lines 141 to 149
* 29n * values + 2 * matrix/preconditioner storage
* 2x SpMV: 4n * values + 2 * storage
* 2x Preconditioner: 4n * values + 2 * storage
* 3x dot 6n
* 1x norm2 n
* 1x step 1 (fused axpys) 4n
* 1x step 2 (axpy) 3n
* 1x step 3 (fused axpys) 7n
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are okay with leaving it averaged over even and odd iterations, then yes

core/solver/gmres.cpp Outdated Show resolved Hide resolved
core/solver/gmres.cpp Outdated Show resolved Hide resolved
core/solver/gmres.cpp Outdated Show resolved Hide resolved
@sonarcloud
Copy link

sonarcloud bot commented Jan 20, 2021

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 4 Code Smells

100.0% 100.0% Coverage
0.0% 0.0% Duplication

@hartwiganzt hartwiganzt requested a review from upsj January 21, 2021 13:29
Copy link
Member

@yhmtsai yhmtsai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Could you add each solver iteration memory estimation on GMRES/IDR? such that Terry can directly use the formula
also check what I comments on MGS

core/solver/idr.cpp Outdated Show resolved Hide resolved
core/solver/idr.cpp Outdated Show resolved Hide resolved
core/solver/gmres.cpp Show resolved Hide resolved
@upsj upsj added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Jan 27, 2021
@upsj upsj self-assigned this Feb 2, 2021
upsj and others added 16 commits March 1, 2021 19:36
Co-authored-by: Hartwig Anzt <hanzt@icl.utk.edu>
Co-authored-by: Hartwig Anzt <hanzt@icl.utk.edu>
Co-authored-by: Aditya Kashi <aditya.kashi@kit.edu>
Co-authored-by: Yuhsiang Tsai <yhmtsai@gmail.com>
Co-authored-by: Aditya Kashi <aditya.kashi@kit.edu>
Co-authored-by: Aditya Kashi <aditya.kashi@kit.edu>
Co-authored-by: Yuhsiang Tsai <yhmtsai@gmail.com>
Co-authored-by: Yuhsiang Tsai <yhmtsai@gmail.com>
but still keep the half-iteration stop check in place
@sonarcloud
Copy link

sonarcloud bot commented Mar 2, 2021

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

100.0% 100.0% Coverage
0.0% 0.0% Duplication

@upsj upsj merged commit 118d031 into develop Mar 2, 2021
@upsj upsj deleted the fix_cgs_documentation branch March 2, 2021 13:17
tcojean added a commit that referenced this pull request Aug 20, 2021
Ginkgo release 1.4.0

The Ginkgo team is proud to announce the new Ginkgo minor release 1.4.0. This
release brings most of the Ginkgo functionality to the Intel DPC++ ecosystem
which enables Intel-GPU and CPU execution. The only Ginkgo features which have
not been ported yet are some preconditioners.

Ginkgo's mixed-precision support is greatly enhanced thanks to:
1. The new Accessor concept, which allows writing kernels featuring on-the-fly
memory compression, among other features. The accessor can be used as
header-only, see the [accessor BLAS benchmarks repository](https://github.com/ginkgo-project/accessor-BLAS/tree/develop) as a usage example.
2. All LinOps now transparently support mixed-precision execution. By default,
this is done through a temporary copy which may have a performance impact but
already allows mixed-precision research.

Native mixed-precision ELL kernels are implemented which do not see this cost.
The accessor is also leveraged in a new CB-GMRES solver which allows for
performance improvements by compressing the Krylov basis vectors. Many other
features have been added to Ginkgo, such as reordering support, a new IDR
solver, Incomplete Cholesky preconditioner, matrix assembly support (only CPU
for now), machine topology information, and more!

Supported systems and requirements:
+ For all platforms, cmake 3.13+
+ C++14 compliant compiler
+ Linux and MacOS
  + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+
  + clang: 3.9+
  + Intel compiler: 2018+
  + Apple LLVM: 8.0+
  + CUDA module: CUDA 9.0+
  + HIP module: ROCm 3.5+
  + DPC++ module: Intel OneAPI 2021.3. Set the CXX compiler to `dpcpp`.
+ Windows
  + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+
  + Microsoft Visual Studio: VS 2019
  + CUDA module: CUDA 9.0+, Microsoft Visual Studio
  + OpenMP module: MinGW or Cygwin.


Algorithm and important feature additions:
+ Add a new DPC++ Executor for SYCL execution and other base utilities
  [#648](#648), [#661](#661), [#757](#757), [#832](#832)
+ Port matrix formats, solvers and related kernels to DPC++. For some kernels,
  also make use of a shared kernel implementation for all executors (except
  Reference). [#710](#710), [#799](#799), [#779](#779), [#733](#733), [#844](#844), [#843](#843), [#789](#789), [#845](#845), [#849](#849), [#855](#855), [#856](#856)
+ Add accessors which allow multi-precision kernels, among other things.
  [#643](#643), [#708](#708)
+ Add support for mixed precision operations through apply in all LinOps. [#677](#677)
+ Add incomplete Cholesky factorizations and preconditioners as well as some
  improvements to ILU. [#672](#672), [#837](#837), [#846](#846)
+ Add an AMGX implementation and kernels on all devices but DPC++.
  [#528](#528), [#695](#695), [#860](#860)
+ Add a new mixed-precision capability solver, Compressed Basis GMRES
  (CB-GMRES). [#693](#693), [#763](#763)
+ Add the IDR(s) solver. [#620](#620)
+ Add a new fixed-size block CSR matrix format (for the Reference executor).
  [#671](#671), [#730](#730)
+ Add native mixed-precision support to the ELL format. [#717](#717), [#780](#780)
+ Add Reverse Cuthill-McKee reordering [#500](#500), [#649](#649)
+ Add matrix assembly support on CPUs. [#644](#644)
+ Extends ISAI from triangular to general and spd matrices. [#690](#690)

Other additions:
+ Add the possibility to apply real matrices to complex vectors.
  [#655](#655), [#658](#658)
+ Add functions to compute the absolute of a matrix format. [#636](#636)
+ Add symmetric permutation and improve existing permutations.
  [#684](#684), [#657](#657), [#663](#663)
+ Add a MachineTopology class with HWLOC support [#554](#554), [#697](#697)
+ Add an implicit residual norm criterion. [#702](#702), [#818](#818), [#850](#850)
+ Row-major accessor is generalized to more than 2 dimensions and a new
  "block column-major" accessor has been added. [#707](#707)
+ Add an heat equation example. [#698](#698), [#706](#706)
+ Add ccache support in CMake and CI. [#725](#725), [#739](#739)
+ Allow tuning and benchmarking variables non intrusively. [#692](#692)
+ Add triangular solver benchmark [#664](#664)
+ Add benchmarks for BLAS operations [#772](#772), [#829](#829)
+ Add support for different precisions and consistent index types in benchmarks.
  [#675](#675), [#828](#828)
+ Add a Github bot system to facilitate development and PR management.
  [#667](#667), [#674](#674), [#689](#689), [#853](#853)
+ Add Intel (DPC++) CI support and enable CI on HPC systems. [#736](#736), [#751](#751), [#781](#781)
+ Add ssh debugging for Github Actions CI. [#749](#749)
+ Add pipeline segmentation for better CI speed. [#737](#737)


Changes:
+ Add a Scalar Jacobi specialization and kernels. [#808](#808), [#834](#834), [#854](#854)
+ Add implicit residual log for solvers and benchmarks. [#714](#714)
+ Change handling of the conjugate in the dense dot product. [#755](#755)
+ Improved Dense stride handling. [#774](#774)
+ Multiple improvements to the OpenMP kernels performance, including COO,
an exclusive prefix sum, and more. [#703](#703), [#765](#765), [#740](#740)
+ Allow specialization of submatrix and other dense creation functions in solvers. [#718](#718)
+ Improved Identity constructor and treatment of rectangular matrices. [#646](#646)
+ Allow CUDA/HIP executors to select allocation mode. [#758](#758)
+ Check if executors share the same memory. [#670](#670)
+ Improve test install and smoke testing support. [#721](#721)
+ Update the JOSS paper citation and add publications in the documentation.
  [#629](#629), [#724](#724)
+ Improve the version output. [#806](#806)
+ Add some utilities for dim and span. [#821](#821)
+ Improved solver and preconditioner benchmarks. [#660](#660)
+ Improve benchmark timing and output. [#669](#669), [#791](#791), [#801](#801), [#812](#812)


Fixes:
+ Sorting fix for the Jacobi preconditioner. [#659](#659)
+ Also log the first residual norm in CGS [#735](#735)
+ Fix BiCG and HIP CSR to work with complex matrices. [#651](#651)
+ Fix Coo SpMV on strided vectors. [#807](#807)
+ Fix segfault of extract_diagonal, add short-and-fat test. [#769](#769)
+ Fix device_reset issue by moving counter/mutex to device. [#810](#810)
+ Fix `EnableLogging` superclass. [#841](#841)
+ Support ROCm 4.1.x and breaking HIP_PLATFORM changes. [#726](#726)
+ Decreased test size for a few device tests. [#742](#742)
+ Fix multiple issues with our CMake HIP and RPATH setup.
  [#712](#712), [#745](#745), [#709](#709)
+ Cleanup our CMake installation step. [#713](#713)
+ Various simplification and fixes to the Windows CMake setup. [#720](#720), [#785](#785)
+ Simplify third-party integration. [#786](#786)
+ Improve Ginkgo device arch flags management. [#696](#696)
+ Other fixes and improvements to the CMake setup.
  [#685](#685), [#792](#792), [#705](#705), [#836](#836)
+ Clarification of dense norm documentation [#784](#784)
+ Various development tools fixes and improvements [#738](#738), [#830](#830), [#840](#840)
+ Make multiple operators/constructors explicit. [#650](#650), [#761](#761)
+ Fix some issues, memory leaks and warnings found by MSVC.
  [#666](#666), [#731](#731)
+ Improved solver memory estimates and consistent iteration counts [#691](#691)
+ Various logger improvements and fixes [#728](#728), [#743](#743), [#754](#754)
+ Fix for ForwardIterator requirements in iterator_factory. [#665](#665)
+ Various benchmark fixes. [#647](#647), [#673](#673), [#722](#722)
+ Various CI fixes and improvements. [#642](#642), [#641](#641), [#795](#795), [#783](#783), [#793](#793), [#852](#852)


Related PR: #857
tcojean added a commit that referenced this pull request Aug 23, 2021
Release 1.4.0 to master

The Ginkgo team is proud to announce the new Ginkgo minor release 1.4.0. This
release brings most of the Ginkgo functionality to the Intel DPC++ ecosystem
which enables Intel-GPU and CPU execution. The only Ginkgo features which have
not been ported yet are some preconditioners.

Ginkgo's mixed-precision support is greatly enhanced thanks to:
1. The new Accessor concept, which allows writing kernels featuring on-the-fly
memory compression, among other features. The accessor can be used as
header-only, see the [accessor BLAS benchmarks repository](https://github.com/ginkgo-project/accessor-BLAS/tree/develop) as a usage example.
2. All LinOps now transparently support mixed-precision execution. By default,
this is done through a temporary copy which may have a performance impact but
already allows mixed-precision research.

Native mixed-precision ELL kernels are implemented which do not see this cost.
The accessor is also leveraged in a new CB-GMRES solver which allows for
performance improvements by compressing the Krylov basis vectors. Many other
features have been added to Ginkgo, such as reordering support, a new IDR
solver, Incomplete Cholesky preconditioner, matrix assembly support (only CPU
for now), machine topology information, and more!

Supported systems and requirements:
+ For all platforms, cmake 3.13+
+ C++14 compliant compiler
+ Linux and MacOS
  + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+
  + clang: 3.9+
  + Intel compiler: 2018+
  + Apple LLVM: 8.0+
  + CUDA module: CUDA 9.0+
  + HIP module: ROCm 3.5+
  + DPC++ module: Intel OneAPI 2021.3. Set the CXX compiler to `dpcpp`.
+ Windows
  + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+
  + Microsoft Visual Studio: VS 2019
  + CUDA module: CUDA 9.0+, Microsoft Visual Studio
  + OpenMP module: MinGW or Cygwin.


Algorithm and important feature additions:
+ Add a new DPC++ Executor for SYCL execution and other base utilities
  [#648](#648), [#661](#661), [#757](#757), [#832](#832)
+ Port matrix formats, solvers and related kernels to DPC++. For some kernels,
  also make use of a shared kernel implementation for all executors (except
  Reference). [#710](#710), [#799](#799), [#779](#779), [#733](#733), [#844](#844), [#843](#843), [#789](#789), [#845](#845), [#849](#849), [#855](#855), [#856](#856)
+ Add accessors which allow multi-precision kernels, among other things.
  [#643](#643), [#708](#708)
+ Add support for mixed precision operations through apply in all LinOps. [#677](#677)
+ Add incomplete Cholesky factorizations and preconditioners as well as some
  improvements to ILU. [#672](#672), [#837](#837), [#846](#846)
+ Add an AMGX implementation and kernels on all devices but DPC++.
  [#528](#528), [#695](#695), [#860](#860)
+ Add a new mixed-precision capability solver, Compressed Basis GMRES
  (CB-GMRES). [#693](#693), [#763](#763)
+ Add the IDR(s) solver. [#620](#620)
+ Add a new fixed-size block CSR matrix format (for the Reference executor).
  [#671](#671), [#730](#730)
+ Add native mixed-precision support to the ELL format. [#717](#717), [#780](#780)
+ Add Reverse Cuthill-McKee reordering [#500](#500), [#649](#649)
+ Add matrix assembly support on CPUs. [#644](#644)
+ Extends ISAI from triangular to general and spd matrices. [#690](#690)

Other additions:
+ Add the possibility to apply real matrices to complex vectors.
  [#655](#655), [#658](#658)
+ Add functions to compute the absolute of a matrix format. [#636](#636)
+ Add symmetric permutation and improve existing permutations.
  [#684](#684), [#657](#657), [#663](#663)
+ Add a MachineTopology class with HWLOC support [#554](#554), [#697](#697)
+ Add an implicit residual norm criterion. [#702](#702), [#818](#818), [#850](#850)
+ Row-major accessor is generalized to more than 2 dimensions and a new
  "block column-major" accessor has been added. [#707](#707)
+ Add an heat equation example. [#698](#698), [#706](#706)
+ Add ccache support in CMake and CI. [#725](#725), [#739](#739)
+ Allow tuning and benchmarking variables non intrusively. [#692](#692)
+ Add triangular solver benchmark [#664](#664)
+ Add benchmarks for BLAS operations [#772](#772), [#829](#829)
+ Add support for different precisions and consistent index types in benchmarks.
  [#675](#675), [#828](#828)
+ Add a Github bot system to facilitate development and PR management.
  [#667](#667), [#674](#674), [#689](#689), [#853](#853)
+ Add Intel (DPC++) CI support and enable CI on HPC systems. [#736](#736), [#751](#751), [#781](#781)
+ Add ssh debugging for Github Actions CI. [#749](#749)
+ Add pipeline segmentation for better CI speed. [#737](#737)


Changes:
+ Add a Scalar Jacobi specialization and kernels. [#808](#808), [#834](#834), [#854](#854)
+ Add implicit residual log for solvers and benchmarks. [#714](#714)
+ Change handling of the conjugate in the dense dot product. [#755](#755)
+ Improved Dense stride handling. [#774](#774)
+ Multiple improvements to the OpenMP kernels performance, including COO,
an exclusive prefix sum, and more. [#703](#703), [#765](#765), [#740](#740)
+ Allow specialization of submatrix and other dense creation functions in solvers. [#718](#718)
+ Improved Identity constructor and treatment of rectangular matrices. [#646](#646)
+ Allow CUDA/HIP executors to select allocation mode. [#758](#758)
+ Check if executors share the same memory. [#670](#670)
+ Improve test install and smoke testing support. [#721](#721)
+ Update the JOSS paper citation and add publications in the documentation.
  [#629](#629), [#724](#724)
+ Improve the version output. [#806](#806)
+ Add some utilities for dim and span. [#821](#821)
+ Improved solver and preconditioner benchmarks. [#660](#660)
+ Improve benchmark timing and output. [#669](#669), [#791](#791), [#801](#801), [#812](#812)


Fixes:
+ Sorting fix for the Jacobi preconditioner. [#659](#659)
+ Also log the first residual norm in CGS [#735](#735)
+ Fix BiCG and HIP CSR to work with complex matrices. [#651](#651)
+ Fix Coo SpMV on strided vectors. [#807](#807)
+ Fix segfault of extract_diagonal, add short-and-fat test. [#769](#769)
+ Fix device_reset issue by moving counter/mutex to device. [#810](#810)
+ Fix `EnableLogging` superclass. [#841](#841)
+ Support ROCm 4.1.x and breaking HIP_PLATFORM changes. [#726](#726)
+ Decreased test size for a few device tests. [#742](#742)
+ Fix multiple issues with our CMake HIP and RPATH setup.
  [#712](#712), [#745](#745), [#709](#709)
+ Cleanup our CMake installation step. [#713](#713)
+ Various simplification and fixes to the Windows CMake setup. [#720](#720), [#785](#785)
+ Simplify third-party integration. [#786](#786)
+ Improve Ginkgo device arch flags management. [#696](#696)
+ Other fixes and improvements to the CMake setup.
  [#685](#685), [#792](#792), [#705](#705), [#836](#836)
+ Clarification of dense norm documentation [#784](#784)
+ Various development tools fixes and improvements [#738](#738), [#830](#830), [#840](#840)
+ Make multiple operators/constructors explicit. [#650](#650), [#761](#761)
+ Fix some issues, memory leaks and warnings found by MSVC.
  [#666](#666), [#731](#731)
+ Improved solver memory estimates and consistent iteration counts [#691](#691)
+ Various logger improvements and fixes [#728](#728), [#743](#743), [#754](#754)
+ Fix for ForwardIterator requirements in iterator_factory. [#665](#665)
+ Various benchmark fixes. [#647](#647), [#673](#673), [#722](#722)
+ Various CI fixes and improvements. [#642](#642), [#641](#641), [#795](#795), [#783](#783), [#793](#793), [#852](#852)

Related PR: #866
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:ready-to-merge This PR is ready to merge. mod:core This is related to the core module. reg:documentation This is related to documentation. type:solver This is related to the solvers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants