Fix solver kernel documentation and add memory estimates #691

hartwiganzt · 2021-01-07T11:00:43Z

The CGS documentation for step_2 was in the wrong line.

EDIT by Tobias: We decided to repurpose this PR to include memory estimates based on the solver kernel implementations. If you have some time, please check our computations - especially GMRES and IDR are challenging.

upsj · 2021-01-07T13:27:30Z

This should be the same for all the other steps as well, right? The comments are below the kernels right now.

codecov · 2021-01-08T00:34:01Z

Codecov Report

Merging #691 (d11ec4a) into develop (cab4358) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           develop     #691      +/-   ##
===========================================
+ Coverage    92.44%   92.45%   +0.01%     
===========================================
  Files          362      362              
  Lines        26920    26916       -4     
===========================================
- Hits         24887    24886       -1     
+ Misses        2033     2030       -3

Impacted Files	Coverage Δ
core/solver/cgs.cpp	`98.61% <ø> (-0.04%)`	⬇️
omp/solver/idr_kernels.cpp	`86.79% <ø> (ø)`
reference/solver/idr_kernels.cpp	`100.00% <ø> (ø)`
core/solver/bicg.cpp	`88.09% <100.00%> (ø)`
core/solver/bicgstab.cpp	`97.64% <100.00%> (-0.06%)`	⬇️
core/solver/cg.cpp	`98.36% <100.00%> (ø)`
core/solver/fcg.cpp	`98.43% <100.00%> (ø)`
core/solver/gmres.cpp	`100.00% <100.00%> (ø)`
core/solver/idr.cpp	`98.18% <100.00%> (ø)`
reference/solver/bicgstab_kernels.cpp	`100.00% <0.00%> (+4.91%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cab4358...d11ec4a. Read the comment docs.

Slaedr

I get different numbers for two solvers - primarily IDR, but also GMRES. I hope I have not made mistakes. I think there are two sources of differences: (1) computing averages / sums for the restarts, and (2) ignoring load/store of arrays of small sizes independent of n. In the current comments, I think point 2 is assumed in some places but not in others - I always assume this.

core/solver/gmres.cpp

core/solver/idr.cpp

upsj · 2021-01-10T19:48:53Z

@Slaedr I incorporated your suggestions, there are a few places where I get slightly different results due to 0-based vs. 1-based indexing. Can you take a second look at the updated numbers? (Also, I didn't incorporate the update iteration count for IDR yet, so we would need to divide everything by (s+1))

core/solver/bicg.cpp

yhmtsai · 2021-01-11T04:14:27Z

core/solver/bicgstab.cpp

+     * 29n * values + 2 * matrix/preconditioner storage
+     * 2x SpMV:                4n * values + 2 * storage
+     * 2x Preconditioner:      4n * values + 2 * storage
+     * 3x dot                  6n
+     * 1x norm2                 n
+     * 1x step 1 (fused axpys) 4n
+     * 1x step 2 (axpy)        3n
+     * 1x step 3 (fused axpys) 7n


need to separate the loop.
The first half and the second half are different

I will average this over even and odd iterations, so a separation should not be necessary.

@upsj are you done with this?

If we are okay with leaving it averaged over even and odd iterations, then yes

I think we are.

core/solver/gmres.cpp

core/solver/cgs.cpp

core/solver/gmres.cpp

core/solver/idr.cpp

Slaedr

I went over the new calculations. I think there are some small errors, but otherwise it looks good.

core/solver/idr.cpp

core/solver/gmres.cpp

core/solver/idr.cpp

Slaedr · 2021-01-12T15:26:20Z

Sorry, the review got split into two parts because I had two windows open.

core/solver/gmres.cpp

yhmtsai · 2021-01-15T06:31:55Z

My loops count for gmres: loops_k = floor(loops/k) loops_r = floor(loops/k) * (k - 1) * k / 2 + (r - 1) * r/ 2

Read:
loops_k * ((k^2/2 + 3k/2 + nk + 8n + 4) * ValueType + precond_storage + matrix_storage)
+ loops * ((8n + 5) * ValueType + 8 + precond_storage + matrix_storage)
+ loops_r * (4 + 4n) * ValueType
Write:
loops_k * ((6n + k + 2) * ValueType + 8) 
+ loops * ((4n + 8) * ValueType + 8)
+ loops_r * (2 + n) * ValueType

for restarting, I count (14n + kn) for each k (d) iteraions

IDR: (s-1) * s/2 (loops_sk)

Read: 
loops * (
  (s * n + n) * ValueType
  + s * ((s^2/2 + 13s/2 + 8n + 2ns) * ValueType + precond_storage + matrix_storage)
  + loops_sk * (3n - 5) * ValueType
  + (11n + 6) * ValueType + matrix_storage + precond_storage
)
Write:
loops * (
  s * ValueType
  + s * (6n + 3s) * ValueType
  + loops_sk * (3n - 1) * ValueType
  + (5n + 5) * ValueType
 )

More detail in compute_memory_rebased

upsj

First GMRES:

core/solver/gmres.cpp

upsj · 2021-01-20T16:17:43Z

core/solver/bicgstab.cpp

+     * 29n * values + 2 * matrix/preconditioner storage
+     * 2x SpMV:                4n * values + 2 * storage
+     * 2x Preconditioner:      4n * values + 2 * storage
+     * 3x dot                  6n
+     * 1x norm2                 n
+     * 1x step 1 (fused axpys) 4n
+     * 1x step 2 (axpy)        3n
+     * 1x step 3 (fused axpys) 7n


If we are okay with leaving it averaged over even and odd iterations, then yes

core/solver/gmres.cpp

sonarcloud · 2021-01-20T21:24:45Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
4 Code Smells

100.0% Coverage
0.0% Duplication

yhmtsai

LGTM.
Could you add each solver iteration memory estimation on GMRES/IDR? such that Terry can directly use the formula
also check what I comments on MGS

core/solver/idr.cpp

core/solver/gmres.cpp

Co-authored-by: Hartwig Anzt <hanzt@icl.utk.edu>

Co-authored-by: Aditya Kashi <aditya.kashi@kit.edu>

Co-authored-by: Yuhsiang Tsai <yhmtsai@gmail.com> Co-authored-by: Aditya Kashi <aditya.kashi@kit.edu>

Co-authored-by: Aditya Kashi <aditya.kashi@kit.edu> Co-authored-by: Yuhsiang Tsai <yhmtsai@gmail.com>

Co-authored-by: Yuhsiang Tsai <yhmtsai@gmail.com>

This reverts commit b59befe.

but still keep the half-iteration stop check in place

sonarcloud · 2021-03-02T02:24:18Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

100.0% Coverage
0.0% Duplication

core/solver/bicgstab.cpp

Ginkgo release 1.4.0 The Ginkgo team is proud to announce the new Ginkgo minor release 1.4.0. This release brings most of the Ginkgo functionality to the Intel DPC++ ecosystem which enables Intel-GPU and CPU execution. The only Ginkgo features which have not been ported yet are some preconditioners. Ginkgo's mixed-precision support is greatly enhanced thanks to: 1. The new Accessor concept, which allows writing kernels featuring on-the-fly memory compression, among other features. The accessor can be used as header-only, see the [accessor BLAS benchmarks repository](https://github.com/ginkgo-project/accessor-BLAS/tree/develop) as a usage example. 2. All LinOps now transparently support mixed-precision execution. By default, this is done through a temporary copy which may have a performance impact but already allows mixed-precision research. Native mixed-precision ELL kernels are implemented which do not see this cost. The accessor is also leveraged in a new CB-GMRES solver which allows for performance improvements by compressing the Krylov basis vectors. Many other features have been added to Ginkgo, such as reordering support, a new IDR solver, Incomplete Cholesky preconditioner, matrix assembly support (only CPU for now), machine topology information, and more! Supported systems and requirements: + For all platforms, cmake 3.13+ + C++14 compliant compiler + Linux and MacOS + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+ + clang: 3.9+ + Intel compiler: 2018+ + Apple LLVM: 8.0+ + CUDA module: CUDA 9.0+ + HIP module: ROCm 3.5+ + DPC++ module: Intel OneAPI 2021.3. Set the CXX compiler to `dpcpp`. + Windows + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+ + Microsoft Visual Studio: VS 2019 + CUDA module: CUDA 9.0+, Microsoft Visual Studio + OpenMP module: MinGW or Cygwin. Algorithm and important feature additions: + Add a new DPC++ Executor for SYCL execution and other base utilities [#648](#648), [#661](#661), [#757](#757), [#832](#832) + Port matrix formats, solvers and related kernels to DPC++. For some kernels, also make use of a shared kernel implementation for all executors (except Reference). [#710](#710), [#799](#799), [#779](#779), [#733](#733), [#844](#844), [#843](#843), [#789](#789), [#845](#845), [#849](#849), [#855](#855), [#856](#856) + Add accessors which allow multi-precision kernels, among other things. [#643](#643), [#708](#708) + Add support for mixed precision operations through apply in all LinOps. [#677](#677) + Add incomplete Cholesky factorizations and preconditioners as well as some improvements to ILU. [#672](#672), [#837](#837), [#846](#846) + Add an AMGX implementation and kernels on all devices but DPC++. [#528](#528), [#695](#695), [#860](#860) + Add a new mixed-precision capability solver, Compressed Basis GMRES (CB-GMRES). [#693](#693), [#763](#763) + Add the IDR(s) solver. [#620](#620) + Add a new fixed-size block CSR matrix format (for the Reference executor). [#671](#671), [#730](#730) + Add native mixed-precision support to the ELL format. [#717](#717), [#780](#780) + Add Reverse Cuthill-McKee reordering [#500](#500), [#649](#649) + Add matrix assembly support on CPUs. [#644](#644) + Extends ISAI from triangular to general and spd matrices. [#690](#690) Other additions: + Add the possibility to apply real matrices to complex vectors. [#655](#655), [#658](#658) + Add functions to compute the absolute of a matrix format. [#636](#636) + Add symmetric permutation and improve existing permutations. [#684](#684), [#657](#657), [#663](#663) + Add a MachineTopology class with HWLOC support [#554](#554), [#697](#697) + Add an implicit residual norm criterion. [#702](#702), [#818](#818), [#850](#850) + Row-major accessor is generalized to more than 2 dimensions and a new "block column-major" accessor has been added. [#707](#707) + Add an heat equation example. [#698](#698), [#706](#706) + Add ccache support in CMake and CI. [#725](#725), [#739](#739) + Allow tuning and benchmarking variables non intrusively. [#692](#692) + Add triangular solver benchmark [#664](#664) + Add benchmarks for BLAS operations [#772](#772), [#829](#829) + Add support for different precisions and consistent index types in benchmarks. [#675](#675), [#828](#828) + Add a Github bot system to facilitate development and PR management. [#667](#667), [#674](#674), [#689](#689), [#853](#853) + Add Intel (DPC++) CI support and enable CI on HPC systems. [#736](#736), [#751](#751), [#781](#781) + Add ssh debugging for Github Actions CI. [#749](#749) + Add pipeline segmentation for better CI speed. [#737](#737) Changes: + Add a Scalar Jacobi specialization and kernels. [#808](#808), [#834](#834), [#854](#854) + Add implicit residual log for solvers and benchmarks. [#714](#714) + Change handling of the conjugate in the dense dot product. [#755](#755) + Improved Dense stride handling. [#774](#774) + Multiple improvements to the OpenMP kernels performance, including COO, an exclusive prefix sum, and more. [#703](#703), [#765](#765), [#740](#740) + Allow specialization of submatrix and other dense creation functions in solvers. [#718](#718) + Improved Identity constructor and treatment of rectangular matrices. [#646](#646) + Allow CUDA/HIP executors to select allocation mode. [#758](#758) + Check if executors share the same memory. [#670](#670) + Improve test install and smoke testing support. [#721](#721) + Update the JOSS paper citation and add publications in the documentation. [#629](#629), [#724](#724) + Improve the version output. [#806](#806) + Add some utilities for dim and span. [#821](#821) + Improved solver and preconditioner benchmarks. [#660](#660) + Improve benchmark timing and output. [#669](#669), [#791](#791), [#801](#801), [#812](#812) Fixes: + Sorting fix for the Jacobi preconditioner. [#659](#659) + Also log the first residual norm in CGS [#735](#735) + Fix BiCG and HIP CSR to work with complex matrices. [#651](#651) + Fix Coo SpMV on strided vectors. [#807](#807) + Fix segfault of extract_diagonal, add short-and-fat test. [#769](#769) + Fix device_reset issue by moving counter/mutex to device. [#810](#810) + Fix `EnableLogging` superclass. [#841](#841) + Support ROCm 4.1.x and breaking HIP_PLATFORM changes. [#726](#726) + Decreased test size for a few device tests. [#742](#742) + Fix multiple issues with our CMake HIP and RPATH setup. [#712](#712), [#745](#745), [#709](#709) + Cleanup our CMake installation step. [#713](#713) + Various simplification and fixes to the Windows CMake setup. [#720](#720), [#785](#785) + Simplify third-party integration. [#786](#786) + Improve Ginkgo device arch flags management. [#696](#696) + Other fixes and improvements to the CMake setup. [#685](#685), [#792](#792), [#705](#705), [#836](#836) + Clarification of dense norm documentation [#784](#784) + Various development tools fixes and improvements [#738](#738), [#830](#830), [#840](#840) + Make multiple operators/constructors explicit. [#650](#650), [#761](#761) + Fix some issues, memory leaks and warnings found by MSVC. [#666](#666), [#731](#731) + Improved solver memory estimates and consistent iteration counts [#691](#691) + Various logger improvements and fixes [#728](#728), [#743](#743), [#754](#754) + Fix for ForwardIterator requirements in iterator_factory. [#665](#665) + Various benchmark fixes. [#647](#647), [#673](#673), [#722](#722) + Various CI fixes and improvements. [#642](#642), [#641](#641), [#795](#795), [#783](#783), [#793](#793), [#852](#852) Related PR: #857

Release 1.4.0 to master The Ginkgo team is proud to announce the new Ginkgo minor release 1.4.0. This release brings most of the Ginkgo functionality to the Intel DPC++ ecosystem which enables Intel-GPU and CPU execution. The only Ginkgo features which have not been ported yet are some preconditioners. Ginkgo's mixed-precision support is greatly enhanced thanks to: 1. The new Accessor concept, which allows writing kernels featuring on-the-fly memory compression, among other features. The accessor can be used as header-only, see the [accessor BLAS benchmarks repository](https://github.com/ginkgo-project/accessor-BLAS/tree/develop) as a usage example. 2. All LinOps now transparently support mixed-precision execution. By default, this is done through a temporary copy which may have a performance impact but already allows mixed-precision research. Native mixed-precision ELL kernels are implemented which do not see this cost. The accessor is also leveraged in a new CB-GMRES solver which allows for performance improvements by compressing the Krylov basis vectors. Many other features have been added to Ginkgo, such as reordering support, a new IDR solver, Incomplete Cholesky preconditioner, matrix assembly support (only CPU for now), machine topology information, and more! Supported systems and requirements: + For all platforms, cmake 3.13+ + C++14 compliant compiler + Linux and MacOS + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+ + clang: 3.9+ + Intel compiler: 2018+ + Apple LLVM: 8.0+ + CUDA module: CUDA 9.0+ + HIP module: ROCm 3.5+ + DPC++ module: Intel OneAPI 2021.3. Set the CXX compiler to `dpcpp`. + Windows + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+ + Microsoft Visual Studio: VS 2019 + CUDA module: CUDA 9.0+, Microsoft Visual Studio + OpenMP module: MinGW or Cygwin. Algorithm and important feature additions: + Add a new DPC++ Executor for SYCL execution and other base utilities [#648](#648), [#661](#661), [#757](#757), [#832](#832) + Port matrix formats, solvers and related kernels to DPC++. For some kernels, also make use of a shared kernel implementation for all executors (except Reference). [#710](#710), [#799](#799), [#779](#779), [#733](#733), [#844](#844), [#843](#843), [#789](#789), [#845](#845), [#849](#849), [#855](#855), [#856](#856) + Add accessors which allow multi-precision kernels, among other things. [#643](#643), [#708](#708) + Add support for mixed precision operations through apply in all LinOps. [#677](#677) + Add incomplete Cholesky factorizations and preconditioners as well as some improvements to ILU. [#672](#672), [#837](#837), [#846](#846) + Add an AMGX implementation and kernels on all devices but DPC++. [#528](#528), [#695](#695), [#860](#860) + Add a new mixed-precision capability solver, Compressed Basis GMRES (CB-GMRES). [#693](#693), [#763](#763) + Add the IDR(s) solver. [#620](#620) + Add a new fixed-size block CSR matrix format (for the Reference executor). [#671](#671), [#730](#730) + Add native mixed-precision support to the ELL format. [#717](#717), [#780](#780) + Add Reverse Cuthill-McKee reordering [#500](#500), [#649](#649) + Add matrix assembly support on CPUs. [#644](#644) + Extends ISAI from triangular to general and spd matrices. [#690](#690) Other additions: + Add the possibility to apply real matrices to complex vectors. [#655](#655), [#658](#658) + Add functions to compute the absolute of a matrix format. [#636](#636) + Add symmetric permutation and improve existing permutations. [#684](#684), [#657](#657), [#663](#663) + Add a MachineTopology class with HWLOC support [#554](#554), [#697](#697) + Add an implicit residual norm criterion. [#702](#702), [#818](#818), [#850](#850) + Row-major accessor is generalized to more than 2 dimensions and a new "block column-major" accessor has been added. [#707](#707) + Add an heat equation example. [#698](#698), [#706](#706) + Add ccache support in CMake and CI. [#725](#725), [#739](#739) + Allow tuning and benchmarking variables non intrusively. [#692](#692) + Add triangular solver benchmark [#664](#664) + Add benchmarks for BLAS operations [#772](#772), [#829](#829) + Add support for different precisions and consistent index types in benchmarks. [#675](#675), [#828](#828) + Add a Github bot system to facilitate development and PR management. [#667](#667), [#674](#674), [#689](#689), [#853](#853) + Add Intel (DPC++) CI support and enable CI on HPC systems. [#736](#736), [#751](#751), [#781](#781) + Add ssh debugging for Github Actions CI. [#749](#749) + Add pipeline segmentation for better CI speed. [#737](#737) Changes: + Add a Scalar Jacobi specialization and kernels. [#808](#808), [#834](#834), [#854](#854) + Add implicit residual log for solvers and benchmarks. [#714](#714) + Change handling of the conjugate in the dense dot product. [#755](#755) + Improved Dense stride handling. [#774](#774) + Multiple improvements to the OpenMP kernels performance, including COO, an exclusive prefix sum, and more. [#703](#703), [#765](#765), [#740](#740) + Allow specialization of submatrix and other dense creation functions in solvers. [#718](#718) + Improved Identity constructor and treatment of rectangular matrices. [#646](#646) + Allow CUDA/HIP executors to select allocation mode. [#758](#758) + Check if executors share the same memory. [#670](#670) + Improve test install and smoke testing support. [#721](#721) + Update the JOSS paper citation and add publications in the documentation. [#629](#629), [#724](#724) + Improve the version output. [#806](#806) + Add some utilities for dim and span. [#821](#821) + Improved solver and preconditioner benchmarks. [#660](#660) + Improve benchmark timing and output. [#669](#669), [#791](#791), [#801](#801), [#812](#812) Fixes: + Sorting fix for the Jacobi preconditioner. [#659](#659) + Also log the first residual norm in CGS [#735](#735) + Fix BiCG and HIP CSR to work with complex matrices. [#651](#651) + Fix Coo SpMV on strided vectors. [#807](#807) + Fix segfault of extract_diagonal, add short-and-fat test. [#769](#769) + Fix device_reset issue by moving counter/mutex to device. [#810](#810) + Fix `EnableLogging` superclass. [#841](#841) + Support ROCm 4.1.x and breaking HIP_PLATFORM changes. [#726](#726) + Decreased test size for a few device tests. [#742](#742) + Fix multiple issues with our CMake HIP and RPATH setup. [#712](#712), [#745](#745), [#709](#709) + Cleanup our CMake installation step. [#713](#713) + Various simplification and fixes to the Windows CMake setup. [#720](#720), [#785](#785) + Simplify third-party integration. [#786](#786) + Improve Ginkgo device arch flags management. [#696](#696) + Other fixes and improvements to the CMake setup. [#685](#685), [#792](#792), [#705](#705), [#836](#836) + Clarification of dense norm documentation [#784](#784) + Various development tools fixes and improvements [#738](#738), [#830](#830), [#840](#840) + Make multiple operators/constructors explicit. [#650](#650), [#761](#761) + Fix some issues, memory leaks and warnings found by MSVC. [#666](#666), [#731](#731) + Improved solver memory estimates and consistent iteration counts [#691](#691) + Various logger improvements and fixes [#728](#728), [#743](#743), [#754](#754) + Fix for ForwardIterator requirements in iterator_factory. [#665](#665) + Various benchmark fixes. [#647](#647), [#673](#673), [#722](#722) + Various CI fixes and improvements. [#642](#642), [#641](#641), [#795](#795), [#783](#783), [#793](#793), [#852](#852) Related PR: #866

ginkgo-bot added mod:core This is related to the core module. type:solver This is related to the solvers labels Jan 7, 2021

upsj force-pushed the fix_cgs_documentation branch 2 times, most recently from 2be1cd6 to 4edc7e1 Compare January 7, 2021 17:43

upsj changed the title ~~fix cgs documentation~~ Fix solver kernel documentation and add memory estimates Jan 7, 2021

upsj requested review from fritzgoebel, yhmtsai, pratikvn, Slaedr, tcojean and thoasm January 7, 2021 17:51

upsj added reg:documentation This is related to documentation. 1:ST:ready-for-review This PR is ready for review labels Jan 7, 2021

upsj mentioned this pull request Jan 7, 2021

Bump copyright year to 2021 #687

Merged

Slaedr reviewed Jan 8, 2021

View reviewed changes

yhmtsai reviewed Jan 11, 2021

View reviewed changes

upsj force-pushed the fix_cgs_documentation branch from 76edbed to 70c9966 Compare January 11, 2021 16:30

Slaedr reviewed Jan 12, 2021

View reviewed changes

core/solver/gmres.cpp Outdated Show resolved Hide resolved

core/solver/gmres.cpp Outdated Show resolved Hide resolved

core/solver/idr.cpp Outdated Show resolved Hide resolved

Slaedr approved these changes Jan 12, 2021

View reviewed changes

core/solver/idr.cpp Outdated Show resolved Hide resolved

core/solver/gmres.cpp Outdated Show resolved Hide resolved

core/solver/idr.cpp Outdated Show resolved Hide resolved

yhmtsai reviewed Jan 14, 2021

View reviewed changes

core/solver/gmres.cpp Outdated Show resolved Hide resolved

yhmtsai reviewed Jan 14, 2021

View reviewed changes

core/solver/gmres.cpp Outdated Show resolved Hide resolved

upsj reviewed Jan 20, 2021

View reviewed changes

upsj force-pushed the fix_cgs_documentation branch from 70c9966 to d2d26d9 Compare January 20, 2021 16:44

hartwiganzt requested a review from upsj January 21, 2021 13:29

upsj force-pushed the fix_cgs_documentation branch from 0faee57 to 44aab4b Compare January 22, 2021 10:49

yhmtsai approved these changes Jan 22, 2021

View reviewed changes

core/solver/idr.cpp Outdated Show resolved Hide resolved

core/solver/idr.cpp Outdated Show resolved Hide resolved

core/solver/gmres.cpp Show resolved Hide resolved

upsj added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Jan 27, 2021

tcojean mentioned this pull request Jan 28, 2021

Add Compressed Basis GMRES (CB-GMRES) #693

Merged

upsj self-assigned this Feb 2, 2021

upsj force-pushed the fix_cgs_documentation branch from 40352e2 to 19f9867 Compare February 5, 2021 12:27

upsj and others added 16 commits March 1, 2021 19:36

move solver kernel comments before calls

8093a6f

Co-authored-by: Hartwig Anzt <hanzt@icl.utk.edu>

use C++ headers for time.h

e43843d

add memory movement estimates to solvers

707b305

Co-authored-by: Hartwig Anzt <hanzt@icl.utk.edu>

update GMRES estimates to include precise counts

4ef2645

Co-authored-by: Aditya Kashi <aditya.kashi@kit.edu>

count SpMV operations as iterations in IDR

6597f0e

update IDR counts

99933f2

Review updates

5108f86

Co-authored-by: Yuhsiang Tsai <yhmtsai@gmail.com> Co-authored-by: Aditya Kashi <aditya.kashi@kit.edu>

exact gmres restart estimate

927c201

update GMRES estimates

b7be195

Co-authored-by: Aditya Kashi <aditya.kashi@kit.edu> Co-authored-by: Yuhsiang Tsai <yhmtsai@gmail.com>

update IDR estimates

df86217

Co-authored-by: Yuhsiang Tsai <yhmtsai@gmail.com>

Revert "count SpMV operations as iterations in IDR"

bb8bb2e

This reverts commit b59befe.

add stopping criterion norm2 call

b88b9fb

fix bicg iteration count in estimate

95a2f06

make cgs iterations match stopping checks

05a0ee6

fix BiCGSTAB half iteration log

61693d2

replace BiCGSTAB half-iteration by full iteration

d11ec4a

but still keep the half-iteration stop check in place

upsj force-pushed the fix_cgs_documentation branch from d5683f4 to d11ec4a Compare March 1, 2021 18:44

upsj reviewed Mar 2, 2021

View reviewed changes

core/solver/bicgstab.cpp Show resolved Hide resolved

upsj merged commit 118d031 into develop Mar 2, 2021

upsj deleted the fix_cgs_documentation branch March 2, 2021 13:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix solver kernel documentation and add memory estimates #691

Fix solver kernel documentation and add memory estimates #691

hartwiganzt commented Jan 7, 2021 •

edited by upsj

Loading

upsj commented Jan 7, 2021

codecov bot commented Jan 8, 2021 •

edited

Loading

Slaedr left a comment

upsj commented Jan 10, 2021

yhmtsai Jan 11, 2021

upsj Jan 11, 2021

hartwiganzt Jan 20, 2021

upsj Jan 20, 2021

hartwiganzt Jan 20, 2021

Slaedr left a comment

Slaedr commented Jan 12, 2021 •

edited

Loading

yhmtsai commented Jan 15, 2021 •

edited

Loading

upsj left a comment

upsj Jan 20, 2021

sonarcloud bot commented Jan 20, 2021

yhmtsai left a comment

sonarcloud bot commented Mar 2, 2021

Fix solver kernel documentation and add memory estimates #691

Fix solver kernel documentation and add memory estimates #691

Conversation

hartwiganzt commented Jan 7, 2021 • edited by upsj Loading

upsj commented Jan 7, 2021

codecov bot commented Jan 8, 2021 • edited Loading

Codecov Report

Slaedr left a comment

Choose a reason for hiding this comment

upsj commented Jan 10, 2021

yhmtsai Jan 11, 2021

Choose a reason for hiding this comment

upsj Jan 11, 2021

Choose a reason for hiding this comment

hartwiganzt Jan 20, 2021

Choose a reason for hiding this comment

upsj Jan 20, 2021

Choose a reason for hiding this comment

hartwiganzt Jan 20, 2021

Choose a reason for hiding this comment

Slaedr left a comment

Choose a reason for hiding this comment

Slaedr commented Jan 12, 2021 • edited Loading

yhmtsai commented Jan 15, 2021 • edited Loading

upsj left a comment

Choose a reason for hiding this comment

upsj Jan 20, 2021

Choose a reason for hiding this comment

sonarcloud bot commented Jan 20, 2021

yhmtsai left a comment

Choose a reason for hiding this comment

sonarcloud bot commented Mar 2, 2021

hartwiganzt commented Jan 7, 2021 •

edited by upsj

Loading

codecov bot commented Jan 8, 2021 •

edited

Loading

Slaedr commented Jan 12, 2021 •

edited

Loading

yhmtsai commented Jan 15, 2021 •

edited

Loading