Check if executors share the same memory #670

tcojean · 2020-11-24T10:00:54Z

This PR contains only the first commit of #652.

This implements a new executor function memory_accessible between executors which verifies whether they share the same memory. Currently, this functionality is used in the temporary_clone function to save copies.
This currently means that:

Any OpenMP executor has the same memory has another OpenMP executor.
Same for Reference executors.
Same for CUDA executors if they have the same device id, and same for HIP executors.
OpenMP and Reference also share the same memory, and same for a host-side DPC++ executor.
HIP and CUDA have the same memory if they are the same device id and HIP actually uses the CUDA backend.

yhmtsai

how about we still isolate the reference executor?
If we optimize omp executor with some additional information like csr in cuda/hip,
considering omp/reference as the same memory will make the conversion failed.

dpcpp/base/executor.dp.cpp

include/ginkgo/core/base/executor.hpp

dpcpp/base/executor.dp.cpp

upsj · 2020-11-25T11:58:18Z

Awesome idea, this might also help later on with the MPI executor. Two comments before I review it in more detail:

This slightly changes the semantics of make_temporary_clone since the returned object no longer necessarily exists on the executor we passed in. Do we use this other executor somewhere?
I would prefer if we used a member function memory_accessible or something like that instead of overloading operator==, since that might prevent us from implementing full executor equality checks later.

tcojean · 2020-11-26T09:38:59Z

If we optimize omp executor with some additional information like csr in cuda/hip,
considering omp/reference as the same memory will make the conversion failed.

I don't think this is such a big issue. If the algorithm needs to clone anyway, gko::clone will still work. If it needs to transform the memory without copies, it can do as long as it doesn't break anything, or it's put back to the previous state. You could always call in make_srow() equivalent in that situation to update the strategy related data? Also, it doesn't seem like we have such a case for now.

This slightly changes the semantics of make_temporary_clone since the returned object no longer necessarily exists on the executor we passed in. Do we use this other executor somewhere?

Indeed, it does in two ways: 1) temporary clone is called after this PR only when there is no direct access to memory, and 2) we do not ensure anymore that LinOps have exactly the same executors when calling a function like apply() or however else we combine them. So far in terms of tests that seems to work, but I'm indeed not sure whether we have such an extensive testing of all combinations to prove that at a 100% level. I believe that's where integration tests would be useful.

I would prefer if we used a member function memory_accessible or something like that instead of overloading operator==, since that might prevent us from implementing full executor equality checks later.

Sure, I don't see how else we would use operator== than this but that does make it more future proof, so I did this change.

core/test/base/array.cpp

pratikvn

LGTM! But I have one question: This PR changes the behaviour of make_temporary_clone and make_temporary_clone seems to be part of the public interface.

Doesn't that mean we would be possibly breaking existing code ?

dpcpp/base/executor.dp.cpp

include/ginkgo/core/base/executor.hpp

pratikvn · 2020-11-27T10:00:36Z

include/ginkgo/core/base/executor.hpp

+        std::for_each(device_type_.begin(), device_type_.end(),
+                      [](char &c) { c = std::tolower(c); });


tcojean · 2020-11-27T10:36:42Z

I don't think changing the temporary clone interface is interface breaking, since the point of this function is to do a copy when you need it. This still happens now, all that changes is that you get less copies when the data is already sitting on memory that you can access.

Slaedr

Wow, this is an intricate piece of work.

My understanding of why we need a two-level dispatch:

Because we do not want to template Executor on the concrete executor type, we have an ExecutorBase template with CRTP, and this is a friend of Executor for all CRTP values. The verify_memory_to functions, which do the actual work, depend on being overloaded on the concrete executor types. This means calling one of them requires knowledge of the concrete type. Therefore, Executor cannot directly call verify_memory_to, so it calls verify_memory_from. This second function is not overloaded but is implemented in ExecutorBase and thus has access to its concrete type, and can call verify_memory_to of its argument, passing its concrete self as the argument in turn.

Is that correct? Does this get affected when actual memory space classes are introduced, or has the plan for that changed?

I have a minor reservation which is a point that @pratikvn raised, about consistency in using pointers and references for verify_memory_to and verify_memory_from.

tcojean · 2020-11-30T16:22:59Z

@Slaedr yes you are pretty much exactly correct about that design, this is how it works. The idea of this double dispatch with the CRTP enabled class in the middle is to allow to evaluate to the actual executor type so that we can call the proper verify_memory_to, for the proper target executor.

About the memory space classes, either this will stay with memory spaces instead of executors, or, more likely, we build the memory spaces properly so that they are shared when they should be, and anyway we don't need all of this anymore.

About using pointers, yes I will change to that, this is a leftover of the previous design which I started changing in some recent commits, and had not done so for the deepest levels.

Slaedr · 2020-11-30T19:26:47Z

About the memory space classes, either this will stay with memory spaces instead of executors, or, more likely, we build the memory spaces properly so that they are shared when they should be, and anyway we don't need all of this anymore.

So all this is a only temporary measure until the full memory spaces are implemented? How much of this will be retained?

tcojean · 2020-11-30T19:53:02Z

About the memory space classes, either this will stay with memory spaces instead of executors, or, more likely, we build the memory spaces properly so that they are shared when they should be, and anyway we don't need all of this anymore.

So all this is a only temporary measure until the full memory spaces are implemented? How much of this will be retained?

That is hard to say. First of all, the memory space is definitely interface breaking so it could be in a while still, whereas this is not. Removing this and replacing by memory spaces will be interface breaking as well, but not much more since memory spaces are interface breaking to begin with. Second, it depends as I said on how the memory space implementation is done. You could keep this together with memory space, if we somehow want to keep a CudaMemory and HipMemory for example, you would still need to say Cuda == HIP when HIP is ran on CUDA and both have the same device id. If it's done in another way, where HIP executor gets a CudaMemory space while using a CUDA backend, then in this case you don't need the verify memory overload.

github-actions · 2020-12-01T16:37:14Z

Error: The following files need to be formatted:

core/device_hooks/dpcpp_hooks.cpp
cuda/test/base/array.cu
include/ginkgo/core/base/dim.hpp
include/ginkgo/core/base/exception_helpers.hpp
include/ginkgo/core/base/executor.hpp
include/ginkgo/core/base/temporary_clone.hpp
include/ginkgo/core/base/utils.hpp
include/ginkgo/core/base/utils_helper.hpp
include/ginkgo/core/log/logger.hpp
include/ginkgo/ginkgo.hpp

You can find a formatting patch under Artifacts here or run format! if you have write access to Ginkgo

github-actions · 2020-12-01T16:37:21Z

Error: The following files need to be formatted:

core/device_hooks/dpcpp_hooks.cpp
cuda/test/base/array.cu
include/ginkgo/core/base/dim.hpp
include/ginkgo/core/base/exception_helpers.hpp
include/ginkgo/core/base/executor.hpp
include/ginkgo/core/base/temporary_clone.hpp
include/ginkgo/core/base/utils.hpp
include/ginkgo/core/base/utils_helper.hpp
include/ginkgo/core/log/logger.hpp
include/ginkgo/ginkgo.hpp

You can find a formatting patch under Artifacts here or run format! if you have write access to Ginkgo

tcojean · 2020-12-02T11:46:57Z

rebase!

github-actions · 2020-12-02T11:47:28Z

Error: The following files need to be formatted:

core/device_hooks/dpcpp_hooks.cpp
cuda/test/base/array.cu
include/ginkgo/core/base/dim.hpp
include/ginkgo/core/base/exception_helpers.hpp
include/ginkgo/core/base/executor.hpp
include/ginkgo/core/base/temporary_clone.hpp
include/ginkgo/core/base/utils.hpp
include/ginkgo/core/base/utils_helper.hpp
include/ginkgo/core/log/logger.hpp
include/ginkgo/ginkgo.hpp

You can find a formatting patch under Artifacts here or run format! if you have write access to Ginkgo

tcojean · 2020-12-02T11:47:42Z

format!

codecov · 2020-12-02T20:01:23Z

Codecov Report

Merging #670 (018244f) into develop (b8a705c) will decrease coverage by 0.10%.
The diff coverage is 74.66%.

@@             Coverage Diff             @@
##           develop     #670      +/-   ##
===========================================
- Coverage    93.00%   92.89%   -0.11%     
===========================================
  Files          332      333       +1     
  Lines        24187    24265      +78     
===========================================
+ Hits         22495    22541      +46     
- Misses        1692     1724      +32

Impacted Files	Coverage Δ
core/device_hooks/dpcpp_hooks.cpp	`29.03% <0.00%> (-5.76%)`	⬇️
include/ginkgo/core/base/dim.hpp	`88.88% <ø> (ø)`
include/ginkgo/core/base/exception_helpers.hpp	`90.90% <ø> (ø)`
include/ginkgo/core/log/logger.hpp	`92.10% <ø> (ø)`
core/test/base/executor.cpp	`91.58% <47.05%> (-8.42%)`	⬇️
include/ginkgo/core/base/executor.hpp	`83.67% <73.07%> (-2.28%)`	⬇️
include/ginkgo/core/base/utils_helper.hpp	`87.17% <87.17%> (ø)`
core/devices/cuda/executor.cpp	`75.00% <100.00%> (+25.00%)`	⬆️
core/devices/hip/executor.cpp	`71.42% <100.00%> (+38.09%)`	⬆️
core/test/base/array.cpp	`100.00% <100.00%> (ø)`
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b8a705c...018244f. Read the comment docs.

include/ginkgo/core/base/executor.hpp

Slaedr

LGTM! Just one minor question.

Co-authored-with: Terry Cojean <terry.cojean@kit.edu>

+ Do not use `operator==`, but a funciton `memory_accessible` instead. + Make DPC++ host and CPU be memory compatible. + Use pointers for the interface instead of references. + Ensure DPC++ tests always work. + Fix some typos. Co-authored-by: Yuhsiang M. Tsai <yhmtsai@gmail.com> Co-authored-by: Tobias Ribizel <ribizel@kit.edu> Co-authored-by: Pratik Nayak <pratikvn@protonmail.com> Co-authored-by: Aditya Kashi <aditya.kashi@kit.edu>

Co-authored-by: tcojean <tcojean@users.noreply.github.com>

thoasm

LGTM, some minor comments.

include/ginkgo/core/base/executor.hpp

dpcpp/base/executor.dp.cpp

core/test/base/executor.cpp

tcojean

Thanks for the comments, I will fix the issues pointed out.

core/test/base/executor.cpp

dpcpp/base/executor.dp.cpp

Some code style issues. Co-authored-by: Thomas Grützmacher <thomas.gruetzmacher@kit.edu>

yhmtsai

LGTM. Thanks for handling this issue and add a lot of tests.
I have one question on the test.
we will make the reference executor isolated, right?

cuda/test/base/lin_op.cu

Slaedr · 2020-12-09T10:01:49Z

Could you remind me why we need the reference executor to have a different memory space from the omp executor? Do we document this somewhere?

yhmtsai · 2020-12-09T13:56:28Z

we have some specific data in different executor, like csr srow.
If we consider omp and reference in the same memory, when omp apply on reference data, the data will not be converted to the omp one. Thus, omp operation may access unallocated memory or wrong data.
Note. we only have it in cuda/hip not omp/reference now.

Another point is from testing or design.
We use reference executor/operation as our correct result to compare different executor.
May isolating the reference executor to avoid any unexpected operation or other executor make sense?

upsj

LGTM! Just a minor question

include/ginkgo/core/base/executor.hpp

sonarcloud · 2020-12-09T23:06:59Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
11 Code Smells

32.7% Coverage
0.0% Duplication

Ginkgo release 1.4.0 The Ginkgo team is proud to announce the new Ginkgo minor release 1.4.0. This release brings most of the Ginkgo functionality to the Intel DPC++ ecosystem which enables Intel-GPU and CPU execution. The only Ginkgo features which have not been ported yet are some preconditioners. Ginkgo's mixed-precision support is greatly enhanced thanks to: 1. The new Accessor concept, which allows writing kernels featuring on-the-fly memory compression, among other features. The accessor can be used as header-only, see the [accessor BLAS benchmarks repository](https://github.com/ginkgo-project/accessor-BLAS/tree/develop) as a usage example. 2. All LinOps now transparently support mixed-precision execution. By default, this is done through a temporary copy which may have a performance impact but already allows mixed-precision research. Native mixed-precision ELL kernels are implemented which do not see this cost. The accessor is also leveraged in a new CB-GMRES solver which allows for performance improvements by compressing the Krylov basis vectors. Many other features have been added to Ginkgo, such as reordering support, a new IDR solver, Incomplete Cholesky preconditioner, matrix assembly support (only CPU for now), machine topology information, and more! Supported systems and requirements: + For all platforms, cmake 3.13+ + C++14 compliant compiler + Linux and MacOS + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+ + clang: 3.9+ + Intel compiler: 2018+ + Apple LLVM: 8.0+ + CUDA module: CUDA 9.0+ + HIP module: ROCm 3.5+ + DPC++ module: Intel OneAPI 2021.3. Set the CXX compiler to `dpcpp`. + Windows + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+ + Microsoft Visual Studio: VS 2019 + CUDA module: CUDA 9.0+, Microsoft Visual Studio + OpenMP module: MinGW or Cygwin. Algorithm and important feature additions: + Add a new DPC++ Executor for SYCL execution and other base utilities [#648](#648), [#661](#661), [#757](#757), [#832](#832) + Port matrix formats, solvers and related kernels to DPC++. For some kernels, also make use of a shared kernel implementation for all executors (except Reference). [#710](#710), [#799](#799), [#779](#779), [#733](#733), [#844](#844), [#843](#843), [#789](#789), [#845](#845), [#849](#849), [#855](#855), [#856](#856) + Add accessors which allow multi-precision kernels, among other things. [#643](#643), [#708](#708) + Add support for mixed precision operations through apply in all LinOps. [#677](#677) + Add incomplete Cholesky factorizations and preconditioners as well as some improvements to ILU. [#672](#672), [#837](#837), [#846](#846) + Add an AMGX implementation and kernels on all devices but DPC++. [#528](#528), [#695](#695), [#860](#860) + Add a new mixed-precision capability solver, Compressed Basis GMRES (CB-GMRES). [#693](#693), [#763](#763) + Add the IDR(s) solver. [#620](#620) + Add a new fixed-size block CSR matrix format (for the Reference executor). [#671](#671), [#730](#730) + Add native mixed-precision support to the ELL format. [#717](#717), [#780](#780) + Add Reverse Cuthill-McKee reordering [#500](#500), [#649](#649) + Add matrix assembly support on CPUs. [#644](#644) + Extends ISAI from triangular to general and spd matrices. [#690](#690) Other additions: + Add the possibility to apply real matrices to complex vectors. [#655](#655), [#658](#658) + Add functions to compute the absolute of a matrix format. [#636](#636) + Add symmetric permutation and improve existing permutations. [#684](#684), [#657](#657), [#663](#663) + Add a MachineTopology class with HWLOC support [#554](#554), [#697](#697) + Add an implicit residual norm criterion. [#702](#702), [#818](#818), [#850](#850) + Row-major accessor is generalized to more than 2 dimensions and a new "block column-major" accessor has been added. [#707](#707) + Add an heat equation example. [#698](#698), [#706](#706) + Add ccache support in CMake and CI. [#725](#725), [#739](#739) + Allow tuning and benchmarking variables non intrusively. [#692](#692) + Add triangular solver benchmark [#664](#664) + Add benchmarks for BLAS operations [#772](#772), [#829](#829) + Add support for different precisions and consistent index types in benchmarks. [#675](#675), [#828](#828) + Add a Github bot system to facilitate development and PR management. [#667](#667), [#674](#674), [#689](#689), [#853](#853) + Add Intel (DPC++) CI support and enable CI on HPC systems. [#736](#736), [#751](#751), [#781](#781) + Add ssh debugging for Github Actions CI. [#749](#749) + Add pipeline segmentation for better CI speed. [#737](#737) Changes: + Add a Scalar Jacobi specialization and kernels. [#808](#808), [#834](#834), [#854](#854) + Add implicit residual log for solvers and benchmarks. [#714](#714) + Change handling of the conjugate in the dense dot product. [#755](#755) + Improved Dense stride handling. [#774](#774) + Multiple improvements to the OpenMP kernels performance, including COO, an exclusive prefix sum, and more. [#703](#703), [#765](#765), [#740](#740) + Allow specialization of submatrix and other dense creation functions in solvers. [#718](#718) + Improved Identity constructor and treatment of rectangular matrices. [#646](#646) + Allow CUDA/HIP executors to select allocation mode. [#758](#758) + Check if executors share the same memory. [#670](#670) + Improve test install and smoke testing support. [#721](#721) + Update the JOSS paper citation and add publications in the documentation. [#629](#629), [#724](#724) + Improve the version output. [#806](#806) + Add some utilities for dim and span. [#821](#821) + Improved solver and preconditioner benchmarks. [#660](#660) + Improve benchmark timing and output. [#669](#669), [#791](#791), [#801](#801), [#812](#812) Fixes: + Sorting fix for the Jacobi preconditioner. [#659](#659) + Also log the first residual norm in CGS [#735](#735) + Fix BiCG and HIP CSR to work with complex matrices. [#651](#651) + Fix Coo SpMV on strided vectors. [#807](#807) + Fix segfault of extract_diagonal, add short-and-fat test. [#769](#769) + Fix device_reset issue by moving counter/mutex to device. [#810](#810) + Fix `EnableLogging` superclass. [#841](#841) + Support ROCm 4.1.x and breaking HIP_PLATFORM changes. [#726](#726) + Decreased test size for a few device tests. [#742](#742) + Fix multiple issues with our CMake HIP and RPATH setup. [#712](#712), [#745](#745), [#709](#709) + Cleanup our CMake installation step. [#713](#713) + Various simplification and fixes to the Windows CMake setup. [#720](#720), [#785](#785) + Simplify third-party integration. [#786](#786) + Improve Ginkgo device arch flags management. [#696](#696) + Other fixes and improvements to the CMake setup. [#685](#685), [#792](#792), [#705](#705), [#836](#836) + Clarification of dense norm documentation [#784](#784) + Various development tools fixes and improvements [#738](#738), [#830](#830), [#840](#840) + Make multiple operators/constructors explicit. [#650](#650), [#761](#761) + Fix some issues, memory leaks and warnings found by MSVC. [#666](#666), [#731](#731) + Improved solver memory estimates and consistent iteration counts [#691](#691) + Various logger improvements and fixes [#728](#728), [#743](#743), [#754](#754) + Fix for ForwardIterator requirements in iterator_factory. [#665](#665) + Various benchmark fixes. [#647](#647), [#673](#673), [#722](#722) + Various CI fixes and improvements. [#642](#642), [#641](#641), [#795](#795), [#783](#783), [#793](#793), [#852](#852) Related PR: #857

Release 1.4.0 to master The Ginkgo team is proud to announce the new Ginkgo minor release 1.4.0. This release brings most of the Ginkgo functionality to the Intel DPC++ ecosystem which enables Intel-GPU and CPU execution. The only Ginkgo features which have not been ported yet are some preconditioners. Ginkgo's mixed-precision support is greatly enhanced thanks to: 1. The new Accessor concept, which allows writing kernels featuring on-the-fly memory compression, among other features. The accessor can be used as header-only, see the [accessor BLAS benchmarks repository](https://github.com/ginkgo-project/accessor-BLAS/tree/develop) as a usage example. 2. All LinOps now transparently support mixed-precision execution. By default, this is done through a temporary copy which may have a performance impact but already allows mixed-precision research. Native mixed-precision ELL kernels are implemented which do not see this cost. The accessor is also leveraged in a new CB-GMRES solver which allows for performance improvements by compressing the Krylov basis vectors. Many other features have been added to Ginkgo, such as reordering support, a new IDR solver, Incomplete Cholesky preconditioner, matrix assembly support (only CPU for now), machine topology information, and more! Supported systems and requirements: + For all platforms, cmake 3.13+ + C++14 compliant compiler + Linux and MacOS + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+ + clang: 3.9+ + Intel compiler: 2018+ + Apple LLVM: 8.0+ + CUDA module: CUDA 9.0+ + HIP module: ROCm 3.5+ + DPC++ module: Intel OneAPI 2021.3. Set the CXX compiler to `dpcpp`. + Windows + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+ + Microsoft Visual Studio: VS 2019 + CUDA module: CUDA 9.0+, Microsoft Visual Studio + OpenMP module: MinGW or Cygwin. Algorithm and important feature additions: + Add a new DPC++ Executor for SYCL execution and other base utilities [#648](#648), [#661](#661), [#757](#757), [#832](#832) + Port matrix formats, solvers and related kernels to DPC++. For some kernels, also make use of a shared kernel implementation for all executors (except Reference). [#710](#710), [#799](#799), [#779](#779), [#733](#733), [#844](#844), [#843](#843), [#789](#789), [#845](#845), [#849](#849), [#855](#855), [#856](#856) + Add accessors which allow multi-precision kernels, among other things. [#643](#643), [#708](#708) + Add support for mixed precision operations through apply in all LinOps. [#677](#677) + Add incomplete Cholesky factorizations and preconditioners as well as some improvements to ILU. [#672](#672), [#837](#837), [#846](#846) + Add an AMGX implementation and kernels on all devices but DPC++. [#528](#528), [#695](#695), [#860](#860) + Add a new mixed-precision capability solver, Compressed Basis GMRES (CB-GMRES). [#693](#693), [#763](#763) + Add the IDR(s) solver. [#620](#620) + Add a new fixed-size block CSR matrix format (for the Reference executor). [#671](#671), [#730](#730) + Add native mixed-precision support to the ELL format. [#717](#717), [#780](#780) + Add Reverse Cuthill-McKee reordering [#500](#500), [#649](#649) + Add matrix assembly support on CPUs. [#644](#644) + Extends ISAI from triangular to general and spd matrices. [#690](#690) Other additions: + Add the possibility to apply real matrices to complex vectors. [#655](#655), [#658](#658) + Add functions to compute the absolute of a matrix format. [#636](#636) + Add symmetric permutation and improve existing permutations. [#684](#684), [#657](#657), [#663](#663) + Add a MachineTopology class with HWLOC support [#554](#554), [#697](#697) + Add an implicit residual norm criterion. [#702](#702), [#818](#818), [#850](#850) + Row-major accessor is generalized to more than 2 dimensions and a new "block column-major" accessor has been added. [#707](#707) + Add an heat equation example. [#698](#698), [#706](#706) + Add ccache support in CMake and CI. [#725](#725), [#739](#739) + Allow tuning and benchmarking variables non intrusively. [#692](#692) + Add triangular solver benchmark [#664](#664) + Add benchmarks for BLAS operations [#772](#772), [#829](#829) + Add support for different precisions and consistent index types in benchmarks. [#675](#675), [#828](#828) + Add a Github bot system to facilitate development and PR management. [#667](#667), [#674](#674), [#689](#689), [#853](#853) + Add Intel (DPC++) CI support and enable CI on HPC systems. [#736](#736), [#751](#751), [#781](#781) + Add ssh debugging for Github Actions CI. [#749](#749) + Add pipeline segmentation for better CI speed. [#737](#737) Changes: + Add a Scalar Jacobi specialization and kernels. [#808](#808), [#834](#834), [#854](#854) + Add implicit residual log for solvers and benchmarks. [#714](#714) + Change handling of the conjugate in the dense dot product. [#755](#755) + Improved Dense stride handling. [#774](#774) + Multiple improvements to the OpenMP kernels performance, including COO, an exclusive prefix sum, and more. [#703](#703), [#765](#765), [#740](#740) + Allow specialization of submatrix and other dense creation functions in solvers. [#718](#718) + Improved Identity constructor and treatment of rectangular matrices. [#646](#646) + Allow CUDA/HIP executors to select allocation mode. [#758](#758) + Check if executors share the same memory. [#670](#670) + Improve test install and smoke testing support. [#721](#721) + Update the JOSS paper citation and add publications in the documentation. [#629](#629), [#724](#724) + Improve the version output. [#806](#806) + Add some utilities for dim and span. [#821](#821) + Improved solver and preconditioner benchmarks. [#660](#660) + Improve benchmark timing and output. [#669](#669), [#791](#791), [#801](#801), [#812](#812) Fixes: + Sorting fix for the Jacobi preconditioner. [#659](#659) + Also log the first residual norm in CGS [#735](#735) + Fix BiCG and HIP CSR to work with complex matrices. [#651](#651) + Fix Coo SpMV on strided vectors. [#807](#807) + Fix segfault of extract_diagonal, add short-and-fat test. [#769](#769) + Fix device_reset issue by moving counter/mutex to device. [#810](#810) + Fix `EnableLogging` superclass. [#841](#841) + Support ROCm 4.1.x and breaking HIP_PLATFORM changes. [#726](#726) + Decreased test size for a few device tests. [#742](#742) + Fix multiple issues with our CMake HIP and RPATH setup. [#712](#712), [#745](#745), [#709](#709) + Cleanup our CMake installation step. [#713](#713) + Various simplification and fixes to the Windows CMake setup. [#720](#720), [#785](#785) + Simplify third-party integration. [#786](#786) + Improve Ginkgo device arch flags management. [#696](#696) + Other fixes and improvements to the CMake setup. [#685](#685), [#792](#792), [#705](#705), [#836](#836) + Clarification of dense norm documentation [#784](#784) + Various development tools fixes and improvements [#738](#738), [#830](#830), [#840](#840) + Make multiple operators/constructors explicit. [#650](#650), [#761](#761) + Fix some issues, memory leaks and warnings found by MSVC. [#666](#666), [#731](#731) + Improved solver memory estimates and consistent iteration counts [#691](#691) + Various logger improvements and fixes [#728](#728), [#743](#743), [#754](#754) + Fix for ForwardIterator requirements in iterator_factory. [#665](#665) + Various benchmark fixes. [#647](#647), [#673](#673), [#722](#722) + Various CI fixes and improvements. [#642](#642), [#641](#641), [#795](#795), [#783](#783), [#793](#793), [#852](#852) Related PR: #866

tcojean added is:enhancement An improvement of an existing feature. mod:core This is related to the core module. 1:ST:ready-for-review This PR is ready for review labels Nov 24, 2020

tcojean requested review from upsj, pratikvn, Slaedr, thoasm and yhmtsai November 24, 2020 10:00

tcojean self-assigned this Nov 24, 2020

tcojean mentioned this pull request Nov 24, 2020

Add global constant and different executors on the same device do not need memory movement #652

Closed

2 tasks

tcojean changed the title ~~Implement executor equal_to (==)~~ Check if executors share the same memory Nov 24, 2020

yhmtsai requested changes Nov 24, 2020

View reviewed changes

dpcpp/base/executor.dp.cpp Outdated Show resolved Hide resolved

include/ginkgo/core/base/executor.hpp Show resolved Hide resolved

dpcpp/base/executor.dp.cpp Outdated Show resolved Hide resolved

Slaedr reviewed Nov 27, 2020

View reviewed changes

core/test/base/array.cpp Outdated Show resolved Hide resolved

tcojean force-pushed the exec_equal_to branch 2 times, most recently from 50361e6 to 6acf8eb Compare November 27, 2020 08:58

pratikvn approved these changes Nov 27, 2020

View reviewed changes

Slaedr reviewed Nov 27, 2020

View reviewed changes

tcojean force-pushed the exec_equal_to branch 2 times, most recently from 78f8a62 to c9d73fd Compare December 1, 2020 16:18

tcojean force-pushed the exec_equal_to branch from c9d73fd to 4f0ec9c Compare December 2, 2020 11:46

Slaedr reviewed Dec 3, 2020

View reviewed changes

include/ginkgo/core/base/executor.hpp Show resolved Hide resolved

Slaedr approved these changes Dec 3, 2020

View reviewed changes

yhmtsai and others added 3 commits December 4, 2020 16:19

Add executor operator== for memory compatibility

33c576a

Co-authored-with: Terry Cojean <terry.cojean@kit.edu>

Format files

bd4936f

Co-authored-by: tcojean <tcojean@users.noreply.github.com>

tcojean force-pushed the exec_equal_to branch from 727584b to bd4936f Compare December 4, 2020 15:24

thoasm approved these changes Dec 8, 2020

View reviewed changes

tcojean commented Dec 8, 2020

View reviewed changes

core/test/base/executor.cpp Outdated Show resolved Hide resolved

dpcpp/base/executor.dp.cpp Show resolved Hide resolved

Review updates.

8608400

Some code style issues. Co-authored-by: Thomas Grützmacher <thomas.gruetzmacher@kit.edu>

yhmtsai approved these changes Dec 8, 2020

View reviewed changes

cuda/test/base/lin_op.cu Show resolved Hide resolved

tcojean force-pushed the exec_equal_to branch 2 times, most recently from 9dd05db to 80297f3 Compare December 9, 2020 10:46

upsj approved these changes Dec 9, 2020

View reviewed changes

include/ginkgo/core/base/executor.hpp Outdated Show resolved Hide resolved

Isolate Reference from other executors in memory.

018244f

tcojean force-pushed the exec_equal_to branch from 80297f3 to 018244f Compare December 9, 2020 17:36

tcojean added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Dec 10, 2020

tcojean merged commit 2a951ac into develop Dec 11, 2020

tcojean deleted the exec_equal_to branch December 11, 2020 08:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check if executors share the same memory #670

Check if executors share the same memory #670

tcojean commented Nov 24, 2020 •

edited

Loading

yhmtsai left a comment

upsj commented Nov 25, 2020

tcojean commented Nov 26, 2020 •

edited

Loading

pratikvn left a comment

pratikvn Nov 27, 2020

tcojean commented Nov 27, 2020

Slaedr left a comment

tcojean commented Nov 30, 2020

Slaedr commented Nov 30, 2020

tcojean commented Nov 30, 2020 •

edited

Loading

github-actions bot commented Dec 1, 2020

github-actions bot commented Dec 1, 2020

tcojean commented Dec 2, 2020

github-actions bot commented Dec 2, 2020

tcojean commented Dec 2, 2020

codecov bot commented Dec 2, 2020 •

edited

Loading

Slaedr left a comment

thoasm left a comment

tcojean left a comment

yhmtsai left a comment

Slaedr commented Dec 9, 2020

yhmtsai commented Dec 9, 2020

upsj left a comment

sonarcloud bot commented Dec 9, 2020 •

edited

Loading

		std::for_each(device_type_.begin(), device_type_.end(),
		[](char &c) { c = std::tolower(c); });

Check if executors share the same memory #670

Check if executors share the same memory #670

Conversation

tcojean commented Nov 24, 2020 • edited Loading

yhmtsai left a comment

Choose a reason for hiding this comment

upsj commented Nov 25, 2020

tcojean commented Nov 26, 2020 • edited Loading

pratikvn left a comment

Choose a reason for hiding this comment

pratikvn Nov 27, 2020

Choose a reason for hiding this comment

tcojean commented Nov 27, 2020

Slaedr left a comment

Choose a reason for hiding this comment

tcojean commented Nov 30, 2020

Slaedr commented Nov 30, 2020

tcojean commented Nov 30, 2020 • edited Loading

github-actions bot commented Dec 1, 2020

github-actions bot commented Dec 1, 2020

tcojean commented Dec 2, 2020

github-actions bot commented Dec 2, 2020

tcojean commented Dec 2, 2020

codecov bot commented Dec 2, 2020 • edited Loading

Codecov Report

Slaedr left a comment

Choose a reason for hiding this comment

thoasm left a comment

Choose a reason for hiding this comment

tcojean left a comment

Choose a reason for hiding this comment

yhmtsai left a comment

Choose a reason for hiding this comment

Slaedr commented Dec 9, 2020

yhmtsai commented Dec 9, 2020

upsj left a comment

Choose a reason for hiding this comment

sonarcloud bot commented Dec 9, 2020 • edited Loading

tcojean commented Nov 24, 2020 •

edited

Loading

tcojean commented Nov 26, 2020 •

edited

Loading

tcojean commented Nov 30, 2020 •

edited

Loading

codecov bot commented Dec 2, 2020 •

edited

Loading

sonarcloud bot commented Dec 9, 2020 •

edited

Loading