Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sci-libs/rocBLAS: various fixes #23450

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
@@ -0,0 +1,22 @@
Those tests will fail comparing rocblas vs openblas, because the testing program is so strict that it cannot tolerate the numerical differences which is actually OK.

https://github.com/ROCmSoftwarePlatform/rocBLAS/issues/1202
--- orig/clients/gtest/known_bugs.yaml
+++ rocBLAS-rocm-4.3.0/clients/gtest/known_bugs.yaml
@@ -7,6 +7,16 @@ Known bugs:
- { function: gemm_ex, a_type: i8_r, b_type: i8_r, c_type: i32_r, d_type: i32_r, compute_type: i32_r, flags: 0, known_bug_platforms: "gfx900,gfx906,gfx1010,gfx1011,gfx1012,gfx1030" }
- { function: gemm_batched_ex, a_type: i8_r, b_type: i8_r, c_type: i32_r, d_type: i32_r, compute_type: i32_r, flags: 0, known_bug_platforms: "gfx900,gfx906,gfx90a,gfx1010,gfx1011,gfx1012,gfx1030" }
- { function: gemm_strided_batched_ex, a_type: i8_r, b_type: i8_r, c_type: i32_r, d_type: i32_r, compute_type: i32_r, flags: 0, known_bug_platforms: "gfx900,gfx906,gfx1010,gfx1011,gfx1012,gfx1030" }
+# gemv openblas reference differences due to summation order dependent roundoff accumulation with large M float complex
+# 8th significant digit difference vs CPU on single precision float math, leads to expected equality test failure
+# code needs to be changed to a tolerance test or reduce M for float complex type if using equality vs. CPU reference
+- { function: gemv, a_type: f32_c, transA: T, M: 131071 }
+- { function: gemv, a_type: f32_c, transA: C, M: 131071 }
+- { function: gemv_batched, a_type: f32_c, transA: T, M: 131071 }
+- { function: gemv_batched, a_type: f32_c, transA: C, M: 131071 }
+- { function: gemv_strided_batched, a_type: f32_c, transA: T, M: 131071 }
+- { function: gemv_strided_batched, a_type: f32_c, transA: C, M: 131071 }
+

#- { function: gemm_ex, a_type: bf16_r, b_type: bf16_r, c_type: bf16_r, d_type: bf16_r, compute_type: f32_r, transA: C, transB: N, M: 512, N: 512, K: 512, lda: 512, ldb: 512, ldc: 512, ldd: 512, alpha: 5.0, alphai: 0.0, beta: 0.0, betai: 0.0, known_bug_platforms: gfx908 }
#- { function: gemm_ex, a_type: bf16_r, b_type: bf16_r, c_type: bf16_r, d_type: bf16_r, compute_type: f32_r, transA: C, transB: N, M: 512, N: 512, K: 512, lda: 512, ldb: 512, ldc: 512, ldd: 512, alpha: 0.0, alphai: 0.0, beta: 3.0, betai: 0.0, known_bug_platforms: gfx908 }
11 changes: 6 additions & 5 deletions sci-libs/rocBLAS/rocBLAS-4.3.0.ebuild
@@ -1,18 +1,18 @@
# Copyright 1999-2021 Gentoo Authors
# Copyright 1999-2022 Gentoo Authors
# Distributed under the terms of the GNU General Public License v2

EAPI=7

PYTHON_COMPAT=( python3_{6..9} )

inherit cmake prefix python-any-r1
inherit cmake multiprocessing prefix python-any-r1

DESCRIPTION="AMD's library for BLAS on ROCm."
HOMEPAGE="https://github.com/ROCmSoftwarePlatform/rocBLAS"
SRC_URI="https://github.com/ROCmSoftwarePlatform/rocBLAS/archive/rocm-${PV}.tar.gz -> rocm-${P}.tar.gz
https://github.com/ROCmSoftwarePlatform/Tensile/archive/rocm-${PV}.tar.gz -> rocm-Tensile-${PV}.tar.gz"

LICENSE="MIT"
LICENSE="BSD"
KEYWORDS="~amd64"
IUSE="benchmark test"
SLOT="0/$(ver_cut 1-2)"
Expand Down Expand Up @@ -46,14 +46,16 @@ S="${WORKDIR}"/${PN}-rocm-${PV}

PATCHES=("${FILESDIR}"/${PN}-4.3.0-fix-glibc-2.32-and-above.patch
"${FILESDIR}"/${PN}-4.3.0-change-default-Tensile-library-dir.patch
"${FILESDIR}"/${PN}-4.3.0-link-system-blas.patch )
"${FILESDIR}"/${PN}-4.3.0-link-system-blas.patch
"${FILESDIR}"/${PN}-4.3.0-remove-problematic-test-suites.patch )

src_prepare() {
eapply_user

pushd "${WORKDIR}"/Tensile-rocm-${PV} || die
eapply "${FILESDIR}/Tensile-${PV}-hsaco-compile-specified-arch.patch" # backported from upstream, should remove after 4.3.0
eapply "${FILESDIR}/Tensile-4.3.0-output-commands.patch"
sed -e "/Number of parallel jobs to launch/s:default=-1:default=$(makeopts_jobs):" -i Tensile/TensileCreateLibrary.py || die
popd || die

# Fit for Gentoo FHS rule
Expand Down Expand Up @@ -97,7 +99,6 @@ src_configure() {
-DBUILD_CLIENTS_TESTS=$(usex test ON OFF)
-DBUILD_CLIENTS_BENCHMARKS=$(usex benchmark ON OFF)
${AMDGPU_TARGETS+-DAMDGPU_TARGETS="${AMDGPU_TARGETS}"}
-D__skip_rocmclang="ON" ## fix cmake-3.21 configuration issue caused by officialy support programming language "HIP"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depend on a newer CMake version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only cmake-3.21.1 and cmake-3.21.2 need this, version below this or higher than this don't need or recognize this flag, and will throw a warning (like https://bugs.gentoo.org/829326 described). The two version does not exists in portage tree now.

)

CXX="hipcc" cmake_src_configure
Expand Down