Skip to content

BLAS API compatibility#159

Merged
simonpintarelli merged 6 commits intomasterfrom
blas-api-compatibility
Mar 2, 2026
Merged

BLAS API compatibility#159
simonpintarelli merged 6 commits intomasterfrom
blas-api-compatibility

Conversation

@simonpintarelli
Copy link
Copy Markdown
Member

@simonpintarelli simonpintarelli commented Jan 14, 2026

Ref #158

Accept unitialized matrix C for case beta==0, fill with zeros on entry to call to multiply

Need to wait for a new release of COSTA, with a routine to fill matrix with given value.

@simonpintarelli simonpintarelli marked this pull request as draft January 14, 2026 13:44
@simonpintarelli simonpintarelli marked this pull request as ready for review February 27, 2026 14:24
Copy link
Copy Markdown
Collaborator

@mtaillefumier mtaillefumier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@simonpintarelli simonpintarelli merged commit e1c61d9 into master Mar 2, 2026
1 check passed
exopoiesis pushed a commit to exopoiesis/COSMA that referenced this pull request Apr 7, 2026
…al_multiply_cpu

When beta=0, the BLAS specification states that matrix C may be
uninitialized and its contents must not be read. COSMA violated this
in two places:

1. multiply.cpp split_k path: iterations K>0 set beta=1 to accumulate
   partial results, but the C buffer from the memory pool was never
   initialized. With NaN/Inf in uninitialized memory, this produces
   garbage results (0*NaN=NaN per IEEE 754, and 1*NaN=NaN for K>0).

2. local_multiply_cpu: `Cvalue *= beta` reads C even when beta=0.
   Per IEEE 754, 0.0 * NaN = NaN, so this corrupts the output.

Fix: (1) zero-fill C before the split_k loop when beta=0 and
divisor>1, (2) use conditional assignment in local_multiply_cpu.

Add regression tests (BetaZero/FloatUninitialized, DoubleUninitialized)
that fill C with NaN and verify beta=0 produces correct, NaN-free results.

Found while debugging sporadic segfaults in CP2K AIMD with GPU-accelerated
COSMA (parallel_gemm called with beta=0 from make_basis_lowdin).
PR eth-cscs#159 (v2.8.0) partially addressed this in multiply_using_layout but
did not cover the internal split_k accumulation or local_multiply_cpu.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants