-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LAPACK test segfault on zen/zen2/zen3 at bli_sgemmsup_rd_haswell_asm_1x16n #821
Comments
Would it be possible to extract the specific |
Just a remark -- I deleted my last two comments as the test code in them was incorrect. Better code to come! |
Here is a test code with some assertions included.
Here is a successful run:
Here is unsuccessful run:
|
@fgvanzee since you're the most familiar with this code, could you take a look? |
@devinamatthews Sure, I'll see what I can figure out. |
@j-bm I used your Fortran driver (thanks for providing that!), but was unable to reproduce your issue. :-\
|
This is running flame/blis version 1.0.0 .tar.gz file. Built on OpenBSD-current (as of a few weeks ago) using
I rebuilt on another cpu (Intel i3-series instead of Ryzen) but had the same issues at the same routine. Both are Windows10/11 running VMware so all my tests are on a VM guest not actual hardware. Some further notes and trials:
|
Here is a debugging run with a modified test program which prints the address of the allocated arrays. code fragment:
debugger output:
This shows that $rax is pointing at the last element of array A.
The dereference fails:
Which suggests a bug, accessing beyond the end of array A. |
@j-bm Thank you for those additional details, they were quite helpful! I think you helped us narrow it down to the last phase of the edge case handling code in the offending s1x16n kernel. In Please try deleting this line and let us know if it fixes the bug. label(.SLOOPKLEFT1) // EDGE LOOP (scalar)
// NOTE: We must use ymm registers here bc
// using the xmm registers would zero out the
// high bits of the destination registers,
// which would destory intermediate results.
vmovss(mem(rax ), xmm0)
vmovss(mem(rax, r8, 1), xmm1) // ***TRY DELETING THIS LINE
add(imm(1*4), rax) // a += 1*cs_a = 1*4;
vmovss(mem(rbx ), xmm3)
vfmadd231ps(ymm0, ymm3, ymm4)
vmovss(mem(rbx, r11, 1), xmm3)
vfmadd231ps(ymm0, ymm3, ymm7)
vmovss(mem(rbx, r11, 2), xmm3)
vfmadd231ps(ymm0, ymm3, ymm10)
vmovss(mem(rbx, r13, 1), xmm3)
add(imm(1*4), rbx) // b += 1*rs_b = 1*4;
vfmadd231ps(ymm0, ymm3, ymm13)
dec(rsi) // i -= 1;
jne(.SLOOPKLEFT1) // iterate again if i != 0. |
Yes, that fixes the issue. Did that extra instruction do anything important (other that segfaulting)? |
Great news. Thanks for your help!
No, it was 100% a copy-paste bug. I probably started with the 2x16 case and deleted instructions until it became a 1x16, but then forgot to delete that instruction (which would have loaded the second of the two elements of A). I'll open a PR with the fix and credit you. I really appreciate your feedback! |
@j-bm Sorry for getting the line numbers a little wrong btw. I had forgotten that I had inserted |
Details: - Fixed a bug in the bli_sgemmsup_rd_haswell_asm_1x16n() millikernel. The kernel was erroneously performing an out-of-bounds read whenever the singleton edge case loop executed (that is, whenever the k dimension of the millikernel problem was not a multiple of 8). This OOB error was the result of a copy-paste bug; when developing the s1x16n function, I started from a copy of the s2x16n function, but then failed to delete the instruction that reads the second element of A in the code that handles the PR loop's edge case. Thanks to @j-bm for reporting this bug in Issue #821 and helping narrow down the cause to the rax register.
Thanks for the quick fix! |
I'm going to close this issue now. If you encounter any further problems or concerns, please let us know. |
Hi @fgvanzee , |
I completely agree. Thanks for that reminder, @BhaskarNallani! |
Details: - Fixed a bug in the bli_sgemmsup_rd_haswell_asm_1x16n() millikernel. The kernel was erroneously performing an out-of-bounds read whenever the singleton edge case loop executed (that is, whenever the k dimension of the millikernel problem was not a multiple of 8). This OOB error was the result of a copy-paste bug; when developing the s1x16n function, I started from a copy of the s2x16n function, but then failed to delete the instruction that reads the second element of A in the code that handles the PR loop's edge case. Thanks to @j-bm for reporting this bug in Issue #821 and helping narrow down the cause to the rax register. - CREDITS file update.
Details: - Previously, if the user enabled CBLAS via 'configure --enable-cblas' and then ran 'make', the flattened blis.h header file would be created immediately, but the flattened cblas.h header file would not be created until 'make install' was run. This was happening because nothing in the BLIS build process (except installation) depended on the flattened cblas.h (whereas *everything* depends on the flattened blis.h, and therefore it was being created first). This behavior can be confusing to application developers who could reasonably expect that the flattened cblas.h header would be available (to inspect or use) prior to running 'make install'. - This commit fixes the aforementioned issue by (1) adding cblas.h (if CBLAS is enabled) as a dependency to all of the build rules for core framework object files, and (2) making the flattened blis.h a prerequisite for flattening cblas.h. The upshot is that (1) ensures that the flattened cblas.h is created around the the same time that the flattened blis.h is created, and (2) ensures that the two headers are flattened sequentially (first blis.h and then cblas.h) even when using 'make -j[n]', which ensures that the output of the two processes do not comingle. - Thanks to Jeff Diamond for reporting this issue. - (cherry picked from commit 8d9be87) Fixed out-of-bounds read bug in sup haswell ukr. (#824) Details: - Fixed a bug in the bli_sgemmsup_rd_haswell_asm_1x16n() millikernel. The kernel was erroneously performing an out-of-bounds read whenever the singleton edge case loop executed (that is, whenever the k dimension of the millikernel problem was not a multiple of 8). This OOB error was the result of a copy-paste bug; when developing the s1x16n function, I started from a copy of the s2x16n function, but then failed to delete the instruction that reads the second element of A in the code that handles the PR loop's edge case. Thanks to @j-bm for reporting this bug in Issue #821 and helping narrow down the cause to the rax register. - CREDITS file update. - (cherry picked from commit a822cb2) Fixed typo in 4158930; variable renames. (#815) Details: - Fixed a typo in the "./configure --help" output for the ScaLAPACK compatibility option implemented in 4158930. - Trivial variable renames. - (cherry picked from commit 8820f8f) Fix a bug in the piledriver microkernels. (#814) Details: - At some point, the piledriver (and bulldozer and excavator) microkernel tests via SDE had been removed from Travis CI testing. This PR re-enables them. - A bug in the piledriver complex gemm microkernels has also been fixed. The `beta*C` product was not being correctly added to the `A*B` product before writing back out to memory. - Fixes #811. - (cherry picked from commit 31ecf82) Add ScaLAPACK compatibility mode. (#813) Details: - Add configure options '--enable-scalapack-compat' and '--disabled-scalapack-compat' (default disabled). - Add a macro BLIS_{ENABLE,DISABLE}_SCALAPACK_COMPAT to bli_config.h. - This option and macro control any changes to the API necessary to maintain compatibility with ScaLAPACK. Currently, this only means disabling the complex versions of syr, syr2, and symv. In the future, other changes could be controlled by the same flag. - Complex syr2 wasn't enabled at the same time that complex syr and symv were. This is now corrected. - (cherry picked from commit 4158930) Update CREDITS - (cherry picked from commit 5cbec65) Fix SyntaxWarning messages from python 3.12 (#809) Details: - When using regexes in Python, certain characters need backslash escaping, e.g.: regex = re.compile( '^[\s]*#include (["<])([\w\.\-/]*)([">])' ) However, technically escape sequences like `\s` are not valid and should actually be double-escaped: `\\s`. Python 3.12 now warns about such escape sequences, and in a later version these warning will be promoted to errors. See also: https://docs.python.org/dev/whatsnew/3.12.html#other-language-changes The fix here is to use Python's "raw strings" to avoid double-escaping. This issue can be checked for all files in the current directory with the command: python -m compileall -d . -f -q . Thanks to @AngryLoki for the fix. - (cherry picked from commit 729c57c) Updates to README.md section on downloading. Details: - Updated the text in README.md in the "How to Download BLIS" section. The new text no longer recommends that the reader use the 'master' branch over official releases, as the previous text did. The text was tweaked since (a) the 'master' branch is now akin to a development branch, and (b) the reader will no longer forgo bugfixes by sticking to official releases since we will (going forward) publish bugfix releases for the most recent version. - (cherry picked from 6d0ab74) Updated RELEASING file; fixes to ReleaseNotes.md. Details: - Updated RELEASING file to reflect new release protocols, given the more sophisticated policy of maintaining release candidate branches separate from 'master' (which is now more akin to a development branch). Further refinements to this file will likely follow. - Fixed typos in ReleaseNotes.md. Thanks to Robert van de Geijn for reporting these. - (cherry picked from 01e151a) ReleaseNotes.md update. - (cherry picked from 06dddf1) CHANGELOG update (1.0) - (cherry picked from a876918) Version file update (1.0) - (cherry picked from c2af113) Added a script to help create new rc branches. Details: - Added a new script, build/start-new-rc.sh, which: 1. Updates the version file with a new version string. 2. Commits (locally) the version string update. 3. Updates the CHANGELOG file with the output of 'git log'. 4. Commits (locally) the CHANGLOG file update. 5. Creates a new branch whose name is equal to "<vers>-rc0" where <vers> is the new version string. 6. Reminds the user to execute some final steps if everything looks good. This new script will help in the future when it's time to start a new release candidate branch/lineage off of 'master'. Note that this script is based on build/bump-version.sh (which itself may change in the future due to changes in the way versions/releases will be handled going forward). - (cherry picked from 5ab286f)
Details: - Fixed a bug in the bli_sgemmsup_rd_haswell_asm_1x16n() millikernel. The kernel was erroneously performing an out-of-bounds read whenever the singleton edge case loop executed (that is, whenever the k dimension of the millikernel problem was not a multiple of 8). This OOB error was the result of a copy-paste bug; when developing the s1x16n function, I started from a copy of the s2x16n function, but then failed to delete the instruction that reads the second element of A in the code that handles the PR loop's edge case. Thanks to @j-bm for reporting this bug in Issue #821 and helping narrow down the cause to the rax register. - CREDITS file update.
Building blis on OpenBSD (-current, that is to say most recent development version).
Built LAPACK version 3.8.0, run the test code:
Experimenting with the $ export BLIS_ARCH_TYPE= yields the conclusion zen/zen2/zen3 fails exactly as above. BLIS_ARCH_TYPE=4 (sandybridge) succeeds, as does Penryn.
It seems to be the SQZ and STQ tests that fail.
The text was updated successfully, but these errors were encountered: