-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vulkan: fix clang-cl debug build #7426
Conversation
4b5ec67
to
3d24ed9
Compare
CMakeLists.txt
Outdated
# Workaround to the iterator debug bug https://stackoverflow.com/questions/74748276/visual-studio-no-displays-the-correct-length-of-stdvector | ||
if (MSVC) | ||
if (CMAKE_CXX_COMPILER_ID STREQUAL "Clang") | ||
add_compile_definitions(_ITERATOR_DEBUG_LEVEL=0) | ||
else() | ||
add_compile_options(/Zc:nrvo-) | ||
endif() | ||
endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why only do this when Vulkan is enabled? This seems like it should apply to all C++ code compiled by MSVC or clang-cl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this problem in other backends (i.e. cpu), and this is a workaround that causes a lost of debug information, so I don't want to fix something thats not break
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am surprised that /Zc:nrvo-
even has the intended effect because in all of the linked StackOverflow examples, the problematic vector (or object) is being returned (hence, Named Return Value Optimization applies) and the debugging is incorrect between the declaration and the time it is returned (because the storage is actually elsewhere).
The Kompute backend definitely returns vectors and is affected by this, but I don't see where the Vulkan backend does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that disabling NRVO doesn't make much sense here. @Adriankhl how can I reproduce this issue? I ran a test with a debug build with cl 19.29.30154 and I didn't get any errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@slaren I did some MSVC updates in the past few days and the error is now gone, and I failed to pinpoint the problematic version. Now I can only reproduce the issue by clang-cl
:
cmake .. -GNinja -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl -DCMAKE_EXPORT_COMPILE_COMMANDS=1 -DLLAMA_NATIVE=OFF -DLLAMA_VULKAN=ON -DCMAKE_BUILD_TYPE=Debug
ninja -j6
.\bin\embedding.exe -m ..\..\..\models\mxbai-embed-large-v1-f16.gguf -p "Antibiotics are a type of medication used to treat bacterial infections. They work by either killing the bacteria or preventing them from reproducing, allowing the body's immune system to fight off the infection. Antibiotics are usually taken orally in the form of pills, capsules, or liquid solutions, or sometimes administered intravenously. They are not effective against viral infections, and using them inappropriately can lead to antibiotic resistance.`nI love cat"
So I have updated my PR to only deal with clang-cl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I was able to reproduce this with clang-cl
.
3d24ed9
to
6e4865f
Compare
As discussed here #7130
With
clang-cl
18.1.5, MSVC 19.40.33808, vulkan SDK 1.3.283.0An MSVC runtime error shows up: "Expression: can't dereference invalidated vector iterator" when I run the debug build of embedding with vulkan backend. I think I have also seen it somewhere when I run main/llava but I didn't manage to reproduce it.
The problem happens here
llama.cpp/ggml-vulkan.cpp
Lines 625 to 646 in e23b974
ctx->seqs.size()
shows 1 but MSVC thinks it is of size 0 when I step through the code using lldb. I believe it comes from this issue. The stackoverflow answer suggested adding a/Zc:nrvo-
flag, and it doesn't work forclang-cl
so I added another_ITERATOR_DEBUG_LEVEL=0
definition forclang-cl
.Seems like the problem in MSVC is gone ,
clang-cl
still needs_ITERATOR_DEBUG_LEVEL=0
to work