Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vulkan: fix clang-cl debug build #7426

Merged
merged 1 commit into from
May 22, 2024

Conversation

Adriankhl
Copy link
Contributor

@Adriankhl Adriankhl commented May 21, 2024

As discussed here #7130

With clang-cl 18.1.5, MSVC 19.40.33808, vulkan SDK 1.3.283.0

An MSVC runtime error shows up: "Expression: can't dereference invalidated vector iterator" when I run the debug build of embedding with vulkan backend. I think I have also seen it somewhere when I run main/llava but I didn't manage to reproduce it.

The problem happens here

llama.cpp/ggml-vulkan.cpp

Lines 625 to 646 in e23b974

static void ggml_vk_submit(vk_context * ctx, vk::Fence fence) {
#ifdef GGML_VULKAN_DEBUG
std::cerr << "ggml_vk_submit(" << ctx->seqs.size() << ", " << fence << ")" << std::endl;
#endif
if (ctx->seqs.empty()) {
return;
}
std::vector<std::vector<uint64_t>> tl_wait_vals;
std::vector<std::vector<uint64_t>> tl_signal_vals;
std::vector<std::vector<vk::Semaphore>> tl_wait_semaphores;
std::vector<std::vector<vk::Semaphore>> tl_signal_semaphores;
std::vector<vk::TimelineSemaphoreSubmitInfo> tl_submit_infos;
std::vector<vk::SubmitInfo> submit_infos;
int idx = -1;
std::vector<std::vector<vk::PipelineStageFlags>> stage_flags;
size_t reserve = 0;
for (const auto& sequence : ctx->seqs) {
reserve += sequence.size();
}

ctx->seqs.size() shows 1 but MSVC thinks it is of size 0 when I step through the code using lldb. I believe it comes from this issue. The stackoverflow answer suggested adding a /Zc:nrvo- flag, and it doesn't work for clang-cl so I added another _ITERATOR_DEBUG_LEVEL=0 definition for clang-cl.

Seems like the problem in MSVC is gone , clang-cl still needs _ITERATOR_DEBUG_LEVEL=0 to work

@github-actions github-actions bot added the build Compilation issues label May 21, 2024
@mofosyne mofosyne added Vulkan Issues specific to the Vulkan backend Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix labels May 21, 2024
CMakeLists.txt Outdated
Comment on lines 509 to 512
# Workaround to the iterator debug bug https://stackoverflow.com/questions/74748276/visual-studio-no-displays-the-correct-length-of-stdvector
if (MSVC)
if (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
add_compile_definitions(_ITERATOR_DEBUG_LEVEL=0)
else()
add_compile_options(/Zc:nrvo-)
endif()
endif()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only do this when Vulkan is enabled? This seems like it should apply to all C++ code compiled by MSVC or clang-cl.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this problem in other backends (i.e. cpu), and this is a workaround that causes a lost of debug information, so I don't want to fix something thats not break

Copy link
Collaborator

@cebtenzzre cebtenzzre May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised that /Zc:nrvo- even has the intended effect because in all of the linked StackOverflow examples, the problematic vector (or object) is being returned (hence, Named Return Value Optimization applies) and the debugging is incorrect between the declaration and the time it is returned (because the storage is actually elsewhere).

The Kompute backend definitely returns vectors and is affected by this, but I don't see where the Vulkan backend does.

Copy link
Collaborator

@slaren slaren May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that disabling NRVO doesn't make much sense here. @Adriankhl how can I reproduce this issue? I ran a test with a debug build with cl 19.29.30154 and I didn't get any errors.

Copy link
Contributor Author

@Adriankhl Adriankhl May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slaren I did some MSVC updates in the past few days and the error is now gone, and I failed to pinpoint the problematic version. Now I can only reproduce the issue by clang-cl:

cmake .. -GNinja -DCMAKE_C_COMPILER=clang-cl -DCMAKE_CXX_COMPILER=clang-cl -DCMAKE_EXPORT_COMPILE_COMMANDS=1 -DLLAMA_NATIVE=OFF -DLLAMA_VULKAN=ON -DCMAKE_BUILD_TYPE=Debug

ninja -j6

.\bin\embedding.exe -m ..\..\..\models\mxbai-embed-large-v1-f16.gguf -p "Antibiotics are a type of medication used to treat bacterial infections. They work by either killing the bacteria or preventing them from reproducing, allowing the body's immune system to fight off the infection. Antibiotics are usually taken orally in the form of pills, capsules, or liquid solutions, or sometimes administered intravenously. They are not effective against viral infections, and using them inappropriately can lead to antibiotic resistance.`nI love cat"

So I have updated my PR to only deal with clang-cl

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I was able to reproduce this with clang-cl.

@Adriankhl Adriankhl changed the title vulkan: fix MSVC debug build vulkan: fix clang-cl debug build May 22, 2024
@slaren slaren merged commit fcda112 into ggerganov:master May 22, 2024
59 of 70 checks passed
teleprint-me pushed a commit to teleprint-me/llama.cpp that referenced this pull request May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Compilation issues Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants