Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: gpu hang after bde7cd3cd949c1a85d3a199498ac98e78039d46f #7730

Closed
rhjdvsgsgks opened this issue Jun 4, 2024 · 4 comments · Fixed by #7806
Closed

Bug: gpu hang after bde7cd3cd949c1a85d3a199498ac98e78039d46f #7730

rhjdvsgsgks opened this issue Jun 4, 2024 · 4 comments · Fixed by #7806
Labels
bug-unconfirmed medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)

Comments

@rhjdvsgsgks
Copy link
Contributor

rhjdvsgsgks commented Jun 4, 2024

What happened?

after bde7cd3 . inferring any llama3 q6 model will cause a gpu hang. previous version (a5735e4) is not affected

Name and Version

bde7cd3
using vulkan backend

What operating system are you seeing the problem on?

Linux

Relevant log output

radv/amdgpu: The CS has been cancelled because the context is lost. This context is innocent.
terminate called after throwing an instance of 'vk::DeviceLostError'                   what():  vk::Queue::submit: ErrorDeviceLost

dmesg

[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring comp_1.1.0 timeout, signaled seq=898, emitted seq=899
@rhjdvsgsgks rhjdvsgsgks added bug-unconfirmed medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable) labels Jun 4, 2024
@rhjdvsgsgks
Copy link
Contributor Author

i found that vram dont have significant increase while inferring. so maybe something else caused the issue

@rhjdvsgsgks rhjdvsgsgks closed this as not planned Won't fix, can't repro, duplicate, stale Jun 6, 2024
@rhjdvsgsgks rhjdvsgsgks changed the title Bug: memory usage increased after update from d7e852c1b to 3b38d4860 Bug: gpu hang after bde7cd3cd949c1a85d3a199498ac98e78039d46f Jun 6, 2024
@rhjdvsgsgks
Copy link
Contributor Author

bisected and found the commit caused the issus, so keep it open

@rhjdvsgsgks rhjdvsgsgks reopened this Jun 6, 2024
@rhjdvsgsgks
Copy link
Contributor Author

also #7769 #7640

@slaren
Copy link
Collaborator

slaren commented Jun 6, 2024

@0cc4m this is probably my bad, I made some changes to the way views are initialized in ggml-backend that may have created this issue. Views are now initialized in the buffer of their parent tensor, instead of on the compute buffer. The reason I made this change is because I came to the conclusion that allocating views on the compute buffer cannot work reliably because the compute buffer is not always of the same type as the buffer used to allocate the tensor originally, and backends should be able to use the same extra as their parent anyway. I thought it was safe to make this change because the CUDA backend no longer needs extras for normal buffers, but I didn't realize that the vulkan backend still does.

Looking at the ggml_tensor_extra_gpu of the vulkan backend I think it should be possible to do this, the only change is that you would have to calculate the offset as t->extra->offset + t->view_offs. Essentially, add the offset of the view to the offset of the extra. Does that sound right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-unconfirmed medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants