Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiencing 2-3 GB GPU memory use increase compared to llama.cpp version a few weeks ago #6909

Closed
zsogitbe opened this issue Apr 25, 2024 · 3 comments

Comments

@zsogitbe
Copy link

zsogitbe commented Apr 25, 2024

I am wondering what has happened and if we can do something about it? Is this some kind of memory pool which has a bigger size? Can we reduce this size if we want to? I have noticed this issue with a model which was fitting into my GPU before, but it reports now out of memory when I offload all layers to GPU.
@slaren , is it possible that this has something to do with the work you have done recently with managing GPU memory?

Will the selection of the LLAMA_CUDA_F16 option during compilation decrease inference GPU memory use?

@slaren
Copy link
Collaborator

slaren commented Apr 25, 2024

I don't remember doing anything recently with managing GPU memory. I am not aware of any changes that could cause VRAM usage to increase, try running a bisect to find the commit that introduced the issue.

@zsogitbe
Copy link
Author

zsogitbe commented Apr 25, 2024

I was just doing that, I had an "old" version on my computer from 1/3/2024 12:39 (I don't know the commit number) and that version uses 11 GB with my model and the latest version uses >13GB.

What I meant above is your work: #6170 (March 20) which is just between 1/3/2024 and today.

Copy link
Contributor

github-actions bot commented Jun 9, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants