Skip to content

Conversation

@wizzard0
Copy link
Contributor

see #32 (comments)

@ggerganov ggerganov merged commit b9bd1d0 into ggml-org:master Mar 12, 2023
44670 pushed a commit to 44670/llama.cpp that referenced this pull request Aug 2, 2023
* RAM usage reduction and calculations
Removed -b batch limit (1024) (tested up to-b 8192)
Fixed a integer overflow in ggml matmul (happened at around nbatch 3000)
Added a dynamic calculation for batched scratch memory consumption
Overall reduced RAM buffer sizes by magnitudes for normal settings
RAM usage scales quadratically with increasing context size * batch
Using a small batch (or default 1) will result in a very small memory footprint even at thousands of tokens processed
Tested up to 13,000 tokens prompt and 8k batch
Needs more tests on various platforms

* removed debug

* minor

---------
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants