Skip to content

Conversation

@guschmue
Copy link
Contributor

No description provided.

@fs-eire fs-eire merged commit 3788b18 into main May 17, 2024
@fs-eire fs-eire deleted the gs/chat branch May 17, 2024 00:33
@bekatan
Copy link

bekatan commented Jun 25, 2024

Error: previous buffer is not registered

The example chatbot can retain context from previous messages in the chat if the new message is sent with "Ctrl + Enter". To my understanding this way the LLM receives a bigger input_ids with tokens that represent previous messages as well as the new message. When I try doing "Ctrl+Enter" for the second message, after calculating and showing the first response token, I get the Error: previous buffer is not registered. Also, during the inference I noticed that the 3D graph in the Task manager/Permofmance/GPU starts showing a steep rise up to 100%, at which point the mentioned error is thrown. The dedicated GPU memory usage in the meantime is around 50-60%.

image

I am guessing this is related to the gpu buffer management. Are there some tricks to make it more memory efficient?

What is peculiar is that it's not consistent with the size of the input_ids. In the above image the first bump is caused by an 500+ token input without continuation, and it ran alright. But the third bump is a 90 token input with continuation and it throws the Error: previous buffer is not registered.

What is causing this? How can it be fixed?

OS: Windows 11
GPU: RTX 4060 8GB VRAM specs
Browser: Chrome 126.0.6478.63 (Official Build) (64-bit)

@guschmue
Copy link
Contributor Author

continuation of the dialog are not really handled yet.
In theory we can use the kv_cache to avoid processing the full prompt again but I ran into some issues with the model (at least I think the issue is with the model itself).
I need to find some time to look into that.

@bekatan
Copy link

bekatan commented Jun 26, 2024

are you referring to the Error: [WebGPU] Kernel "[Expand] /model/attn_mask_reformat/input_ids_subgraph/Expand" failed. Error: Expand requires shape to be broadcastable to input with the shapes in the feed like

input_ids dims: [1, seq_length]
position_ids dims: [1, seq_length]
attention_mask dims: [1, seq_length + past_sequence_length]
past_key_values.i.key dims: [1, 32, past_sequence_length, 96]
past_key_values.i.value dims: [1, 32, past_sequence_length, 96]

mentioned here?

@guschmue
Copy link
Contributor Author

yes, that is the one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants