simple phi3 chat example #424

guschmue · 2024-05-13T19:44:34Z

No description provided.

bekatan · 2024-06-25T02:20:28Z

Error: previous buffer is not registered

The example chatbot can retain context from previous messages in the chat if the new message is sent with "Ctrl + Enter". To my understanding this way the LLM receives a bigger input_ids with tokens that represent previous messages as well as the new message. When I try doing "Ctrl+Enter" for the second message, after calculating and showing the first response token, I get the Error: previous buffer is not registered. Also, during the inference I noticed that the 3D graph in the Task manager/Permofmance/GPU starts showing a steep rise up to 100%, at which point the mentioned error is thrown. The dedicated GPU memory usage in the meantime is around 50-60%.

I am guessing this is related to the gpu buffer management. Are there some tricks to make it more memory efficient?

What is peculiar is that it's not consistent with the size of the input_ids. In the above image the first bump is caused by an 500+ token input without continuation, and it ran alright. But the third bump is a 90 token input with continuation and it throws the Error: previous buffer is not registered.

What is causing this? How can it be fixed?

OS: Windows 11
GPU: RTX 4060 8GB VRAM specs
Browser: Chrome 126.0.6478.63 (Official Build) (64-bit)

guschmue · 2024-06-25T20:02:52Z

continuation of the dialog are not really handled yet.
In theory we can use the kv_cache to avoid processing the full prompt again but I ran into some issues with the model (at least I think the issue is with the model itself).
I need to find some time to look into that.

bekatan · 2024-06-26T00:21:16Z

are you referring to the Error: [WebGPU] Kernel "[Expand] /model/attn_mask_reformat/input_ids_subgraph/Expand" failed. Error: Expand requires shape to be broadcastable to input with the shapes in the feed like

input_ids dims: [1, seq_length]
position_ids dims: [1, seq_length]
attention_mask dims: [1, seq_length + past_sequence_length]
past_key_values.i.key dims: [1, 32, past_sequence_length, 96]
past_key_values.i.value dims: [1, 32, past_sequence_length, 96]

mentioned here?

guschmue · 2024-06-28T16:50:57Z

yes, that is the one

guschmue added 12 commits May 10, 2024 17:38

add phi3 ort-web example

782f2fa

add phi3 ort-web example

f80ffb8

add phi3 ort-web example

4a77bb3

add phi3 ort-web example

25dc4f6

fix ort package version

d49732f

pin ort, limit width of user messages

9c412cf

switch to webpack

83366de

update readme to reflect webpack

d0a95c5

fix wasm path

df76069

add olive instructions to readme

712a887

new naming convention and location for the model

4670897

future proof build config

6bde4f3

fs-eire approved these changes May 17, 2024

View reviewed changes

fs-eire merged commit 3788b18 into main May 17, 2024

fs-eire deleted the gs/chat branch May 17, 2024 00:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

simple phi3 chat example #424

simple phi3 chat example #424

Uh oh!

guschmue commented May 13, 2024

Uh oh!

bekatan commented Jun 25, 2024

Uh oh!

guschmue commented Jun 25, 2024

Uh oh!

bekatan commented Jun 26, 2024

Uh oh!

guschmue commented Jun 28, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

simple phi3 chat example #424

simple phi3 chat example #424

Uh oh!

Conversation

guschmue commented May 13, 2024

Uh oh!

bekatan commented Jun 25, 2024

Error: previous buffer is not registered

Uh oh!

guschmue commented Jun 25, 2024

Uh oh!

bekatan commented Jun 26, 2024

Uh oh!

guschmue commented Jun 28, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants