kompute : make partial tensor copies faster by syncing less data #15

cebtenzzre · 2024-02-13T19:13:51Z

This brings the time to run save-load-state with orca-mini-3b-gguf2-q4_0.gguf fully offloaded on my P40 from 110.82s to 23.21s.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

Also, fix a bug where we were aligning tensors in ggml_vk_get_tensor without telling the caller. Signed-off-by: Jared Van Bortel <jared@nomic.ai>

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre added 2 commits February 13, 2024 13:57

llama : don't attempt to serialize empty KV cache

abde521

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

kompute : sync partial tensors for partial copies

0937224

Also, fix a bug where we were aligning tensors in ggml_vk_get_tensor without telling the caller. Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre requested a review from manyoso February 13, 2024 19:13

fix offset bug

4fea629

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre changed the title ~~Improvements for llama_copy_state_data with Kompute~~ kompute : make partial tensor copies faster by syncing less data Feb 13, 2024

manyoso approved these changes Feb 13, 2024

View reviewed changes

cebtenzzre merged commit 96df17b into master Feb 13, 2024
50 of 54 checks passed

cebtenzzre added a commit that referenced this pull request Feb 21, 2024

kompute : make partial tensor copies faster by syncing less data (#15)

fa654d0

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre added a commit that referenced this pull request Mar 13, 2024

kompute : make partial tensor copies faster by syncing less data (#15)

4a06b01

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre added a commit that referenced this pull request May 7, 2024

kompute : make partial tensor copies faster by syncing less data (#15)

725e98d

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre added a commit that referenced this pull request May 8, 2024

kompute : make partial tensor copies faster by syncing less data (#15)

506c0ad

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre added a commit that referenced this pull request May 9, 2024

kompute : make partial tensor copies faster by syncing less data (#15)

5a648da

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kompute : make partial tensor copies faster by syncing less data #15

kompute : make partial tensor copies faster by syncing less data #15

cebtenzzre commented Feb 13, 2024 •

edited

kompute : make partial tensor copies faster by syncing less data #15

kompute : make partial tensor copies faster by syncing less data #15

Conversation

cebtenzzre commented Feb 13, 2024 • edited

cebtenzzre commented Feb 13, 2024 •

edited