Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kompute : make partial tensor copies faster by syncing less data #15

Merged
merged 3 commits into from
Feb 13, 2024

Conversation

cebtenzzre
Copy link
Member

@cebtenzzre cebtenzzre commented Feb 13, 2024

This brings the time to run save-load-state with orca-mini-3b-gguf2-q4_0.gguf fully offloaded on my P40 from 110.82s to 23.21s.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Also, fix a bug where we were aligning tensors in ggml_vk_get_tensor
without telling the caller.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
@cebtenzzre cebtenzzre changed the title Improvements for llama_copy_state_data with Kompute kompute : make partial tensor copies faster by syncing less data Feb 13, 2024
@cebtenzzre cebtenzzre merged commit 96df17b into master Feb 13, 2024
50 of 54 checks passed
cebtenzzre added a commit that referenced this pull request Feb 21, 2024
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
cebtenzzre added a commit that referenced this pull request Mar 13, 2024
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
cebtenzzre added a commit that referenced this pull request May 7, 2024
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
cebtenzzre added a commit that referenced this pull request May 8, 2024
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
cebtenzzre added a commit that referenced this pull request May 9, 2024
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants