sarkar/cache_optimization #1028

ssarkar2 · 2024-05-31T18:55:33Z

What does this PR do?

PR-942 brought upto speed to main, with some fixes

test using:

python run_generation.py --model_name_or_path /mnt/weka/data/llama_inference/Llama-2-7b-hf --use_hpu_graphs --use_kv_cache --max_input_tokens 1024 --max_new_tokens 3072 --batch_size 39 --attn_softmax_bf16 --trim_logits --bf16 --warmup 2 --n_iterations 2 --limit_hpu_graphs --bucket_internal --bucket_size 128

note no --reuse_cache in above cmd

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

ssarkar2

Needs a README change for llama, to mention --reuse_cache isnt needed

HuggingFaceDocBuilderDev · 2024-05-31T19:02:45Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

Initial commit

0f26ba2

ssarkar2 requested review from bhargaveede, vivekgoe, mandy-li, libinta, dvarshney-habana and regisss as code owners May 31, 2024 18:55

ssarkar2 commented May 31, 2024

View reviewed changes

style

25d3130

add missing comment

ad5959b

ssarkar2 added the synapse1.16 label May 31, 2024

indent fix

a3f301f

hsubramony added a commit that referenced this pull request May 31, 2024

sarkar/cache_optimization #1028

1084e85

Remove prints

693fbf1

regisss approved these changes Jun 6, 2024

View reviewed changes

regisss merged commit f32d485 into main Jun 6, 2024
7 of 9 checks passed

regisss deleted the sarkar/pr942_cache_optimization branch June 6, 2024 22:20

ssarkar2 mentioned this pull request Jun 10, 2024

Prefill kvcache upstream #942

Closed

3 tasks

This was referenced Jun 11, 2024

Use KV cache till input seq len for prefill phase HabanaAI/optimum-habana-fork#154

Merged

Sampling search UseKV cache till input seq len for prefill phase HabanaAI/optimum-habana-fork#161

Merged

ssarkar2 mentioned this pull request Jun 11, 2024

Sasarkar/qwen optimization #1067

Closed

3 tasks

imangohari1 pushed a commit to imangohari1/optimum-habana that referenced this pull request Jun 13, 2024

Cache optimization (huggingface#1028)

b52f7ad

Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>

ssarkar2 mentioned this pull request Jul 18, 2024

Support bucket_internal for MPT #1137

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sarkar/cache_optimization #1028

sarkar/cache_optimization #1028

ssarkar2 commented May 31, 2024 •

edited

Loading

ssarkar2 left a comment

HuggingFaceDocBuilderDev commented May 31, 2024

sarkar/cache_optimization #1028

sarkar/cache_optimization #1028

Conversation

ssarkar2 commented May 31, 2024 • edited Loading

What does this PR do?

Before submitting

ssarkar2 left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented May 31, 2024

ssarkar2 commented May 31, 2024 •

edited

Loading