Remove non-HF ExLlamaV2 loader #5431

oobabooga · 2024-02-04T04:14:48Z

Since PR #4814, the speed difference between ExLlamav2 and ExLlamav2_HF is zero. So I see no point in keeping the non-HF version, which is redundant and which samples in a way not guaranteed to be consistent with HF transformers sampling.

sgsdxzy · 2024-02-04T05:53:09Z

Won't this cause problems for #5375 ?

aikitoria · 2024-02-06T05:23:09Z

It's not true that there is zero speed difference. Non-HF loader is around 10% faster for goliath-120b.

aikitoria · 2024-02-06T05:31:13Z

Quick bench using ooba from before this commit and exllamav2 master branch from 5 minutes ago on runpod A100 80GB.
Using the new version here as that reverts the performance degradation that happened in 0.0.12.

HF:

Output generated in 8.76 seconds (14.50 tokens/s, 127 tokens, context 1728, seed 735928511)
Output generated in 9.22 seconds (13.77 tokens/s, 127 tokens, context 1728, seed 83286885)
Output generated in 8.99 seconds (14.13 tokens/s, 127 tokens, context 1728, seed 128023280)
Output generated in 8.78 seconds (14.47 tokens/s, 127 tokens, context 1728, seed 1418661767)

Non-HF:

Output generated in 8.17 seconds (15.67 tokens/s, 128 tokens, context 1728, seed 745431605)
Output generated in 8.18 seconds (15.65 tokens/s, 128 tokens, context 1728, seed 762707583)
Output generated in 8.18 seconds (15.64 tokens/s, 128 tokens, context 1728, seed 996129951)
Output generated in 8.18 seconds (15.64 tokens/s, 128 tokens, context 1728, seed 700382800)

aikitoria · 2024-02-06T05:34:12Z

not guaranteed to be consistent with HF transformers sampling

Why is this important, if the builtin sampling in exllamav2 works fine?

This reverts commit cde000d.

aikitoria · 2024-02-06T23:42:50Z

Thanks for restoring it!

This reverts commit cde000d.

oobabooga added 2 commits February 3, 2024 19:43

Remove non-HF ExLlamaV2 loader

cdd89f4

Handle saved settings

3cf5a6c

oobabooga merged commit cde000d into dev Feb 4, 2024

oobabooga deleted the remove-exllamav2 branch February 4, 2024 04:16

oobabooga added a commit that referenced this pull request Feb 6, 2024

Revert "Remove non-HF ExLlamaV2 loader (#5431)"

2a1063e

This reverts commit cde000d.

PoetOnTheRun pushed a commit to PoetOnTheRun/text-generation-webui that referenced this pull request Feb 22, 2024

Remove non-HF ExLlamaV2 loader (oobabooga#5431)

4dc6434

PoetOnTheRun pushed a commit to PoetOnTheRun/text-generation-webui that referenced this pull request Feb 22, 2024

Revert "Remove non-HF ExLlamaV2 loader (oobabooga#5431)"

4985c6f

This reverts commit cde000d.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove non-HF ExLlamaV2 loader #5431

Remove non-HF ExLlamaV2 loader #5431

oobabooga commented Feb 4, 2024

sgsdxzy commented Feb 4, 2024

aikitoria commented Feb 6, 2024

aikitoria commented Feb 6, 2024 •

edited

Loading

aikitoria commented Feb 6, 2024

aikitoria commented Feb 6, 2024

Remove non-HF ExLlamaV2 loader #5431

Remove non-HF ExLlamaV2 loader #5431

Conversation

oobabooga commented Feb 4, 2024

sgsdxzy commented Feb 4, 2024

aikitoria commented Feb 6, 2024

aikitoria commented Feb 6, 2024 • edited Loading

aikitoria commented Feb 6, 2024

aikitoria commented Feb 6, 2024

aikitoria commented Feb 6, 2024 •

edited

Loading