Using EAGLE will slow down inference #73

zkqq · 2024-05-20T06:08:06Z

Thank you very much for your work on EAGLE; it has been extremely helpful to me.

I have a question: why does downloading yuhuili/EAGLE-Vicuna-7B-v1.3 from Hugging Face and using it directly to accelerate lmsys/vicuna-7b-v1.3 result in a negative effect? However, using my own trained EAGLE head produces a speedup effect. Could you please tell me where I went wrong?

Below is a screenshot of my operation.

I would greatly appreciate any assistance you can provide in resolving this issue. Thank you very much.

cdliang11 · 2024-05-21T12:41:53Z

Maybe try temperature=0.

zkqq · 2024-05-21T13:16:47Z

Maybe try temperature=0.

Thank you very much for your valuable advice. However, I obtained the same result regardless of the temperature.

Liyuhui-12 · 2024-05-26T12:15:59Z

@zkqq The correct drafts will be displayed in yellow. I noticed that there are almost no yellow words in your image. You may not have correctly matched the draft model with the base model, or you did not set the --model-type parameter. Its default value is llama-2-chat, and it must be changed to vicuna.

zkqq · 2024-05-26T15:31:49Z

yuhuili/EAGLE-Vicuna-7B-v1.3

Thank you very much for your reply. You are correct; the issue likely stems from the mismatch between the EAGLE head and the origin model. However, I believe I have configured all necessary parameters, including the model type.

I trained an EAGLE head, ran webui.py and the evaluation, and observed a good acceleration effect. However, when switching back to the EAGLE head from yuhuili/EAGLE-Vicuna-7B-v1.3, there are negative impacts. Both config.json are identical, with the only difference being the pytorch_model.bin file.

Liyuhui-12 · 2024-05-28T13:39:49Z

No issues were encountered when using yuhuili/EAGLE-Vicuna-7B-v1.3, but there are issues with the weights you trained yourself?

zkqq · 2024-05-28T14:20:12Z

yuhuili/EAGLE-Vicuna-7B-v1.3,

On the contrary, there is no issue with utilizing the model weights that I have trained personally. However, employing the yuhuili/EAGLE-Vicuna-7B-v1.3 weights may result in adverse effects.

Liyuhui-12 · 2024-06-28T08:12:26Z

The possible reason is that the template or weights of your base model are different from those used when we trained the draft model.

hongyanz closed this as completed Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using EAGLE will slow down inference #73

Using EAGLE will slow down inference #73

zkqq commented May 20, 2024

cdliang11 commented May 21, 2024

zkqq commented May 21, 2024

Liyuhui-12 commented May 26, 2024 •

edited

Loading

zkqq commented May 26, 2024

Liyuhui-12 commented May 28, 2024

zkqq commented May 28, 2024

Liyuhui-12 commented Jun 28, 2024

Using EAGLE will slow down inference #73

Using EAGLE will slow down inference #73

Comments

zkqq commented May 20, 2024

cdliang11 commented May 21, 2024

zkqq commented May 21, 2024

Liyuhui-12 commented May 26, 2024 • edited Loading

zkqq commented May 26, 2024

Liyuhui-12 commented May 28, 2024

zkqq commented May 28, 2024

Liyuhui-12 commented Jun 28, 2024

Liyuhui-12 commented May 26, 2024 •

edited

Loading