Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using EAGLE will slow down inference #73

Closed
zkqq opened this issue May 20, 2024 · 7 comments
Closed

Using EAGLE will slow down inference #73

zkqq opened this issue May 20, 2024 · 7 comments

Comments

@zkqq
Copy link

zkqq commented May 20, 2024

Thank you very much for your work on EAGLE; it has been extremely helpful to me.

I have a question: why does downloading yuhuili/EAGLE-Vicuna-7B-v1.3 from Hugging Face and using it directly to accelerate lmsys/vicuna-7b-v1.3 result in a negative effect? However, using my own trained EAGLE head produces a speedup effect. Could you please tell me where I went wrong?

Below is a screenshot of my operation.

I would greatly appreciate any assistance you can provide in resolving this issue. Thank you very much.

image
image
image
image
image

@cdliang11
Copy link

Maybe try temperature=0.

@zkqq
Copy link
Author

zkqq commented May 21, 2024

Maybe try temperature=0.

Thank you very much for your valuable advice. However, I obtained the same result regardless of the temperature.

image
image

@Liyuhui-12
Copy link
Collaborator

Liyuhui-12 commented May 26, 2024

@zkqq The correct drafts will be displayed in yellow. I noticed that there are almost no yellow words in your image. You may not have correctly matched the draft model with the base model, or you did not set the --model-type parameter. Its default value is llama-2-chat, and it must be changed to vicuna.

@zkqq
Copy link
Author

zkqq commented May 26, 2024

yuhuili/EAGLE-Vicuna-7B-v1.3

Thank you very much for your reply. You are correct; the issue likely stems from the mismatch between the EAGLE head and the origin model. However, I believe I have configured all necessary parameters, including the model type.

I trained an EAGLE head, ran webui.py and the evaluation, and observed a good acceleration effect. However, when switching back to the EAGLE head from yuhuili/EAGLE-Vicuna-7B-v1.3, there are negative impacts. Both config.json are identical, with the only difference being the pytorch_model.bin file.

@Liyuhui-12
Copy link
Collaborator

No issues were encountered when using yuhuili/EAGLE-Vicuna-7B-v1.3, but there are issues with the weights you trained yourself?

@zkqq
Copy link
Author

zkqq commented May 28, 2024

yuhuili/EAGLE-Vicuna-7B-v1.3,

On the contrary, there is no issue with utilizing the model weights that I have trained personally. However, employing the yuhuili/EAGLE-Vicuna-7B-v1.3 weights may result in adverse effects.

@Liyuhui-12
Copy link
Collaborator

The possible reason is that the template or weights of your base model are different from those used when we trained the draft model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants