Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use error #27

Open
zky001 opened this issue Feb 27, 2024 · 1 comment
Open

use error #27

zky001 opened this issue Feb 27, 2024 · 1 comment

Comments

@zky001
Copy link

zky001 commented Feb 27, 2024

image The model's max seq len (4096) is larger than the maximum number of tokens that can be stored in KV cache (1792). Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine
@sherdencooper
Copy link
Owner

Hi, thanks for running our codes, it looks like you are encountering an issue with vllm. You could refer to vllm-project/vllm#2418 to try the solution mentioned there. Since the vllm running may depend on cuda version and torch version, I cannot determine the solution for your case. If you still encounter issues with vllm, you may turn to hugging face inference instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants