Skip to content

Conversation

Leoyzen
Copy link
Contributor

@Leoyzen Leoyzen commented Jan 20, 2025

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

The Automatic Prefix Caching is very useful for modeling service.
This PR add argument (--enable_prefix_caching) for both infer and deploy mode. Prefix caching disable by default.

Experiment results

@Leoyzen Leoyzen force-pushed the feature/vllm-prefix-caching branch from 180f9a2 to 0b19a64 Compare January 22, 2025 14:34
@Leoyzen Leoyzen changed the title [WIP]add "enable_prefix_caching" args for vllm engine. add "enable_prefix_caching" args for vllm engine. Jan 22, 2025
@Jintao-Huang
Copy link
Collaborator

thanks!

@Jintao-Huang Jintao-Huang merged commit e02ebfd into modelscope:main Jan 23, 2025
2 checks passed
tastelikefeet added a commit to tastelikefeet/swift that referenced this pull request Jan 23, 2025
…-qwen-prm

* commit '6524bcc5caf7b63307f458fe45356ad18bf8f3b1': (21 commits)
  Fix vllm docs link & fix web-ui (modelscope#2970)
  add "enable_prefix_caching" args for vllm engine. (modelscope#2939)
  fix install_all.sh
  ppo compat transformers>=4.47.* (modelscope#2964)
  fix seq_cls patcher (modelscope#2963)
  fix max_length error print (modelscope#2960)
  update quant_mllm shell (modelscope#2959)
  update web-ui images (modelscope#2958)
  update requirements (modelscope#2957)
  fix bugs (modelscope#2954)
  fix citest (modelscope#2953)
  fix infer_stream (modelscope#2952)
  fix demo_hf (modelscope#2951)
  support deepseek_r1_distill (modelscope#2946)
  Fix mllm seq cls (modelscope#2945)
  Support minimax (modelscope#2943)
  Fix quant template (modelscope#2942)
  support deepseek-ai/DeepSeek-R1 (modelscope#2940)
  fix bugs (modelscope#2938)
  Support mllm seq_cls/rm (modelscope#2934)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants