Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge changes from master to release-0.13 branch (#3698)
* upgrade vllm/transformers version (#3671) upgrade vllm version Signed-off-by: Johnu George <johnugeorge109@gmail.com> * Add openai models endpoint (#3666) Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> * feat: Support customizable deployment strategy for RawDeployment mode. Fixes #3452 (#3603) * feat: Support customizable deployment strategy for RawDeployment mode Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * regen Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * lint Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Correctly apply rollingupdate Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * address comments Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Add validation Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Enable dtype support for huggingface server (#3613) * Enable dtype for huggingface server Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Set float16 as default. Fixup linter Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Add small comment to make the changes understandable Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Fixup linter Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Adapt to new huggingfacemodel Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Fixup merge :) Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Explicitly mention the behaviour of dtype flag on auto. Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Default to FP32 for encoder models Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Selectively add --dtype to parser. Use FP16 for GPU and FP32 for CPU Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Fixup linter Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Update poetry Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Use torch.float32 forr tests explicitly Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> --------- Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> * Add method for checking model health/readiness (#3673) Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> * fix for extract zip from gcs (#3510) * fix for extract zip from gcs Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * initial commit for gcs model download unittests Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * unittests for model download from gcs Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * black format fix Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * code verification Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> --------- Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> * Update Dockerfile and Readme (#3676) Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * Update huggingface readme (#3678) * update wording for huggingface README small update to make readme easier to understand Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> * Update README.md Signed-off-by: Alexa Griffith agriffith50@bloomberg.net * Update python/huggingfaceserver/README.md Co-authored-by: Filippe Spolti <filippespolti@gmail.com> Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> * update vllm Signed-off-by: alexagriffith <agriffith50@bloomberg.net> * Update README.md --------- Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> Signed-off-by: Alexa Griffith agriffith50@bloomberg.net Signed-off-by: alexagriffith <agriffith50@bloomberg.net> Signed-off-by: Dan Sun <dsun20@bloomberg.net> Co-authored-by: Filippe Spolti <filippespolti@gmail.com> Co-authored-by: Dan Sun <dsun20@bloomberg.net> * fix: HPA equality check should include annotations (#3650) * fix: HPA equality check should include annotations Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Only watch related autoscalerclass annotation Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * simplify Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Add missing delete action Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * fix logic Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> * Fix: huggingface runtime in helm chart (#3679) fix huggingface runtime in chart Signed-off-by: Dan Sun <dsun20@bloomberg.net> * Fix: model id and model dir check order (#3680) * fix huggingface runtime in chart Signed-off-by: Dan Sun <dsun20@bloomberg.net> * Allow model_dir to be specified on template Signed-off-by: Dan Sun <dsun20@bloomberg.net> * Default model_dir to /mnt/models for HF Signed-off-by: Dan Sun <dsun20@bloomberg.net> * Lint format Signed-off-by: Dan Sun <dsun20@bloomberg.net> --------- Signed-off-by: Dan Sun <dsun20@bloomberg.net> * Fix:vLLM Model Supported check throwing circular dependency (#3688) * Fix:vLLM Model Supported check throwing circular dependency Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * remove unwanted comments Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * remove unwanted comments Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix return case Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fix to check all arch in model config forr vllm support Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * fixlint Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> --------- Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> * Fix: Allow null in Finish reason streaming response in vLLM (#3684) Fix: allow null in Finish reason Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> --------- Signed-off-by: Johnu George <johnugeorge109@gmail.com> Signed-off-by: Curtis Maddalozzo <cmaddalozzo@bloomberg.net> Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Signed-off-by: Dattu Sharma <venkatadattasainimmaturi@gmail.com> Signed-off-by: Andrews Arokiam <andrews.arokiam@ideas2it.com> Signed-off-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net> Signed-off-by: Alexa Griffith agriffith50@bloomberg.net Signed-off-by: alexagriffith <agriffith50@bloomberg.net> Signed-off-by: Dan Sun <dsun20@bloomberg.net> Co-authored-by: Curtis Maddalozzo <cmaddalozzo@users.noreply.github.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Datta Nimmaturi <39181234+Datta0@users.noreply.github.com> Co-authored-by: Andrews Arokiam <87992092+andyi2it@users.noreply.github.com> Co-authored-by: Gavrish Prabhu <gavrish.prabhu@nutanix.com> Co-authored-by: Alexa Griffith <agriffith50@bloomberg.net> Co-authored-by: Filippe Spolti <filippespolti@gmail.com> Co-authored-by: Dan Sun <dsun20@bloomberg.net>
- Loading branch information