Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Ascend NPU support #2422

Merged
merged 1 commit into from
Sep 18, 2023
Merged

Add Ascend NPU support #2422

merged 1 commit into from
Sep 18, 2023

Conversation

zhangsibo1129
Copy link
Contributor

@zhangsibo1129 zhangsibo1129 commented Sep 14, 2023

Why are these changes needed?

This PR integrates Ascend NPU hardware capabilities into the FastChat, and enables users to leverage the NPUs for training, inference and serving of LLMs.

Ascend NPU is an AI processor that support AI frameworks like PyTorch, TensorFlow, which has already integrated by huggingface, deepspeed and others popular LLM related software and tools.

Related issue number (if applicable)

N/A

Checks

  • I've run format.sh to lint the changes in this PR.
  • I've included any doc changes needed.
  • I've made sure the relevant tests are passing (if applicable).

Use Cases

inference with cli test on Ascend device

$ python -m fastchat.serve.cli --model-path ~/vicuna-7b-v1.5-16k --device npu 

USER: hello
ASSISTANT: Hello! How can I assist you today?
USER: Provide three most famous deep learning frameworks
ASSISTANT: Sure, here are three of the most famous deep learning frameworks:

1. TensorFlow: Developed by Google, TensorFlow is an open-source deep learning framework that is widely used for building and deploying machine learning models. It is known for its flexibility and scalability, and is used by researchers and developers alike.
2. PyTorch: Developed by Facebook, PyTorch is another popular deep learning framework that is known for its ease of use and flexibility. It is often used for research purposes, as well as for building and deploying machine learning models.
3. Keras: Keras is a high-level deep learning framework that is built on top of TensorFlow. It is designed to be user-friendly and easy to use, making it a popular choice for beginners and researchers alike. Keras is also known for its ability to quickly prototype and test new ideas.
USER: 

Fine-tuning test with multi Ascend devices

torchrun --nproc_per_node=4 --master_port=20001 fastchat/train/train.py \
    --model_name_or_path ~/vicuna-7b-v1.5-16k  \
    --data_path data/dummy_conversation.json \
    --fp16 True \
    --output_dir output_vicuna \
    --num_train_epochs 3 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 1 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 2000 \
    --save_total_limit 200 \
    --learning_rate 1e-5 \
    --weight_decay 0. \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp "full_shard auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
    --tf32 False \
    --model_max_length 512 \
    --gradient_checkpointing True \
    --lazy_preprocess True | tee train.log
[2023-09-13 10:20:36,763] torch.distributed.run: [WARNING] 
[2023-09-13 10:20:36,763] torch.distributed.run: [WARNING] *****************************************
[2023-09-13 10:20:36,763] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
[2023-09-13 10:20:36,763] torch.distributed.run: [WARNING] *****************************************
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:12<00:00,  6.04s/it]
Loading checkpoint shards:  50%|███████████████████████████████████████████                                           | 1/2 [00:13<00:13, 13.34s/it][W LegacyTypeDispatch.h:74] Warning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (function operator())
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:17<00:00,  8.99s/it]
[W LegacyTypeDispatch.h:74] Warning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (function operator())
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:18<00:00,  9.24s/it]
[W LegacyTypeDispatch.h:74] Warning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (function operator())
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:49<00:00, 24.53s/it]
Loading data...
Formatting inputs...Skip in lazy mode
[W LegacyTypeDispatch.h:74] Warning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (function operator())
  0%|                                                                                                                        | 0/87 [00:00<?, ?it/s][W OpCommand.cpp:117] Warning: [Check][offset] Check input storage_offset[%ld] = 0 failed, result is untrustworthy16777216 (function operator())
[W OpCommand.cpp:117] Warning: [Check][offset] Check input storage_offset[%ld] = 0 failed, result is untrustworthy16777216 (function operator())
[W OpCommand.cpp:117] Warning: [Check][offset] Check input storage_offset[%ld] = 0 failed, result is untrustworthy16777216 (function operator())
[W OpCommand.cpp:117] Warning: [Check][offset] Check input storage_offset[%ld] = 0 failed, result is untrustworthy16777216 (function operator())
[W VariableFallbackKernel.cpp:40] Warning: The operator 'aten::linalg_vector_norm.out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function operator())
[W VariableFallbackKernel.cpp:40] Warning: The operator 'aten::linalg_vector_norm.out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function operator())
[W VariableFallbackKernel.cpp:40] Warning: The operator 'aten::linalg_vector_norm.out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function operator())
[W VariableFallbackKernel.cpp:40] Warning: The operator 'aten::linalg_vector_norm.out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function operator())
  1%|█▎                                                                                                           | 1/87 [02:38<3:47:11, 158.51s/it]{'loss': 0.2701, 'learning_rate': 9.996740476948386e-06, 'epoch': 0.03}                                                                             
{'loss': 0.9416, 'learning_rate': 9.986966157589751e-06, 'epoch': 0.07}                                                                             
{'loss': 0.2061, 'learning_rate': 9.970689785771798e-06, 'epoch': 0.1}                                                                              
{'loss': 0.3403, 'learning_rate': 9.947932582778188e-06, 'epoch': 0.14}                                                                             

// ... ...
                                                                          
{'loss': 0.1071, 'learning_rate': 5.206741722181385e-08, 'epoch': 2.86}                                                                             
{'loss': 0.1074, 'learning_rate': 2.9310214228202016e-08, 'epoch': 2.9}                                                                             
{'loss': 0.1088, 'learning_rate': 1.3033842410251074e-08, 'epoch': 2.93}                                                                            
{'loss': 0.1061, 'learning_rate': 3.2595230516152543e-09, 'epoch': 2.97}                                                                            
{'loss': 0.1076, 'learning_rate': 0.0, 'epoch': 3.0}                                                                                                
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 87/87 [2:19:10<00:00, 93.97s/it]{'train_runtime': 8350.1159, 'train_samples_per_second': 0.327, 'train_steps_per_second': 0.01, 'train_loss': 0.14017466147398128, 'epoch': 3.0}    
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 87/87 [2:19:16<00:00, 96.05s/it]

Copy link

@Yikun Yikun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asciicast

LGTM, thanks, I also do a E2E test for vicuna model inference in my env with NPU device, it works.

@merrymercy Would you mind taking a look?

Copy link

@statelesshz statelesshz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment about the training argument tf32. Otherwise all looks good!

README.md Outdated
--logging_steps 1 \
--fsdp "full_shard auto_wrap" \
--fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
--tf32 True \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ascend NPU currently does not support tf32 regardless of whether you set it to true or false. I would prefer to remove the line 340: --fp32 True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@merrymercy merrymercy merged commit c685951 into lm-sys:main Sep 18, 2023
1 check passed
@merrymercy
Copy link
Member

@zhangsibo1129 Thanks! It is merged.

renning22 added a commit to shaleprotocol/Shale-Serve-API that referenced this pull request Sep 23, 2023
* Remove hardcode flash-attn disable setting (lm-sys#2342)

* Document turning off proxy_buffering when api is streaming (lm-sys#2337)

* Simplify huggingface api example (lm-sys#2355)

* Update sponsor logos (lm-sys#2367)

* if LOGDIR is empty, then don't try output log to local file (lm-sys#2357)

Signed-off-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Lei Wen <wenlei03@qiyi.com>

* add best_of and use_beam_search for completions interface (lm-sys#2348)

Signed-off-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Lei Wen <wenlei03@qiyi.com>

* Extract upvote/downvote from log files (lm-sys#2369)

* Revert "add best_of and use_beam_search for completions interface" (lm-sys#2370)

* Improve doc (lm-sys#2371)

* add best_of and use_beam_search for completions interface (lm-sys#2372)

Signed-off-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Lei Wen <wenlei03@qiyi.com>

* update monkey patch for llama2 (lm-sys#2379)

* Make E5 adapter more restrict to reduce mismatch (lm-sys#2381)

* Update UI and sponsers (lm-sys#2387)

* Use fsdp api for save save (lm-sys#2390)

* Release v0.2.27

* Spicyboros + airoboros 2.2 template update. (lm-sys#2392)

Co-authored-by: Jon Durbin <jon.durbin@onna.com>

* bugfix of openai_api_server for fastchat.serve.vllm_worker (lm-sys#2398)

Co-authored-by: wuyongyu <wuyongyu@atomecho.xyz>

* Revert "bugfix of openai_api_server for fastchat.serve.vllm_worker" (lm-sys#2400)

* Revert "add best_of and use_beam_search for completions interface" (lm-sys#2401)

* Release a v0.2.28 with bug fixes and more test cases

* Fix model_worker error (lm-sys#2404)

* Added google/flan models and fixed AutoModelForSeq2SeqLM when loading T5 compression model (lm-sys#2402)

* Rename twitter to X (lm-sys#2406)

* Update huggingface_api.py (lm-sys#2409)

* Add support for baichuan2 models (lm-sys#2408)

* Fixed character overlap issue when api streaming output (lm-sys#2431)

* Support custom conversation template in multi_model_worker (lm-sys#2434)

* Add Ascend NPU support (lm-sys#2422)

* Add raw conversation template (lm-sys#2417) (lm-sys#2418)

* Improve docs & UI (lm-sys#2436)

* Fix Salesforce xgen inference (lm-sys#2350)

* Add support for Phind-CodeLlama models (lm-sys#2415) (lm-sys#2416)

Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>

* Add falcon 180B chat conversation template (lm-sys#2384)

* Improve docs (lm-sys#2438)

* add dtype and seed (lm-sys#2430)

* Data cleaning scripts for dataset release (lm-sys#2440)

* merge google/flan based adapters: T5Adapter, CodeT5pAdapter, FlanAdapter (lm-sys#2411)

* Fix docs

* Update UI (lm-sys#2446)

* Add Optional SSL Support to controller.py (lm-sys#2448)

* Format & Improve docs

* Release v0.2.29 (lm-sys#2450)

* Show terms of use as an JS alert (lm-sys#2461)

* vllm worker awq quantization update (lm-sys#2463)

Co-authored-by: 董晓龙 <dongxiaolong@shiyanjia.com>

* Fix falcon chat template (lm-sys#2464)

---------

Signed-off-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Trangle <kw_w@foxmail.com>
Co-authored-by: Nathan Stitt <nathan@stitt.org>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: leiwen83 <leiwen83@users.noreply.github.com>
Co-authored-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Jon Durbin <jon@jondurbin.com>
Co-authored-by: Jon Durbin <jon.durbin@onna.com>
Co-authored-by: Rayrtfr <2384172887@qq.com>
Co-authored-by: wuyongyu <wuyongyu@atomecho.xyz>
Co-authored-by: wangxiyuan <wangxiyuan@huawei.com>
Co-authored-by: Jeff (Zhen) Wang <wangzhen263@gmail.com>
Co-authored-by: karshPrime <94996251+karshPrime@users.noreply.github.com>
Co-authored-by: obitolyz <obitoquilt@qq.com>
Co-authored-by: Shangwei Chen <109785802+Somezak1@users.noreply.github.com>
Co-authored-by: HyungJin Ahn <crushed7@o.cnu.ac.kr>
Co-authored-by: zhangsibo1129 <134488188+zhangsibo1129@users.noreply.github.com>
Co-authored-by: Tobias Birchler <tobias@birchlerfamily.ch>
Co-authored-by: Jae-Won Chung <jwnchung@umich.edu>
Co-authored-by: Mingdao Liu <joshua@btlmd.com>
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: Brandon Biggs <brandonsbiggs@gmail.com>
Co-authored-by: dongxiaolong <774848421@qq.com>
Co-authored-by: 董晓龙 <dongxiaolong@shiyanjia.com>
renning22 added a commit to shaleprotocol/Shale-Serve-API that referenced this pull request Nov 27, 2023
* Remove hardcode flash-attn disable setting (lm-sys#2342)

* Document turning off proxy_buffering when api is streaming (lm-sys#2337)

* Simplify huggingface api example (lm-sys#2355)

* Update sponsor logos (lm-sys#2367)

* if LOGDIR is empty, then don't try output log to local file (lm-sys#2357)

Signed-off-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Lei Wen <wenlei03@qiyi.com>

* add best_of and use_beam_search for completions interface (lm-sys#2348)

Signed-off-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Lei Wen <wenlei03@qiyi.com>

* Extract upvote/downvote from log files (lm-sys#2369)

* Revert "add best_of and use_beam_search for completions interface" (lm-sys#2370)

* Improve doc (lm-sys#2371)

* add best_of and use_beam_search for completions interface (lm-sys#2372)

Signed-off-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Lei Wen <wenlei03@qiyi.com>

* update monkey patch for llama2 (lm-sys#2379)

* Make E5 adapter more restrict to reduce mismatch (lm-sys#2381)

* Update UI and sponsers (lm-sys#2387)

* Use fsdp api for save save (lm-sys#2390)

* Release v0.2.27

* Spicyboros + airoboros 2.2 template update. (lm-sys#2392)

Co-authored-by: Jon Durbin <jon.durbin@onna.com>

* bugfix of openai_api_server for fastchat.serve.vllm_worker (lm-sys#2398)

Co-authored-by: wuyongyu <wuyongyu@atomecho.xyz>

* Revert "bugfix of openai_api_server for fastchat.serve.vllm_worker" (lm-sys#2400)

* Revert "add best_of and use_beam_search for completions interface" (lm-sys#2401)

* Release a v0.2.28 with bug fixes and more test cases

* Fix model_worker error (lm-sys#2404)

* Added google/flan models and fixed AutoModelForSeq2SeqLM when loading T5 compression model (lm-sys#2402)

* Rename twitter to X (lm-sys#2406)

* Update huggingface_api.py (lm-sys#2409)

* Add support for baichuan2 models (lm-sys#2408)

* Fixed character overlap issue when api streaming output (lm-sys#2431)

* Support custom conversation template in multi_model_worker (lm-sys#2434)

* Add Ascend NPU support (lm-sys#2422)

* Add raw conversation template (lm-sys#2417) (lm-sys#2418)

* Improve docs & UI (lm-sys#2436)

* Fix Salesforce xgen inference (lm-sys#2350)

* Add support for Phind-CodeLlama models (lm-sys#2415) (lm-sys#2416)

Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>

* Add falcon 180B chat conversation template (lm-sys#2384)

* Improve docs (lm-sys#2438)

* add dtype and seed (lm-sys#2430)

* Data cleaning scripts for dataset release (lm-sys#2440)

* merge google/flan based adapters: T5Adapter, CodeT5pAdapter, FlanAdapter (lm-sys#2411)

* Fix docs

* Update UI (lm-sys#2446)

* Add Optional SSL Support to controller.py (lm-sys#2448)

* Format & Improve docs

* Release v0.2.29 (lm-sys#2450)

* Show terms of use as an JS alert (lm-sys#2461)

* vllm worker awq quantization update (lm-sys#2463)

Co-authored-by: 董晓龙 <dongxiaolong@shiyanjia.com>

* Fix falcon chat template (lm-sys#2464)

* Fix chunk handling when partial chunks are returned (lm-sys#2485)

* Update openai_api_server.py to add an SSL option (lm-sys#2484)

* Update vllm_worker.py (lm-sys#2482)

* fix typo quantization (lm-sys#2469)

* fix vllm quanziation args

* Update README.md (lm-sys#2492)

* Huggingface api worker (lm-sys#2456)

* Update links to lmsys-chat-1m (lm-sys#2497)

* Update train code to support the new tokenizer (lm-sys#2498)

* Third Party UI Example (lm-sys#2499)

* Add metharme (pygmalion) conversation template (lm-sys#2500)

* Optimize for proper flash attn causal handling (lm-sys#2503)

* Add Mistral AI instruction template (lm-sys#2483)

* Update monitor & plots (lm-sys#2506)

* Release v0.2.30 (lm-sys#2507)

* Fix for single turn dataset (lm-sys#2509)

* replace os.getenv with os.path.expanduser because the first one doesn… (lm-sys#2515)

Co-authored-by: khalil <k.hennara@work-with-nerds.ca>

* Fix arena (lm-sys#2522)

* Update Dockerfile (lm-sys#2524)

* add Llama2ChangAdapter (lm-sys#2510)

* Add ExllamaV2 Inference Framework Support. (lm-sys#2455)

* Improve docs (lm-sys#2534)

* Fix warnings for new gradio versions (lm-sys#2538)

* revert the gradio change; now works for 3.40

* Improve chat templates (lm-sys#2539)

* Add Zephyr 7B Alpha (lm-sys#2535)

* Improve Support for Mistral-Instruct (lm-sys#2547)

* correct max_tokens by context_length instead of raise exception (lm-sys#2544)

* Revert "Improve Support for Mistral-Instruct" (lm-sys#2552)

* Fix Mistral template (lm-sys#2529)

* Add additional Informations from the vllm worker (lm-sys#2550)

* Make FastChat work with LMSYS-Chat-1M Code (lm-sys#2551)

* Create `tags` attribute to fix `MarkupError` in rich CLI (lm-sys#2553)

* move BaseModelWorker outside serve.model_worker to make it independent (lm-sys#2531)

* Misc style and bug fixes (lm-sys#2559)

* Fix README.md (lm-sys#2561)

* release v0.2.31 (lm-sys#2563)

* resolves lm-sys#2542 modify dockerfile to upgrade cuda to 12.2.0 and pydantic 1.10.13 (lm-sys#2565)

* Add airoboros_v3 chat template (llama-2 format) (lm-sys#2564)

* Add Xwin-LM V0.1, V0.2 support (lm-sys#2566)

* Fixed model_worker generate_gate may blocked main thread (lm-sys#2540) (lm-sys#2562)

* feat: add claude-v2 (lm-sys#2571)

* Update vigogne template (lm-sys#2580)

* Fix issue lm-sys#2568: --device mps led to TypeError: forward() got an unexpected keyword argument 'padding_mask'. (lm-sys#2579)

* Add Mistral-7B-OpenOrca conversation_temmplate (lm-sys#2585)

* docs: bit misspell comments model adapter default template name conversation (lm-sys#2594)

* Update Mistral template (lm-sys#2581)

* Fix <s> in mistral template

* Update README.md  (vicuna-v1.3 -> vicuna-1.5) (lm-sys#2592)

* Update README.md to highlight chatbot arena (lm-sys#2596)

* Add Lemur model (lm-sys#2584)

Co-authored-by: Roberto Ugolotti <Roberto.UGOLOTTI@ec.europa.eu>

* add trust_remote_code=True in BaseModelAdapter (lm-sys#2583)

* Openai interface add use beam search and best of 2 (lm-sys#2442)

Signed-off-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Lei Wen <wenlei03@qiyi.com>

* Update qwen and add pygmalion (lm-sys#2607)

* feat: Support model AquilaChat2 (lm-sys#2616)

* Added settings vllm (lm-sys#2599)

Co-authored-by: bodza <bodza@qnovi.de>
Co-authored-by: bodza <sebastian.bodza@qnovi.de>

* [Logprobs] Support logprobs=1 (lm-sys#2612)

* release v0.2.32

* fix: Fix for OpenOrcaAdapter to return correct conversation template (lm-sys#2613)

* Make fastchat.serve.model_worker to take debug argument (lm-sys#2628)

Co-authored-by: hi-jin <crushed7@o.cnu.ac.kr>

* openchat 3.5 model support (lm-sys#2638)

* xFastTransformer framework support (lm-sys#2615)

* feat: support custom models vllm serving (lm-sys#2635)

* kill only fastchat process (lm-sys#2641)

* Update server_arch.png

* Use conv.update_last_message api in mt-bench answer generation (lm-sys#2647)

* Improve Azure OpenAI interface (lm-sys#2651)

* Add required_temp support in jsonl format to support flexible temperature setting for gen_api_answer (lm-sys#2653)

* Pin openai version < 1 (lm-sys#2658)

* Remove exclude_unset parameter (lm-sys#2654)

* Revert "Remove exclude_unset parameter" (lm-sys#2666)

* added support for CodeGeex(2) (lm-sys#2645)

* add chatglm3 conv template support in conversation.py (lm-sys#2622)

* UI and model change (lm-sys#2672)

Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>

* train_flant5: fix typo (lm-sys#2673)

* Fix gpt template (lm-sys#2674)

* Update README.md (lm-sys#2679)

* feat: support template's stop_str as list (lm-sys#2678)

* Update exllama_v2.md (lm-sys#2680)

* save model under deepspeed (lm-sys#2689)

* Adding SSL support for model workers and huggingface worker (lm-sys#2687)

* Check the max_new_tokens <= 0 in openai api server (lm-sys#2688)

* Add Microsoft/Orca-2-7b and update model support docs (lm-sys#2714)

* fix tokenizer of chatglm2 (lm-sys#2711)

* Template for using Deepseek code models (lm-sys#2705)

* add support for Chinese-LLaMA-Alpaca (lm-sys#2700)

* Make --load-8bit flag work with weights in safetensors format (lm-sys#2698)

* Format code and minor bug fix (lm-sys#2716)

* Bump version to v0.2.33 (lm-sys#2717)

* fix tokenizer.pad_token attribute error (lm-sys#2710)

* support stable-vicuna model (lm-sys#2696)

* Exllama cache 8bit (lm-sys#2719)

* Add Yi support (lm-sys#2723)

* Add Hermes 2.5 [fixed] (lm-sys#2725)

* Fix Hermes2Adapter (lm-sys#2727)

* Fix YiAdapter (lm-sys#2730)

* add trust_remote_code argument (lm-sys#2715)

* Add revision arg to MT Bench answer generation (lm-sys#2728)

* Fix MPS backend 'index out of range' error (lm-sys#2737)

* add starling support (lm-sys#2738)

---------

Signed-off-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Trangle <kw_w@foxmail.com>
Co-authored-by: Nathan Stitt <nathan@stitt.org>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: leiwen83 <leiwen83@users.noreply.github.com>
Co-authored-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Jon Durbin <jon@jondurbin.com>
Co-authored-by: Jon Durbin <jon.durbin@onna.com>
Co-authored-by: Rayrtfr <2384172887@qq.com>
Co-authored-by: wuyongyu <wuyongyu@atomecho.xyz>
Co-authored-by: wangxiyuan <wangxiyuan@huawei.com>
Co-authored-by: Jeff (Zhen) Wang <wangzhen263@gmail.com>
Co-authored-by: karshPrime <94996251+karshPrime@users.noreply.github.com>
Co-authored-by: obitolyz <obitoquilt@qq.com>
Co-authored-by: Shangwei Chen <109785802+Somezak1@users.noreply.github.com>
Co-authored-by: HyungJin Ahn <crushed7@o.cnu.ac.kr>
Co-authored-by: zhangsibo1129 <134488188+zhangsibo1129@users.noreply.github.com>
Co-authored-by: Tobias Birchler <tobias@birchlerfamily.ch>
Co-authored-by: Jae-Won Chung <jwnchung@umich.edu>
Co-authored-by: Mingdao Liu <joshua@btlmd.com>
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: Brandon Biggs <brandonsbiggs@gmail.com>
Co-authored-by: dongxiaolong <774848421@qq.com>
Co-authored-by: 董晓龙 <dongxiaolong@shiyanjia.com>
Co-authored-by: Siddartha Naidu <siddartha@abacus.ai>
Co-authored-by: shuishu <990941859@qq.com>
Co-authored-by: Andrew Aikawa <asai@berkeley.edu>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: enochlev <47466848+enochlev@users.noreply.github.com>
Co-authored-by: AlpinDale <52078762+AlpinDale@users.noreply.github.com>
Co-authored-by: Lé <lerela@users.noreply.github.com>
Co-authored-by: Toshiki Kataoka <tos.lunar@gmail.com>
Co-authored-by: khalil <90086758+khalil-Hennara@users.noreply.github.com>
Co-authored-by: khalil <k.hennara@work-with-nerds.ca>
Co-authored-by: dubaoquan404 <87166864@qq.com>
Co-authored-by: Chang W. Lee <changlee99@gmail.com>
Co-authored-by: theScotchGame <36061851+leonxia1018@users.noreply.github.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: Stephen Horvath <s.horvath@outlook.com.au>
Co-authored-by: liunux4odoo <41217877+liunux4odoo@users.noreply.github.com>
Co-authored-by: Norman Mu <normster@users.noreply.github.com>
Co-authored-by: Sebastian Bodza <66752172+SebastianBodza@users.noreply.github.com>
Co-authored-by: Tianle (Tim) Li <67527391+CodingWithTim@users.noreply.github.com>
Co-authored-by: Wei-Lin Chiang <weichiang@berkeley.edu>
Co-authored-by: Alex <alexander.s.delapaz@gmail.com>
Co-authored-by: Jingcheng Hu <67776176+REIGN12@users.noreply.github.com>
Co-authored-by: lvxuan <3645933+lvxuan263@users.noreply.github.com>
Co-authored-by: cOng <erdongerzong@qq.com>
Co-authored-by: bofeng huang <bofenghuang7@gmail.com>
Co-authored-by: Phil-U-U <phil.h.cui@gmail.com>
Co-authored-by: Wayne Spangenberg <waynespa@gmail.com>
Co-authored-by: Guspan Tanadi <36249910+guspan-tanadi@users.noreply.github.com>
Co-authored-by: Rohan Gupta <63547845+Gk-rohan@users.noreply.github.com>
Co-authored-by: ugolotti <96428459+ugolotti@users.noreply.github.com>
Co-authored-by: Roberto Ugolotti <Roberto.UGOLOTTI@ec.europa.eu>
Co-authored-by: edisonwd <2388100489@qq.com>
Co-authored-by: FangYin Cheng <staneyffer@gmail.com>
Co-authored-by: bodza <bodza@qnovi.de>
Co-authored-by: bodza <sebastian.bodza@qnovi.de>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: Srinath Janakiraman <me@vjsrinath.com>
Co-authored-by: Jaeheon Jeong <tizm423@gmail.com>
Co-authored-by: One <imoneoi@users.noreply.github.com>
Co-authored-by: sheng.gui@intel.com <guisheng315@sina.com>
Co-authored-by: David <scenaristeur@gmail.com>
Co-authored-by: Witold Wasiczko <snapshotpl@users.noreply.github.com>
Co-authored-by: Peter Willemsen <peter@codebuffet.co>
Co-authored-by: ZeyuTeng96 <96521059+ZeyuTeng96@users.noreply.github.com>
Co-authored-by: Forceless <72636351+Force1ess@users.noreply.github.com>
Co-authored-by: Jeff <122586668+jm23jeffmorgan@users.noreply.github.com>
Co-authored-by: MrZhengXin <34998703+MrZhengXin@users.noreply.github.com>
Co-authored-by: Long Nguyen <long.nguyen11288@gmail.com>
Co-authored-by: Elsa Granger <zeyugao@outlook.com>
Co-authored-by: Christopher Chou <49086305+BabyChouSr@users.noreply.github.com>
Co-authored-by: wangshuai09 <391746016@qq.com>
Co-authored-by: amaleshvemula <vemulaamalesh1997@gmail.com>
Co-authored-by: Zollty Tsou <zollty@163.com>
Co-authored-by: xuguodong1999 <bugxu@outlook.com>
Co-authored-by: Michael J Kaye <1014467+mjkaye@users.noreply.github.com>
Co-authored-by: 152334H <54623771+152334H@users.noreply.github.com>
Co-authored-by: Jingsong-Yan <75230787+Jingsong-Yan@users.noreply.github.com>
Co-authored-by: Siyuan (Ryans) Zhuang <suquark@gmail.com>
renning22 added a commit to shaleprotocol/Shale-Serve-API that referenced this pull request Feb 24, 2024
* Remove hardcode flash-attn disable setting (lm-sys#2342)

* Document turning off proxy_buffering when api is streaming (lm-sys#2337)

* Simplify huggingface api example (lm-sys#2355)

* Update sponsor logos (lm-sys#2367)

* if LOGDIR is empty, then don't try output log to local file (lm-sys#2357)

Signed-off-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Lei Wen <wenlei03@qiyi.com>

* add best_of and use_beam_search for completions interface (lm-sys#2348)

Signed-off-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Lei Wen <wenlei03@qiyi.com>

* Extract upvote/downvote from log files (lm-sys#2369)

* Revert "add best_of and use_beam_search for completions interface" (lm-sys#2370)

* Improve doc (lm-sys#2371)

* add best_of and use_beam_search for completions interface (lm-sys#2372)

Signed-off-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Lei Wen <wenlei03@qiyi.com>

* update monkey patch for llama2 (lm-sys#2379)

* Make E5 adapter more restrict to reduce mismatch (lm-sys#2381)

* Update UI and sponsers (lm-sys#2387)

* Use fsdp api for save save (lm-sys#2390)

* Release v0.2.27

* Spicyboros + airoboros 2.2 template update. (lm-sys#2392)

Co-authored-by: Jon Durbin <jon.durbin@onna.com>

* bugfix of openai_api_server for fastchat.serve.vllm_worker (lm-sys#2398)

Co-authored-by: wuyongyu <wuyongyu@atomecho.xyz>

* Revert "bugfix of openai_api_server for fastchat.serve.vllm_worker" (lm-sys#2400)

* Revert "add best_of and use_beam_search for completions interface" (lm-sys#2401)

* Release a v0.2.28 with bug fixes and more test cases

* Fix model_worker error (lm-sys#2404)

* Added google/flan models and fixed AutoModelForSeq2SeqLM when loading T5 compression model (lm-sys#2402)

* Rename twitter to X (lm-sys#2406)

* Update huggingface_api.py (lm-sys#2409)

* Add support for baichuan2 models (lm-sys#2408)

* Fixed character overlap issue when api streaming output (lm-sys#2431)

* Support custom conversation template in multi_model_worker (lm-sys#2434)

* Add Ascend NPU support (lm-sys#2422)

* Add raw conversation template (lm-sys#2417) (lm-sys#2418)

* Improve docs & UI (lm-sys#2436)

* Fix Salesforce xgen inference (lm-sys#2350)

* Add support for Phind-CodeLlama models (lm-sys#2415) (lm-sys#2416)

Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>

* Add falcon 180B chat conversation template (lm-sys#2384)

* Improve docs (lm-sys#2438)

* add dtype and seed (lm-sys#2430)

* Data cleaning scripts for dataset release (lm-sys#2440)

* merge google/flan based adapters: T5Adapter, CodeT5pAdapter, FlanAdapter (lm-sys#2411)

* Fix docs

* Update UI (lm-sys#2446)

* Add Optional SSL Support to controller.py (lm-sys#2448)

* Format & Improve docs

* Release v0.2.29 (lm-sys#2450)

* Show terms of use as an JS alert (lm-sys#2461)

* vllm worker awq quantization update (lm-sys#2463)

Co-authored-by: 董晓龙 <dongxiaolong@shiyanjia.com>

* Fix falcon chat template (lm-sys#2464)

* Fix chunk handling when partial chunks are returned (lm-sys#2485)

* Update openai_api_server.py to add an SSL option (lm-sys#2484)

* Update vllm_worker.py (lm-sys#2482)

* fix typo quantization (lm-sys#2469)

* fix vllm quanziation args

* Update README.md (lm-sys#2492)

* Huggingface api worker (lm-sys#2456)

* Update links to lmsys-chat-1m (lm-sys#2497)

* Update train code to support the new tokenizer (lm-sys#2498)

* Third Party UI Example (lm-sys#2499)

* Add metharme (pygmalion) conversation template (lm-sys#2500)

* Optimize for proper flash attn causal handling (lm-sys#2503)

* Add Mistral AI instruction template (lm-sys#2483)

* Update monitor & plots (lm-sys#2506)

* Release v0.2.30 (lm-sys#2507)

* Fix for single turn dataset (lm-sys#2509)

* replace os.getenv with os.path.expanduser because the first one doesn… (lm-sys#2515)

Co-authored-by: khalil <k.hennara@work-with-nerds.ca>

* Fix arena (lm-sys#2522)

* Update Dockerfile (lm-sys#2524)

* add Llama2ChangAdapter (lm-sys#2510)

* Add ExllamaV2 Inference Framework Support. (lm-sys#2455)

* Improve docs (lm-sys#2534)

* Fix warnings for new gradio versions (lm-sys#2538)

* revert the gradio change; now works for 3.40

* Improve chat templates (lm-sys#2539)

* Add Zephyr 7B Alpha (lm-sys#2535)

* Improve Support for Mistral-Instruct (lm-sys#2547)

* correct max_tokens by context_length instead of raise exception (lm-sys#2544)

* Revert "Improve Support for Mistral-Instruct" (lm-sys#2552)

* Fix Mistral template (lm-sys#2529)

* Add additional Informations from the vllm worker (lm-sys#2550)

* Make FastChat work with LMSYS-Chat-1M Code (lm-sys#2551)

* Create `tags` attribute to fix `MarkupError` in rich CLI (lm-sys#2553)

* move BaseModelWorker outside serve.model_worker to make it independent (lm-sys#2531)

* Misc style and bug fixes (lm-sys#2559)

* Fix README.md (lm-sys#2561)

* release v0.2.31 (lm-sys#2563)

* resolves lm-sys#2542 modify dockerfile to upgrade cuda to 12.2.0 and pydantic 1.10.13 (lm-sys#2565)

* Add airoboros_v3 chat template (llama-2 format) (lm-sys#2564)

* Add Xwin-LM V0.1, V0.2 support (lm-sys#2566)

* Fixed model_worker generate_gate may blocked main thread (lm-sys#2540) (lm-sys#2562)

* feat: add claude-v2 (lm-sys#2571)

* Update vigogne template (lm-sys#2580)

* Fix issue lm-sys#2568: --device mps led to TypeError: forward() got an unexpected keyword argument 'padding_mask'. (lm-sys#2579)

* Add Mistral-7B-OpenOrca conversation_temmplate (lm-sys#2585)

* docs: bit misspell comments model adapter default template name conversation (lm-sys#2594)

* Update Mistral template (lm-sys#2581)

* Fix <s> in mistral template

* Update README.md  (vicuna-v1.3 -> vicuna-1.5) (lm-sys#2592)

* Update README.md to highlight chatbot arena (lm-sys#2596)

* Add Lemur model (lm-sys#2584)

Co-authored-by: Roberto Ugolotti <Roberto.UGOLOTTI@ec.europa.eu>

* add trust_remote_code=True in BaseModelAdapter (lm-sys#2583)

* Openai interface add use beam search and best of 2 (lm-sys#2442)

Signed-off-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Lei Wen <wenlei03@qiyi.com>

* Update qwen and add pygmalion (lm-sys#2607)

* feat: Support model AquilaChat2 (lm-sys#2616)

* Added settings vllm (lm-sys#2599)

Co-authored-by: bodza <bodza@qnovi.de>
Co-authored-by: bodza <sebastian.bodza@qnovi.de>

* [Logprobs] Support logprobs=1 (lm-sys#2612)

* release v0.2.32

* fix: Fix for OpenOrcaAdapter to return correct conversation template (lm-sys#2613)

* Make fastchat.serve.model_worker to take debug argument (lm-sys#2628)

Co-authored-by: hi-jin <crushed7@o.cnu.ac.kr>

* openchat 3.5 model support (lm-sys#2638)

* xFastTransformer framework support (lm-sys#2615)

* feat: support custom models vllm serving (lm-sys#2635)

* kill only fastchat process (lm-sys#2641)

* Update server_arch.png

* Use conv.update_last_message api in mt-bench answer generation (lm-sys#2647)

* Improve Azure OpenAI interface (lm-sys#2651)

* Add required_temp support in jsonl format to support flexible temperature setting for gen_api_answer (lm-sys#2653)

* Pin openai version < 1 (lm-sys#2658)

* Remove exclude_unset parameter (lm-sys#2654)

* Revert "Remove exclude_unset parameter" (lm-sys#2666)

* added support for CodeGeex(2) (lm-sys#2645)

* add chatglm3 conv template support in conversation.py (lm-sys#2622)

* UI and model change (lm-sys#2672)

Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>

* train_flant5: fix typo (lm-sys#2673)

* Fix gpt template (lm-sys#2674)

* Update README.md (lm-sys#2679)

* feat: support template's stop_str as list (lm-sys#2678)

* Update exllama_v2.md (lm-sys#2680)

* save model under deepspeed (lm-sys#2689)

* Adding SSL support for model workers and huggingface worker (lm-sys#2687)

* Check the max_new_tokens <= 0 in openai api server (lm-sys#2688)

* Add Microsoft/Orca-2-7b and update model support docs (lm-sys#2714)

* fix tokenizer of chatglm2 (lm-sys#2711)

* Template for using Deepseek code models (lm-sys#2705)

* add support for Chinese-LLaMA-Alpaca (lm-sys#2700)

* Make --load-8bit flag work with weights in safetensors format (lm-sys#2698)

* Format code and minor bug fix (lm-sys#2716)

* Bump version to v0.2.33 (lm-sys#2717)

* fix tokenizer.pad_token attribute error (lm-sys#2710)

* support stable-vicuna model (lm-sys#2696)

* Exllama cache 8bit (lm-sys#2719)

* Add Yi support (lm-sys#2723)

* Add Hermes 2.5 [fixed] (lm-sys#2725)

* Fix Hermes2Adapter (lm-sys#2727)

* Fix YiAdapter (lm-sys#2730)

* add trust_remote_code argument (lm-sys#2715)

* Add revision arg to MT Bench answer generation (lm-sys#2728)

* Fix MPS backend 'index out of range' error (lm-sys#2737)

* add starling support (lm-sys#2738)

* Add deepseek chat (lm-sys#2760)

* a convenient script for spinning up the API with Model Workers (lm-sys#2790)

* Prevent returning partial stop string in vllm worker (lm-sys#2780)

* Update UI and new models (lm-sys#2762)

* Support MetaMath (lm-sys#2748)

* Use common logging code in the OpenAI API server (lm-sys#2758)

Co-authored-by: Warren Francis <warren@kududyn.com>

* Show how to turn on experiment tracking for fine-tuning (lm-sys#2742)

Co-authored-by: Morgan McGuire <morganmcguire@Morgans-MacBook-Pro.local>

* Support xDAN-L1-Chat Model  (lm-sys#2732)

* Format code

* Update the version to 0.2.34 (lm-sys#2793)

* add dolphin (lm-sys#2794)

* Fix tiny typo (lm-sys#2805)

* Add instructions for evaluating on MT bench using vLLM (lm-sys#2770)

* Update README.md

* Add SOLAR-10.7b Instruct Model (lm-sys#2826)

* Update README.md (lm-sys#2852)

* fix: 'compeletion' typo (lm-sys#2847)

* Add Tunnelmole as an open source alternative to ngrok and include usage instructions (lm-sys#2846)

* update readme

* update mt-bench readme

* Add support for CatPPT (lm-sys#2840)

* Add functionality to ping AI2 InferD endpoints for tulu 2 (lm-sys#2832)

Co-authored-by: Sam Skjonsberg <sams@allenai.org>

* add download models from www.modelscope.cn (lm-sys#2830)

Co-authored-by: mulin.lyh <mulin.lyh@taobao.com>

* Fix conv_template of chinese alpaca 2 (lm-sys#2812)

* add bagel model adapter (lm-sys#2814)

* add root_path argument to gradio web server. (lm-sys#2807)

Co-authored-by: bertls <s.bertl@iaea.org>

* Import `accelerate` locally to avoid it as a strong dependency (lm-sys#2820)

* Replace dict merge with unpacking for compatibility of 3.8 in vLLM worker (lm-sys#2824)

Signed-off-by: rudeigerc <rudeigerc@gmail.com>

* Format code (lm-sys#2854)

* Openai API migrate (lm-sys#2765)

* fix openai api server docs

* Add a16z as a sponser

* Add new models (Perplexity, gemini) & Separate GPT versions (lm-sys#2856)

Co-authored-by: Wei-Lin Chiang <infwinston@gmail.com>

* Clean error messages (lm-sys#2857)

* Update docs (lm-sys#2858)

* Modify doc description (lm-sys#2859)

* Fix the problem of not using the decoding method corresponding to the base model in peft mode (lm-sys#2865)

* update a new sota model on MT-Bench which touch an 8.8 scores. (lm-sys#2864)

* NPU needs to be initialized when starting a new process (lm-sys#2843)

* Fix the problem with "vllm + chatglm3" (lm-sys#2845) (lm-sys#2876)

Co-authored-by: 姚峰 <yaofeng@chinaums.com>

* Update token spacing for mistral conversation.py (lm-sys#2872)

* check if hm in models before deleting to avoid errors (lm-sys#2870)

Co-authored-by: Your Name <you@example.com>

* Add TinyLlama (lm-sys#2889)

* Fix bug that model doesn't automatically switch peft adapter (lm-sys#2884)

* Update web server commands (lm-sys#2869)

* fix the tokenize process and prompt template of chatglm3 (lm-sys#2883)

Co-authored-by: 章焕锭 <zhanghuanding@zj.chinamobile.com>

* Add `Notus` support (lm-sys#2813)

Co-authored-by: alvarobartt <alvaro@argilla.io>

* feat: support anthropic api with api_dict (lm-sys#2879)

* Update model_adapter.py (lm-sys#2895)

* leaderboard code update (lm-sys#2867)

* fix: change order of SEQUENCE_LENGTH_KEYS (lm-sys#2925)

* fix baichuan:apply_prompt_template call args error (lm-sys#2921)

Co-authored-by: Zheng Hao <forcelss@ForcelessMacBook-Pro.local>

* Fix a typo in openai_api_server.py (lm-sys#2905)

* feat: use variables OPENAI_MODEL_LIST (lm-sys#2907)

* Add TenyxChat-7B-v1 model (lm-sys#2901)

Co-authored-by: sarath@L3 <[omitted]>

* add support for iei yuan2.0 (https://huggingface.co/IEITYuan) (lm-sys#2919)

* nous-hermes-2-mixtral-dpo (lm-sys#2922)

* Bump the version to 0.2.35 (lm-sys#2927)

* fix specify local path issue use model from www.modelscope.cn (lm-sys#2934)

Co-authored-by: mulin.lyh <mulin.lyh@taobao.com>

* support openai embedding for topic clustering (lm-sys#2729)

* Remove duplicate API endpoint (lm-sys#2949)

* Update Hermes Mixtral (lm-sys#2938)

* Enablement of REST API Usage within Google Colab Free Tier (lm-sys#2940)

* Create a new worker implementation for Apple MLX (lm-sys#2937)

* feat: support Model Yuan2.0, a new generation Fundamental Large Language Model developed by IEIT System (lm-sys#2936)

* Fix the pooling method of BGE embedding model (lm-sys#2926)

* format code

* SGLang Worker (lm-sys#2928)

* Fix sglang worker (lm-sys#2953)

* Update mlx_worker to be async (lm-sys#2958)

* Integrate LightLLM into serve worker (lm-sys#2888)

* Copy button (lm-sys#2963)

* feat: train with template (lm-sys#2951)

* fix content maybe a str (lm-sys#2968)

* Adding download folder information in README (lm-sys#2972)

* use cl100k_base as the default tiktoken encoding (lm-sys#2974)

Signed-off-by: bjwswang <bjwswang@gmail.com>

* Update README.md (lm-sys#2975)

* Fix tokenizer for vllm worker (lm-sys#2984)

* update yuan2.0 generation (lm-sys#2989)

* fix: tokenization mismatch when training with different templates (lm-sys#2996)

* fix: inconsistent tokenization by llama tokenizer (lm-sys#3006)

* Fix type hint for play_a_match_single (lm-sys#3008)

* code update (lm-sys#2997)

* Update model_support.md (lm-sys#3016)

* Update lightllm_integration.md (lm-sys#3014)

* Upgrade gradio to 4.17 (lm-sys#3027)

* Update MLX integration to use new generate_step function signature (lm-sys#3021)

* Update readme (lm-sys#3028)

* Update gradio version in `pyproject.toml` and fix a bug (lm-sys#3029)

* Update gradio demo and API model providers (lm-sys#3030)

* Gradio Web Server for Multimodal Models (lm-sys#2960)

Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>

* Migrate the gradio server to openai v1 (lm-sys#3032)

* Update version to 0.2.36 (lm-sys#3033)

Co-authored-by: Wei-Lin Chiang <infwinston@gmail.com>

* Add llava 34b template (lm-sys#3034)

* Update model support  (lm-sys#3040)

* Add psutil to pyproject.toml dependencies (lm-sys#3039)

* Fix SGLang worker (lm-sys#3045)

* Random VQA Sample button for VLM direct chat (lm-sys#3041)

* Update arena.md to fix link (lm-sys#3051)

* multi inference

---------

Signed-off-by: Lei Wen <wenlei03@qiyi.com>
Signed-off-by: rudeigerc <rudeigerc@gmail.com>
Signed-off-by: bjwswang <bjwswang@gmail.com>
Co-authored-by: Trangle <kw_w@foxmail.com>
Co-authored-by: Nathan Stitt <nathan@stitt.org>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: leiwen83 <leiwen83@users.noreply.github.com>
Co-authored-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Jon Durbin <jon@jondurbin.com>
Co-authored-by: Jon Durbin <jon.durbin@onna.com>
Co-authored-by: Rayrtfr <2384172887@qq.com>
Co-authored-by: wuyongyu <wuyongyu@atomecho.xyz>
Co-authored-by: wangxiyuan <wangxiyuan@huawei.com>
Co-authored-by: Jeff (Zhen) Wang <wangzhen263@gmail.com>
Co-authored-by: karshPrime <94996251+karshPrime@users.noreply.github.com>
Co-authored-by: obitolyz <obitoquilt@qq.com>
Co-authored-by: Shangwei Chen <109785802+Somezak1@users.noreply.github.com>
Co-authored-by: HyungJin Ahn <crushed7@o.cnu.ac.kr>
Co-authored-by: zhangsibo1129 <134488188+zhangsibo1129@users.noreply.github.com>
Co-authored-by: Tobias Birchler <tobias@birchlerfamily.ch>
Co-authored-by: Jae-Won Chung <jwnchung@umich.edu>
Co-authored-by: Mingdao Liu <joshua@btlmd.com>
Co-authored-by: Ying Sheng <sqy1415@gmail.com>
Co-authored-by: Brandon Biggs <brandonsbiggs@gmail.com>
Co-authored-by: dongxiaolong <774848421@qq.com>
Co-authored-by: 董晓龙 <dongxiaolong@shiyanjia.com>
Co-authored-by: Siddartha Naidu <siddartha@abacus.ai>
Co-authored-by: shuishu <990941859@qq.com>
Co-authored-by: Andrew Aikawa <asai@berkeley.edu>
Co-authored-by: Liangsheng Yin <hnyls2002@gmail.com>
Co-authored-by: enochlev <47466848+enochlev@users.noreply.github.com>
Co-authored-by: AlpinDale <52078762+AlpinDale@users.noreply.github.com>
Co-authored-by: Lé <lerela@users.noreply.github.com>
Co-authored-by: Toshiki Kataoka <tos.lunar@gmail.com>
Co-authored-by: khalil <90086758+khalil-Hennara@users.noreply.github.com>
Co-authored-by: khalil <k.hennara@work-with-nerds.ca>
Co-authored-by: dubaoquan404 <87166864@qq.com>
Co-authored-by: Chang W. Lee <changlee99@gmail.com>
Co-authored-by: theScotchGame <36061851+leonxia1018@users.noreply.github.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: Stephen Horvath <s.horvath@outlook.com.au>
Co-authored-by: liunux4odoo <41217877+liunux4odoo@users.noreply.github.com>
Co-authored-by: Norman Mu <normster@users.noreply.github.com>
Co-authored-by: Sebastian Bodza <66752172+SebastianBodza@users.noreply.github.com>
Co-authored-by: Tianle (Tim) Li <67527391+CodingWithTim@users.noreply.github.com>
Co-authored-by: Wei-Lin Chiang <weichiang@berkeley.edu>
Co-authored-by: Alex <alexander.s.delapaz@gmail.com>
Co-authored-by: Jingcheng Hu <67776176+REIGN12@users.noreply.github.com>
Co-authored-by: lvxuan <3645933+lvxuan263@users.noreply.github.com>
Co-authored-by: cOng <erdongerzong@qq.com>
Co-authored-by: bofeng huang <bofenghuang7@gmail.com>
Co-authored-by: Phil-U-U <phil.h.cui@gmail.com>
Co-authored-by: Wayne Spangenberg <waynespa@gmail.com>
Co-authored-by: Guspan Tanadi <36249910+guspan-tanadi@users.noreply.github.com>
Co-authored-by: Rohan Gupta <63547845+Gk-rohan@users.noreply.github.com>
Co-authored-by: ugolotti <96428459+ugolotti@users.noreply.github.com>
Co-authored-by: Roberto Ugolotti <Roberto.UGOLOTTI@ec.europa.eu>
Co-authored-by: edisonwd <2388100489@qq.com>
Co-authored-by: FangYin Cheng <staneyffer@gmail.com>
Co-authored-by: bodza <bodza@qnovi.de>
Co-authored-by: bodza <sebastian.bodza@qnovi.de>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: Srinath Janakiraman <me@vjsrinath.com>
Co-authored-by: Jaeheon Jeong <tizm423@gmail.com>
Co-authored-by: One <imoneoi@users.noreply.github.com>
Co-authored-by: sheng.gui@intel.com <guisheng315@sina.com>
Co-authored-by: David <scenaristeur@gmail.com>
Co-authored-by: Witold Wasiczko <snapshotpl@users.noreply.github.com>
Co-authored-by: Peter Willemsen <peter@codebuffet.co>
Co-authored-by: ZeyuTeng96 <96521059+ZeyuTeng96@users.noreply.github.com>
Co-authored-by: Forceless <72636351+Force1ess@users.noreply.github.com>
Co-authored-by: Jeff <122586668+jm23jeffmorgan@users.noreply.github.com>
Co-authored-by: MrZhengXin <34998703+MrZhengXin@users.noreply.github.com>
Co-authored-by: Long Nguyen <long.nguyen11288@gmail.com>
Co-authored-by: Elsa Granger <zeyugao@outlook.com>
Co-authored-by: Christopher Chou <49086305+BabyChouSr@users.noreply.github.com>
Co-authored-by: wangshuai09 <391746016@qq.com>
Co-authored-by: amaleshvemula <vemulaamalesh1997@gmail.com>
Co-authored-by: Zollty Tsou <zollty@163.com>
Co-authored-by: xuguodong1999 <bugxu@outlook.com>
Co-authored-by: Michael J Kaye <1014467+mjkaye@users.noreply.github.com>
Co-authored-by: 152334H <54623771+152334H@users.noreply.github.com>
Co-authored-by: Jingsong-Yan <75230787+Jingsong-Yan@users.noreply.github.com>
Co-authored-by: Siyuan (Ryans) Zhuang <suquark@gmail.com>
Co-authored-by: Chris Kerwell Gresla <80501101+ckgresla@users.noreply.github.com>
Co-authored-by: pandada8 <pandada8@gmail.com>
Co-authored-by: Isaac Ong <isaacong.jw@gmail.com>
Co-authored-by: Warren Francis <geekoftheweek@users.noreply.github.com>
Co-authored-by: Warren Francis <warren@kududyn.com>
Co-authored-by: Morgan McGuire <morganmcg1@users.noreply.github.com>
Co-authored-by: Morgan McGuire <morganmcguire@Morgans-MacBook-Pro.local>
Co-authored-by: xDAN-AI <128944251+xiechengmude@users.noreply.github.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
Co-authored-by: Robbie <robbie-cahill@proton.me>
Co-authored-by: Rishiraj Acharya <44090649+rishiraj@users.noreply.github.com>
Co-authored-by: Nathan Lambert <nathanl@allenai.org>
Co-authored-by: Sam Skjonsberg <sams@allenai.org>
Co-authored-by: liuyhwangyh <liuyhwangyh@163.com>
Co-authored-by: mulin.lyh <mulin.lyh@taobao.com>
Co-authored-by: stephanbertl <stephan@bweb.at>
Co-authored-by: bertls <s.bertl@iaea.org>
Co-authored-by: Chirag Jain <jain.chirag925@gmail.com>
Co-authored-by: Yuchen Cheng <rudeigerc@gmail.com>
Co-authored-by: Shuo Yang <73746844+andy-yang-1@users.noreply.github.com>
Co-authored-by: Wei-Lin Chiang <infwinston@gmail.com>
Co-authored-by: JQ <460494839@qq.com>
Co-authored-by: yaofeng <yf_reg@outlook.com>
Co-authored-by: 姚峰 <yaofeng@chinaums.com>
Co-authored-by: Michael <67104840+thavens@users.noreply.github.com>
Co-authored-by: Josh NE <renjunyao@gmail.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: WHDY <38045789+WHDY@users.noreply.github.com>
Co-authored-by: 章焕锭 <zhanghuanding@zj.chinamobile.com>
Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com>
Co-authored-by: alvarobartt <alvaro@argilla.io>
Co-authored-by: Zheng Hao <forcelss@ForcelessMacBook-Pro.local>
Co-authored-by: Ren Xuancheng <jklj077@users.noreply.github.com>
Co-authored-by: Sarath Shekkizhar <137322432+sarath-shekkizhar@users.noreply.github.com>
Co-authored-by: wangpengfei1013 <155146149+wangpengfei1013@users.noreply.github.com>
Co-authored-by: Alexandre Strube <a.strube@fz-juelich.de>
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Co-authored-by: Cristian Gutiérrez <57730982+ggcr@users.noreply.github.com>
Co-authored-by: ali asaria <aliasaria@users.noreply.github.com>
Co-authored-by: wulixuan <cauwulixuan@163.com>
Co-authored-by: staoxiao <2906698981@qq.com>
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>
Co-authored-by: dheeraj-326 <dheeraj.326@gmail.com>
Co-authored-by: bjwswang <30621793+bjwswang@users.noreply.github.com>
Co-authored-by: Zhanghao Wu <zhanghao.wu@outlook.com>
Co-authored-by: Ted Li <tl2493@columbia.edu>
Co-authored-by: Shukant Pal <SukantK2002@outlook.com>
Co-authored-by: Lisa Dunlap <lisabdunlap@gmail.com>
Co-authored-by: Logan Kilpatrick <23kilpatrick23@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants