Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skipping refactor #234

Closed
wants to merge 123 commits into from
Closed

Skipping refactor #234

wants to merge 123 commits into from

Commits on May 6, 2024

  1. Configuration menu
    Copy the full SHA
    7b11874 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    28590fc View commit details
    Browse the repository at this point in the history
  3. [ROCm][Hardware][AMD][Doc] Documentation update for ROCm (vllm-projec…

    …t#4376)
    
    Co-authored-by: WoosukKwon <woosuk.kwon@berkeley.edu>
    2 people authored and robertgshaw2-neuralmagic committed May 6, 2024
    Configuration menu
    Copy the full SHA
    7873343 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    5f32d89 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    c20ff92 View commit details
    Browse the repository at this point in the history
  6. [CI] Disable non-lazy string operation on logging (vllm-project#4326)

    Co-authored-by: Danny Guinther <dguinther@neuralmagic.com>
    2 people authored and robertgshaw2-neuralmagic committed May 6, 2024
    Configuration menu
    Copy the full SHA
    ec4050a View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    ee654c9 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    4f5d020 View commit details
    Browse the repository at this point in the history
  9. [Misc] add RFC issue template (vllm-project#4401)

    Co-authored-by: Simon Mo <simon.mo@hey.com>
    2 people authored and robertgshaw2-neuralmagic committed May 6, 2024
    Configuration menu
    Copy the full SHA
    dc47676 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    192c704 View commit details
    Browse the repository at this point in the history
  11. [Kernel] Optimize FP8 support for MoE kernel / Mixtral via static sca…

    …les (vllm-project#4343)
    
    Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
    2 people authored and robertgshaw2-neuralmagic committed May 6, 2024
    Configuration menu
    Copy the full SHA
    1e88172 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    b9e05fa View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    5395fa3 View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    cc7a791 View commit details
    Browse the repository at this point in the history
  15. [Kernel] Full Tensor Parallelism for LoRA Layers (vllm-project#3524)

    Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
    2 people authored and robertgshaw2-neuralmagic committed May 6, 2024
    Configuration menu
    Copy the full SHA
    77c1eb1 View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    287d987 View commit details
    Browse the repository at this point in the history
  17. Configuration menu
    Copy the full SHA
    b3759af View commit details
    Browse the repository at this point in the history
  18. Configuration menu
    Copy the full SHA
    6a44e8e View commit details
    Browse the repository at this point in the history
  19. [BugFix] Fix min_tokens when eos_token_id is None (vllm-project#4389

    )
    
    Co-authored-by: DefTruth <31974251+deftruth@users.noreply.github.com>
    2 people authored and robertgshaw2-neuralmagic committed May 6, 2024
    Configuration menu
    Copy the full SHA
    821a91a View commit details
    Browse the repository at this point in the history
  20. [Core] Support offline use of local cache for models (vllm-project#4374)

    Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
    Co-authored-by: Travis Johnson <tjohnson31415@gmail.com>
    2 people authored and robertgshaw2-neuralmagic committed May 6, 2024
    Configuration menu
    Copy the full SHA
    5a4c41b View commit details
    Browse the repository at this point in the history
  21. Configuration menu
    Copy the full SHA
    593db14 View commit details
    Browse the repository at this point in the history
  22. Configuration menu
    Copy the full SHA
    1f87fe1 View commit details
    Browse the repository at this point in the history
  23. Configuration menu
    Copy the full SHA
    b24aae6 View commit details
    Browse the repository at this point in the history
  24. Add more Prometheus metrics (vllm-project#2764)

    Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
    Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
    3 people committed May 6, 2024
    Configuration menu
    Copy the full SHA
    6a8a97b View commit details
    Browse the repository at this point in the history
  25. Configuration menu
    Copy the full SHA
    8ab0de8 View commit details
    Browse the repository at this point in the history
  26. Configuration menu
    Copy the full SHA
    7f5a450 View commit details
    Browse the repository at this point in the history
  27. [Kernel] Marlin Expansion: Support AutoGPTQ Models with Marlin (vllm-…

    …project#3922)
    
    Co-authored-by: alexm <alexm@neuralmagic.com>
    Co-authored-by: mgoin <michael@neuralmagic.com>
    3 people committed May 6, 2024
    Configuration menu
    Copy the full SHA
    1e75df8 View commit details
    Browse the repository at this point in the history
  28. Configuration menu
    Copy the full SHA
    19187df View commit details
    Browse the repository at this point in the history
  29. Configuration menu
    Copy the full SHA
    43add77 View commit details
    Browse the repository at this point in the history
  30. Configuration menu
    Copy the full SHA
    768facf View commit details
    Browse the repository at this point in the history
  31. Configuration menu
    Copy the full SHA
    10b984a View commit details
    Browse the repository at this point in the history
  32. Configuration menu
    Copy the full SHA
    42929fe View commit details
    Browse the repository at this point in the history
  33. [BugFix] fix num_lookahead_slots missing in async executor (vllm-proj…

    …ect#4165)
    
    Co-authored-by: Lei Wen <wenlei03@qiyi.com>
    2 people authored and robertgshaw2-neuralmagic committed May 6, 2024
    Configuration menu
    Copy the full SHA
    da4215e View commit details
    Browse the repository at this point in the history
  34. [Doc] add visualization for multi-stage dockerfile (vllm-project#4456)

    Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
    Co-authored-by: Roger Wang <ywang@roblox.com>
    2 people authored and robertgshaw2-neuralmagic committed May 6, 2024
    Configuration menu
    Copy the full SHA
    40b286f View commit details
    Browse the repository at this point in the history
  35. [Kernel] Support Fp8 Checkpoints (Dynamic + Static) (vllm-project#4332)

    Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
    Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
    Co-authored-by: mgoin <michael@neuralmagic.com>
    Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
    Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
    6 people committed May 6, 2024
    Configuration menu
    Copy the full SHA
    faed3eb View commit details
    Browse the repository at this point in the history
  36. [Frontend] Support complex message content for chat completions endpo…

    …int (vllm-project#3467)
    
    Co-authored-by: Lily Liu <lilyliupku@gmail.com>
    Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
    3 people authored and robertgshaw2-neuralmagic committed May 6, 2024
    Configuration menu
    Copy the full SHA
    8b9d685 View commit details
    Browse the repository at this point in the history
  37. Configuration menu
    Copy the full SHA
    9ad9b65 View commit details
    Browse the repository at this point in the history
  38. Configuration menu
    Copy the full SHA
    195439e View commit details
    Browse the repository at this point in the history
  39. Configuration menu
    Copy the full SHA
    7cff2a5 View commit details
    Browse the repository at this point in the history
  40. Unable to find Punica extension issue during source code installation (

    …vllm-project#4494)
    
    Co-authored-by: Simon Mo <simon.mo@hey.com>
    2 people authored and robertgshaw2-neuralmagic committed May 6, 2024
    Configuration menu
    Copy the full SHA
    666ccdb View commit details
    Browse the repository at this point in the history
  41. Configuration menu
    Copy the full SHA
    e1fc3da View commit details
    Browse the repository at this point in the history
  42. Configuration menu
    Copy the full SHA
    2ef0a89 View commit details
    Browse the repository at this point in the history
  43. Configuration menu
    Copy the full SHA
    bd7f454 View commit details
    Browse the repository at this point in the history
  44. Configuration menu
    Copy the full SHA
    66d2c00 View commit details
    Browse the repository at this point in the history
  45. Configuration menu
    Copy the full SHA
    dc2970e View commit details
    Browse the repository at this point in the history
  46. Configuration menu
    Copy the full SHA
    b496ac2 View commit details
    Browse the repository at this point in the history
  47. [Bugfix] Fix the fp8 kv_cache check error that occurs when failing to…

    … obtain the CUDA version. (vllm-project#4173)
    
    Signed-off-by: AnyISalIn <anyisalin@gmail.com>
    AnyISalIn authored and robertgshaw2-neuralmagic committed May 6, 2024
    Configuration menu
    Copy the full SHA
    c1e7a79 View commit details
    Browse the repository at this point in the history
  48. Configuration menu
    Copy the full SHA
    d05b702 View commit details
    Browse the repository at this point in the history
  49. Configuration menu
    Copy the full SHA
    75c6ebf View commit details
    Browse the repository at this point in the history
  50. Configuration menu
    Copy the full SHA
    21bc3bf View commit details
    Browse the repository at this point in the history
  51. [CI/Build][Bugfix] VLLM_USE_PRECOMPILED should skip compilation (vllm…

    …-project#4534)
    
    Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
    tjohnson31415 authored and robertgshaw2-neuralmagic committed May 6, 2024
    Configuration menu
    Copy the full SHA
    752043f View commit details
    Browse the repository at this point in the history
  52. [Speculative decoding] Add ngram prompt lookup decoding (vllm-project…

    …#4237)
    
    Co-authored-by: Lei Wen <wenlei03@qiyi.com>
    2 people authored and robertgshaw2-neuralmagic committed May 6, 2024
    Configuration menu
    Copy the full SHA
    862330a View commit details
    Browse the repository at this point in the history
  53. [Core] Enable prefix caching with block manager v2 enabled (vllm-proj…

    …ect#4142)
    
    Co-authored-by: Lei Wen <wenlei03@qiyi.com>
    Co-authored-by: Sage Moore <sagemoore@utexas.edu>
    3 people authored and robertgshaw2-neuralmagic committed May 6, 2024
    Configuration menu
    Copy the full SHA
    3d32972 View commit details
    Browse the repository at this point in the history
  54. Configuration menu
    Copy the full SHA
    56d2002 View commit details
    Browse the repository at this point in the history
  55. [Kernel] Update fused_moe tuning script for FP8 (vllm-project#4457)

    This PR updates the tuning script for the fused_moe kernel to support FP8 and also adds configurations for TP4. Note that for the configuration I removed num_warps and num_stages for small batch sizes since that improved performance and brought the benchmarks on par with the numbers before in that regime to make sure this is a strict improvement over the status quo.
    
    All the numbers below are for mistralai/Mixtral-8x7B-Instruct-v0.1, 1000 input and 50 output tokens.
    
    Before this PR (with static activation scaling):
    
    qps = 1: 9.8 ms ITL, 0.49s e2e latency
    qps = 2: 9.7 ms ITL, 0.49s e2e latency 
    qps = 4: 10.1 ms ITL, 0.52s e2e latency
    qps = 6: 11.9 ms ITL, 0.59s e2e latency
    qps = 8: 14.0 ms ITL, 0.70s e2e latency
    qps = 10: 15.7 ms ITL, 0.79s e2e latency
    
    After this PR (with static activation scaling):
    
    qps = 1: 9.8 ms ITL, 0.49s e2e latency
    qps = 2: 9.7 ms ITL, 0.49s e2e latency
    qps = 4: 10.2 ms ITL, 0.53s e2e latency
    qps = 6: 11.9 ms ITL, 0.59s e2e latency
    qps = 8: 11.9 ms ITL, 0.59s e2e latency
    qps = 10: 12.1 ms ITL, 0.61s e2e latency
    pcmoritz authored and robertgshaw2-neuralmagic committed May 6, 2024
    Configuration menu
    Copy the full SHA
    7c04a00 View commit details
    Browse the repository at this point in the history
  56. Configuration menu
    Copy the full SHA
    0533a6b View commit details
    Browse the repository at this point in the history
  57. Configuration menu
    Copy the full SHA
    224ecd7 View commit details
    Browse the repository at this point in the history
  58. Configuration menu
    Copy the full SHA
    5b174c4 View commit details
    Browse the repository at this point in the history
  59. [Misc] Remove Mixtral device="cuda" declarations (vllm-project#4543)

    Remove the device="cuda" declarations in mixtral as promised in vllm-project#4343
    pcmoritz authored and robertgshaw2-neuralmagic committed May 6, 2024
    Configuration menu
    Copy the full SHA
    4be23dd View commit details
    Browse the repository at this point in the history
  60. Configuration menu
    Copy the full SHA
    de3262f View commit details
    Browse the repository at this point in the history
  61. [MISC] Rework logger to enable pythonic custom logging configuration …

    …to be provided (vllm-project#4273)
    Danny Guinther authored and robertgshaw2-neuralmagic committed May 6, 2024
    Configuration menu
    Copy the full SHA
    b85188d View commit details
    Browse the repository at this point in the history
  62. [Bug fix][Core] assert num_new_tokens == 1 fails when SamplingParams.…

    …n is not 1 and max_tokens is large & Add tests for preemption (vllm-project#4451)
    rkooo567 authored and robertgshaw2-neuralmagic committed May 6, 2024
    Configuration menu
    Copy the full SHA
    b259286 View commit details
    Browse the repository at this point in the history
  63. Configuration menu
    Copy the full SHA
    91f8b48 View commit details
    Browse the repository at this point in the history
  64. [mypy][6/N] Fix all the core subdirectory typing (vllm-project#4450)

    Co-authored-by: Cade Daniel <edacih@gmail.com>
    2 people authored and robertgshaw2-neuralmagic committed May 6, 2024
    Configuration menu
    Copy the full SHA
    2017aaf View commit details
    Browse the repository at this point in the history
  65. [Core][Distributed] enable multiple tp group (vllm-project#4512)

    Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
    2 people authored and robertgshaw2-neuralmagic committed May 6, 2024
    Configuration menu
    Copy the full SHA
    27f0c2b View commit details
    Browse the repository at this point in the history
  66. Configuration menu
    Copy the full SHA
    2078207 View commit details
    Browse the repository at this point in the history
  67. Configuration menu
    Copy the full SHA
    ed6d376 View commit details
    Browse the repository at this point in the history
  68. Configuration menu
    Copy the full SHA
    87d793d View commit details
    Browse the repository at this point in the history
  69. Configuration menu
    Copy the full SHA
    4dc269d View commit details
    Browse the repository at this point in the history
  70. Configuration menu
    Copy the full SHA
    f7d8e46 View commit details
    Browse the repository at this point in the history
  71. [kernel] fix sliding window in prefix prefill Triton kernel (vllm-pro…

    …ject#4405)
    
    Co-authored-by: SangBin Cho <rkooo567@gmail.com>
    2 people authored and robertgshaw2-neuralmagic committed May 6, 2024
    Configuration menu
    Copy the full SHA
    673e4eb View commit details
    Browse the repository at this point in the history
  72. [CI/Build] AMD CI pipeline with extended set of tests. (vllm-project#…

    …4267)
    
    Co-authored-by: simon-mo <simon.mo@hey.com>
    2 people authored and robertgshaw2-neuralmagic committed May 6, 2024
    Configuration menu
    Copy the full SHA
    2ff2756 View commit details
    Browse the repository at this point in the history
  73. Configuration menu
    Copy the full SHA
    3d453d0 View commit details
    Browse the repository at this point in the history
  74. Configuration menu
    Copy the full SHA
    2a0fb55 View commit details
    Browse the repository at this point in the history
  75. Configuration menu
    Copy the full SHA
    82bbb3d View commit details
    Browse the repository at this point in the history
  76. Configuration menu
    Copy the full SHA
    44f6086 View commit details
    Browse the repository at this point in the history
  77. Configuration menu
    Copy the full SHA
    f62ba17 View commit details
    Browse the repository at this point in the history
  78. Configuration menu
    Copy the full SHA
    fc4f08f View commit details
    Browse the repository at this point in the history
  79. Configuration menu
    Copy the full SHA
    f10844f View commit details
    Browse the repository at this point in the history
  80. Configuration menu
    Copy the full SHA
    e132240 View commit details
    Browse the repository at this point in the history
  81. [Kernel] Use flashinfer for decoding (vllm-project#4353)

    Co-authored-by: LiuXiaoxuanPKU <llilyliupku@gmail.com>
    2 people authored and robertgshaw2-neuralmagic committed May 6, 2024
    Configuration menu
    Copy the full SHA
    4b0f703 View commit details
    Browse the repository at this point in the history
  82. Configuration menu
    Copy the full SHA
    6dd96ce View commit details
    Browse the repository at this point in the history
  83. Configuration menu
    Copy the full SHA
    19ae179 View commit details
    Browse the repository at this point in the history
  84. Configuration menu
    Copy the full SHA
    12c155b View commit details
    Browse the repository at this point in the history
  85. Configuration menu
    Copy the full SHA
    5d65e2f View commit details
    Browse the repository at this point in the history
  86. [Kernel] Support MoE Fp8 Checkpoints for Mixtral (Static Weights with…

    … Dynamic/Static Activations) (vllm-project#4527)
    
    Follow on to vllm-project#4332 to enable FP8 checkpoint loading for Mixtral and supersedes vllm-project#4436.
    
    This PR enables the following checkpoint loading features for Mixtral:
    
    Supports loading fp8 checkpoints for Mixtral, such as this "nm-testing/Mixtral-8x7B-Instruct-v0.1-FP8" test model
    Supports static or dynamic activation quantization with static weight quantization (all per tensor)
    Supports different scales for each expert weight
    Supports Fp8 in QKV layer
    Notes:
    
    The Expert Gate/Router always runs at half / full precision for now.
    If there are different weight scales between QKV layer (for separate QKV weights), they are re-quantized using layer.weight_scale.max() so we can have a single gemm for performance.
    mgoin authored and robertgshaw2-neuralmagic committed May 6, 2024
    Configuration menu
    Copy the full SHA
    55dd119 View commit details
    Browse the repository at this point in the history
  87. Configuration menu
    Copy the full SHA
    c152bd7 View commit details
    Browse the repository at this point in the history
  88. Configuration menu
    Copy the full SHA
    f8fb8c1 View commit details
    Browse the repository at this point in the history
  89. Configuration menu
    Copy the full SHA
    2d96b61 View commit details
    Browse the repository at this point in the history
  90. 1 Configuration menu
    Copy the full SHA
    9f817f0 View commit details
    Browse the repository at this point in the history
  91. 1 Configuration menu
    Copy the full SHA
    f57a219 View commit details
    Browse the repository at this point in the history
  92. Configuration menu
    Copy the full SHA
    6b2c4c1 View commit details
    Browse the repository at this point in the history
  93. Configuration menu
    Copy the full SHA
    18a6e93 View commit details
    Browse the repository at this point in the history
  94. lint

    robertgshaw2-neuralmagic committed May 6, 2024
    1 Configuration menu
    Copy the full SHA
    bcf686d View commit details
    Browse the repository at this point in the history
  95. 1 Configuration menu
    Copy the full SHA
    8423620 View commit details
    Browse the repository at this point in the history

Commits on May 7, 2024

  1. 1 Configuration menu
    Copy the full SHA
    50c1029 View commit details
    Browse the repository at this point in the history

Commits on May 8, 2024

  1. updated test

    Robert Shaw committed May 8, 2024
    1 Configuration menu
    Copy the full SHA
    a55fb2b View commit details
    Browse the repository at this point in the history
  2. 1 Configuration menu
    Copy the full SHA
    b091999 View commit details
    Browse the repository at this point in the history
  3. format

    robertgshaw2-neuralmagic committed May 8, 2024
    1 Configuration menu
    Copy the full SHA
    4c04122 View commit details
    Browse the repository at this point in the history

Commits on May 9, 2024

  1. fixed torch reinit

    Robert Shaw committed May 9, 2024
    1 Configuration menu
    Copy the full SHA
    0300194 View commit details
    Browse the repository at this point in the history

Commits on May 11, 2024

  1. Configuration menu
    Copy the full SHA
    5dc0afe View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    94878e5 View commit details
    Browse the repository at this point in the history
  3. format

    robertgshaw2-neuralmagic committed May 11, 2024
    Configuration menu
    Copy the full SHA
    81f5e29 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    e04b743 View commit details
    Browse the repository at this point in the history
  5. format

    robertgshaw2-neuralmagic committed May 11, 2024
    Configuration menu
    Copy the full SHA
    774df9d View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    7888aa7 View commit details
    Browse the repository at this point in the history

Commits on May 12, 2024

  1. Configuration menu
    Copy the full SHA
    450f145 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    017f02f View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    e4bbefc View commit details
    Browse the repository at this point in the history
  4. ./format.sh

    robertgshaw2-neuralmagic committed May 12, 2024
    Configuration menu
    Copy the full SHA
    9d18c67 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    7c0da92 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    c380918 View commit details
    Browse the repository at this point in the history

Commits on May 13, 2024

  1. Configuration menu
    Copy the full SHA
    4f1fb80 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    086eda5 View commit details
    Browse the repository at this point in the history
  3. format

    robertgshaw2-neuralmagic committed May 13, 2024
    Configuration menu
    Copy the full SHA
    e68144d View commit details
    Browse the repository at this point in the history
  4. format

    robertgshaw2-neuralmagic committed May 13, 2024
    Configuration menu
    Copy the full SHA
    77b3c4e View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    845f224 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    6870ea2 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    9eda994 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    181b0b8 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    5f4e8bc View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    642e7f4 View commit details
    Browse the repository at this point in the history
  11. 3 Configuration menu
    Copy the full SHA
    5bcf1af View commit details
    Browse the repository at this point in the history