Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Inference] Clean duplicated vector utils #5715

Open
wants to merge 182 commits into
base: main
Choose a base branch
from

Commits on Jan 11, 2024

  1. [Inference] First PR for rebuild colossal-infer (hpcaitech#5143)

    * add engine and scheduler
    
    * add dirs
    
    ---------
    
    Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
    CjhHa1 authored and FrankLeeeee committed Jan 11, 2024
    Configuration menu
    Copy the full SHA
    4cf4682 View commit details
    Browse the repository at this point in the history
  2. [Inference] Add readme (roadmap) and fulfill request handler (hpcaite…

    …ch#5147)
    
    * request handler
    
    * add readme
    
    ---------
    
    Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
    CjhHa1 authored and FrankLeeeee committed Jan 11, 2024
    Configuration menu
    Copy the full SHA
    56e75ee View commit details
    Browse the repository at this point in the history
  3. [Inference/NFC] Clean outdated inference tests and deprecated kernels (

    …hpcaitech#5159)
    
    * [inference/nfc] remove outdated inference tests
    
    * remove outdated kernel tests
    
    * remove deprecated triton kernels
    
    * remove imports from deprecated kernels
    yuanheng-zhao authored and FrankLeeeee committed Jan 11, 2024
    Configuration menu
    Copy the full SHA
    2bb9224 View commit details
    Browse the repository at this point in the history
  4. [Inference]Add BatchInferState, Sequence and InferConfig (hpcaitech#5149

    )
    
    * add infer_struct and infer_config
    
    * update codes
    
    * change InferConfig
    
    * Add hf_model_config to the engine
    
    * rm _get_hf_model_config
    
    * update codes
    
    * made adjustments according to the feedback from the reviewer.
    
    * update codes
    
    * add ci test for config and struct
    yuehuayingxueluo authored and FrankLeeeee committed Jan 11, 2024
    Configuration menu
    Copy the full SHA
    fab9b93 View commit details
    Browse the repository at this point in the history
  5. [Inference] Add CacheBlock and KV-Cache Manager (hpcaitech#5156)

    * [Inference] Add KVCache Manager
    
    * function refactored
    
    * add test for KVCache Manager
    
    * add attr beam width
    
    * Revise alloc func in CacheManager
    
    * Fix docs and pytests
    
    * add tp slicing for head number
    
    * optimize shapes of tensors used as physical cache
    
    * Apply using InferenceConfig on KVCacheManager
    
    * rm duplicate config file
    
    * Optimize cache allocation: use contiguous cache
    
    * Fix config in pytest (and config)
    yuanheng-zhao authored and FrankLeeeee committed Jan 11, 2024
    Configuration menu
    Copy the full SHA
    3de2e62 View commit details
    Browse the repository at this point in the history
  6. [Inference]Update inference config and fix test (hpcaitech#5178)

    * unify the config setting
    
    * fix test
    
    * fix import
    
    * fix test
    
    * fix
    
    * fix
    
    * add logger
    
    * revise log info
    
    ---------
    
    Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
    CjhHa1 authored and FrankLeeeee committed Jan 11, 2024
    Configuration menu
    Copy the full SHA
    93aeacc View commit details
    Browse the repository at this point in the history
  7. [Inference] Add the logic of the inference engine (hpcaitech#5173)

    * add infer_struct and infer_config
    
    * update codes
    
    * change InferConfig
    
    * Add hf_model_config to the engine
    
    * rm _get_hf_model_config
    
    * update codes
    
    * made adjustments according to the feedback from the reviewer.
    
    * update codes
    
    * add ci test for config and struct
    
    * Add the logic of the inference engine
    
    * update engine and test
    
    * Recover cache_manager.py
    
    * add logger
    
    * fix conflict
    
    * update codes
    
    * update codes
    
    * update model and tokenizer
    
    * fix add the logic about shardformer
    
    * change kvcache_manager docstring
    
    * add policy
    
    * fix ci bug in test_kvcache_manager.py
    
    * remove codes related o tokenizer and move model_policy
    
    * fix  code style
    
    * add ordered_set to requirements-infer.txt
    
    * Delete extra empty lines
    
    * add ordered_set to requirements-test.txt
    yuehuayingxueluo authored and FrankLeeeee committed Jan 11, 2024
    Configuration menu
    Copy the full SHA
    8daee26 View commit details
    Browse the repository at this point in the history
  8. [Inference] add logit processor and request handler (hpcaitech#5166)

    * add logit processor and request handler
    
    * add
    
    * add
    
    * add
    
    * fix
    
    * add search tokens and update func
    
    * finish request handler
    
    * add running list test
    
    * fix test
    
    * fix some bug
    
    * add
    
    * add
    
    * fix bugs
    
    * fix some bugs
    
    * fix bug
    
    * fix
    
    * fix
    
    * add copy fun
    
    * del useless attn
    
    * fix request status
    
    ---------
    
    Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
    CjhHa1 authored and FrankLeeeee committed Jan 11, 2024
    Configuration menu
    Copy the full SHA
    0e61646 View commit details
    Browse the repository at this point in the history
  9. Add padding llama model

    yuehuayingxueluo authored and FrankLeeeee committed Jan 11, 2024
    Configuration menu
    Copy the full SHA
    86853a3 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    62fd08e View commit details
    Browse the repository at this point in the history
  11. fix bugs in request_handler

    yuehuayingxueluo authored and FrankLeeeee committed Jan 11, 2024
    Configuration menu
    Copy the full SHA
    6296858 View commit details
    Browse the repository at this point in the history
  12. precision alignment

    yuehuayingxueluo authored and FrankLeeeee committed Jan 11, 2024
    Configuration menu
    Copy the full SHA
    9489dc6 View commit details
    Browse the repository at this point in the history
  13. Fixed a writing error

    yuehuayingxueluo authored and FrankLeeeee committed Jan 11, 2024
    Configuration menu
    Copy the full SHA
    4df8876 View commit details
    Browse the repository at this point in the history
  14. [kernel] Add triton kernel for context attention (FAv2) without paddi…

    …ng (hpcaitech#5192)
    
    * add context attn unpadded triton kernel
    
    * test compatibility
    
    * kv cache copy (testing)
    
    * fix k/v cache copy
    
    * fix kv cache copy and test
    
    * fix boundary of block ptrs
    
    * add support for GQA/MQA and testing
    
    * fix import statement
    
    ---------
    
    Co-authored-by: Round Heng <yuanhengzhao@Rounds-MacBook-Pro.local>
    2 people authored and FrankLeeeee committed Jan 11, 2024
    Configuration menu
    Copy the full SHA
    07b5283 View commit details
    Browse the repository at this point in the history
  15. Configuration menu
    Copy the full SHA
    02c1bf8 View commit details
    Browse the repository at this point in the history
  16. fix bugs in sampler

    yuehuayingxueluo authored and FrankLeeeee committed Jan 11, 2024
    Configuration menu
    Copy the full SHA
    bbfebfb View commit details
    Browse the repository at this point in the history
  17. Fixed a typo

    yuehuayingxueluo authored and FrankLeeeee committed Jan 11, 2024
    Configuration menu
    Copy the full SHA
    b2eb9cd View commit details
    Browse the repository at this point in the history
  18. fix beam_width

    yuehuayingxueluo authored and FrankLeeeee committed Jan 11, 2024
    Configuration menu
    Copy the full SHA
    3ad1f3b View commit details
    Browse the repository at this point in the history
  19. [Inference] Pytorch Attention func, pad&nopad input support (hpcaitec…

    …h#5219)
    
    * add attn
    
    * add attention test
    
    * fix attn forward
    
    * fix decoding
    CjhHa1 authored and FrankLeeeee committed Jan 11, 2024
    Configuration menu
    Copy the full SHA
    bfd9b1b View commit details
    Browse the repository at this point in the history
  20. Configuration menu
    Copy the full SHA
    47e53ea View commit details
    Browse the repository at this point in the history
  21. Configuration menu
    Copy the full SHA
    fa4fbdb View commit details
    Browse the repository at this point in the history
  22. [Hotfix] Fix accuracy and align attention method api with Triton kern…

    …el (hpcaitech#5229)
    
    * fix accuracy
    
    * alignment in attention
    
    * fix attention
    
    * fix
    
    * fix bugs
    
    * fix bugs
    
    * fix bugs
    CjhHa1 authored and FrankLeeeee committed Jan 11, 2024
    Configuration menu
    Copy the full SHA
    e545a87 View commit details
    Browse the repository at this point in the history
  23. Configuration menu
    Copy the full SHA
    2a73e82 View commit details
    Browse the repository at this point in the history
  24. fix CI bugs

    yuehuayingxueluo authored and FrankLeeeee committed Jan 11, 2024
    Configuration menu
    Copy the full SHA
    fab294c View commit details
    Browse the repository at this point in the history
  25. rm torch.cuda.synchronize

    yuehuayingxueluo authored and FrankLeeeee committed Jan 11, 2024
    Configuration menu
    Copy the full SHA
    10e3c9f View commit details
    Browse the repository at this point in the history
  26. Configuration menu
    Copy the full SHA
    d40eb26 View commit details
    Browse the repository at this point in the history
  27. [Inference] Kernel: no pad rotary embedding (hpcaitech#5252)

    * fix bugs
    
    * comment
    
    * use more accurate atol
    
    * fix
    CjhHa1 authored and FrankLeeeee committed Jan 11, 2024
    Configuration menu
    Copy the full SHA
    fded91d View commit details
    Browse the repository at this point in the history
  28. [kernel] Add flash decoding triton kernel for blocked kv cache (hpcai…

    …tech#5249)
    
    * add flash decoding unpad triton kernel
    
    * rename flash decoding kernel
    
    * add kernel testing (draft)
    
    * revise pytest
    
    * support kv group (GQA)
    
    * (trivial) fix api and pytest
    
    * (trivial) func renaming
    
    * (trivial) func/file renaming
    
    * refactor pytest for attention
    
    * (trivial) format and consistent vars of context/decode attn
    
    * (trivial) remove test redundancy
    yuanheng-zhao authored and FrankLeeeee committed Jan 11, 2024
    Configuration menu
    Copy the full SHA
    1513f20 View commit details
    Browse the repository at this point in the history
  29. [git] fixed rebased files

    FrankLeeeee committed Jan 11, 2024
    Configuration menu
    Copy the full SHA
    1ded7e8 View commit details
    Browse the repository at this point in the history

Commits on Jan 15, 2024

  1. [kernel] Add KV cache copy kernel during decoding (hpcaitech#5261)

    * add kv copy triton kernel during decoding stage
    
    * add pytest and fix kernel
    
    * fix test utilities
    
    * revise kernel config
    
    * add benchmark for kvcache copy
    yuanheng-zhao committed Jan 15, 2024
    Configuration menu
    Copy the full SHA
    fa85e02 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    c597678 View commit details
    Browse the repository at this point in the history
  3. [Inference] Fix request handler and add recycle logic (hpcaitech#5260)

    * fix request handler
    
    * fix comment
    CjhHa1 committed Jan 15, 2024
    Configuration menu
    Copy the full SHA
    d8db500 View commit details
    Browse the repository at this point in the history

Commits on Jan 16, 2024

  1. [kernel] Revise KVCache copy triton kernel API (hpcaitech#5273)

    * [kernel/fix] revise kvcache copy kernel api
    
    * fix benchmark
    yuanheng-zhao committed Jan 16, 2024
    Configuration menu
    Copy the full SHA
    0f2b46a View commit details
    Browse the repository at this point in the history

Commits on Jan 17, 2024

  1. [Inference]Adapted to the triton attn kernels (hpcaitech#5264)

    * adapted to the triton attn kernels
    
    * fix pad input
    
    * adapted to copy_kv_to_blocked_cache
    
    * fix ci test
    
    * update kv memcpy
    
    * remove print
    yuehuayingxueluo committed Jan 17, 2024
    Configuration menu
    Copy the full SHA
    86b63f7 View commit details
    Browse the repository at this point in the history

Commits on Jan 18, 2024

  1. [kernel] Add RMSLayerNorm triton kernel (hpcaitech#5262)

    * add layerrmsnorm triton kernel
    
    * add layerrmsnorm kernel
    
    * modify the atol and rtol in test file
    
    * Remove the logics of mean computations, and update the name of ther kernel functions and files
    
    * add benchmark of rms norm
    nkfyz committed Jan 18, 2024
    Configuration menu
    Copy the full SHA
    5ae9099 View commit details
    Browse the repository at this point in the history
  2. [Hotfix] Fix bugs in testing continuous batching (hpcaitech#5270)

    * fix bug
    
    * fix bugs
    
    * fix bugs
    
    * fix bugs and add padding
    
    * add funcs and fix bugs
    
    * fix typos
    
    * fix bugs
    
    * add func
    CjhHa1 committed Jan 18, 2024
    Configuration menu
    Copy the full SHA
    9e2342b View commit details
    Browse the repository at this point in the history

Commits on Jan 19, 2024

  1. [kernel/fix] Performance Optimization for Decoding Kernel and Benchma…

    …rking (hpcaitech#5274)
    
    * prevent re-creating intermediate tensors
    
    * add singleton class holding intermediate values
    
    * fix triton kernel api
    
    * add benchmark in pytest
    
    * fix kernel api and add benchmark
    
    * revise flash decoding triton kernel in/out shapes
    
    * fix calling of triton kernel in modeling
    
    * fix pytest: extract to util functions
    yuanheng-zhao committed Jan 19, 2024
    Configuration menu
    Copy the full SHA
    6e487e7 View commit details
    Browse the repository at this point in the history

Commits on Jan 22, 2024

  1. [inference] Adapted to Rotary Embedding and RMS Norm (hpcaitech#5283)

    * adapted to rotary_embedding
    
    * adapted to nopad rms norm
    
    * fix bugs in benchmark
    
    * fix flash_decoding.py
    yuehuayingxueluo committed Jan 22, 2024
    Configuration menu
    Copy the full SHA
    bfff925 View commit details
    Browse the repository at this point in the history
  2. add utils.py

    yuehuayingxueluo committed Jan 22, 2024
    Configuration menu
    Copy the full SHA
    cea9c86 View commit details
    Browse the repository at this point in the history
  3. Merge pull request hpcaitech#5297 from yuehuayingxueluo/fix_rotary_em…

    …bedding
    
    [Inference/fix]Add utils.py for Rotary Embedding
    yuehuayingxueluo committed Jan 22, 2024
    Configuration menu
    Copy the full SHA
    b785319 View commit details
    Browse the repository at this point in the history

Commits on Jan 23, 2024

  1. [Inference] Benchmarking rotary embedding and add a fetch function (h…

    …pcaitech#5277)
    
    * fix bugs and add a cos/sin cache fetch func
    
    * add docstring
    
    * fix bug
    
    * fix
    CjhHa1 committed Jan 23, 2024
    Configuration menu
    Copy the full SHA
    8e606ec View commit details
    Browse the repository at this point in the history
  2. [Kernel/Fix] Revise flash attention triton kernel API and add benchma…

    …rk (hpcaitech#5301)
    
    * fix decoding kernel pytest
    
    * revise and add triton context attn benchmark
    yuanheng-zhao committed Jan 23, 2024
    Configuration menu
    Copy the full SHA
    3da9993 View commit details
    Browse the repository at this point in the history

Commits on Jan 24, 2024

  1. [Inference]Add fused rotary kernel and get cos cache kernel (hpcaitec…

    …h#5302)
    
    * add fused rotary and get cos cache func
    
    * staged
    
    * fix bugs
    
    * fix bugs
    CjhHa1 committed Jan 24, 2024
    Configuration menu
    Copy the full SHA
    c647e00 View commit details
    Browse the repository at this point in the history

Commits on Jan 25, 2024

  1. Configuration menu
    Copy the full SHA
    af8359c View commit details
    Browse the repository at this point in the history

Commits on Jan 26, 2024

  1. [inference]Optimize the usage of the mid tensors space in flash attn (h…

    …pcaitech#5304)
    
    * opt flash attn
    
    * opt tmp tensor
    
    * fix benchmark_llama
    
    * fix code style
    
    * fix None logic for output tensor
    
    * fix adapted to get_xine_cache
    
    * add comment
    
    * fix ci bugs
    
    * fix some codes
    
    * rm duplicated codes
    
    * rm duplicated codes
    
    * fix code style
    
    * add _get_dtype in config.py
    yuehuayingxueluo committed Jan 26, 2024
    Configuration menu
    Copy the full SHA
    4f28cb4 View commit details
    Browse the repository at this point in the history
  2. fix (hpcaitech#5311)

    CjhHa1 committed Jan 26, 2024
    Configuration menu
    Copy the full SHA
    7ddd8b3 View commit details
    Browse the repository at this point in the history

Commits on Jan 29, 2024

  1. [Inference] Update rms norm kernel, benchmark with vLLM (hpcaitech#5315)

    * add
    
    * xi
    
    * del
    
    * del
    
    * fix
    CjhHa1 committed Jan 29, 2024
    Configuration menu
    Copy the full SHA
    1f8a75d View commit details
    Browse the repository at this point in the history
  2. [DOC] Update inference readme (hpcaitech#5280)

    * add readme
    
    * add readme
    
    * 1
    
    * update engine
    
    * finish readme
    
    * add readme
    CjhHa1 committed Jan 29, 2024
    Configuration menu
    Copy the full SHA
    c7c104c View commit details
    Browse the repository at this point in the history

Commits on Jan 30, 2024

  1. [Inference]Add Nopadding Llama Modeling (hpcaitech#5327)

    * add nopadding llama modeling
    
    * add nopadding_llama.py
    
    * rm unused codes
    
    * fix bugs in test_xine_copy.py
    
    * fix code style
    yuehuayingxueluo committed Jan 30, 2024
    Configuration menu
    Copy the full SHA
    e8f0642 View commit details
    Browse the repository at this point in the history
  2. [Infer] Optimize Blocked KVCache And Kernels Using It (hpcaitech#5325)

    * revise shape of kvcache (context attn kernel)
    
    * revise shape of kvcache (flash decoding kernel)
    
    * revise shape of kvcache (kvcache copy) and attn func
    
    * init of kvcache in kvcache manager
    
    * revise llama modeling
    
    * revise block size retrieval
    
    * use torch for rms_norm benchmarking
    
    * revise block size retrieval
    yuanheng-zhao committed Jan 30, 2024
    Configuration menu
    Copy the full SHA
    5f98a9d View commit details
    Browse the repository at this point in the history

Commits on Jan 31, 2024

  1. merge commit

    FrankLeeeee committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    c565519 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    1336838 View commit details
    Browse the repository at this point in the history
  3. [Inference] Kernel Fusion, fused copy kv cache into rotary embedding (h…

    …pcaitech#5336)
    
    * revise rotary embedding
    
    * remove useless print
    
    * adapt
    CjhHa1 committed Jan 31, 2024
    Configuration menu
    Copy the full SHA
    df0aa49 View commit details
    Browse the repository at this point in the history

Commits on Feb 1, 2024

  1. [inference] simplified config verification (hpcaitech#5346)

    * [inference] simplified config verification
    
    * polish
    
    * polish
    FrankLeeeee committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    f8e456d View commit details
    Browse the repository at this point in the history
  2. [Inference]Repalce Attention layer and MLP layer by shardformer to op…

    …timize the weight transpose operation,add fused_qkv and fused linear_add (hpcaitech#5340)
    
    * add fused qkv
    
    * replace attn and mlp by shardformer
    
    * fix bugs in mlp
    
    * add docstrings
    
    * fix test_inference_engine.py
    
    * add optimize unbind
    
    * add fused_addmm
    
    * rm squeeze(1)
    
    * refactor codes
    
    * fix ci bugs
    
    * rename ShardFormerLlamaMLP and ShardFormerLlamaAttention
    
    * Removed the dependency on LlamaFlashAttention2
    
    * rollback test_inference_engine.py
    yuehuayingxueluo committed Feb 1, 2024
    Configuration menu
    Copy the full SHA
    249644c View commit details
    Browse the repository at this point in the history

Commits on Feb 2, 2024

  1. Configuration menu
    Copy the full SHA
    db1a763 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    e76acbb View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    027aa10 View commit details
    Browse the repository at this point in the history
  4. [Inference/opt]Optimize the mid tensor of RMS Norm (hpcaitech#5350)

    * opt rms_norm
    
    * fix bugs in rms_layernorm
    yuehuayingxueluo committed Feb 2, 2024
    Configuration menu
    Copy the full SHA
    21ad4a2 View commit details
    Browse the repository at this point in the history
  5. [Inference]Optimize generation process of inference engine (hpcaitech…

    …#5356)
    
    * opt inference engine
    
    * fix run_benchmark.sh
    
    * fix generate in engine.py
    
    * rollback tesh_inference_engine.py
    yuehuayingxueluo committed Feb 2, 2024
    Configuration menu
    Copy the full SHA
    631862f View commit details
    Browse the repository at this point in the history

Commits on Feb 6, 2024

  1. [Fix/Infer] Remove unused deps and revise requirements (hpcaitech#5341)

    * remove flash-attn dep
    
    * rm padding llama
    
    * revise infer requirements
    
    * move requirements out of module
    yuanheng-zhao committed Feb 6, 2024
    Configuration menu
    Copy the full SHA
    1dedb57 View commit details
    Browse the repository at this point in the history
  2. [Inference]Fused the gate and up proj in mlp,and optimized the autogr…

    …ad process. (hpcaitech#5365)
    
    * fused the gate and up proj in mlp
    
    * fix code styles
    
    * opt auto_grad
    
    * rollback test_inference_engine.py
    
    * modifications based on the review feedback.
    
    * fix bugs in flash attn
    
    * Change reshape to view
    
    * fix test_rmsnorm_triton.py
    yuehuayingxueluo committed Feb 6, 2024
    Configuration menu
    Copy the full SHA
    35382a7 View commit details
    Browse the repository at this point in the history

Commits on Feb 7, 2024

  1. [Inference] Adapt to Fused rotary (hpcaitech#5348)

    * revise rotary embedding
    
    * remove useless print
    
    * adapt
    
    * fix
    
    * add
    
    * fix
    
    * modeling
    
    * fix
    
    * fix
    
    * fix
    CjhHa1 committed Feb 7, 2024
    Configuration menu
    Copy the full SHA
    9f4ab2e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    8106ede View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    58740b5 View commit details
    Browse the repository at this point in the history
  4. [Inference/opt] Fused KVCahce Memcopy (hpcaitech#5374)

    * fused kv memcopy
    
    * add TODO in test_kvcache_copy.py
    yuehuayingxueluo committed Feb 7, 2024
    Configuration menu
    Copy the full SHA
    6fb4bcb View commit details
    Browse the repository at this point in the history
  5. [Inference] User Experience: update the logic of default tokenizer an…

    …d generation config. (hpcaitech#5337)
    
    * add
    
    * fix
    
    * fix
    
    * pause
    
    * fix
    
    * fix pytest
    
    * align
    
    * fix
    
    * license
    
    * fix
    
    * fix
    
    * fix readme
    
    * fix some bugs
    
    * remove tokenizer config
    CjhHa1 committed Feb 7, 2024
    Configuration menu
    Copy the full SHA
    1f8c7e7 View commit details
    Browse the repository at this point in the history

Commits on Feb 8, 2024

  1. Configuration menu
    Copy the full SHA
    9afa520 View commit details
    Browse the repository at this point in the history
  2. [Inference]Support vllm testing in benchmark scripts (hpcaitech#5379)

    * add vllm benchmark scripts
    
    * fix code style
    
    * update run_benchmark.sh
    
    * fix code style
    yuehuayingxueluo committed Feb 8, 2024
    Configuration menu
    Copy the full SHA
    8c69deb View commit details
    Browse the repository at this point in the history

Commits on Feb 19, 2024

  1. [Inference] Optimize and Refactor Inference Batching/Scheduling (hpca…

    …itech#5367)
    
    * add kvcache manager funcs for batching
    
    * add batch bucket for batching
    
    * revise RunningList struct in handler
    
    * add kvcache/batch funcs for compatibility
    
    * use new batching methods
    
    * fix indexing bugs
    
    * revise abort logic
    
    * use cpu seq lengths/block tables
    
    * rm unused attr in Sequence
    
    * fix type conversion/default arg
    
    * add and revise pytests
    
    * revise pytests, rm unused tests
    
    * rm unused statements
    
    * fix pop finished indexing issue
    
    * fix: use index in batch when retrieving inputs/update seqs
    
    * use dict instead of odict in batch struct
    
    * arg type hinting
    
    * fix make compress
    
    * refine comments
    
    * fix: pop_n_seqs to pop the first n seqs
    
    * add check in request handler
    
    * remove redundant conversion
    
    * fix test for request handler
    
    * fix pop method in batch bucket
    
    * fix prefill adding
    yuanheng-zhao committed Feb 19, 2024
    Configuration menu
    Copy the full SHA
    b21aac5 View commit details
    Browse the repository at this point in the history

Commits on Feb 21, 2024

  1. [Inference]Fused kv copy into rotary calculation (hpcaitech#5383)

    * revise rotary embedding
    
    * remove useless print
    
    * adapt
    
    * fix
    
    * add
    
    * fix
    
    * modeling
    
    * fix
    
    * fix
    
    * fix
    
    * fused kv copy
    
    * fused copy
    
    * colossalai/kernel/triton/no_pad_rotary_embedding.py
    
    * del padding llama
    
    * del
    CjhHa1 committed Feb 21, 2024
    Configuration menu
    Copy the full SHA
    7301038 View commit details
    Browse the repository at this point in the history
  2. Optimized the execution interval time between cuda kernels caused by …

    …view and memcopy (hpcaitech#5390)
    
    * opt_view_and_memcopy
    
    * fix bugs in ci
    
    * fix ci bugs
    
    * update benchmark scripts
    
    * fix ci bugs
    yuehuayingxueluo committed Feb 21, 2024
    Configuration menu
    Copy the full SHA
    2a718c8 View commit details
    Browse the repository at this point in the history

Commits on Feb 23, 2024

  1. [Fix/Inference] Fix format of input prompts and input model in infere…

    …nce engine (hpcaitech#5395)
    
    * Fix bugs in inference_engine
    
    * fix bugs in engine.py
    
    * rm  CUDA_VISIBLE_DEVICES
    
    * add request_ids in generate
    
    * fix bug in engine.py
    
    * add logger.debug for BatchBucket
    yuehuayingxueluo committed Feb 23, 2024
    Configuration menu
    Copy the full SHA
    bc1da87 View commit details
    Browse the repository at this point in the history

Commits on Feb 26, 2024

  1. Configuration menu
    Copy the full SHA
    1906118 View commit details
    Browse the repository at this point in the history

Commits on Feb 28, 2024

  1. [Inference]Add CUDA KVCache Kernel (hpcaitech#5406)

    * add cuda KVCache kernel
    
    * annotation benchmark_kvcache_copy
    
    * add use cuda
    
    * fix import path
    
    * move benchmark scripts to example/
    
    * rm benchmark codes in test_kv_cache_memcpy.py
    
    * rm redundancy codes
    
    * rm redundancy codes
    
    * pr was modified according to the review
    yuehuayingxueluo committed Feb 28, 2024
    Configuration menu
    Copy the full SHA
    600881a View commit details
    Browse the repository at this point in the history
  2. [Inference]Move benchmark-related code to the example directory. (hpc…

    …aitech#5408)
    
    * move benchmark-related code to the example directory.
    
    * fix bugs in test_fused_rotary_embedding.py
    yuehuayingxueluo committed Feb 28, 2024
    Configuration menu
    Copy the full SHA
    0aa27f1 View commit details
    Browse the repository at this point in the history

Commits on Mar 4, 2024

  1. Configuration menu
    Copy the full SHA
    0310b76 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    593a72e View commit details
    Browse the repository at this point in the history

Commits on Mar 7, 2024

  1. Configuration menu
    Copy the full SHA
    95c2149 View commit details
    Browse the repository at this point in the history

Commits on Mar 8, 2024

  1. Configuration menu
    Copy the full SHA
    cefaeb5 View commit details
    Browse the repository at this point in the history
  2. Merge pull request hpcaitech#5433 from Courtesy-Xs/add_silu_and_mul

    【Inference】Add silu_and_mul for infer
    Courtesy-Xs committed Mar 8, 2024
    Configuration menu
    Copy the full SHA
    2b28b54 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    a46598a View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    01d289d View commit details
    Browse the repository at this point in the history
  5. refactor code

    Courtesy-Xs committed Mar 8, 2024
    Configuration menu
    Copy the full SHA
    5eb5ff1 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    f7aecc0 View commit details
    Browse the repository at this point in the history

Commits on Mar 11, 2024

  1. Configuration menu
    Copy the full SHA
    b2c0d9f View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    9dec66f View commit details
    Browse the repository at this point in the history
  3. [doc] add doc

    LRY89757 committed Mar 11, 2024
    Configuration menu
    Copy the full SHA
    633e95b View commit details
    Browse the repository at this point in the history
  4. Merge pull request hpcaitech#5435 from Courtesy-Xs/add_gpu_launch_config

    Add query and other components
    Courtesy-Xs committed Mar 11, 2024
    Configuration menu
    Copy the full SHA
    21e1e36 View commit details
    Browse the repository at this point in the history
  5. refactor code

    Courtesy-Xs committed Mar 11, 2024
    Configuration menu
    Copy the full SHA
    095c070 View commit details
    Browse the repository at this point in the history

Commits on Mar 12, 2024

  1. Merge pull request hpcaitech#5445 from Courtesy-Xs/refactor_infer_com…

    …pilation
    
    Refactor colossal-infer code arch
    Courtesy-Xs committed Mar 12, 2024
    Configuration menu
    Copy the full SHA
    368a2aa View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    b699f54 View commit details
    Browse the repository at this point in the history

Commits on Mar 13, 2024

  1. fix include path

    Courtesy-Xs committed Mar 13, 2024
    Configuration menu
    Copy the full SHA
    c1c45e9 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    6fd355a View commit details
    Browse the repository at this point in the history
  3. fix rmsnorm template function invocation problem(template function pa…

    …rtial specialization is not allowed in Cpp) and luckily pass e2e precision test (hpcaitech#5454)
    SunflowerAries committed Mar 13, 2024
    Configuration menu
    Copy the full SHA
    ed431de View commit details
    Browse the repository at this point in the history
  4. [Inference/kernel]Add Fused Rotary Embedding and KVCache Memcopy CUDA…

    … Kernel (hpcaitech#5418)
    
    * add rotary embedding kernel
    
    * add rotary_embedding_kernel
    
    * add fused rotary_emb and kvcache memcopy
    
    * add fused_rotary_emb_and_cache_kernel.cu
    
    * add fused_rotary_emb_and_memcopy
    
    * fix bugs in fused_rotary_emb_and_cache_kernel.cu
    
    * fix ci bugs
    
    * use vec memcopy and opt the  gloabl memory access
    
    * fix code style
    
    * fix test_rotary_embdding_unpad.py
    
    * codes revised based on the review comments
    
    * fix bugs about include path
    
    * rm inline
    yuehuayingxueluo committed Mar 13, 2024
    Configuration menu
    Copy the full SHA
    f366a5e View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    1821a6d View commit details
    Browse the repository at this point in the history

Commits on Mar 14, 2024

  1. diverse tests

    LRY89757 committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    ae24b4f View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    d02e257 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    388e043 View commit details
    Browse the repository at this point in the history
  4. [fix] tmp for test

    LRY89757 committed Mar 14, 2024
    Configuration menu
    Copy the full SHA
    6e30248 View commit details
    Browse the repository at this point in the history

Commits on Mar 15, 2024

  1. add some comments

    Courtesy-Xs committed Mar 15, 2024
    Configuration menu
    Copy the full SHA
    5724b9e View commit details
    Browse the repository at this point in the history
  2. Merge pull request hpcaitech#5457 from Courtesy-Xs/ly_add_implementat…

    …ion_for_launch_config
    
    add implementatino for GetGPULaunchConfig1D
    Courtesy-Xs committed Mar 15, 2024
    Configuration menu
    Copy the full SHA
    b6e9785 View commit details
    Browse the repository at this point in the history

Commits on Mar 19, 2024

  1. refactor vector utils

    Courtesy-Xs committed Mar 19, 2024
    Configuration menu
    Copy the full SHA
    48c4f29 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    aabc9fb View commit details
    Browse the repository at this point in the history
  3. Merge pull request hpcaitech#5469 from Courtesy-Xs/add_vec_traits

    Refactor vector utils
    Courtesy-Xs committed Mar 19, 2024
    Configuration menu
    Copy the full SHA
    b96557b View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    7ff42cc View commit details
    Browse the repository at this point in the history

Commits on Mar 21, 2024

  1. [fix] unused option

    LRY89757 committed Mar 21, 2024
    Configuration menu
    Copy the full SHA
    4eafe0c View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    606603b View commit details
    Browse the repository at this point in the history
  3. [fix]

    LRY89757 committed Mar 21, 2024
    Configuration menu
    Copy the full SHA
    5b017d6 View commit details
    Browse the repository at this point in the history

Commits on Mar 25, 2024

  1. [fix]

    LRY89757 committed Mar 25, 2024
    Configuration menu
    Copy the full SHA
    9fe61b4 View commit details
    Browse the repository at this point in the history
  2. [fix] remove unused comment

    LRY89757 committed Mar 25, 2024
    Configuration menu
    Copy the full SHA
    ff4998c View commit details
    Browse the repository at this point in the history
  3. [Inference]Support FP16/BF16 Flash Attention 2 And Add high_precision…

    … Flag To Rotary Embedding (hpcaitech#5461)
    
    * Support FP16/BF16 Flash Attention 2
    
    * fix bugs in test_kv_cache_memcpy.py
    
    * add context_kv_cache_memcpy_kernel.cu
    
    * rm typename MT
    
    * add tail process
    
    * add high_precision
    
    * add high_precision to config.py
    
    * rm unused code
    
    * change the comment for the high_precision parameter
    
    * update test_rotary_embdding_unpad.py
    
    * fix vector_copy_utils.h
    
    * add comment for self.high_precision when using float32
    yuehuayingxueluo committed Mar 25, 2024
    Configuration menu
    Copy the full SHA
    87079cf View commit details
    Browse the repository at this point in the history
  4. [fix] merge conflicts

    LRY89757 committed Mar 25, 2024
    Configuration menu
    Copy the full SHA
    68e9396 View commit details
    Browse the repository at this point in the history
  5. Merge pull request hpcaitech#5434 from LRY89757/colossal-infer-cuda-g…

    …raph
    
    [feat] cuda graph support and refactor non-functional api
    LRY89757 committed Mar 25, 2024
    Configuration menu
    Copy the full SHA
    1d62623 View commit details
    Browse the repository at this point in the history
  6. [fix] PR hpcaitech#5354 (hpcaitech#5501)

    * [fix]
    
    * [fix]
    
    * Update config.py docstring
    
    * [fix] docstring align
    
    * [fix] docstring align
    
    * [fix] docstring align
    LRY89757 committed Mar 25, 2024
    Configuration menu
    Copy the full SHA
    6251d68 View commit details
    Browse the repository at this point in the history

Commits on Mar 26, 2024

  1. [Inference] Optimize request handler of llama (hpcaitech#5512)

    * optimize request_handler
    
    * fix ways of writing
    Courtesy-Xs committed Mar 26, 2024
    Configuration menu
    Copy the full SHA
    e6496dd View commit details
    Browse the repository at this point in the history

Commits on Mar 28, 2024

  1. Configuration menu
    Copy the full SHA
    934e31a View commit details
    Browse the repository at this point in the history

Commits on Apr 1, 2024

  1. [Inference/Kernel]Add get_cos_and_sin Kernel (hpcaitech#5528)

    * Add get_cos_and_sin kernel
    
    * fix code comments
    
    * fix code typos
    
    * merge common codes of get_cos_and_sin kernel.
    
    * Fixed a typo
    
    * Changed 'asset allclose' to 'assert equal'.
    yuehuayingxueluo committed Apr 1, 2024
    Configuration menu
    Copy the full SHA
    04aca9e View commit details
    Browse the repository at this point in the history
  2. [Inference] Add Reduce Utils (hpcaitech#5537)

    * add reduce utils
    
    * add using to delele namespace prefix
    Courtesy-Xs committed Apr 1, 2024
    Configuration menu
    Copy the full SHA
    a2878e3 View commit details
    Browse the repository at this point in the history

Commits on Apr 2, 2024

  1. [Fix/Inference] Remove unused and non-functional functions (hpcaitech…

    …#5543)
    
    * [fix] remove unused func
    
    * rm non-functional partial
    yuanheng-zhao committed Apr 2, 2024
    Configuration menu
    Copy the full SHA
    4bb5d89 View commit details
    Browse the repository at this point in the history

Commits on Apr 8, 2024

  1. Configuration menu
    Copy the full SHA
    7ebdf48 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    ed5ebd1 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    ce9401a View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    d788175 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    7ca1d1c View commit details
    Browse the repository at this point in the history

Commits on Apr 9, 2024

  1. Sync main to feature/colossal-infer

    [Sync] Merge feature/colossal-infer with main
    yuanheng-zhao committed Apr 9, 2024
    Configuration menu
    Copy the full SHA
    d56c963 View commit details
    Browse the repository at this point in the history

Commits on Apr 10, 2024

  1. [Infer] Revise and Adapt Triton Kernels for Spec-Dec (hpcaitech#5401)

    * [Infer/Fix] Fix Dependency in test - RMSNorm kernel (hpcaitech#5399)
    
    fix dependency in pytest
    
    * resolve conflicts for revising flash-attn
    
    * adapt kv cache copy kernel for spec-dec
    
    * fix seqlen-n kvcache copy kernel/tests
    
    * test kvcache copy - use torch.equal
    
    * add assertions
    
    * (trivial) comment out
    yuanheng-zhao committed Apr 10, 2024
    Configuration menu
    Copy the full SHA
    d63c469 View commit details
    Browse the repository at this point in the history
  2. [Inference/SpecDec] Add Basic Drafter Model Container (hpcaitech#5405)

    * [Infer/Fix] Fix Dependency in test - RMSNorm kernel (hpcaitech#5399)
    
    fix dependency in pytest
    
    * add drafter model container (basic ver)
    yuanheng-zhao committed Apr 10, 2024
    Configuration menu
    Copy the full SHA
    5a9b05f View commit details
    Browse the repository at this point in the history
  3. [Inference/SpecDec] Add Speculative Decoding Implementation (hpcaitec…

    …h#5423)
    
    * fix flash decoding mask during verification
    
    * add spec-dec
    
    * add test for spec-dec
    
    * revise drafter init
    
    * remove drafter sampling
    
    * retire past kv in drafter
    
    * (trivial) rename attrs
    
    * (trivial) rename arg
    
    * revise how we enable/disable spec-dec
    yuanheng-zhao committed Apr 10, 2024
    Configuration menu
    Copy the full SHA
    a37f826 View commit details
    Browse the repository at this point in the history
  4. [SpecDec] Fix inputs for speculation and revise past KV trimming (hpc…

    …aitech#5449)
    
    * fix drafter pastkv and usage of batch bucket
    yuanheng-zhao committed Apr 10, 2024
    Configuration menu
    Copy the full SHA
    912e24b View commit details
    Browse the repository at this point in the history
  5. [Inference/SpecDec] Support GLIDE Drafter Model (hpcaitech#5455)

    * add glide-llama policy and modeling
    
    * update glide modeling, compitable with transformers 4.36.2
    
    * revise glide llama modeling/usage
    
    * fix issues of glimpsing large kv
    
    * revise the way re-loading params for glide drafter
    
    * fix drafter and engine tests
    
    * enable convert to glide strict=False
    
    * revise glide llama modeling
    
    * revise vicuna prompt template
    
    * revise drafter and tests
    
    * apply usage of glide model in engine
    yuanheng-zhao committed Apr 10, 2024
    Configuration menu
    Copy the full SHA
    d85d914 View commit details
    Browse the repository at this point in the history
  6. [doc] Add inference/speculative-decoding README (hpcaitech#5552)

    * add README for spec-dec
    
    * update roadmap
    yuanheng-zhao committed Apr 10, 2024
    Configuration menu
    Copy the full SHA
    e1acb58 View commit details
    Browse the repository at this point in the history
  7. [Fix] resolve conflicts of rebasing feat/speculative-decoding (hpcait…

    …ech#5557)
    
    - resolve conflicts of rebasing feat/speculative-decoding
    yuanheng-zhao committed Apr 10, 2024
    Configuration menu
    Copy the full SHA
    e60d430 View commit details
    Browse the repository at this point in the history
  8. [Fix] Llama Modeling Control with Spec-Dec (hpcaitech#5580)

    - fix ref before asgmt
    - fall back to use triton kernels when using spec-dec
    yuanheng-zhao committed Apr 10, 2024
    Configuration menu
    Copy the full SHA
    f8598e3 View commit details
    Browse the repository at this point in the history
  9. [Inference/Spec-Dec] Merge pull request hpcaitech#5565 from hpcaitech…

    …/feat/speculative-decoding
    
    Add Speculative Decoding and GLIDE Spec-Dec
    yuanheng-zhao committed Apr 10, 2024
    Configuration menu
    Copy the full SHA
    25928d8 View commit details
    Browse the repository at this point in the history

Commits on Apr 11, 2024

  1. Configuration menu
    Copy the full SHA
    a219123 View commit details
    Browse the repository at this point in the history

Commits on Apr 15, 2024

  1. [Inference/Refactor] Delete Duplicated code and refactor vec_copy uti…

    …ls and reduce utils (hpcaitech#5593)
    
    * delete duplicated code and refactor vec_copy utils and reduce utils
    
    * delete unused header file
    Courtesy-Xs committed Apr 15, 2024
    Configuration menu
    Copy the full SHA
    d4cb023 View commit details
    Browse the repository at this point in the history
  2. [inference/model]Adapted to the baichuan2-7B model (hpcaitech#5591)

    * Adapted to the baichuan2-7B model
    
    * modified according to the review comments.
    
    * Modified the method of obtaining random weights.
    
    * modified according to the review comments.
    
    * change mlp layewr 'NOTE'
    yuehuayingxueluo committed Apr 15, 2024
    Configuration menu
    Copy the full SHA
    56b222e View commit details
    Browse the repository at this point in the history

Commits on Apr 18, 2024

  1. [Inference/Kernel] Add Paged Decoding kernel, sequence split within t…

    …he same thread block (hpcaitech#5531)
    
    * feat flash decoding for paged attention
    
    * refactor flashdecodingattention
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    SunflowerAries and pre-commit-ci[bot] committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    be396ad View commit details
    Browse the repository at this point in the history
  2. [Feat]Tensor Model Parallel Support For Inference (hpcaitech#5563)

    * tensor parallel support naive source
    
    * [fix]precision, model load and refactor the framework
    
    * add tp unit test
    
    * docstring
    
    * fix do_sample
    LRY89757 committed Apr 18, 2024
    Configuration menu
    Copy the full SHA
    e37ee2f View commit details
    Browse the repository at this point in the history

Commits on Apr 19, 2024

  1. Configuration menu
    Copy the full SHA
    ccf7279 View commit details
    Browse the repository at this point in the history

Commits on Apr 23, 2024

  1. [Fix/Inference] Fix GQA Triton and Support Llama3 (hpcaitech#5624)

    * [fix] GQA calling of flash decoding triton
    
    * fix kv cache alloc shape
    
    * fix rotary triton - GQA
    
    * fix sequence max length assigning
    
    * Sequence max length logic
    
    * fix scheduling and spec-dec
    
    * skip without import error
    
    * fix pytest - skip without ImportError
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    yuanheng-zhao and pre-commit-ci[bot] committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    5d4c1fe View commit details
    Browse the repository at this point in the history
  2. [Fix/Inference]Fix CUDA Rotary Rmbedding GQA (hpcaitech#5623)

    * fix rotary embedding GQA
    
    * change test_rotary_embdding_unpad.py KH
    yuehuayingxueluo committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    12f10d5 View commit details
    Browse the repository at this point in the history
  3. [example] Update Llama Inference example (hpcaitech#5629)

    * [example] add infernece benchmark llama3
    
    * revise inference config - arg
    
    * remove unused args
    
    * add llama generation demo script
    
    * fix init rope in llama policy
    
    * add benchmark-llama3 - cleanup
    yuanheng-zhao committed Apr 23, 2024
    Configuration menu
    Copy the full SHA
    04863a9 View commit details
    Browse the repository at this point in the history

Commits on Apr 24, 2024

  1. [Inference/Refactor] Refactor compilation mechanism and unified multi…

    … hw (hpcaitech#5613)
    
    * refactor compilation mechanism and unified multi hw
    
    * fix file path bug
    
    * add init.py to make pybind a module to avoid relative path error caused by softlink
    
    * delete duplicated micros
    
    * fix micros bug in gcc
    Courtesy-Xs committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    279300d View commit details
    Browse the repository at this point in the history
  2. [Fix/Inference]Fix vllm benchmark (hpcaitech#5630)

    * Fix bugs about OOM when running vllm-0.4.0
    
    * rm used params
    
    * change generation_config
    
    * change benchmark log file name
    yuehuayingxueluo committed Apr 24, 2024
    Configuration menu
    Copy the full SHA
    90cd522 View commit details
    Browse the repository at this point in the history

Commits on Apr 25, 2024

  1. [Inference/Kernel] Optimize paged attention: Refactor key cache layout (

    hpcaitech#5643)
    
    * optimize flashdecodingattention: refactor code with different key cache layout(from [num_blocks, num_kv_heads, block_size, head_size] to [num_blocks, num_kv_heads, head_size/x, block_size, x])
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    SunflowerAries and pre-commit-ci[bot] committed Apr 25, 2024
    Configuration menu
    Copy the full SHA
    a8fd3b0 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    f342a93 View commit details
    Browse the repository at this point in the history
  3. [Inference]Adapt to baichuan2 13B (hpcaitech#5614)

    * adapt to baichuan2 13B
    
    * adapt to baichuan2 13B
    
    * change BAICHUAN_MODEL_NAME_OR_PATH
    
    * fix test_decoding_attn.py
    
    * Modifications based on review comments.
    
    * change BAICHUAN_MODEL_NAME_OR_PATH
    
    * mv attn mask processes to test flash decoding
    
    * mv get_alibi_slopes baichuan modeling
    
    * fix bugs in test_baichuan.py
    yuehuayingxueluo committed Apr 25, 2024
    Configuration menu
    Copy the full SHA
    3c91e3f View commit details
    Browse the repository at this point in the history

Commits on Apr 26, 2024

  1. [kernel] Support new KCache Layout - Context Attention Triton Kernel (h…

    …pcaitech#5658)
    
    * add context attn triton kernel - new kcache layout
    
    * add benchmark triton
    
    * tiny revise
    
    * trivial - code style, comment
    yuanheng-zhao committed Apr 26, 2024
    Configuration menu
    Copy the full SHA
    5be590b View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    8ccb671 View commit details
    Browse the repository at this point in the history

Commits on Apr 30, 2024

  1. Configuration menu
    Copy the full SHA
    808ee6e View commit details
    Browse the repository at this point in the history
  2. [Inference] Adapt Baichuan2-13B TP (hpcaitech#5659)

    * adapt to baichuan2 13B
    
    * add baichuan2 13B TP
    
    * update baichuan tp logic
    
    * rm unused code
    
    * Fix TP logic
    
    * fix alibi slopes tp logic
    
    * rm nn.Module
    
    * Polished the code.
    
    * change BAICHUAN_MODEL_NAME_OR_PATH
    
    * Modified the logic for loading Baichuan weights.
    
    * fix typos
    yuehuayingxueluo committed Apr 30, 2024
    Configuration menu
    Copy the full SHA
    5f00002 View commit details
    Browse the repository at this point in the history
  3. [Inference/Kernel] refactor kvcache manager and rotary_embedding and …

    …kvcache_memcpy oper… (hpcaitech#5663)
    
    * refactor kvcache manager and rotary_embedding and kvcache_memcpy operator
    
    * refactor decode_kv_cache_memcpy
    
    * enable alibi in pagedattention
    SunflowerAries committed Apr 30, 2024
    Configuration menu
    Copy the full SHA
    5cd75ce View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    ef8e4ff View commit details
    Browse the repository at this point in the history
  5. [inference]Add alibi to flash attn function (hpcaitech#5678)

    * add alibi to flash attn function
    
    * rm redundant modifications
    yuehuayingxueluo committed Apr 30, 2024
    Configuration menu
    Copy the full SHA
    f799631 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    9df016f View commit details
    Browse the repository at this point in the history

Commits on May 3, 2024

  1. [kernel] Support New KCache Layout - Triton Kernel (hpcaitech#5677)

    * kvmemcpy triton for new kcache layout
    
    * revise tests for new kcache layout
    
    * naive triton flash decoding - new kcache layout
    
    * rotary triton kernel - new kcache layout
    
    * remove redundancy - triton decoding
    
    * remove redundancy - triton kvcache copy
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    yuanheng-zhao and pre-commit-ci[bot] committed May 3, 2024
    Configuration menu
    Copy the full SHA
    537a3cb View commit details
    Browse the repository at this point in the history

Commits on May 5, 2024

  1. Configuration menu
    Copy the full SHA
    56ed09a View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    8754aba View commit details
    Browse the repository at this point in the history

Commits on May 6, 2024

  1. Configuration menu
    Copy the full SHA
    725fbd2 View commit details
    Browse the repository at this point in the history
  2. [Sync] Update from main to feature/colossal-infer (Merge pull request h…

    …pcaitech#5685)
    
    [Sync] Update from main to feature/colossal-infer
    
    - Merge pull request hpcaitech#5685 from yuanheng-zhao/inference/merge/main
    yuanheng-zhao committed May 6, 2024
    Configuration menu
    Copy the full SHA
    db7b305 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    1ace106 View commit details
    Browse the repository at this point in the history

Commits on May 7, 2024

  1. [hotfix] Fix KV Heads Number Assignment in KVCacheManager (hpcaitech#…

    …5695)
    
    - Fix key value number assignment in KVCacheManager, as well as method of accessing
    yuanheng-zhao committed May 7, 2024
    Configuration menu
    Copy the full SHA
    f9afe0a View commit details
    Browse the repository at this point in the history

Commits on May 8, 2024

  1. [Fix] Fix Inference Example, Tests, and Requirements (hpcaitech#5688)

    * clean requirements
    
    * modify example inference struct
    
    * add test ci scripts
    
    * mark test_infer as submodule
    
    * rm deprecated cls & deps
    
    * import of HAS_FLASH_ATTN
    
    * prune inference tests to be run
    
    * prune triton kernel tests
    
    * increment pytest timeout mins
    
    * revert import path in openmoe
    yuanheng-zhao committed May 8, 2024
    Configuration menu
    Copy the full SHA
    55cc7f3 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    12e7c28 View commit details
    Browse the repository at this point in the history
  3. [Inference]Adapt temperature processing logic (hpcaitech#5689)

    * Adapt temperature processing logic
    
    * add ValueError for top_p and top_k
    
    * add GQA Test
    
    * fix except_msg
    yuehuayingxueluo committed May 8, 2024
    Configuration menu
    Copy the full SHA
    9c2fe79 View commit details
    Browse the repository at this point in the history
  4. [Inference] Support the logic related to ignoring EOS token (hpcaitec…

    …h#5693)
    
    * Adapt temperature processing logic
    
    * add ValueError for top_p and top_k
    
    * add GQA Test
    
    * fix except_msg
    
    * support ignore EOS token
    
    * change variable's name
    
    * fix annotation
    yuehuayingxueluo committed May 8, 2024
    Configuration menu
    Copy the full SHA
    d482922 View commit details
    Browse the repository at this point in the history
  5. [Inference] ADD async and sync Api server using FastAPI (hpcaitech#5396)

    * add api server
    
    * fix
    
    * add
    
    * add completion service and fix bug
    
    * add generation config
    
    * revise shardformer
    
    * fix bugs
    
    * add docstrings and fix some bugs
    
    * fix bugs and add choices for prompt template
    CjhHa1 committed May 8, 2024
    Configuration menu
    Copy the full SHA
    69cd7e0 View commit details
    Browse the repository at this point in the history
  6. [Inference] Finish Online Serving Test, add streaming output api, con…

    …tinuous batching test and example (hpcaitech#5432)
    
    * finish online test and add examples
    
    * fix test_contionus_batching
    
    * fix some bugs
    
    * fix bash
    
    * fix
    
    * fix inference
    
    * finish revision
    
    * fix typos
    
    * revision
    CjhHa1 committed May 8, 2024
    Configuration menu
    Copy the full SHA
    de378cd View commit details
    Browse the repository at this point in the history
  7. [Online Server] Chat Api for streaming and not streaming response (hp…

    …caitech#5470)
    
    * fix bugs
    
    * fix bugs
    
    * fix api server
    
    * fix api server
    
    * add chat api and test
    
    * del request.n
    CjhHa1 committed May 8, 2024
    Configuration menu
    Copy the full SHA
    c064032 View commit details
    Browse the repository at this point in the history
  8. [Inference] resolve rebase conflicts

    fix
    CjhHa1 committed May 8, 2024
    Configuration menu
    Copy the full SHA
    7bbb28e View commit details
    Browse the repository at this point in the history
  9. [Inference] Fix bugs and docs for feat/online-server (hpcaitech#5598)

    * fix test bugs
    
    * add do sample test
    
    * del useless lines
    
    * fix comments
    
    * fix tests
    
    * delete version tag
    
    * delete version tag
    
    * add
    
    * del test sever
    
    * fix test
    
    * fix
    
    * Revert "add"
    
    This reverts commit b9305fb.
    CjhHa1 committed May 8, 2024
    Configuration menu
    Copy the full SHA
    61a1b2e View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    bc9063a View commit details
    Browse the repository at this point in the history

Commits on May 9, 2024

  1. Configuration menu
    Copy the full SHA
    5d9a494 View commit details
    Browse the repository at this point in the history
  2. Merge pull request hpcaitech#5588 from hpcaitech/feat/online-serving

    [Feature]Online Serving
    CjhHa1 committed May 9, 2024
    Configuration menu
    Copy the full SHA
    492520d View commit details
    Browse the repository at this point in the history
  3. [Inference/Feat] Add quant kvcache interface (hpcaitech#5700)

    * add quant kvcache interface
    
    * delete unused output
    
    * complete args comments
    Courtesy-Xs committed May 9, 2024
    Configuration menu
    Copy the full SHA
    bfad393 View commit details
    Browse the repository at this point in the history

Commits on May 10, 2024

  1. [Inference/Feat] Add convert_fp8 op for fp8 test in the future (hpcai…

    …tech#5706)
    
    * add convert_fp8 op for fp8 test in the future
    
    * rerun ci
    Courtesy-Xs committed May 10, 2024
    Configuration menu
    Copy the full SHA
    50104ab View commit details
    Browse the repository at this point in the history

Commits on May 11, 2024

  1. [Inference]Adapt repetition_penalty and no_repeat_ngram_size (hpcaite…

    …ch#5708)
    
    * Adapt repetition_penalty and no_repeat_ngram_size
    
    * fix no_repeat_ngram_size_logit_process
    
    * remove batch_updated
    
    * fix annotation
    
    * modified codes based on the review feedback.
    
    * rm get_batch_token_ids
    yuehuayingxueluo committed May 11, 2024
    Configuration menu
    Copy the full SHA
    de4bf3d View commit details
    Browse the repository at this point in the history

Commits on May 14, 2024

  1. [Feat]Inference RPC Server Support (hpcaitech#5705)

    * rpc support source
    * kv cache logical/physical disaggregation
    * sampler refactor
    * colossalai launch built in
    * Unitest
    * Rpyc support
    
    ---------
    
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    LRY89757 and pre-commit-ci[bot] committed May 14, 2024
    Configuration menu
    Copy the full SHA
    18d67d0 View commit details
    Browse the repository at this point in the history
  2. delete copy_vector

    Courtesy-Xs committed May 14, 2024
    Configuration menu
    Copy the full SHA
    30ea54f View commit details
    Browse the repository at this point in the history