Skip to content

Commit 6535071

Browse files
committed
5papers
1 parent f7e4e53 commit 6535071

31 files changed

+5760
-1590
lines changed

README.md

Lines changed: 116 additions & 111 deletions
Large diffs are not rendered by default.

cls_author.md

Lines changed: 126 additions & 0 deletions
Large diffs are not rendered by default.

cls_institution.md

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -372,6 +372,7 @@
372372
|:-----|:------|:------|:--------|:-----|:-----|
373373
| [VENOM](./meta/2023/VENOM.prototxt) | [VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores](http://arxiv.org/abs/2310.02065v1) | ![cover](./notes/2023/VENOM/vnm.png) | ![Publish](https://img.shields.io/badge/2023-SC-CD5C5C) | [![GitHub Repo stars](https://img.shields.io/github/stars/UDC-GAC/venom)](https://github.com/UDC-GAC/venom) | [note](./notes/2023/VENOM/note.md) |
374374
| [SliceGPT](./meta/2024/SliceGPT.prototxt) | [SliceGPT: Compress Large Language Models by Deleting Rows and Columns](http://arxiv.org/abs/2401.15024v2) | ![cover](./notes/2024/SliceGPT/sliceGPT.jpg) | ![Publish](https://img.shields.io/badge/2024-ICLR-FF6B6B) | [![GitHub Repo stars](https://img.shields.io/github/stars/microsoft/TransformerCompression)](https://github.com/microsoft/TransformerCompression) | [note](./notes/2024/SliceGPT/note.md) |
375+
| [EvolKV](./meta/2025/EvolKV.prototxt) | [EvolKV: Evolutionary KV Cache Compression for LLM Inference](http://arxiv.org/abs/2509.08315v1) | ![cover](./notes/2025/EvolKV/fig1.png) | ![Publish](https://img.shields.io/badge/2025-arXiv-1E88E5) | | [note](./notes/2025/EvolKV/note.md) |
375376
</p>
376377
</details>
377378
<details open><summary><b>Eindhoven University of Technology</b></summary>
@@ -721,6 +722,15 @@
721722
| [SEAP](./meta/2025/SEAP.prototxt) | [SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models](http://arxiv.org/abs/2503.07605v1) | ![cover](./notes/2025/SEAP/fig2.png) | ![Publish](https://img.shields.io/badge/2025-arXiv-1E88E5) | [![GitHub Repo stars](https://img.shields.io/github/stars/IAAR-Shanghai/SEAP)](https://github.com/IAAR-Shanghai/SEAP) | [note](./notes/2025/SEAP/note.md) |
722723
</p>
723724
</details>
725+
<details open><summary><b>Institute of Automation</b></summary>
726+
<p>
727+
728+
729+
| Meta | Title | Cover | Publish | Code | Note |
730+
|:-----|:------|:------|:--------|:-----|:-----|
731+
| [EvolKV](./meta/2025/EvolKV.prototxt) | [EvolKV: Evolutionary KV Cache Compression for LLM Inference](http://arxiv.org/abs/2509.08315v1) | ![cover](./notes/2025/EvolKV/fig1.png) | ![Publish](https://img.shields.io/badge/2025-arXiv-1E88E5) | | [note](./notes/2025/EvolKV/note.md) |
732+
</p>
733+
</details>
724734
<details open><summary><b>Institute of Automation, Chinese Academy of Sciences</b></summary>
725735
<p>
726736

@@ -1145,6 +1155,15 @@
11451155
| [SPP](./meta/2024/SPP.prototxt) | [SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models](http://arxiv.org/abs/2405.16057v1) | ![cover](./notes/2024/SPP/spp.png) | ![Publish](https://img.shields.io/badge/2024-ICML-FF8C00) | [![GitHub Repo stars](https://img.shields.io/github/stars/Lucky-Lance/SPP)](https://github.com/Lucky-Lance/SPP) | [note](./notes/2024/SPP/note.md) |
11461156
</p>
11471157
</details>
1158+
<details open><summary><b>Murdoch University</b></summary>
1159+
<p>
1160+
1161+
1162+
| Meta | Title | Cover | Publish | Code | Note |
1163+
|:-----|:------|:------|:--------|:-----|:-----|
1164+
| [TOA](./meta/2025/TOA.prototxt) | [Tree of Agents: Improving Long-Context Capabilities of Large Language Models through Multi-Perspective Reasoning](http://arxiv.org/abs/2509.06436v1) | ![cover](./notes/2025/TOA/fig1.png) | ![Publish](https://img.shields.io/badge/2025-EMNLP_Findings-green) | [![GitHub Repo stars](https://img.shields.io/github/stars/Aireduce952/Tree-of-Agents)](https://github.com/Aireduce952/Tree-of-Agents) | [note](./notes/2025/TOA/note.md) |
1165+
</p>
1166+
</details>
11481167
<details open><summary><b>NAVER Cloud</b></summary>
11491168
<p>
11501169

@@ -1205,6 +1224,7 @@
12051224
| [STA](./meta/2022/44KWQAWO.prototxt) | [An Algorithm-Hardware Co-Optimized Framework for Accelerating N:M Sparse Transformers](https://arxiv.org/abs/2208.06118) | | ![Publish](https://img.shields.io/badge/2022-VLSI-8B0000) | | |
12061225
| [DeepSeekMoE](./meta/2024/DeepSeekMoE.prototxt) | [DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models](http://arxiv.org/abs/2401.06066v1) | ![cover](./notes/2024/DeepSeekMoE/fig2.png) | ![Publish](https://img.shields.io/badge/2024-arXiv-1E88E5) | [![GitHub Repo stars](https://img.shields.io/github/stars/deepseek-ai/DeepSeek-MoE)](https://github.com/deepseek-ai/DeepSeek-MoE) | [note](./notes/2024/DeepSeekMoE/note.md) |
12071226
| [RaaS](./meta/2025/RaaS.prototxt) | [Efficient Long-Decoding Inference with Reasoning-Aware Attention Sparsity](http://arxiv.org/abs/2502.11147v1) | ![cover](./notes/2025/RaaS/fig5.png) | ![Publish](https://img.shields.io/badge/2025-arXiv-1E88E5) | | [note](./notes/2025/RaaS/note.md) |
1227+
| [LAVa](./meta/2025/LAVa.prototxt) | [LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation](http://arxiv.org/abs/2509.09754v1) | | ![Publish](https://img.shields.io/badge/2025-arXiv-1E88E5) | [![GitHub Repo stars](https://img.shields.io/github/stars/MGDDestiny/Lava)](https://github.com/MGDDestiny/Lava) | [note](./notes/2025/LAVa/note.md) |
12081228
</p>
12091229
</details>
12101230
<details open><summary><b>Nanyang Technological University</b></summary>
@@ -1455,6 +1475,24 @@
14551475
| [Qwen3](./meta/2025/Qwen3.prototxt) | [Qwen3 Technical Report](http://arxiv.org/abs/2505.09388v1) | ![cover](./notes/2025/Qwen3/fig1.png) | ![Publish](https://img.shields.io/badge/2025-arXiv-1E88E5) | [![GitHub Repo stars](https://img.shields.io/github/stars/QwenLM/Qwen3)](https://github.com/QwenLM/Qwen3) | [note](./notes/2025/Qwen3/note.md) |
14561476
</p>
14571477
</details>
1478+
<details open><summary><b>RMIT University</b></summary>
1479+
<p>
1480+
1481+
1482+
| Meta | Title | Cover | Publish | Code | Note |
1483+
|:-----|:------|:------|:--------|:-----|:-----|
1484+
| [TOA](./meta/2025/TOA.prototxt) | [Tree of Agents: Improving Long-Context Capabilities of Large Language Models through Multi-Perspective Reasoning](http://arxiv.org/abs/2509.06436v1) | ![cover](./notes/2025/TOA/fig1.png) | ![Publish](https://img.shields.io/badge/2025-EMNLP_Findings-green) | [![GitHub Repo stars](https://img.shields.io/github/stars/Aireduce952/Tree-of-Agents)](https://github.com/Aireduce952/Tree-of-Agents) | [note](./notes/2025/TOA/note.md) |
1485+
</p>
1486+
</details>
1487+
<details open><summary><b>RWTH Aachen University</b></summary>
1488+
<p>
1489+
1490+
1491+
| Meta | Title | Cover | Publish | Code | Note |
1492+
|:-----|:------|:------|:--------|:-----|:-----|
1493+
| [FasterVGGT](./meta/2025/FasterVGGT.prototxt) | [Faster VGGT with Block-Sparse Global Attention](http://arxiv.org/abs/2509.07120v1) | | ![Publish](https://img.shields.io/badge/2025-arXiv-1E88E5) | | [note](./notes/2025/FasterVGGT/note.md) |
1494+
</p>
1495+
</details>
14581496
<details open><summary><b>Renmin University of China</b></summary>
14591497
<p>
14601498

@@ -1703,6 +1741,15 @@
17031741
| [Awesome-Efficient-Arch](./meta/2025/Awesome-Efficient-Arch.prototxt) | [Speed Always Wins: A Survey on Efficient Architectures for Large Language Models](http://arxiv.org/abs/2508.09834v1) | ![cover](./notes/2025/Awesome-Efficient-Arch/fig1.png) | ![Publish](https://img.shields.io/badge/2025-arXiv-1E88E5) | [![GitHub Repo stars](https://img.shields.io/github/stars/weigao266/Awesome-Efficient-Arch)](https://github.com/weigao266/Awesome-Efficient-Arch) | [note](./notes/2025/Awesome-Efficient-Arch/note.md) |
17041742
</p>
17051743
</details>
1744+
<details open><summary><b>Southwest University</b></summary>
1745+
<p>
1746+
1747+
1748+
| Meta | Title | Cover | Publish | Code | Note |
1749+
|:-----|:------|:------|:--------|:-----|:-----|
1750+
| [TOA](./meta/2025/TOA.prototxt) | [Tree of Agents: Improving Long-Context Capabilities of Large Language Models through Multi-Perspective Reasoning](http://arxiv.org/abs/2509.06436v1) | ![cover](./notes/2025/TOA/fig1.png) | ![Publish](https://img.shields.io/badge/2025-EMNLP_Findings-green) | [![GitHub Repo stars](https://img.shields.io/github/stars/Aireduce952/Tree-of-Agents)](https://github.com/Aireduce952/Tree-of-Agents) | [note](./notes/2025/TOA/note.md) |
1751+
</p>
1752+
</details>
17061753
<details open><summary><b>Stanford</b></summary>
17071754
<p>
17081755

@@ -1743,6 +1790,15 @@
17431790
| [Step-3](./meta/2025/Step-3.prototxt) | [Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding](http://arxiv.org/abs/2507.19427v1) | | ![Publish](https://img.shields.io/badge/2025-arXiv-1E88E5) | | [note](./notes/2025/Step-3/note.md) |
17441791
</p>
17451792
</details>
1793+
<details open><summary><b>Stepfun</b></summary>
1794+
<p>
1795+
1796+
1797+
| Meta | Title | Cover | Publish | Code | Note |
1798+
|:-----|:------|:------|:--------|:-----|:-----|
1799+
| [LAVa](./meta/2025/LAVa.prototxt) | [LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation](http://arxiv.org/abs/2509.09754v1) | | ![Publish](https://img.shields.io/badge/2025-arXiv-1E88E5) | [![GitHub Repo stars](https://img.shields.io/github/stars/MGDDestiny/Lava)](https://github.com/MGDDestiny/Lava) | [note](./notes/2025/LAVa/note.md) |
1800+
</p>
1801+
</details>
17461802
<details open><summary><b>Stevens Institute of Technology</b></summary>
17471803
<p>
17481804

@@ -1979,6 +2035,7 @@
19792035
| [NanoFlow](./meta/2025/NanoFlow.prototxt) | [NanoFlow: Towards Optimal Large Language Model Serving Throughput](http://arxiv.org/abs/2408.12757v2) | ![cover](./notes/2025/NanoFlow/pipeline.gif) | ![Publish](https://img.shields.io/badge/2025-OSDI-green) | [![GitHub Repo stars](https://img.shields.io/github/stars/efeslab/Nanoflow)](https://github.com/efeslab/Nanoflow) | [note](./notes/2025/NanoFlow/note.md) |
19802036
| [LinearPatch](./meta/2025/LinearPatch.prototxt) | [A Simple Linear Patch Revives Layer-Pruned Large Language Models](http://arxiv.org/abs/2505.24680v1) | ![cover](./notes/2025/LinearPatch/fig3.png) | ![Publish](https://img.shields.io/badge/2025-arXiv-1E88E5) | | [note](./notes/2025/LinearPatch/note.md) |
19812037
| [DReSS](./meta/2025/DReSS.prototxt) | [DReSS: Data-driven Regularized Structured Streamlining for Large Language Models](http://arxiv.org/abs/2501.17905v3) | ![cover](./notes/2025/DReSS/fig1.png) | ![Publish](https://img.shields.io/badge/2025-arXiv-1E88E5) | | [note](./notes/2025/DReSS/note.md) |
2038+
| [GLM-4.5](./meta/2025/GLM-4.5.prototxt) | [GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models](http://arxiv.org/abs/2508.06471v1) | | ![Publish](https://img.shields.io/badge/2025-arXiv-1E88E5) | [![GitHub Repo stars](https://img.shields.io/github/stars/zai-org/GLM-4.5)](https://github.com/zai-org/GLM-4.5) | [note](./notes/2025/GLM-4.5/note.md) |
19822039
| [KeepKV](./meta/2025/KeepKV.prototxt) | [KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference](http://arxiv.org/abs/2504.09936v1) | ![cover](./notes/2025/KeepKV/fig1.png) | ![Publish](https://img.shields.io/badge/2025-arXiv-1E88E5) | | [note](./notes/2025/KeepKV/note.md) |
19832040
| [LeanK](./meta/2025/LeanK.prototxt) | [LeanK: Learnable K Cache Channel Pruning for Efficient Decoding](http://arxiv.org/abs/2508.02215v1) | ![cover](./notes/2025/LeanK/fig2.png) | ![Publish](https://img.shields.io/badge/2025-arXiv-1E88E5) | [![GitHub Repo stars](https://img.shields.io/github/stars/microsoft/MInference)](https://github.com/microsoft/MInference) | [note](./notes/2025/LeanK/note.md) |
19842041
| [MoBA](./meta/2025/MoBA.prototxt) | [MoBA: Mixture of Block Attention for Long-Context LLMs](http://arxiv.org/abs/2502.13189v1) | ![cover](./notes/2025/MoBA/fig1.png) | ![Publish](https://img.shields.io/badge/2025-arXiv-1E88E5) | [![GitHub Repo stars](https://img.shields.io/github/stars/MoonshotAI/MoBA)](https://github.com/MoonshotAI/MoBA) | [note](./notes/2025/MoBA/note.md) |
@@ -2129,6 +2186,7 @@
21292186
|:-----|:------|:------|:--------|:-----|:-----|
21302187
| [Q-Sparse](./meta/2024/Q-Sparse.prototxt) | [Q-Sparse: All Large Language Models can be Fully Sparsely-Activated](http://arxiv.org/abs/2407.10969v1) | ![cover](./notes/2024/Q-Sparse/q-sparse.png) | ![Publish](https://img.shields.io/badge/2024-arXiv-1E88E5) | | [note](./notes/2024/Q-Sparse/note.md) |
21312188
| [COMET](./meta/2025/COMET.prototxt) | [COMET: Towards Partical W4A4KV4 LLMs Serving](http://arxiv.org/abs/2410.12168v1) | ![cover](./notes/2025/COMET/fig5.png) | ![Publish](https://img.shields.io/badge/2025-ASPLOS-9370DB) | | [note](./notes/2025/COMET/note.md) |
2189+
| [EvolKV](./meta/2025/EvolKV.prototxt) | [EvolKV: Evolutionary KV Cache Compression for LLM Inference](http://arxiv.org/abs/2509.08315v1) | ![cover](./notes/2025/EvolKV/fig1.png) | ![Publish](https://img.shields.io/badge/2025-arXiv-1E88E5) | | [note](./notes/2025/EvolKV/note.md) |
21322190
</p>
21332191
</details>
21342192
<details open><summary><b>University of Connecticut</b></summary>
@@ -2291,6 +2349,15 @@
22912349
| [Selective Context](./meta/2023/selective_context.prototxt) | [Unlocking Context Constraints of LLMs: Enhancing Context Efficiency of LLMs with Self-Information-Based Content Filtering](https://arxiv.org/abs/2304.12102) | ![cover](./notes/2023/selective_context/selective_context.jpg) | ![Publish](https://img.shields.io/badge/2023-arXiv-1E88E5) | [![GitHub Repo stars](https://img.shields.io/github/stars/liyucheng09/Selective_Context)](https://github.com/liyucheng09/Selective_Context) | |
22922350
</p>
22932351
</details>
2352+
<details open><summary><b>University of Technology Sydney</b></summary>
2353+
<p>
2354+
2355+
2356+
| Meta | Title | Cover | Publish | Code | Note |
2357+
|:-----|:------|:------|:--------|:-----|:-----|
2358+
| [TOA](./meta/2025/TOA.prototxt) | [Tree of Agents: Improving Long-Context Capabilities of Large Language Models through Multi-Perspective Reasoning](http://arxiv.org/abs/2509.06436v1) | ![cover](./notes/2025/TOA/fig1.png) | ![Publish](https://img.shields.io/badge/2025-EMNLP_Findings-green) | [![GitHub Repo stars](https://img.shields.io/github/stars/Aireduce952/Tree-of-Agents)](https://github.com/Aireduce952/Tree-of-Agents) | [note](./notes/2025/TOA/note.md) |
2359+
</p>
2360+
</details>
22942361
<details open><summary><b>University of Texas at Austin</b></summary>
22952362
<p>
22962363

@@ -2482,6 +2549,15 @@
24822549
| [MoBA](./meta/2025/MoBA.prototxt) | [MoBA: Mixture of Block Attention for Long-Context LLMs](http://arxiv.org/abs/2502.13189v1) | ![cover](./notes/2025/MoBA/fig1.png) | ![Publish](https://img.shields.io/badge/2025-arXiv-1E88E5) | [![GitHub Repo stars](https://img.shields.io/github/stars/MoonshotAI/MoBA)](https://github.com/MoonshotAI/MoBA) | [note](./notes/2025/MoBA/note.md) |
24832550
</p>
24842551
</details>
2552+
<details open><summary><b>Zhipu AI</b></summary>
2553+
<p>
2554+
2555+
2556+
| Meta | Title | Cover | Publish | Code | Note |
2557+
|:-----|:------|:------|:--------|:-----|:-----|
2558+
| [GLM-4.5](./meta/2025/GLM-4.5.prototxt) | [GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models](http://arxiv.org/abs/2508.06471v1) | | ![Publish](https://img.shields.io/badge/2025-arXiv-1E88E5) | [![GitHub Repo stars](https://img.shields.io/github/stars/zai-org/GLM-4.5)](https://github.com/zai-org/GLM-4.5) | [note](./notes/2025/GLM-4.5/note.md) |
2559+
</p>
2560+
</details>
24852561
<details open><summary><b>Zhipu.AI</b></summary>
24862562
<p>
24872563

0 commit comments

Comments
 (0)