|
372 | 372 | |:-----|:------|:------|:--------|:-----|:-----|
|
373 | 373 | | [VENOM](./meta/2023/VENOM.prototxt) | [VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores](http://arxiv.org/abs/2310.02065v1) |  |  | [](https://github.com/UDC-GAC/venom) | [note](./notes/2023/VENOM/note.md) |
|
374 | 374 | | [SliceGPT](./meta/2024/SliceGPT.prototxt) | [SliceGPT: Compress Large Language Models by Deleting Rows and Columns](http://arxiv.org/abs/2401.15024v2) |  |  | [](https://github.com/microsoft/TransformerCompression) | [note](./notes/2024/SliceGPT/note.md) |
|
| 375 | +| [EvolKV](./meta/2025/EvolKV.prototxt) | [EvolKV: Evolutionary KV Cache Compression for LLM Inference](http://arxiv.org/abs/2509.08315v1) |  |  | | [note](./notes/2025/EvolKV/note.md) | |
375 | 376 | </p>
|
376 | 377 | </details>
|
377 | 378 | <details open><summary><b>Eindhoven University of Technology</b></summary>
|
|
721 | 722 | | [SEAP](./meta/2025/SEAP.prototxt) | [SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models](http://arxiv.org/abs/2503.07605v1) |  |  | [](https://github.com/IAAR-Shanghai/SEAP) | [note](./notes/2025/SEAP/note.md) |
|
722 | 723 | </p>
|
723 | 724 | </details>
|
| 725 | +<details open><summary><b>Institute of Automation</b></summary> |
| 726 | +<p> |
| 727 | + |
| 728 | + |
| 729 | +| Meta | Title | Cover | Publish | Code | Note | |
| 730 | +|:-----|:------|:------|:--------|:-----|:-----| |
| 731 | +| [EvolKV](./meta/2025/EvolKV.prototxt) | [EvolKV: Evolutionary KV Cache Compression for LLM Inference](http://arxiv.org/abs/2509.08315v1) |  |  | | [note](./notes/2025/EvolKV/note.md) | |
| 732 | +</p> |
| 733 | +</details> |
724 | 734 | <details open><summary><b>Institute of Automation, Chinese Academy of Sciences</b></summary>
|
725 | 735 | <p>
|
726 | 736 |
|
|
1145 | 1155 | | [SPP](./meta/2024/SPP.prototxt) | [SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models](http://arxiv.org/abs/2405.16057v1) |  |  | [](https://github.com/Lucky-Lance/SPP) | [note](./notes/2024/SPP/note.md) |
|
1146 | 1156 | </p>
|
1147 | 1157 | </details>
|
| 1158 | +<details open><summary><b>Murdoch University</b></summary> |
| 1159 | +<p> |
| 1160 | + |
| 1161 | + |
| 1162 | +| Meta | Title | Cover | Publish | Code | Note | |
| 1163 | +|:-----|:------|:------|:--------|:-----|:-----| |
| 1164 | +| [TOA](./meta/2025/TOA.prototxt) | [Tree of Agents: Improving Long-Context Capabilities of Large Language Models through Multi-Perspective Reasoning](http://arxiv.org/abs/2509.06436v1) |  |  | [](https://github.com/Aireduce952/Tree-of-Agents) | [note](./notes/2025/TOA/note.md) | |
| 1165 | +</p> |
| 1166 | +</details> |
1148 | 1167 | <details open><summary><b>NAVER Cloud</b></summary>
|
1149 | 1168 | <p>
|
1150 | 1169 |
|
|
1205 | 1224 | | [STA](./meta/2022/44KWQAWO.prototxt) | [An Algorithm-Hardware Co-Optimized Framework for Accelerating N:M Sparse Transformers](https://arxiv.org/abs/2208.06118) | |  | | |
|
1206 | 1225 | | [DeepSeekMoE](./meta/2024/DeepSeekMoE.prototxt) | [DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models](http://arxiv.org/abs/2401.06066v1) |  |  | [](https://github.com/deepseek-ai/DeepSeek-MoE) | [note](./notes/2024/DeepSeekMoE/note.md) |
|
1207 | 1226 | | [RaaS](./meta/2025/RaaS.prototxt) | [Efficient Long-Decoding Inference with Reasoning-Aware Attention Sparsity](http://arxiv.org/abs/2502.11147v1) |  |  | | [note](./notes/2025/RaaS/note.md) |
|
| 1227 | +| [LAVa](./meta/2025/LAVa.prototxt) | [LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation](http://arxiv.org/abs/2509.09754v1) | |  | [](https://github.com/MGDDestiny/Lava) | [note](./notes/2025/LAVa/note.md) | |
1208 | 1228 | </p>
|
1209 | 1229 | </details>
|
1210 | 1230 | <details open><summary><b>Nanyang Technological University</b></summary>
|
|
1455 | 1475 | | [Qwen3](./meta/2025/Qwen3.prototxt) | [Qwen3 Technical Report](http://arxiv.org/abs/2505.09388v1) |  |  | [](https://github.com/QwenLM/Qwen3) | [note](./notes/2025/Qwen3/note.md) |
|
1456 | 1476 | </p>
|
1457 | 1477 | </details>
|
| 1478 | +<details open><summary><b>RMIT University</b></summary> |
| 1479 | +<p> |
| 1480 | + |
| 1481 | + |
| 1482 | +| Meta | Title | Cover | Publish | Code | Note | |
| 1483 | +|:-----|:------|:------|:--------|:-----|:-----| |
| 1484 | +| [TOA](./meta/2025/TOA.prototxt) | [Tree of Agents: Improving Long-Context Capabilities of Large Language Models through Multi-Perspective Reasoning](http://arxiv.org/abs/2509.06436v1) |  |  | [](https://github.com/Aireduce952/Tree-of-Agents) | [note](./notes/2025/TOA/note.md) | |
| 1485 | +</p> |
| 1486 | +</details> |
| 1487 | +<details open><summary><b>RWTH Aachen University</b></summary> |
| 1488 | +<p> |
| 1489 | + |
| 1490 | + |
| 1491 | +| Meta | Title | Cover | Publish | Code | Note | |
| 1492 | +|:-----|:------|:------|:--------|:-----|:-----| |
| 1493 | +| [FasterVGGT](./meta/2025/FasterVGGT.prototxt) | [Faster VGGT with Block-Sparse Global Attention](http://arxiv.org/abs/2509.07120v1) | |  | | [note](./notes/2025/FasterVGGT/note.md) | |
| 1494 | +</p> |
| 1495 | +</details> |
1458 | 1496 | <details open><summary><b>Renmin University of China</b></summary>
|
1459 | 1497 | <p>
|
1460 | 1498 |
|
|
1703 | 1741 | | [Awesome-Efficient-Arch](./meta/2025/Awesome-Efficient-Arch.prototxt) | [Speed Always Wins: A Survey on Efficient Architectures for Large Language Models](http://arxiv.org/abs/2508.09834v1) |  |  | [](https://github.com/weigao266/Awesome-Efficient-Arch) | [note](./notes/2025/Awesome-Efficient-Arch/note.md) |
|
1704 | 1742 | </p>
|
1705 | 1743 | </details>
|
| 1744 | +<details open><summary><b>Southwest University</b></summary> |
| 1745 | +<p> |
| 1746 | + |
| 1747 | + |
| 1748 | +| Meta | Title | Cover | Publish | Code | Note | |
| 1749 | +|:-----|:------|:------|:--------|:-----|:-----| |
| 1750 | +| [TOA](./meta/2025/TOA.prototxt) | [Tree of Agents: Improving Long-Context Capabilities of Large Language Models through Multi-Perspective Reasoning](http://arxiv.org/abs/2509.06436v1) |  |  | [](https://github.com/Aireduce952/Tree-of-Agents) | [note](./notes/2025/TOA/note.md) | |
| 1751 | +</p> |
| 1752 | +</details> |
1706 | 1753 | <details open><summary><b>Stanford</b></summary>
|
1707 | 1754 | <p>
|
1708 | 1755 |
|
|
1743 | 1790 | | [Step-3](./meta/2025/Step-3.prototxt) | [Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding](http://arxiv.org/abs/2507.19427v1) | |  | | [note](./notes/2025/Step-3/note.md) |
|
1744 | 1791 | </p>
|
1745 | 1792 | </details>
|
| 1793 | +<details open><summary><b>Stepfun</b></summary> |
| 1794 | +<p> |
| 1795 | + |
| 1796 | + |
| 1797 | +| Meta | Title | Cover | Publish | Code | Note | |
| 1798 | +|:-----|:------|:------|:--------|:-----|:-----| |
| 1799 | +| [LAVa](./meta/2025/LAVa.prototxt) | [LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation](http://arxiv.org/abs/2509.09754v1) | |  | [](https://github.com/MGDDestiny/Lava) | [note](./notes/2025/LAVa/note.md) | |
| 1800 | +</p> |
| 1801 | +</details> |
1746 | 1802 | <details open><summary><b>Stevens Institute of Technology</b></summary>
|
1747 | 1803 | <p>
|
1748 | 1804 |
|
|
1979 | 2035 | | [NanoFlow](./meta/2025/NanoFlow.prototxt) | [NanoFlow: Towards Optimal Large Language Model Serving Throughput](http://arxiv.org/abs/2408.12757v2) |  |  | [](https://github.com/efeslab/Nanoflow) | [note](./notes/2025/NanoFlow/note.md) |
|
1980 | 2036 | | [LinearPatch](./meta/2025/LinearPatch.prototxt) | [A Simple Linear Patch Revives Layer-Pruned Large Language Models](http://arxiv.org/abs/2505.24680v1) |  |  | | [note](./notes/2025/LinearPatch/note.md) |
|
1981 | 2037 | | [DReSS](./meta/2025/DReSS.prototxt) | [DReSS: Data-driven Regularized Structured Streamlining for Large Language Models](http://arxiv.org/abs/2501.17905v3) |  |  | | [note](./notes/2025/DReSS/note.md) |
|
| 2038 | +| [GLM-4.5](./meta/2025/GLM-4.5.prototxt) | [GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models](http://arxiv.org/abs/2508.06471v1) | |  | [](https://github.com/zai-org/GLM-4.5) | [note](./notes/2025/GLM-4.5/note.md) | |
1982 | 2039 | | [KeepKV](./meta/2025/KeepKV.prototxt) | [KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference](http://arxiv.org/abs/2504.09936v1) |  |  | | [note](./notes/2025/KeepKV/note.md) |
|
1983 | 2040 | | [LeanK](./meta/2025/LeanK.prototxt) | [LeanK: Learnable K Cache Channel Pruning for Efficient Decoding](http://arxiv.org/abs/2508.02215v1) |  |  | [](https://github.com/microsoft/MInference) | [note](./notes/2025/LeanK/note.md) |
|
1984 | 2041 | | [MoBA](./meta/2025/MoBA.prototxt) | [MoBA: Mixture of Block Attention for Long-Context LLMs](http://arxiv.org/abs/2502.13189v1) |  |  | [](https://github.com/MoonshotAI/MoBA) | [note](./notes/2025/MoBA/note.md) |
|
|
2129 | 2186 | |:-----|:------|:------|:--------|:-----|:-----|
|
2130 | 2187 | | [Q-Sparse](./meta/2024/Q-Sparse.prototxt) | [Q-Sparse: All Large Language Models can be Fully Sparsely-Activated](http://arxiv.org/abs/2407.10969v1) |  |  | | [note](./notes/2024/Q-Sparse/note.md) |
|
2131 | 2188 | | [COMET](./meta/2025/COMET.prototxt) | [COMET: Towards Partical W4A4KV4 LLMs Serving](http://arxiv.org/abs/2410.12168v1) |  |  | | [note](./notes/2025/COMET/note.md) |
|
| 2189 | +| [EvolKV](./meta/2025/EvolKV.prototxt) | [EvolKV: Evolutionary KV Cache Compression for LLM Inference](http://arxiv.org/abs/2509.08315v1) |  |  | | [note](./notes/2025/EvolKV/note.md) | |
2132 | 2190 | </p>
|
2133 | 2191 | </details>
|
2134 | 2192 | <details open><summary><b>University of Connecticut</b></summary>
|
|
2291 | 2349 | | [Selective Context](./meta/2023/selective_context.prototxt) | [Unlocking Context Constraints of LLMs: Enhancing Context Efficiency of LLMs with Self-Information-Based Content Filtering](https://arxiv.org/abs/2304.12102) |  |  | [](https://github.com/liyucheng09/Selective_Context) | |
|
2292 | 2350 | </p>
|
2293 | 2351 | </details>
|
| 2352 | +<details open><summary><b>University of Technology Sydney</b></summary> |
| 2353 | +<p> |
| 2354 | + |
| 2355 | + |
| 2356 | +| Meta | Title | Cover | Publish | Code | Note | |
| 2357 | +|:-----|:------|:------|:--------|:-----|:-----| |
| 2358 | +| [TOA](./meta/2025/TOA.prototxt) | [Tree of Agents: Improving Long-Context Capabilities of Large Language Models through Multi-Perspective Reasoning](http://arxiv.org/abs/2509.06436v1) |  |  | [](https://github.com/Aireduce952/Tree-of-Agents) | [note](./notes/2025/TOA/note.md) | |
| 2359 | +</p> |
| 2360 | +</details> |
2294 | 2361 | <details open><summary><b>University of Texas at Austin</b></summary>
|
2295 | 2362 | <p>
|
2296 | 2363 |
|
|
2482 | 2549 | | [MoBA](./meta/2025/MoBA.prototxt) | [MoBA: Mixture of Block Attention for Long-Context LLMs](http://arxiv.org/abs/2502.13189v1) |  |  | [](https://github.com/MoonshotAI/MoBA) | [note](./notes/2025/MoBA/note.md) |
|
2483 | 2550 | </p>
|
2484 | 2551 | </details>
|
| 2552 | +<details open><summary><b>Zhipu AI</b></summary> |
| 2553 | +<p> |
| 2554 | + |
| 2555 | + |
| 2556 | +| Meta | Title | Cover | Publish | Code | Note | |
| 2557 | +|:-----|:------|:------|:--------|:-----|:-----| |
| 2558 | +| [GLM-4.5](./meta/2025/GLM-4.5.prototxt) | [GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models](http://arxiv.org/abs/2508.06471v1) | |  | [](https://github.com/zai-org/GLM-4.5) | [note](./notes/2025/GLM-4.5/note.md) | |
| 2559 | +</p> |
| 2560 | +</details> |
2485 | 2561 | <details open><summary><b>Zhipu.AI</b></summary>
|
2486 | 2562 | <p>
|
2487 | 2563 |
|
|
0 commit comments