|
174 | 174 | | [AdaSkip](./meta/2025/AdaSkip.prototxt) | [AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference](http://arxiv.org/abs/2501.02336v1) |  |  | [](https://github.com/ASISys/AdaSkip) | [note](./notes/2025/AdaSkip/note.md) |
|
175 | 175 | | [SlimLLM](./meta/2025/SlimLLM.prototxt) | [SlimLLM: Accurate Structured Pruning for Large Language Models](http://arxiv.org/abs/2505.22689v1) | |  | | [note](./notes/2025/SlimLLM/note.md) |
|
176 | 176 | | [SpecEE](./meta/2025/SpecEE.prototxt) | [SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting](http://arxiv.org/abs/2504.08850v1) |  |  | [](https://github.com/infinigence/SpecEE) | [note](./notes/2025/SpecEE/note.md) |
|
| 177 | +| [Týr-the-Pruner](./meta/2025/Týr-the-Pruner.prototxt) | [Týr-the-Pruner: Unlocking Accurate 50% Structural Pruning for LLMs via Global Sparsity Distribution Optimization](http://arxiv.org/abs/2503.09657v2) |  |  | | [note](./notes/2025/Týr-the-Pruner/note.md) | |
177 | 178 | | [LinearPatch](./meta/2025/LinearPatch.prototxt) | [A Simple Linear Patch Revives Layer-Pruned Large Language Models](http://arxiv.org/abs/2505.24680v1) |  |  | | [note](./notes/2025/LinearPatch/note.md) |
|
178 | 179 | | [FlexiDepth](./meta/2025/FlexiDepth.prototxt) | [Adaptive Layer-skipping in Pre-trained LLMs](http://arxiv.org/abs/2503.23798v1) |  |  | | [note](./notes/2025/FlexiDepth/note.md) |
|
179 | 180 | | [DReSS](./meta/2025/DReSS.prototxt) | [DReSS: Data-driven Regularized Structured Streamlining for Large Language Models](http://arxiv.org/abs/2501.17905v3) |  |  | | [note](./notes/2025/DReSS/note.md) |
|
180 | 181 | | [Mosaic](./meta/2025/Mosaic.prototxt) | [Mosaic: Composite Projection Pruning for Resource-efficient LLMs](http://arxiv.org/abs/2504.06323v1) |  |  | | [note](./notes/2025/Mosaic/note.md) |
|
181 | 182 | | [Cus-Prun](./meta/2025/Cus-Prun.prototxt) | [Pruning General Large Language Models into Customized Expert Models](http://arxiv.org/abs/2506.02561v1) |  |  | [](https://github.com/zhaoyiran924/Custom-Prune) | [note](./notes/2025/Cus-Prun/note.md) |
|
182 | 183 | | [SEAP](./meta/2025/SEAP.prototxt) | [SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models](http://arxiv.org/abs/2503.07605v1) |  |  | [](https://github.com/IAAR-Shanghai/SEAP) | [note](./notes/2025/SEAP/note.md) |
|
183 |
| -| [Týr-the-Pruner](./meta/2025/Týr-the-Pruner.prototxt) | [Týr-the-Pruner: Unlocking Accurate 50% Structural Pruning for LLMs via Global Sparsity Distribution Optimization](http://arxiv.org/abs/2503.09657v2) |  |  | | [note](./notes/2025/Týr-the-Pruner/note.md) | |
184 | 184 | </p>
|
185 | 185 | </details>
|
186 | 186 | <details open><summary><b>05-Sparse/Pruning</b></summary>
|
|
325 | 325 | | [XAttention](./meta/2025/XAttention.prototxt) | [XAttention: Block Sparse Attention with Antidiagonal Scoring](http://arxiv.org/abs/2503.16428v1) |  |  | [](https://github.com/mit-han-lab/x-attention) | [note](./notes/2025/XAttention/note.md) |
|
326 | 326 | | [SpecEE](./meta/2025/SpecEE.prototxt) | [SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting](http://arxiv.org/abs/2504.08850v1) |  |  | [](https://github.com/infinigence/SpecEE) | [note](./notes/2025/SpecEE/note.md) |
|
327 | 327 | | [0VRXJQ3F](./meta/2025/0VRXJQ3F.prototxt) | [Rethinking Key-Value Cache Compression Techniques for Large Language Model Serving](http://arxiv.org/abs/2503.24000v1) |  |  | [](https://github.com/LLMkvsys/rethink-kv-compression) | [note](./notes/2025/0VRXJQ3F/note.md) |
|
| 328 | +| [Týr-the-Pruner](./meta/2025/Týr-the-Pruner.prototxt) | [Týr-the-Pruner: Unlocking Accurate 50% Structural Pruning for LLMs via Global Sparsity Distribution Optimization](http://arxiv.org/abs/2503.09657v2) |  |  | | [note](./notes/2025/Týr-the-Pruner/note.md) | |
328 | 329 | | [Acc-SpMM](./meta/2025/Acc-SpMM.prototxt) | [Acc-SpMM: Accelerating General-purpose Sparse Matrix-Matrix Multiplication with GPU Tensor Cores](http://arxiv.org/abs/2501.09251v1) |  |  | | [note](./notes/2025/Acc-SpMM/note.md) |
|
329 | 330 | | [LinearPatch](./meta/2025/LinearPatch.prototxt) | [A Simple Linear Patch Revives Layer-Pruned Large Language Models](http://arxiv.org/abs/2505.24680v1) |  |  | | [note](./notes/2025/LinearPatch/note.md) |
|
330 | 331 | | [07NWF4VE](./meta/2025/07NWF4VE.prototxt) | [Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching](http://arxiv.org/abs/2504.06319v1) |  |  | | [note](./notes/2025/07NWF4VE/note.md) |
|
|
372 | 373 | | [Awesome-Efficient-Arch](./meta/2025/Awesome-Efficient-Arch.prototxt) | [Speed Always Wins: A Survey on Efficient Architectures for Large Language Models](http://arxiv.org/abs/2508.09834v1) |  |  | [](https://github.com/weigao266/Awesome-Efficient-Arch) | [note](./notes/2025/Awesome-Efficient-Arch/note.md) |
|
373 | 374 | | [SpindleKV](./meta/2025/SpindleKV.prototxt) | [SpindleKV: A Novel KV Cache Reduction Method Balancing Both Shallow and Deep Layers](http://arxiv.org/abs/2507.06517v1) |  |  | [](https://github.com/tyxqc/SpindleKV) | [note](./notes/2025/SpindleKV/note.md) |
|
374 | 375 | | [Task-KV](./meta/2025/Task-KV.prototxt) | [Task-KV: Task-aware KV Cache Optimization via Semantic Differentiation of Attention Heads](http://arxiv.org/abs/2501.15113v1) |  |  | | [note](./notes/2025/Task-KV/note.md) |
|
375 |
| -| [Týr-the-Pruner](./meta/2025/Týr-the-Pruner.prototxt) | [Týr-the-Pruner: Unlocking Accurate 50% Structural Pruning for LLMs via Global Sparsity Distribution Optimization](http://arxiv.org/abs/2503.09657v2) |  |  | | [note](./notes/2025/Týr-the-Pruner/note.md) | |
376 | 376 | | [Super-Experts-Profilling](./meta/2025/Super-Experts-Profilling.prototxt) | [Unveiling Super Experts in Mixture-of-Experts Large Language Models](http://arxiv.org/abs/2507.23279v1) |  |  | [](https://github.com/ZunhaiSu/Super-Experts-Profilling) | [note](./notes/2025/Super-Experts-Profilling/note.md) |
|
377 | 377 | | [attention-gym](./meta/2025/attention-gym.prototxt) | [Attention-Gym: Triton-Based Sparse and Quantization Attention](https://github.com/RiseAI-Sys/attention-gym) | |  | [](https://github.com/RiseAI-Sys/attention-gym) | [note](./notes/2025/attention-gym/note.md) |
|
378 | 378 | </p>
|
|
0 commit comments