-
Notifications
You must be signed in to change notification settings - Fork 4
Add FlashInfer-Bench blog post #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @xslingcn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new blog post that unveils FlashInfer-Bench, a novel benchmark and infrastructure designed to enable AI agents to systematically optimize GPU kernels for Large Language Models (LLMs). The initiative aims to create a 'virtuous cycle' where AI improves its own underlying systems by providing standardized workload descriptions, real-world performance benchmarks, and a seamless mechanism for integrating AI-generated kernel optimizations directly into production environments. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughA new Markdown blog post Changes
Sequence Diagram(s)No sequence diagrams included. Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🪛 LanguageTool_posts/2025-10-21-flashinfer-bench.md[style] ~17-~17: Consider using a more formal and expressive alternative to ‘amazing’. (AWESOME) [grammar] ~19-~19: Use a hyphen to join words. (QB_NEW_EN_HYPHEN) [style] ~27-~27: To reduce wordiness, try specifying a number or using “many” or “numerous” instead. (LARGE_NUMBER_OF) [style] ~109-~109: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym. (ENGLISH_WORD_REPEAT_BEGINNING_RULE) [style] ~143-~143: To reduce wordiness, try specifying a number or using “many” or “numerous” instead. (LARGE_NUMBER_OF) [style] ~155-~155: Specify a number, remove phrase, use “a few”, or use “some” (SMALL_NUMBER_OF) [style] ~155-~155: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym. (ENGLISH_WORD_REPEAT_BEGINNING_RULE) [style] ~156-~156: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym. (ENGLISH_WORD_REPEAT_BEGINNING_RULE) 🪛 markdownlint-cli2 (0.18.1)_posts/2025-10-21-flashinfer-bench.md17-17: Bare URL used (MD034, no-bare-urls) 177-177: Bare URL used (MD034, no-bare-urls) 178-178: Bare URL used (MD034, no-bare-urls) 179-179: Bare URL used (MD034, no-bare-urls) 180-180: Bare URL used (MD034, no-bare-urls) 🔇 Additional comments (1)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (5)
_posts/2025-10-20-flashinfer-bench.md (5)
17-17: Consider a more formal word choice for "amazing".Line 17: Replace "amazing" with a more formal alternative such as "significant" or "remarkable" to better fit the professional tone.
- We have seen amazing progress recently that shows LLMs can generate reasonable GPU kernels + We have seen significant progress recently that shows LLMs can generate reasonable GPU kernels
27-27: Reduce wordiness: replace "a large number of" with "many".Line 27: Use "many" instead of "a large number of" for conciseness.
- they involve a large number of sophisticated kernels with different API designs and input signatures + they involve many sophisticated kernels with different API designs and input signatures
109-109: Vary sentence structure to avoid repetition.Lines 109–110: Three successive sentences begin with "We". Reword one or more to improve readability.
- We selected the most popular models, including Llama 3, DeepSeek V3, and Qwen 3, and recorded their main kernels, including attention, GEMM, MoE, normalization, sampling, and more. We also strive to ensure realism in the LLM engine configuration. + We selected the most popular models, including Llama 3, DeepSeek V3, and Qwen 3, and recorded their main kernels, including attention, GEMM, MoE, normalization, sampling, and more. To ensure realism in the LLM engine configuration, we...
143-143: Reduce wordiness: replace "a large number of" with "many".Line 143: Use "many" instead of "a large number of" for conciseness.
- By using `apply()`, we can replace a large number of operators in FlashInfer. + By using `apply()`, we can replace many operators in FlashInfer.
155-156: Improve sentence variety and clarity.Lines 155–156: Vary sentence structure to reduce repetition. Three successive sentences begin with "We".
- *We also found a small number of AI-generated kernels with a speedup greater than 1. We deeply understand the importance of ensuring the correctness of kernel implementations, so we are manually verifying the correctness of each kernel. We will soon release kernels with a speedup ratio greater than 1 after review.* + *A small number of AI-generated kernels show speedups greater than 1. Given the importance of correctness, we are manually verifying each kernel. These kernels will be released after review.*Alternatively, specify the exact number if available rather than "a small number".
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (15)
assets/imgs/flashinfer-bench/image1.pngis excluded by!**/*.pngassets/imgs/flashinfer-bench/image10.pngis excluded by!**/*.pngassets/imgs/flashinfer-bench/image11.pngis excluded by!**/*.pngassets/imgs/flashinfer-bench/image12.pngis excluded by!**/*.pngassets/imgs/flashinfer-bench/image13.pngis excluded by!**/*.pngassets/imgs/flashinfer-bench/image14.pngis excluded by!**/*.pngassets/imgs/flashinfer-bench/image15.pngis excluded by!**/*.pngassets/imgs/flashinfer-bench/image2.pngis excluded by!**/*.pngassets/imgs/flashinfer-bench/image3.pngis excluded by!**/*.pngassets/imgs/flashinfer-bench/image4.pngis excluded by!**/*.pngassets/imgs/flashinfer-bench/image5.pngis excluded by!**/*.pngassets/imgs/flashinfer-bench/image6.pngis excluded by!**/*.pngassets/imgs/flashinfer-bench/image7.pngis excluded by!**/*.pngassets/imgs/flashinfer-bench/image8.pngis excluded by!**/*.pngassets/imgs/flashinfer-bench/image9.pngis excluded by!**/*.png
📒 Files selected for processing (1)
_posts/2025-10-20-flashinfer-bench.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
_posts/2025-10-20-flashinfer-bench.md
[style] ~17-~17: Consider using a more formal and expressive alternative to ‘amazing’.
Context: ... operations for AI models. We have seen amazing progress recently that shows LLMs can g...
(AWESOME)
[grammar] ~19-~19: Use a hyphen to join words.
Context: ...nch, a benchmark of real-world AI-system driven GPU workloads, and more important...
(QB_NEW_EN_HYPHEN)
[style] ~27-~27: To reduce wordiness, try specifying a number or using “many” or “numerous” instead.
Context: ...nments are highly complex: they involve a large number of sophisticated kernels with different AP...
(LARGE_NUMBER_OF)
[style] ~109-~109: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...MoE, normalization, sampling, and more. We also strive to ensure realism in the LL...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~143-~143: To reduce wordiness, try specifying a number or using “many” or “numerous” instead.
Context: ...k. By using apply(), we can replace a large number of operators in FlashInfer. If you want to...
(LARGE_NUMBER_OF)
[style] ~155-~155: Specify a number, remove phrase, use “a few”, or use “some”
Context: ...e kernel’s performance. *We also found a small number of AI-generated kernels with a speedup gre...
(SMALL_NUMBER_OF)
[style] ~155-~155: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...rifying the correctness of each kernel. We will soon release kernels with a speedu...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~156-~156: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...up ratio greater than 1 after review.* We also provide a separate leaderboard for...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 markdownlint-cli2 (0.18.1)
_posts/2025-10-20-flashinfer-bench.md
17-17: Bare URL used
(MD034, no-bare-urls)
145-145: No space after hash on atx style heading
(MD018, no-missing-space-atx)
177-177: Bare URL used
(MD034, no-bare-urls)
178-178: Bare URL used
(MD034, no-bare-urls)
179-179: Bare URL used
(MD034, no-bare-urls)
180-180: Bare URL used
(MD034, no-bare-urls)
🔇 Additional comments (1)
_posts/2025-10-20-flashinfer-bench.md (1)
6-6: Verify Jekyll front matter key name.The key
usematjaxmay be a typo. Verify if this is the correct YAML key for your Jekyll configuration—it might need to beuseMathjax,use_mathjax, or another variant depending on your theme and plugins.
| > | ||
| > *—And it is something we’re building: **FlashInfer-Bench**—a benchmark and infrastructure that **opens the pathway for AI to accelerate real-world AI deployment**.* | ||
|
|
||
| AI agents have become remarkably powerful, capable of writing complex code and even building sophisticated systems. Such powerful capabilities naturally lead us to wonder: can AI agents optimize the very production systems they run on? At the heart of these AI systems, the most intensive parts are GPU kernels—the low-level programs that perform the core operations for AI models. We have seen amazing progress recently that shows LLMs can generate reasonable GPU kernels (https://scalingintelligence.stanford.edu/blogs/kernelbench/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrap bare URLs in markdown link syntax.
Lines 17, 177–180 contain bare URLs. For consistency and accessibility, wrap them in markdown link syntax: [text](url).
Example fixes:
- We have seen amazing progress recently that shows LLMs can generate reasonable GPU kernels (https://scalingintelligence.stanford.edu/blogs/kernelbench/).
+ We have seen amazing progress recently that shows LLMs can generate reasonable GPU kernels ([KernelBench](https://scalingintelligence.stanford.edu/blogs/kernelbench/)).
- * Leaderboard (https://bench.flashinfer.ai/)
- * FlashInfer-Bench Dataset (https://huggingface.co/datasets/flashinfer-ai/flashinfer-trace)
- * FlashInfer-Bench Python Package (https://github.com/flashinfer-ai/flashinfer-bench/)
- * Documentation (https://bench.flashinfer.ai/docs)
+ * [Leaderboard](https://bench.flashinfer.ai/)
+ * [FlashInfer-Bench Dataset](https://huggingface.co/datasets/flashinfer-ai/flashinfer-trace)
+ * [FlashInfer-Bench Python Package](https://github.com/flashinfer-ai/flashinfer-bench/)
+ * [Documentation](https://bench.flashinfer.ai/docs)Also applies to: 177-180
🧰 Tools
🪛 LanguageTool
[style] ~17-~17: Consider using a more formal and expressive alternative to ‘amazing’.
Context: ... operations for AI models. We have seen amazing progress recently that shows LLMs can g...
(AWESOME)
🪛 markdownlint-cli2 (0.18.1)
17-17: Bare URL used
(MD034, no-bare-urls)
🤖 Prompt for AI Agents
In _posts/2025-10-20-flashinfer-bench.md around lines 17 and 177-180, several
bare URLs are present; replace each bare URL with markdown link syntax such as
[descriptive text](https://example.com) — for line 17 wrap the kernelbench URL
with an appropriate link text (e.g., [KernelBench
blog](https://scalingintelligence.stanford.edu/blogs/kernelbench/)), and for
lines 177–180 identify each bare URL and convert them to `[text](url)` links
using concise, descriptive link text that matches the surrounding copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a new blog post about FlashInfer-Bench. The post is well-written and informative. I've made a few suggestions to improve clarity, consistency, and fix some minor formatting issues to ensure it renders correctly. These include correcting a typo in the front matter, improving word choice for clarity, standardizing the use of dashes, fixing a markdown heading format, and clarifying the acknowledgements section.
| * Shanli Xing* (UW, CMU): Core components and web development | ||
| * Yiyan Zhai* (CMU): FlashInfer-Trace dataset, workload tracing system | ||
| * Alexander Jiang* (CMU): Benchmark system, agent design | ||
| * Yixin Dong* (CMU): Core idea, overall architecture design |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The asterisks (*) after some contributors' names (Shanli Xing, Yiyan Zhai, Alexander Jiang, Yixin Dong) are a bit ambiguous. In academic papers, this often denotes equal contribution or lead authors. It would be clearer for readers if you added a note explaining the meaning of the asterisk, for example: (* denotes primary contributors).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (8)
_posts/2025-10-20-flashinfer-bench.md (8)
145-145: Add missing space after heading marker.Line 145 is missing a space after
##, breaking markdown heading rendering. The markdown parser expects##(with space) for level-2 headings.-##FlashInfer-Bench Leaderboard — The Arena for LLM Kernel Optimization +## FlashInfer-Bench Leaderboard — The Arena for LLM Kernel Optimization
17-17: Wrap bare URL in markdown link syntax and reconsider word choice.Line 17 contains a bare URL. Additionally, consider using a more formal alternative to "amazing" for professional tone.
-We have seen amazing progress recently that shows LLMs can generate reasonable GPU kernels (https://scalingintelligence.stanford.edu/blogs/kernelbench/). +We have seen remarkable progress recently that shows LLMs can generate reasonable GPU kernels ([KernelBench](https://scalingintelligence.stanford.edu/blogs/kernelbench/)).
19-19: Fix two language issues: word choice and hyphenation.Line 19 has "systemically" (should be "systematically") and "AI-system driven" (should be "AI-system-driven" for proper compound modifier hyphenation).
-That prompts us to ask the next natural question: how can we systemically get AI agents to improve the very AI system they depend on? We know there can still be hurdles towards this ultimate dream, but it is time to get ready to build a clear pathway for the future. We build FlashInfer-Bench, a benchmark of real-world AI-system driven GPU workloads, and more importantly, an infrastructure and workflow to 0-day ship AI-generated kernels into production. +That prompts us to ask the next natural question: how can we systematically get AI agents to improve the very AI system they depend on? We know there can still be hurdles towards this ultimate dream, but it is time to get ready to build a clear pathway for the future. We build FlashInfer-Bench, a benchmark of real-world AI-system-driven GPU workloads, and more importantly, an infrastructure and workflow to 0-day ship AI-generated kernels into production.
177-180: Wrap bare URLs in markdown link syntax.Lines 177–180 contain bare URLs. Wrap each in markdown link syntax for consistency and accessibility.
For more information, please visit the following links: -* Leaderboard (https://bench.flashinfer.ai/) -* FlashInfer-Bench Dataset (https://huggingface.co/datasets/flashinfer-ai/flashinfer-trace) -* FlashInfer-Bench Python Package (https://github.com/flashinfer-ai/flashinfer-bench/) -* Documentation (https://bench.flashinfer.ai/docs) +* [Leaderboard](https://bench.flashinfer.ai/) +* [FlashInfer-Bench Dataset](https://huggingface.co/datasets/flashinfer-ai/flashinfer-trace) +* [FlashInfer-Bench Python Package](https://github.com/flashinfer-ai/flashinfer-bench/) +* [Documentation](https://bench.flashinfer.ai/docs)
184-184: Fix capitalization of project name.Line 184 has "Flashinfer-Bench" but should be "FlashInfer-Bench" (with capital I) for consistency with the rest of the document.
-Flashinfer-Bench is a research effort initiated in collaboration with CMU Catalyst, NVIDIA, and Bosch. +FlashInfer-Bench is a research effort initiated in collaboration with CMU Catalyst, NVIDIA, and Bosch.
43-43: Use consistent em-dashes throughout the list.Line 43 uses a mix of en-dashes (–) and em-dashes (—). For consistency with other list items, use em-dashes throughout.
-- **Establish a 0-day production path – First-class FlashInfer Integration.** We build first-class integration with FlashInfer – an open LLM kernel library widely used in major LLM inference engines. +- **Establish a 0-day production path — First-class FlashInfer Integration.** We build first-class integration with FlashInfer — an open LLM kernel library widely used in major LLM inference engines.
188-191: Clarify meaning of asterisks after contributor names.Lines 188–191 include asterisks after some names (Shanli Xing, Yiyan Zhai, Alexander Jiang, Yixin Dong) but their meaning is unclear. Add a note explaining what the asterisks denote (e.g., primary contributors, equal contribution, lead authors).
After the acknowledgements heading or before the contributor list, add clarification:
+*Equal contribution or lead contributors + * Shanli Xing* (UW, CMU): Core components and web development * Yiyan Zhai* (CMU): FlashInfer-Trace dataset, workload tracing system * Alexander Jiang* (CMU): Benchmark system, agent design
6-6: Fix MathJax front matter typo.Line 6 has
usematjaxwhich should beusemathjaxto enable MathJax rendering for mathematical formulas.-usematjax: true +usemathjax: true
🧹 Nitpick comments (1)
_posts/2025-10-20-flashinfer-bench.md (1)
27-27: Consider tightening wordiness and reducing sentence-opening repetition.Several passages could be more concise:
- Line 27: "a large number of sophisticated kernels" → "many sophisticated kernels" or "numerous kernels"
- Line 109: Three successive sentences begin with "We"; consider rewording for variety
- Line 143: "a large number of operators" → "many operators"
- Lines 155–156: "a small number of AI-generated kernels" could be "a few AI-generated kernels"; also, multiple consecutive sentences start with "We"
These are stylistic refinements that can improve readability without changing meaning.
Also applies to: 109-109, 143-143, 155-156
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
_posts/2025-10-20-flashinfer-bench.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
_posts/2025-10-20-flashinfer-bench.md
[style] ~17-~17: Consider using a more formal and expressive alternative to ‘amazing’.
Context: ... operations for AI models. We have seen amazing progress recently that shows LLMs can g...
(AWESOME)
[grammar] ~19-~19: Use a hyphen to join words.
Context: ...nch, a benchmark of real-world AI-system driven GPU workloads, and more important...
(QB_NEW_EN_HYPHEN)
[style] ~27-~27: To reduce wordiness, try specifying a number or using “many” or “numerous” instead.
Context: ...nments are highly complex: they involve a large number of sophisticated kernels with different AP...
(LARGE_NUMBER_OF)
[style] ~109-~109: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...MoE, normalization, sampling, and more. We also strive to ensure realism in the LL...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~143-~143: To reduce wordiness, try specifying a number or using “many” or “numerous” instead.
Context: ...k. By using apply(), we can replace a large number of operators in FlashInfer. If you want to...
(LARGE_NUMBER_OF)
[style] ~155-~155: Specify a number, remove phrase, use “a few”, or use “some”
Context: ...e kernel’s performance. *We also found a small number of AI-generated kernels with a speedup gre...
(SMALL_NUMBER_OF)
[style] ~155-~155: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...rifying the correctness of each kernel. We will soon release kernels with a speedu...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~156-~156: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...up ratio greater than 1 after review.* We also provide a separate leaderboard for...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 markdownlint-cli2 (0.18.1)
_posts/2025-10-20-flashinfer-bench.md
17-17: Bare URL used
(MD034, no-bare-urls)
145-145: No space after hash on atx style heading
(MD018, no-missing-space-atx)
177-177: Bare URL used
(MD034, no-bare-urls)
178-178: Bare URL used
(MD034, no-bare-urls)
179-179: Bare URL used
(MD034, no-bare-urls)
180-180: Bare URL used
(MD034, no-bare-urls)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (6)
_posts/2025-10-21-flashinfer-bench.md (6)
184-184: Fix inconsistent capitalization of project name.Line 184 uses
Flashinfer-Bench, but the correct project name throughout the post isFlashInfer-Bench(capitalI). Maintain consistency.-Flashinfer-Bench is a research effort initiated in collaboration with CMU Catalyst, NVIDIA, and Bosch. +FlashInfer-Bench is a research effort initiated in collaboration with CMU Catalyst, NVIDIA, and Bosch.
19-19: Fix typo and hyphenation in compound modifier.Line 19 has two issues: (1)
systemicallyshould besystematically, and (2)AI-system drivenshould beAI-system-driven(compound modifiers must be hyphenated).-That prompts us to ask the next natural question: how can we systemically get AI agents to improve the very AI system they depend on? We know there can still be hurdles towards this ultimate dream, but it is time to get ready to build a clear pathway for the future. We build FlashInfer-Bench, a benchmark of real-world AI-system driven GPU workloads, and more importantly, an infrastructure and workflow to 0-day ship AI-generated kernels into production. +That prompts us to ask the next natural question: how can we systematically get AI agents to improve the very AI system they depend on? We know there can still be hurdles towards this ultimate dream, but it is time to get ready to build a clear pathway for the future. We build FlashInfer-Bench, a benchmark of real-world AI-system-driven GPU workloads, and more importantly, an infrastructure and workflow to 0-day ship AI-generated kernels into production.
17-17: Wrap bare URL in markdown link syntax.Line 17 contains a bare URL. For consistency and accessibility, wrap it in markdown link syntax:
[text](url).-We have seen amazing progress recently that shows LLMs can generate reasonable GPU kernels (https://scalingintelligence.stanford.edu/blogs/kernelbench/). +We have seen amazing progress recently that shows LLMs can generate reasonable GPU kernels ([KernelBench blog](https://scalingintelligence.stanford.edu/blogs/kernelbench/)).
177-180: Wrap bare URLs in markdown link syntax.Lines 177–180 contain bare URLs. For consistency and accessibility, wrap each in markdown link syntax using descriptive link text.
-* Leaderboard (https://bench.flashinfer.ai/) -* FlashInfer-Bench Dataset (https://huggingface.co/datasets/flashinfer-ai/flashinfer-trace) -* FlashInfer-Bench Python Package (https://github.com/flashinfer-ai/flashinfer-bench/) -* Documentation (https://bench.flashinfer.ai/docs) +* [Leaderboard](https://bench.flashinfer.ai/) +* [FlashInfer-Bench Dataset](https://huggingface.co/datasets/flashinfer-ai/flashinfer-trace) +* [FlashInfer-Bench Python Package](https://github.com/flashinfer-ai/flashinfer-bench/) +* [Documentation](https://bench.flashinfer.ai/docs)
145-145: Add missing space after markdown heading hash.Line 145 is missing the required space after
##, which breaks markdown rendering. Markdown requires##(with space) for level-2 headings.-##FlashInfer-Bench Leaderboard — The Arena for LLM Kernel Optimization +## FlashInfer-Bench Leaderboard — The Arena for LLM Kernel Optimization
6-6: Fix typo in YAML frontmatter to enable MathJax.The attribute
usematjaxshould beusemathjaxto correctly enable MathJax rendering for mathematical formulas in the post.-usematjax: true +usemathjax: true
🧹 Nitpick comments (2)
_posts/2025-10-21-flashinfer-bench.md (2)
43-43: Use em-dash for consistency with other list items.Line 43 uses an en-dash (
–), but other list items use em-dashes (—). Standardize on em-dash for visual and typographic consistency.-- **Establish a 0-day production path – First-class FlashInfer Integration.** We build first-class integration with FlashInfer – an open LLM kernel library widely used in major LLM inference engines. Our solution can dynamically replace the FlashInfer Kernels with the best-performing kernels, as evaluated using the FlashInfer Trace and the FlashInfer-Bench dataset. This enables activating the best kernels in LLM engines and testing end-to-end performance with minimal effort. +- **Establish a 0-day production path — First-class FlashInfer Integration.** We build first-class integration with FlashInfer — an open LLM kernel library widely used in major LLM inference engines. Our solution can dynamically replace the FlashInfer Kernels with the best-performing kernels, as evaluated using the FlashInfer Trace and the FlashInfer-Bench dataset. This enables activating the best kernels in LLM engines and testing end-to-end performance with minimal effort.
188-191: Clarify asterisk notation for contributors.Lines 188–191 mark several contributors with asterisks, but the meaning is not explicitly stated. In academic contexts, asterisks often denote equal contribution or lead authors. Consider adding a brief note explaining their significance.
For example, add a line before the contributor list:
We thank the entire FlashInfer-Bench team for their contributions to the project: +(\* denotes primary/lead contributors) + * Shanli Xing* (UW, CMU): Core components and web development
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
_posts/2025-10-21-flashinfer-bench.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
_posts/2025-10-21-flashinfer-bench.md
[style] ~17-~17: Consider using a more formal and expressive alternative to ‘amazing’.
Context: ... operations for AI models. We have seen amazing progress recently that shows LLMs can g...
(AWESOME)
[grammar] ~19-~19: Use a hyphen to join words.
Context: ...nch, a benchmark of real-world AI-system driven GPU workloads, and more important...
(QB_NEW_EN_HYPHEN)
[style] ~27-~27: To reduce wordiness, try specifying a number or using “many” or “numerous” instead.
Context: ...nments are highly complex: they involve a large number of sophisticated kernels with different AP...
(LARGE_NUMBER_OF)
[style] ~109-~109: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...MoE, normalization, sampling, and more. We also strive to ensure realism in the LL...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~143-~143: To reduce wordiness, try specifying a number or using “many” or “numerous” instead.
Context: ...k. By using apply(), we can replace a large number of operators in FlashInfer. If you want to...
(LARGE_NUMBER_OF)
[style] ~155-~155: Specify a number, remove phrase, use “a few”, or use “some”
Context: ...e kernel’s performance. *We also found a small number of AI-generated kernels with a speedup gre...
(SMALL_NUMBER_OF)
[style] ~155-~155: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...rifying the correctness of each kernel. We will soon release kernels with a speedu...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~156-~156: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...up ratio greater than 1 after review.* We also provide a separate leaderboard for...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 markdownlint-cli2 (0.18.1)
_posts/2025-10-21-flashinfer-bench.md
17-17: Bare URL used
(MD034, no-bare-urls)
145-145: No space after hash on atx style heading
(MD018, no-missing-space-atx)
177-177: Bare URL used
(MD034, no-bare-urls)
178-178: Bare URL used
(MD034, no-bare-urls)
179-179: Bare URL used
(MD034, no-bare-urls)
180-180: Bare URL used
(MD034, no-bare-urls)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address gemini's comments. @xslingcn Could you check these changes and if ok, commit it? Thanks.
Co-authored-by: Yixin Dong <ubospica@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Summary by CodeRabbit