Add FlashInfer-Bench blog post #5

xslingcn · 2025-10-20T19:35:10Z

Summary by CodeRabbit

Documentation
- Added a comprehensive blog post introducing FlashInfer-Bench: benchmark, dataset, architecture, workflow, evaluation, and roadmap.
New Features
- Introduced FlashInfer Trace for end-to-end workload/solution exchange and a public Leaderboard with the new fast_p performance metric.
- Described Day‑Zero integration enabling environment-driven operator substitution and remapping for production rollouts.
Datasets & Models
- Documented curated real-world kernels, workloads, and model coverage examples.

gemini-code-assist · 2025-10-20T19:35:21Z

Summary of Changes

Hello @xslingcn, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new blog post that unveils FlashInfer-Bench, a novel benchmark and infrastructure designed to enable AI agents to systematically optimize GPU kernels for Large Language Models (LLMs). The initiative aims to create a 'virtuous cycle' where AI improves its own underlying systems by providing standardized workload descriptions, real-world performance benchmarks, and a seamless mechanism for integrating AI-generated kernel optimizations directly into production environments.

Highlights

New Blog Post: A new blog post titled 'FlashInfer-Bench: Building the Virtuous Cycle for AI-driven LLM Systems' has been added, introducing a comprehensive benchmark and infrastructure for optimizing LLM GPU kernels.
FlashInfer Trace: The post details FlashInfer Trace, an open JSON schema for standardizing the definition, workload, solution, and evaluation of GPU kernels in LLM inference scenarios.
FlashInfer-Bench Dataset: It introduces the FlashInfer-Bench Dataset, a curated collection of real-world kernel workloads from popular LLMs (Llama 3, DeepSeek V3, Qwen 3) to ensure realistic benchmarking.
Day-Zero Production Integration: The blog highlights the 'day-zero' production path through first-class integration with FlashInfer, allowing dynamic replacement of kernels with best-performing, AI-generated solutions using an apply() decorator.
FlashInfer-Bench Leaderboard: The post also announces the FlashInfer-Bench Leaderboard, which uses the fast_p metric to compare kernel performance and visualize results, fostering competition and improvement in LLM kernel optimization.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2025-10-20T19:35:31Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

A new Markdown blog post _posts/2025-10-21-flashinfer-bench.md adds documentation for FlashInfer-Bench, describing FlashInfer Trace, dataset curation, architecture and workflow, a Python package with dynamic operator substitution, a leaderboard with the fast_p metric, and integration/production guidance.

Changes

Cohort / File(s)	Summary
Blog Post `\_posts/2025-10-21-flashinfer-bench.md`	Added a comprehensive post introducing FlashInfer-Bench: benchmark concepts (FlashInfer Trace), dataset and kernel definitions, architecture and workflow (definition → workload → solution → evaluation), Python package and `flashinfer_bench.apply` operator substitution, leaderboard and fast_p metric, figures, examples, and acknowledgements.

Sequence Diagram(s)

No sequence diagrams included.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 I hopped through traces, datasets in tow,
Tuning kernels where fast breezes blow.
A bench to measure, pick, and play,
Day-zero paths to guide the way.
🥕✨ — from a bunny's joyful hop.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "Add FlashInfer-Bench blog post" directly and accurately describes the main change in the changeset, which is the addition of a new Markdown blog post file at `_posts/2025-10-21-flashinfer-bench.md`. The title is concise (4 words), specific in mentioning "FlashInfer-Bench" rather than being generic, and clear enough that a teammate scanning the repository history would immediately understand that this PR adds content about the FlashInfer-Bench benchmark and infrastructure. The title avoids vague language and unnecessary noise, making it an appropriate summary of the primary change.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 65ba35f and e849edc.

📒 Files selected for processing (1)

_posts/2025-10-21-flashinfer-bench.md (1 hunks)

🧰 Additional context used

🪛 LanguageTool

_posts/2025-10-21-flashinfer-bench.md

[style] ~17-~17: Consider using a more formal and expressive alternative to ‘amazing’.
Context: ... operations for AI models. We have seen amazing progress recently that shows LLMs can g...

(AWESOME)

[grammar] ~19-~19: Use a hyphen to join words.
Context: ...nch, a benchmark of real-world AI-system driven GPU workloads, and more important...

(QB_NEW_EN_HYPHEN)

[style] ~27-~27: To reduce wordiness, try specifying a number or using “many” or “numerous” instead.
Context: ...nments are highly complex: they involve a large number of sophisticated kernels with different AP...

(LARGE_NUMBER_OF)

[style] ~109-~109: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...MoE, normalization, sampling, and more. We also strive to ensure realism in the LL...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~143-~143: To reduce wordiness, try specifying a number or using “many” or “numerous” instead.
Context: ...k. By using apply(), we can replace a large number of operators in FlashInfer. If you want to...

(LARGE_NUMBER_OF)

[style] ~155-~155: Specify a number, remove phrase, use “a few”, or use “some”
Context: ...e kernel’s performance. *We also found a small number of AI-generated kernels with a speedup gre...

(SMALL_NUMBER_OF)

[style] ~155-~155: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...rifying the correctness of each kernel. We will soon release kernels with a speedu...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~156-~156: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...up ratio greater than 1 after review.* We also provide a separate leaderboard for...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 markdownlint-cli2 (0.18.1)

_posts/2025-10-21-flashinfer-bench.md

17-17: Bare URL used

(MD034, no-bare-urls)

177-177: Bare URL used

(MD034, no-bare-urls)

178-178: Bare URL used

(MD034, no-bare-urls)

179-179: Bare URL used

(MD034, no-bare-urls)

180-180: Bare URL used

(MD034, no-bare-urls)

🔇 Additional comments (1)

_posts/2025-10-21-flashinfer-bench.md (1)
184-184: Maintain consistent capitalization of project name.

Line 184: Change Flashinfer-Bench to FlashInfer-Bench to match the project's name formatting used throughout the post.
- FlashInfer-Bench is a research effort initiated in collaboration with CMU Catalyst, NVIDIA, and Bosch. We are bringing up an open community as part of the FlashInfer community and welcome contributions from the ML systems community.
+ FlashInfer-Bench is a research effort initiated in collaboration with CMU Catalyst, NVIDIA, and Bosch. We are bringing up an open community as part of the FlashInfer community and welcome contributions from the ML systems community.
Likely an incorrect or invalid review comment.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (5)

_posts/2025-10-20-flashinfer-bench.md (5)
17-17: Consider a more formal word choice for "amazing".

Line 17: Replace "amazing" with a more formal alternative such as "significant" or "remarkable" to better fit the professional tone.
- We have seen amazing progress recently that shows LLMs can generate reasonable GPU kernels
+ We have seen significant progress recently that shows LLMs can generate reasonable GPU kernels
27-27: Reduce wordiness: replace "a large number of" with "many".

Line 27: Use "many" instead of "a large number of" for conciseness.
- they involve a large number of sophisticated kernels with different API designs and input signatures
+ they involve many sophisticated kernels with different API designs and input signatures
109-109: Vary sentence structure to avoid repetition.

Lines 109–110: Three successive sentences begin with "We". Reword one or more to improve readability.
- We selected the most popular models, including Llama 3, DeepSeek V3, and Qwen 3, and recorded their main kernels, including attention, GEMM, MoE, normalization, sampling, and more. We also strive to ensure realism in the LLM engine configuration.
+ We selected the most popular models, including Llama 3, DeepSeek V3, and Qwen 3, and recorded their main kernels, including attention, GEMM, MoE, normalization, sampling, and more. To ensure realism in the LLM engine configuration, we...
143-143: Reduce wordiness: replace "a large number of" with "many".

Line 143: Use "many" instead of "a large number of" for conciseness.
- By using `apply()`, we can replace a large number of operators in FlashInfer.
+ By using `apply()`, we can replace many operators in FlashInfer.
155-156: Improve sentence variety and clarity.

Lines 155–156: Vary sentence structure to reduce repetition. Three successive sentences begin with "We".
- *We also found a small number of AI-generated kernels with a speedup greater than 1. We deeply understand the importance of ensuring the correctness of kernel implementations, so we are manually verifying the correctness of each kernel. We will soon release kernels with a speedup ratio greater than 1 after review.*
+ *A small number of AI-generated kernels show speedups greater than 1. Given the importance of correctness, we are manually verifying each kernel. These kernels will be released after review.*
Alternatively, specify the exact number if available rather than "a small number".

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b6043bc and 367580b.

⛔ Files ignored due to path filters (15)

assets/imgs/flashinfer-bench/image1.png is excluded by !**/*.png
assets/imgs/flashinfer-bench/image10.png is excluded by !**/*.png
assets/imgs/flashinfer-bench/image11.png is excluded by !**/*.png
assets/imgs/flashinfer-bench/image12.png is excluded by !**/*.png
assets/imgs/flashinfer-bench/image13.png is excluded by !**/*.png
assets/imgs/flashinfer-bench/image14.png is excluded by !**/*.png
assets/imgs/flashinfer-bench/image15.png is excluded by !**/*.png
assets/imgs/flashinfer-bench/image2.png is excluded by !**/*.png
assets/imgs/flashinfer-bench/image3.png is excluded by !**/*.png
assets/imgs/flashinfer-bench/image4.png is excluded by !**/*.png
assets/imgs/flashinfer-bench/image5.png is excluded by !**/*.png
assets/imgs/flashinfer-bench/image6.png is excluded by !**/*.png
assets/imgs/flashinfer-bench/image7.png is excluded by !**/*.png
assets/imgs/flashinfer-bench/image8.png is excluded by !**/*.png
assets/imgs/flashinfer-bench/image9.png is excluded by !**/*.png

📒 Files selected for processing (1)

_posts/2025-10-20-flashinfer-bench.md (1 hunks)

🧰 Additional context used

🪛 LanguageTool

_posts/2025-10-20-flashinfer-bench.md

[style] ~17-~17: Consider using a more formal and expressive alternative to ‘amazing’.
Context: ... operations for AI models. We have seen amazing progress recently that shows LLMs can g...

(AWESOME)

[grammar] ~19-~19: Use a hyphen to join words.
Context: ...nch, a benchmark of real-world AI-system driven GPU workloads, and more important...

(QB_NEW_EN_HYPHEN)

[style] ~27-~27: To reduce wordiness, try specifying a number or using “many” or “numerous” instead.
Context: ...nments are highly complex: they involve a large number of sophisticated kernels with different AP...

(LARGE_NUMBER_OF)

[style] ~109-~109: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...MoE, normalization, sampling, and more. We also strive to ensure realism in the LL...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~143-~143: To reduce wordiness, try specifying a number or using “many” or “numerous” instead.
Context: ...k. By using apply(), we can replace a large number of operators in FlashInfer. If you want to...

(LARGE_NUMBER_OF)

[style] ~155-~155: Specify a number, remove phrase, use “a few”, or use “some”
Context: ...e kernel’s performance. *We also found a small number of AI-generated kernels with a speedup gre...

(SMALL_NUMBER_OF)

[style] ~155-~155: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...rifying the correctness of each kernel. We will soon release kernels with a speedu...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~156-~156: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...up ratio greater than 1 after review.* We also provide a separate leaderboard for...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 markdownlint-cli2 (0.18.1)

_posts/2025-10-20-flashinfer-bench.md

17-17: Bare URL used

(MD034, no-bare-urls)

145-145: No space after hash on atx style heading

(MD018, no-missing-space-atx)

177-177: Bare URL used

(MD034, no-bare-urls)

178-178: Bare URL used

(MD034, no-bare-urls)

179-179: Bare URL used

(MD034, no-bare-urls)

180-180: Bare URL used

(MD034, no-bare-urls)

🔇 Additional comments (1)

_posts/2025-10-20-flashinfer-bench.md (1)

6-6: Verify Jekyll front matter key name.

The key usematjax may be a typo. Verify if this is the correct YAML key for your Jekyll configuration—it might need to be useMathjax, use_mathjax, or another variant depending on your theme and plugins.

coderabbitai · 2025-10-20T19:36:50Z

_posts/2025-10-21-flashinfer-bench.md

+>
+> *—And it is something we’re building: **FlashInfer-Bench**—a benchmark and infrastructure that **opens the pathway for AI to accelerate real-world AI deployment**.*
+
+AI agents have become remarkably powerful, capable of writing complex code and even building sophisticated systems. Such powerful capabilities naturally lead us to wonder: can AI agents optimize the very production systems they run on? At the heart of these AI systems, the most intensive parts are GPU kernels—the low-level programs that perform the core operations for AI models. We have seen amazing progress recently that shows LLMs can generate reasonable GPU kernels (https://scalingintelligence.stanford.edu/blogs/kernelbench/).


⚠️ Potential issue | 🟡 Minor

Wrap bare URLs in markdown link syntax.

Lines 17, 177–180 contain bare URLs. For consistency and accessibility, wrap them in markdown link syntax: [text](url).

Example fixes:

- We have seen amazing progress recently that shows LLMs can generate reasonable GPU kernels (https://scalingintelligence.stanford.edu/blogs/kernelbench/). + We have seen amazing progress recently that shows LLMs can generate reasonable GPU kernels ([KernelBench](https://scalingintelligence.stanford.edu/blogs/kernelbench/)). - * Leaderboard (https://bench.flashinfer.ai/) - * FlashInfer-Bench Dataset (https://huggingface.co/datasets/flashinfer-ai/flashinfer-trace) - * FlashInfer-Bench Python Package (https://github.com/flashinfer-ai/flashinfer-bench/) - * Documentation (https://bench.flashinfer.ai/docs) + * [Leaderboard](https://bench.flashinfer.ai/) + * [FlashInfer-Bench Dataset](https://huggingface.co/datasets/flashinfer-ai/flashinfer-trace) + * [FlashInfer-Bench Python Package](https://github.com/flashinfer-ai/flashinfer-bench/) + * [Documentation](https://bench.flashinfer.ai/docs)

Also applies to: 177-180

🧰 Tools

🪛 LanguageTool

[style] ~17-~17: Consider using a more formal and expressive alternative to ‘amazing’.
Context: ... operations for AI models. We have seen amazing progress recently that shows LLMs can g...

(AWESOME)

🪛 markdownlint-cli2 (0.18.1)

17-17: Bare URL used

(MD034, no-bare-urls)

🤖 Prompt for AI Agents

In _posts/2025-10-20-flashinfer-bench.md around lines 17 and 177-180, several bare URLs are present; replace each bare URL with markdown link syntax such as [descriptive text](https://example.com) — for line 17 wrap the kernelbench URL with an appropriate link text (e.g., [KernelBench blog](https://scalingintelligence.stanford.edu/blogs/kernelbench/)), and for lines 177–180 identify each bare URL and convert them to `[text](url)` links using concise, descriptive link text that matches the surrounding copy.

_posts/2025-10-21-flashinfer-bench.md

gemini-code-assist

Code Review

This pull request adds a new blog post about FlashInfer-Bench. The post is well-written and informative. I've made a few suggestions to improve clarity, consistency, and fix some minor formatting issues to ensure it renders correctly. These include correcting a typo in the front matter, improving word choice for clarity, standardizing the use of dashes, fixing a markdown heading format, and clarifying the acknowledgements section.

_posts/2025-10-21-flashinfer-bench.md

gemini-code-assist · 2025-10-20T19:37:22Z

_posts/2025-10-21-flashinfer-bench.md

+* Shanli Xing* (UW, CMU): Core components and web development  
+* Yiyan Zhai* (CMU): FlashInfer-Trace dataset, workload tracing system  
+* Alexander Jiang* (CMU): Benchmark system, agent design  
+* Yixin Dong* (CMU): Core idea, overall architecture design  


The asterisks (*) after some contributors' names (Shanli Xing, Yiyan Zhai, Alexander Jiang, Yixin Dong) are a bit ambiguous. In academic papers, this often denotes equal contribution or lead authors. It would be clearer for readers if you added a note explaining the meaning of the asterisk, for example: (* denotes primary contributors).

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (8)

_posts/2025-10-20-flashinfer-bench.md (8)
145-145: Add missing space after heading marker.

Line 145 is missing a space after ##, breaking markdown heading rendering. The markdown parser expects ## (with space) for level-2 headings.
-##FlashInfer-Bench Leaderboard — The Arena for LLM Kernel Optimization
+## FlashInfer-Bench Leaderboard — The Arena for LLM Kernel Optimization
17-17: Wrap bare URL in markdown link syntax and reconsider word choice.

Line 17 contains a bare URL. Additionally, consider using a more formal alternative to "amazing" for professional tone.
-We have seen amazing progress recently that shows LLMs can generate reasonable GPU kernels (https://scalingintelligence.stanford.edu/blogs/kernelbench/).
+We have seen remarkable progress recently that shows LLMs can generate reasonable GPU kernels ([KernelBench](https://scalingintelligence.stanford.edu/blogs/kernelbench/)).
19-19: Fix two language issues: word choice and hyphenation.

Line 19 has "systemically" (should be "systematically") and "AI-system driven" (should be "AI-system-driven" for proper compound modifier hyphenation).
-That prompts us to ask the next natural question: how can we systemically get AI agents to improve the very AI system they depend on? We know there can still be hurdles towards this ultimate dream, but it is time to get ready to build a clear pathway for the future. We build FlashInfer-Bench, a benchmark of real-world AI-system driven GPU workloads, and more importantly, an infrastructure and workflow to 0-day ship AI-generated kernels into production.
+That prompts us to ask the next natural question: how can we systematically get AI agents to improve the very AI system they depend on? We know there can still be hurdles towards this ultimate dream, but it is time to get ready to build a clear pathway for the future. We build FlashInfer-Bench, a benchmark of real-world AI-system-driven GPU workloads, and more importantly, an infrastructure and workflow to 0-day ship AI-generated kernels into production.
177-180: Wrap bare URLs in markdown link syntax.

Lines 177–180 contain bare URLs. Wrap each in markdown link syntax for consistency and accessibility.
 For more information, please visit the following links:
 
-* Leaderboard (https://bench.flashinfer.ai/)  
-* FlashInfer-Bench Dataset (https://huggingface.co/datasets/flashinfer-ai/flashinfer-trace)
-* FlashInfer-Bench Python Package (https://github.com/flashinfer-ai/flashinfer-bench/)
-* Documentation (https://bench.flashinfer.ai/docs)
+* [Leaderboard](https://bench.flashinfer.ai/)  
+* [FlashInfer-Bench Dataset](https://huggingface.co/datasets/flashinfer-ai/flashinfer-trace)
+* [FlashInfer-Bench Python Package](https://github.com/flashinfer-ai/flashinfer-bench/)
+* [Documentation](https://bench.flashinfer.ai/docs)
184-184: Fix capitalization of project name.

Line 184 has "Flashinfer-Bench" but should be "FlashInfer-Bench" (with capital I) for consistency with the rest of the document.
-Flashinfer-Bench is a research effort initiated in collaboration with CMU Catalyst, NVIDIA, and Bosch.
+FlashInfer-Bench is a research effort initiated in collaboration with CMU Catalyst, NVIDIA, and Bosch.
43-43: Use consistent em-dashes throughout the list.

Line 43 uses a mix of en-dashes (–) and em-dashes (—). For consistency with other list items, use em-dashes throughout.
-- **Establish a 0-day production path – First-class FlashInfer Integration.** We build first-class integration with FlashInfer – an open LLM kernel library widely used in major LLM inference engines.
+- **Establish a 0-day production path — First-class FlashInfer Integration.** We build first-class integration with FlashInfer — an open LLM kernel library widely used in major LLM inference engines.
188-191: Clarify meaning of asterisks after contributor names.

Lines 188–191 include asterisks after some names (Shanli Xing, Yiyan Zhai, Alexander Jiang, Yixin Dong) but their meaning is unclear. Add a note explaining what the asterisks denote (e.g., primary contributors, equal contribution, lead authors).

After the acknowledgements heading or before the contributor list, add clarification:
+*Equal contribution or lead contributors
+
 * Shanli Xing* (UW, CMU): Core components and web development  
 * Yiyan Zhai* (CMU): FlashInfer-Trace dataset, workload tracing system  
 * Alexander Jiang* (CMU): Benchmark system, agent design  
6-6: Fix MathJax front matter typo.

Line 6 has usematjax which should be usemathjax to enable MathJax rendering for mathematical formulas.
-usematjax: true
+usemathjax: true

🧹 Nitpick comments (1)

_posts/2025-10-20-flashinfer-bench.md (1)

27-27: Consider tightening wordiness and reducing sentence-opening repetition.

Several passages could be more concise:

Line 27: "a large number of sophisticated kernels" → "many sophisticated kernels" or "numerous kernels"

Line 109: Three successive sentences begin with "We"; consider rewording for variety

Line 143: "a large number of operators" → "many operators"

Lines 155–156: "a small number of AI-generated kernels" could be "a few AI-generated kernels"; also, multiple consecutive sentences start with "We"

These are stylistic refinements that can improve readability without changing meaning.

Also applies to: 109-109, 143-143, 155-156

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 367580b and 7674a25.

📒 Files selected for processing (1)

_posts/2025-10-20-flashinfer-bench.md (1 hunks)

🧰 Additional context used

🪛 LanguageTool

_posts/2025-10-20-flashinfer-bench.md

[style] ~17-~17: Consider using a more formal and expressive alternative to ‘amazing’.
Context: ... operations for AI models. We have seen amazing progress recently that shows LLMs can g...

(AWESOME)

[grammar] ~19-~19: Use a hyphen to join words.
Context: ...nch, a benchmark of real-world AI-system driven GPU workloads, and more important...

(QB_NEW_EN_HYPHEN)

[style] ~27-~27: To reduce wordiness, try specifying a number or using “many” or “numerous” instead.
Context: ...nments are highly complex: they involve a large number of sophisticated kernels with different AP...

(LARGE_NUMBER_OF)

[style] ~109-~109: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...MoE, normalization, sampling, and more. We also strive to ensure realism in the LL...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~143-~143: To reduce wordiness, try specifying a number or using “many” or “numerous” instead.
Context: ...k. By using apply(), we can replace a large number of operators in FlashInfer. If you want to...

(LARGE_NUMBER_OF)

[style] ~155-~155: Specify a number, remove phrase, use “a few”, or use “some”
Context: ...e kernel’s performance. *We also found a small number of AI-generated kernels with a speedup gre...

(SMALL_NUMBER_OF)

[style] ~155-~155: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...rifying the correctness of each kernel. We will soon release kernels with a speedu...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~156-~156: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...up ratio greater than 1 after review.* We also provide a separate leaderboard for...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 markdownlint-cli2 (0.18.1)

_posts/2025-10-20-flashinfer-bench.md

17-17: Bare URL used

(MD034, no-bare-urls)

145-145: No space after hash on atx style heading

(MD018, no-missing-space-atx)

177-177: Bare URL used

(MD034, no-bare-urls)

178-178: Bare URL used

(MD034, no-bare-urls)

179-179: Bare URL used

(MD034, no-bare-urls)

180-180: Bare URL used

(MD034, no-bare-urls)

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (6)

_posts/2025-10-21-flashinfer-bench.md (6)
184-184: Fix inconsistent capitalization of project name.

Line 184 uses Flashinfer-Bench, but the correct project name throughout the post is FlashInfer-Bench (capital I). Maintain consistency.
-Flashinfer-Bench is a research effort initiated in collaboration with CMU Catalyst, NVIDIA, and Bosch.
+FlashInfer-Bench is a research effort initiated in collaboration with CMU Catalyst, NVIDIA, and Bosch.
19-19: Fix typo and hyphenation in compound modifier.

Line 19 has two issues: (1) systemically should be systematically, and (2) AI-system driven should be AI-system-driven (compound modifiers must be hyphenated).
-That prompts us to ask the next natural question: how can we systemically get AI agents to improve the very AI system they depend on? We know there can still be hurdles towards this ultimate dream, but it is time to get ready to build a clear pathway for the future. We build FlashInfer-Bench, a benchmark of real-world AI-system driven GPU workloads, and more importantly, an infrastructure and workflow to 0-day ship AI-generated kernels into production.
+That prompts us to ask the next natural question: how can we systematically get AI agents to improve the very AI system they depend on? We know there can still be hurdles towards this ultimate dream, but it is time to get ready to build a clear pathway for the future. We build FlashInfer-Bench, a benchmark of real-world AI-system-driven GPU workloads, and more importantly, an infrastructure and workflow to 0-day ship AI-generated kernels into production.
17-17: Wrap bare URL in markdown link syntax.

Line 17 contains a bare URL. For consistency and accessibility, wrap it in markdown link syntax: [text](url).
-We have seen amazing progress recently that shows LLMs can generate reasonable GPU kernels (https://scalingintelligence.stanford.edu/blogs/kernelbench/).
+We have seen amazing progress recently that shows LLMs can generate reasonable GPU kernels ([KernelBench blog](https://scalingintelligence.stanford.edu/blogs/kernelbench/)).
177-180: Wrap bare URLs in markdown link syntax.

Lines 177–180 contain bare URLs. For consistency and accessibility, wrap each in markdown link syntax using descriptive link text.
-* Leaderboard (https://bench.flashinfer.ai/)  
-* FlashInfer-Bench Dataset (https://huggingface.co/datasets/flashinfer-ai/flashinfer-trace)
-* FlashInfer-Bench Python Package (https://github.com/flashinfer-ai/flashinfer-bench/)
-* Documentation (https://bench.flashinfer.ai/docs)
+* [Leaderboard](https://bench.flashinfer.ai/)  
+* [FlashInfer-Bench Dataset](https://huggingface.co/datasets/flashinfer-ai/flashinfer-trace)
+* [FlashInfer-Bench Python Package](https://github.com/flashinfer-ai/flashinfer-bench/)
+* [Documentation](https://bench.flashinfer.ai/docs)
145-145: Add missing space after markdown heading hash.

Line 145 is missing the required space after ##, which breaks markdown rendering. Markdown requires ## (with space) for level-2 headings.
-##FlashInfer-Bench Leaderboard — The Arena for LLM Kernel Optimization
+## FlashInfer-Bench Leaderboard — The Arena for LLM Kernel Optimization
6-6: Fix typo in YAML frontmatter to enable MathJax.

The attribute usematjax should be usemathjax to correctly enable MathJax rendering for mathematical formulas in the post.
-usematjax: true
+usemathjax: true

🧹 Nitpick comments (2)

_posts/2025-10-21-flashinfer-bench.md (2)
43-43: Use em-dash for consistency with other list items.

Line 43 uses an en-dash (–), but other list items use em-dashes (—). Standardize on em-dash for visual and typographic consistency.
-- **Establish a 0-day production path – First-class FlashInfer Integration.** We build first-class integration with FlashInfer – an open LLM kernel library widely used in major LLM inference engines. Our solution can dynamically replace the FlashInfer Kernels with the best-performing kernels, as evaluated using the FlashInfer Trace and the FlashInfer-Bench dataset. This enables activating the best kernels in LLM engines and testing end-to-end performance with minimal effort.
+- **Establish a 0-day production path — First-class FlashInfer Integration.** We build first-class integration with FlashInfer — an open LLM kernel library widely used in major LLM inference engines. Our solution can dynamically replace the FlashInfer Kernels with the best-performing kernels, as evaluated using the FlashInfer Trace and the FlashInfer-Bench dataset. This enables activating the best kernels in LLM engines and testing end-to-end performance with minimal effort.
188-191: Clarify asterisk notation for contributors.

Lines 188–191 mark several contributors with asterisks, but the meaning is not explicitly stated. In academic contexts, asterisks often denote equal contribution or lead authors. Consider adding a brief note explaining their significance.

For example, add a line before the contributor list:
 We thank the entire FlashInfer-Bench team for their contributions to the project:
 
+(\* denotes primary/lead contributors)
+
 * Shanli Xing* (UW, CMU): Core components and web development  

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7674a25 and 65ba35f.

📒 Files selected for processing (1)

_posts/2025-10-21-flashinfer-bench.md (1 hunks)

🧰 Additional context used

🪛 LanguageTool

_posts/2025-10-21-flashinfer-bench.md

[style] ~17-~17: Consider using a more formal and expressive alternative to ‘amazing’.
Context: ... operations for AI models. We have seen amazing progress recently that shows LLMs can g...

(AWESOME)

[grammar] ~19-~19: Use a hyphen to join words.
Context: ...nch, a benchmark of real-world AI-system driven GPU workloads, and more important...

(QB_NEW_EN_HYPHEN)

[style] ~27-~27: To reduce wordiness, try specifying a number or using “many” or “numerous” instead.
Context: ...nments are highly complex: they involve a large number of sophisticated kernels with different AP...

(LARGE_NUMBER_OF)

[style] ~109-~109: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...MoE, normalization, sampling, and more. We also strive to ensure realism in the LL...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~143-~143: To reduce wordiness, try specifying a number or using “many” or “numerous” instead.
Context: ...k. By using apply(), we can replace a large number of operators in FlashInfer. If you want to...

(LARGE_NUMBER_OF)

[style] ~155-~155: Specify a number, remove phrase, use “a few”, or use “some”
Context: ...e kernel’s performance. *We also found a small number of AI-generated kernels with a speedup gre...

(SMALL_NUMBER_OF)

[style] ~155-~155: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...rifying the correctness of each kernel. We will soon release kernels with a speedu...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

[style] ~156-~156: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...up ratio greater than 1 after review.* We also provide a separate leaderboard for...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 markdownlint-cli2 (0.18.1)

_posts/2025-10-21-flashinfer-bench.md

17-17: Bare URL used

(MD034, no-bare-urls)

145-145: No space after hash on atx style heading

(MD018, no-missing-space-atx)

177-177: Bare URL used

(MD034, no-bare-urls)

178-178: Bare URL used

(MD034, no-bare-urls)

179-179: Bare URL used

(MD034, no-bare-urls)

180-180: Bare URL used

(MD034, no-bare-urls)

Ubospica

Address gemini's comments. @xslingcn Could you check these changes and if ok, commit it? Thanks.

_posts/2025-10-21-flashinfer-bench.md

Co-authored-by: Yixin Dong <ubospica@gmail.com>

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

upd

367580b

coderabbitai bot reviewed Oct 20, 2025

View reviewed changes

gemini-code-assist bot reviewed Oct 20, 2025

View reviewed changes

upd

7674a25

coderabbitai bot reviewed Oct 20, 2025

View reviewed changes

Ubospica approved these changes Oct 20, 2025

View reviewed changes

upd date

65ba35f

coderabbitai bot reviewed Oct 21, 2025

View reviewed changes

Ubospica reviewed Oct 21, 2025

View reviewed changes

_posts/2025-10-21-flashinfer-bench.md Outdated Show resolved Hide resolved

xslingcn and others added 5 commits October 21, 2025 04:28

Update _posts/2025-10-21-flashinfer-bench.md

d91f894

Co-authored-by: Yixin Dong <ubospica@gmail.com>

Update _posts/2025-10-21-flashinfer-bench.md

efb4a1e

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update _posts/2025-10-21-flashinfer-bench.md

8c1464e

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update _posts/2025-10-21-flashinfer-bench.md

a7f9b09

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Update _posts/2025-10-21-flashinfer-bench.md

e849edc

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

yzh119 approved these changes Oct 21, 2025

View reviewed changes

yzh119 merged commit fdad3b2 into flashinfer-ai:main Oct 21, 2025
1 check passed

This was referenced Oct 21, 2025

Update flashinfer-bench doc link #6

Merged

docs: update author affiliation #7

Merged

Fix FlashInfer Bench post formatting #8

Merged

Add FlashInfer-Bench blog post #5

Add FlashInfer-Bench blog post #5

Uh oh!

Conversation

xslingcn commented Oct 20, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

gemini-code-assist bot commented Oct 20, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Ubospica left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xslingcn commented Oct 20, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 20, 2025 •

edited

Loading

Ubospica left a comment •

edited

Loading