Skip to content

Conversation

@kaiming-cheng
Copy link
Contributor

@kaiming-cheng kaiming-cheng commented Nov 7, 2025

Summary

This PR introduces the integration between KernelAgent and BackendBench, enabling systematic evaluation of KernelAgent-generated Triton kernels.

Implementation Details

The integration follows a two-phase workflow:

Phase 1: Kernel Generation

  • KernelAgent generates Triton kernels for specified PyTorch operators
  • Generated kernels are saved to generated_kernels/<operator_name>/

Phase 2: Evaluation

  • BackendBench's DirectoryBackend loads generated kernels
  • Executes correctness and performance tests

Key Components

  1. benchmark/BackendBench/eval.py - Main evaluation script with:

  2. benchmark/BackendBench/setup.py - Directory structure generator:

    • Creates operator directories from BackendBench's op_map
    • Sets up proper file structure for DirectoryBackend

Testing Instructions

1. Install BackendBench dependency

pip install -e ".[backendbench]"

2. Generate operator directory structure

cd benchmark/BackendBench 
python setup.py --base-dir generated_kernels

3. Run evaluation

Option A: Generate and evaluate in one step

python eval.py --suite torchbench --num-operators 3

Option B: Generate only

python eval.py --suite torchbench --num-operators 3 --generate-only

Option C: Evaluate existing kernels

python eval.py --suite torchbench --evaluate-only --kernels-dir generated_kernels

4. View results

Results are saved to timestamped directories:

  • agent_logs/run_<timestamp>/ - KernelAgent generation logs
  • generated_kernels - operator directory and store the KernelAgent generated kernel
  • log_BackendBench/run_<timestamp>/ - Evaluation results
    • OVERALL_SUMMARY.md - Human-readable summary
    • operator_summary.csv - Per-operator metrics
    • full_results.json - Complete test results

5. TODO

  • Support Fuser evaluation logic
  • Add single op vs multiple op support
  • Enable FP16/BF16 filtering

Eval Result

TritonKernelAgent - INFO - Starting kernel generation
TritonKernelAgent - INFO - Problem:
Implement a high-performance Triton kernel for the PyTorch operation: _adaptive_avg_pool2d.default
...
TritonKernelAgent - INFO - Using provided test code
INFO - - log_BackendBench/run_20251107_001753/OVERALL_SUMMARY.md
INFO - - log_BackendBench/run_20251107_001753/operator_summary.csv
INFO - - log_BackendBench/run_20251107_001753/full_results.json

DirectoryBackend loaded 1 kernels from generated_kernels/
correctness score (mean pass rate over all operators): 1.00
performance score (geomean speedup over all operators): 0.14

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 7, 2025
kaiming-cheng and others added 2 commits November 6, 2025 22:51
- Fix ruff linting errors in eval.py (remove unused import, fix f-strings)
- Fix ruff linting errors in setup.py (remove unnecessary f-string prefixes)
- Add Python version marker (>=3.10) to backendbench dependency in pyproject.toml
- This allows core KernelAgent to support Python 3.8+ while BackendBench integration requires 3.10+
- Update CI workflow to use venv activation instead of 'uv run' to avoid dependency resolution issues
@kaiming-cheng kaiming-cheng marked this pull request as ready for review November 7, 2025 07:07
# Show message if using default directory
if base_dir == "generated_kernels":
print(f"ℹ️ Using default directory: {abs_base_dir}")
print(" (Specify --base-dir to use a different location)\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: For the sake of keeping it minimal

Suggested change
print(" (Specify --base-dir to use a different location)\n")

Copy link
Contributor

@Jack-Khuu Jack-Khuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a nice start, but I'm not sure how this goes towards testing the main offerings of KernelAgent if we're using different prompts

@@ -0,0 +1,130 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just make the setup a simple pass thru?

E.g. scripts/setup_backendbench.sh

python -m BackendBench.scripts.setup_operator_directories "$@"

@@ -0,0 +1,547 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming nit since it's one file: benchmark/backend_bench.py after moving setup (see comment)

"python-dotenv",
"gradio",
"requests",

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lel

logger.error("--evaluate-only requires --kernels-dir")
return 1

if args.generate_only and args.evaluate_only:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +500 to +533
kernels_dir = args.kernels_dir

# Phase 1: Generate kernels
if not args.evaluate_only:
kernels_dir = generate_kernels(
suite_name=args.suite,
num_operators=args.num_operators,
num_workers=args.num_workers,
max_rounds=args.max_rounds,
workflow=args.workflows,
verbose=args.verbose,
)

if args.generate_only:
logger.info(f"Generation complete. Kernels saved to: {kernels_dir}")
return 0

# Phase 2: Evaluate kernels
if not args.generate_only:
exit_code = evaluate_kernels(
kernels_dir=kernels_dir,
suite_name=args.suite,
verbose=args.verbose,
)

if exit_code == 0:
logger.info("Evaluation complete!")
logger.info(f"Results saved to: {kernels_dir}")
else:
logger.error(f"Evaluation failed with exit code {exit_code}")

return exit_code

return 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestion/removing unnecessary conditionals

Suggested change
kernels_dir = args.kernels_dir
# Phase 1: Generate kernels
if not args.evaluate_only:
kernels_dir = generate_kernels(
suite_name=args.suite,
num_operators=args.num_operators,
num_workers=args.num_workers,
max_rounds=args.max_rounds,
workflow=args.workflows,
verbose=args.verbose,
)
if args.generate_only:
logger.info(f"Generation complete. Kernels saved to: {kernels_dir}")
return 0
# Phase 2: Evaluate kernels
if not args.generate_only:
exit_code = evaluate_kernels(
kernels_dir=kernels_dir,
suite_name=args.suite,
verbose=args.verbose,
)
if exit_code == 0:
logger.info("Evaluation complete!")
logger.info(f"Results saved to: {kernels_dir}")
else:
logger.error(f"Evaluation failed with exit code {exit_code}")
return exit_code
return 0
kernels_dir = args.kernels_dir if args.evaluate_only
else generate_kernels(
suite_name=args.suite,
num_operators=args.num_operators,
num_workers=args.num_workers,
max_rounds=args.max_rounds,
workflow=args.workflows,
verbose=args.verbose,
)
if args.generate_only:
logger.info(f"Generation complete. Kernels saved to: {kernels_dir}")
return 0
# Phase 2: Evaluate kernels
exit_code = evaluate_kernels(
kernels_dir=kernels_dir,
suite_name=args.suite,
verbose=args.verbose,
)
if exit_code == 0:
logger.info("Evaluation complete!")
logger.info(f"Results saved to: {kernels_dir}")
else:
logger.error(f"Evaluation failed with exit code {exit_code}")
return exit_code

return result.returncode


def _create_problem_description_from_op(op, op_name: str) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Custom?

raise ValueError(f"Unknown suite: {suite_name}")

# Get operators to generate
operators = list(test_suite)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might make sense for this to be a static artifact we refresh if backendbench isn't making huge updates

Comment on lines +199 to +211
cmd = [
sys.executable,
"-m",
"BackendBench.scripts.main",
"--backend",
"directory",
"--suite",
suite_name,
"--ops-directory",
kernels_dir,
"--log-dir",
log_dir, # Save evaluation results to separate log directory
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiannanWang Is there an API we can call instead of a CLI?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, currently there isn’t an API for the main entry point. You can check our expected usage of BackendBench here: https://github.com/meta-pytorch/BackendBench/tree/main?tab=readme-ov-file#llm-kernel-development-workflow.

Actually if you think having an API would be helpful, please open an issue in BackendBench. I can implement it into BackendBench.

return problem_description


def _create_test_code_from_backendbench(op, op_name: str, test_cases, logger) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing we can probably make static

# Import the serialization utility
from BackendBench.utils import serialize_args

test_code = f'''import torch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Source? Or custom?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants