[PERFORMANCE OPTIMIZATION] add dot test script #49

LoserCheems · 2025-12-01T04:03:25Z

Summary
Adds benchmark-style test coverage for the BLAS level‑1 dot kernel so Python, PyTorch, Triton, and (optionally) CuTe implementations can be compared under a consistent harness. This closes the testing gap tracked in issue #19 and updates the README to show that dot now has an associated test.

Design
Follows the existing benchmarking pattern used elsewhere in the project. A factory creates structured input pairs (x, y) on a given device/dtype, and kernel_course.testing is used to collect implementations, run timed benchmarks, and compute FLOPs. Rather than asserting correctness directly here, the focus is on performance comparison across backends using the same shapes, devices, and dtypes.

Changes

Added test_dot.py which:
- imports the Python, PyTorch, Triton, and CuTe dot helpers, each guarded with a feature flag (HAS_PYTORCH, HAS_TRITON, HAS_CUTE) so the suite degrades gracefully when a backend is unavailable,
- defines a factory that builds 1D tensors via torch.linspace on the requested device and dtype,
- uses testing.get_impls and testing.run_benchmarks to time each available backend implementation.
Updated the BLAS table in README.md to mark the Test column for dot as ✅ and link to test_dot.py.

Implementation notes

Benchmarks are parameterized over device (cuda and mps, with skip markers if unavailable), dtype (float32, float16, bfloat16), and problem size (2^4, 2^8, 2^16) to cover small and moderately large vectors.
FLOP count is set to 2 * numel to match the dot product’s multiply‑add cost model.
BenchmarkConfig(warmup=3, repeat=1_000) balances stability and runtime, providing enough iterations to smooth out noise while keeping the test run manageable.
Results are printed via testing.show_benchmarks(results), which is consistent with existing benchmark‑style tests in the repo.

Tests

test_dot_benchmark runs as part of pytest tests/test_dot.py (or pytest tests/) and will automatically skip device configurations that are not available on the current machine.
Backends are pulled in only if their corresponding modules import successfully, avoiding hard failures when, for example, Triton or CuTe is not installed.

Documentation

README has been updated so the dot row now shows a ✅ in the Test column and links to test_dot.py, keeping the operator matrix aligned with actual coverage.
The existing dot.md “Testing” section that references test_dot.py is now accurate with this file in place.

Checklist

Linked issue provided ([FEATURE REQUEST] dot test coverage and fixtures #19)
API stabilized
Tests added or updated (test_dot.py)
Docs added or updated (README Test column for dot)
No known performance regressions

Provides parameterized benchmark over CUDA and MPS devices with multiple dtypes and sizes to compare python, PyTorch, Triton, and Cutlass implementations for the dot kernel

Updates the kernel summary so dot now shows an available unit test, keeping the documentation aligned with actual test coverage

LoserCheems added 2 commits December 1, 2025 12:00

Adds dot benchmark coverage

a7a443a

Provides parameterized benchmark over CUDA and MPS devices with multiple dtypes and sizes to compare python, PyTorch, Triton, and Cutlass implementations for the dot kernel

Marks dot test coverage

a7f9fb2

Updates the kernel summary so dot now shows an available unit test, keeping the documentation aligned with actual test coverage

github-actions bot assigned SNHuan Dec 1, 2025

LoserCheems merged commit dedcc48 into main Dec 1, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PERFORMANCE OPTIMIZATION] add dot test script #49

[PERFORMANCE OPTIMIZATION] add dot test script #49

Uh oh!

LoserCheems commented Dec 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[PERFORMANCE OPTIMIZATION] add dot test script #49

[PERFORMANCE OPTIMIZATION] add dot test script #49

Uh oh!

Conversation

LoserCheems commented Dec 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants