-
Notifications
You must be signed in to change notification settings - Fork 1
[PERFORMANCE OPTIMIZATION] add dot pytorch kernel #47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Provides a reusable tensor dot routine to standardize future vector operations
Updates the BLAS kernel matrix so the PyTorch column reflects the completed dot implementation, keeping the support table accurate for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a PyTorch implementation of the BLAS level-1 dot kernel to compute dot products via a consistent backend API. The implementation fills the PyTorch column in the BLAS operations table, allowing users to compute z = x^T y using kernel_course.pytorch_ops.dot.dot(x, y).
Key changes:
- Added
dot(x, y)function usingtorch.sum(torch.mul(x, y))operations - Updated README.md to mark PyTorch
dotas complete (✅)
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| kernel_course/pytorch_ops/dot.py | Implements dot product computation using PyTorch elementwise multiply and sum operations |
| README.md | Updates BLAS table to mark PyTorch dot implementation as complete |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Returns: | ||
| torch.Tensor: The dot product of `x` and `y`. | ||
| """ | ||
|
|
Copilot
AI
Dec 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PyTorch implementation does not flatten the input tensors before computing the dot product, unlike the Python reference implementation. The Python reference (lines 19-20) uses x.reshape(-1) and y.reshape(-1) to ensure inputs are 1D vectors.
Without flattening, this implementation will:
- Produce incorrect results for multi-dimensional tensors
- Not be numerically equivalent to the Python reference
- Potentially fail with broadcasting errors for certain input shapes
Recommendation: Add tensor flattening before the multiplication:
x = x.reshape(-1)
y = y.reshape(-1)
z = torch.sum(torch.mul(x, y))| x = x.reshape(-1) | |
| y = y.reshape(-1) |
| | [scal](./docs/scal.md) | scale vector | $y = \alpha y$ | $n$ | $2n$ | [✅](./kernel_course/python_ops/scal.py) | [✅](./kernel_course/pytorch_ops/scal.py) | [✅](./kernel_course/triton_ops/scal.py) | ❌ | [✅](./tests/test_scal.py) | | ||
| | [axpby](./docs/axpby.md) | update vector| $y = \alpha x + \beta y$ | $3n$ | $3n$ | [✅](./kernel_course/python_ops/axpby.py) | [✅](./kernel_course/pytorch_ops/axpby.py) | [✅](./kernel_course/triton_ops/axpby.py) | ❌ | [✅](./tests/test_axpby.py) | | ||
| | [dot](./docs/dot.md) | dot product | $z = x^\top y$ | $2n$ | $2n$ | [✅](./kernel_course/python_ops/dot.py) | ❌ | ❌ | ❌ | ❌ | | ||
| | [dot](./docs/dot.md) | dot product | $z = x^\top y$ | $2n$ | $2n$ | [✅](./kernel_course/python_ops/dot.py) | [✅](./kernel_course/pytorch_ops/dot.py) | ❌ | ❌ | ❌ | |
Copilot
AI
Dec 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR marks the PyTorch dot implementation as complete (✅) but does not include a corresponding test file. All other complete operations (copy, swap, scal, axpby) have test files (test_copy.py, test_swap.py, test_scal.py, test_axpby.py) that validate implementations across backends.
The PR description acknowledges this: "A follow‑up tests/test_dot.py will treat the Python reference implementation as ground truth..." However, marking the implementation as complete without tests is inconsistent with the existing convention.
Recommendation: Either:
- Include
tests/test_dot.pyin this PR before marking as ✅, or - Keep the Test column as ❌ until the test file is added
| y: torch.Tensor, | ||
| ) -> torch.Tensor: | ||
| """ | ||
| Computes the dot product of two tensors using PyTorch operations. |
Copilot
AI
Dec 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The docstring description lacks detail compared to similar operations. Other PyTorch ops include more specific descriptions of what the operation does with the inputs.
For example:
copy.py: "Copies the contents of tensorxinto tensoryusing PyTorch operations."scal.py: "Scales the contents of tensoryby a scalaralphausing PyTorch operations."swap.py: "Swaps the contents of tensorxwith tensoryusing PyTorch operations."
Recommendation: Update the description to be more specific:
"""
Computes the dot product of two tensors by multiplying corresponding elements and summing the results using PyTorch operations.This clarifies what "dot product" means computationally and maintains consistency with the existing documentation style.
| Computes the dot product of two tensors using PyTorch operations. | |
| Computes the dot product of two tensors by multiplying corresponding elements | |
| and summing the results using PyTorch operations. |
Summary
Adds a PyTorch implementation of the BLAS level‑1
dotkernel so users can compute dot products via a consistent backend API instead of calling raw PyTorch ops directly. This fills the PyTorch column fordotin the BLAS table and aligns with the Python reference added in earlier work.Design
Implements
kernel_course.pytorch_ops.dot.dot(x, y)using idiomatic PyTorch operations: elementwise multiplication followed by a reduction. The function accepts two tensors, multiplies them elementwise, and sums the result into a scalartorch.Tensor. The interface mirrors the Pythondothelper and other PyTorch kernels (e.g.,copy,swap) so switching backends is straightforward.Changes
dot(x: torch.Tensor, y: torch.Tensor) -> torch.Tensorhelper that computestorch.sum(torch.mul(x, y)).dotimplementation as ✅ and link directly to dot.py.Implementation notes
xandyto be any shapes that broadcast to a common shape before the elementwise multiply.torch.sum, producing a scalar tensor on the same device and with the same dtype as the inputs, which makes it suitable as a reference for later Triton/CuTe implementations.Tests
torch.dot(x.flatten(), y.flatten())andtorch.sum(x * y)on small tensors.tests/test_dot.pywill treat the Python reference implementation as ground truth and validate the PyTorch backend against it across devices/dtypes, and then be wired intopytest tests/.Documentation
dotkernels are available; dot.md already describes the shared interface and how to run the dot tests once they are added.Checklist
dotPyTorch kernel implementation #16)test_dot.pyPR)