Skip to content

Conversation

@LoserCheems
Copy link
Collaborator

Summary
Adds a PyTorch implementation of the BLAS level‑1 dot kernel so users can compute dot products via a consistent backend API instead of calling raw PyTorch ops directly. This fills the PyTorch column for dot in the BLAS table and aligns with the Python reference added in earlier work.

Design
Implements kernel_course.pytorch_ops.dot.dot(x, y) using idiomatic PyTorch operations: elementwise multiplication followed by a reduction. The function accepts two tensors, multiplies them elementwise, and sums the result into a scalar torch.Tensor. The interface mirrors the Python dot helper and other PyTorch kernels (e.g., copy, swap) so switching backends is straightforward.

Changes

  • Added dot.py with a dot(x: torch.Tensor, y: torch.Tensor) -> torch.Tensor helper that computes torch.sum(torch.mul(x, y)).
  • Documented the function’s purpose, arguments, and return value via a docstring in line with existing PyTorch ops.
  • Updated the BLAS table in README.md to mark the PyTorch dot implementation as ✅ and link directly to dot.py.

Implementation notes

  • Relies on PyTorch broadcasting semantics, which allows x and y to be any shapes that broadcast to a common shape before the elementwise multiply.
  • The reduction is performed via torch.sum, producing a scalar tensor on the same device and with the same dtype as the inputs, which makes it suitable as a reference for later Triton/CuTe implementations.
  • The implementation is intentionally minimal, prioritizing clarity and alignment with the mathematical definition over micro‑optimizations.

Tests

  • No dedicated tests are included in this PR. Basic checks were done manually by comparing the helper’s output to torch.dot(x.flatten(), y.flatten()) and torch.sum(x * y) on small tensors.
  • A follow‑up tests/test_dot.py will treat the Python reference implementation as ground truth and validate the PyTorch backend against it across devices/dtypes, and then be wired into pytest tests/.

Documentation

  • The new helper is documented via its docstring.
  • The README BLAS table now accurately reflects that the Python and PyTorch dot kernels are available; dot.md already describes the shared interface and how to run the dot tests once they are added.

Checklist

Provides a reusable tensor dot routine to standardize future vector operations
Updates the BLAS kernel matrix so the PyTorch column reflects the completed dot implementation, keeping the support table accurate for users.
Copilot AI review requested due to automatic review settings December 1, 2025 03:24
Copilot finished reviewing on behalf of LoserCheems December 1, 2025 03:27
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a PyTorch implementation of the BLAS level-1 dot kernel to compute dot products via a consistent backend API. The implementation fills the PyTorch column in the BLAS operations table, allowing users to compute z = x^T y using kernel_course.pytorch_ops.dot.dot(x, y).

Key changes:

  • Added dot(x, y) function using torch.sum(torch.mul(x, y)) operations
  • Updated README.md to mark PyTorch dot as complete (✅)

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
kernel_course/pytorch_ops/dot.py Implements dot product computation using PyTorch elementwise multiply and sum operations
README.md Updates BLAS table to mark PyTorch dot implementation as complete

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Returns:
torch.Tensor: The dot product of `x` and `y`.
"""

Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PyTorch implementation does not flatten the input tensors before computing the dot product, unlike the Python reference implementation. The Python reference (lines 19-20) uses x.reshape(-1) and y.reshape(-1) to ensure inputs are 1D vectors.

Without flattening, this implementation will:

  1. Produce incorrect results for multi-dimensional tensors
  2. Not be numerically equivalent to the Python reference
  3. Potentially fail with broadcasting errors for certain input shapes

Recommendation: Add tensor flattening before the multiplication:

x = x.reshape(-1)
y = y.reshape(-1)
z = torch.sum(torch.mul(x, y))
Suggested change
x = x.reshape(-1)
y = y.reshape(-1)

Copilot uses AI. Check for mistakes.
| [scal](./docs/scal.md) | scale vector | $y = \alpha y$ | $n$ | $2n$ | [](./kernel_course/python_ops/scal.py) | [](./kernel_course/pytorch_ops/scal.py) | [](./kernel_course/triton_ops/scal.py) || [](./tests/test_scal.py) |
| [axpby](./docs/axpby.md) | update vector| $y = \alpha x + \beta y$ | $3n$ | $3n$ | [](./kernel_course/python_ops/axpby.py) | [](./kernel_course/pytorch_ops/axpby.py) | [](./kernel_course/triton_ops/axpby.py) || [](./tests/test_axpby.py) |
| [dot](./docs/dot.md) | dot product | $z = x^\top y$ | $2n$ | $2n$ | [](./kernel_course/python_ops/dot.py) | ||||
| [dot](./docs/dot.md) | dot product | $z = x^\top y$ | $2n$ | $2n$ | [](./kernel_course/python_ops/dot.py) | [](./kernel_course/pytorch_ops/dot.py) ||||
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR marks the PyTorch dot implementation as complete (✅) but does not include a corresponding test file. All other complete operations (copy, swap, scal, axpby) have test files (test_copy.py, test_swap.py, test_scal.py, test_axpby.py) that validate implementations across backends.

The PR description acknowledges this: "A follow‑up tests/test_dot.py will treat the Python reference implementation as ground truth..." However, marking the implementation as complete without tests is inconsistent with the existing convention.

Recommendation: Either:

  1. Include tests/test_dot.py in this PR before marking as ✅, or
  2. Keep the Test column as ❌ until the test file is added

Copilot uses AI. Check for mistakes.
y: torch.Tensor,
) -> torch.Tensor:
"""
Computes the dot product of two tensors using PyTorch operations.
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The docstring description lacks detail compared to similar operations. Other PyTorch ops include more specific descriptions of what the operation does with the inputs.

For example:

  • copy.py: "Copies the contents of tensor x into tensor y using PyTorch operations."
  • scal.py: "Scales the contents of tensor y by a scalar alpha using PyTorch operations."
  • swap.py: "Swaps the contents of tensor x with tensor y using PyTorch operations."

Recommendation: Update the description to be more specific:

"""
Computes the dot product of two tensors by multiplying corresponding elements and summing the results using PyTorch operations.

This clarifies what "dot product" means computationally and maintains consistency with the existing documentation style.

Suggested change
Computes the dot product of two tensors using PyTorch operations.
Computes the dot product of two tensors by multiplying corresponding elements
and summing the results using PyTorch operations.

Copilot uses AI. Check for mistakes.
@LoserCheems LoserCheems merged commit 3ac79a2 into main Dec 1, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants