Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ The following common BLAS kernels have been implemented in multiple frameworks.
| [swap](./docs/swap.md) | swap vectors | $x \leftrightarrow y$ | $0$ | $4n$ | [✅](./kernel_course/python_ops/swap.py) | [✅](./kernel_course/pytorch_ops/swap.py) | [✅](./kernel_course/triton_ops/swap.py) | ❌ | [✅](./tests/test_swap.py) |
| [scal](./docs/scal.md) | scale vector | $y = \alpha y$ | $n$ | $2n$ | [✅](./kernel_course/python_ops/scal.py) | [✅](./kernel_course/pytorch_ops/scal.py) | [✅](./kernel_course/triton_ops/scal.py) | ❌ | [✅](./tests/test_scal.py) |
| [axpby](./docs/axpby.md) | update vector| $y = \alpha x + \beta y$ | $3n$ | $3n$ | [✅](./kernel_course/python_ops/axpby.py) | [✅](./kernel_course/pytorch_ops/axpby.py) | [✅](./kernel_course/triton_ops/axpby.py) | ❌ | [✅](./tests/test_axpby.py) |
| [dot](./docs/dot.md) | dot product | $z = x^\top y$ | $2n$ | $2n$ | [✅](./kernel_course/python_ops/dot.py) | | ❌ | ❌ | ❌ |
| [dot](./docs/dot.md) | dot product | $z = x^\top y$ | $2n$ | $2n$ | [✅](./kernel_course/python_ops/dot.py) | [✅](./kernel_course/pytorch_ops/dot.py) | ❌ | ❌ | ❌ |
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR marks the PyTorch dot implementation as complete (✅) but does not include a corresponding test file. All other complete operations (copy, swap, scal, axpby) have test files (test_copy.py, test_swap.py, test_scal.py, test_axpby.py) that validate implementations across backends.

The PR description acknowledges this: "A follow‑up tests/test_dot.py will treat the Python reference implementation as ground truth..." However, marking the implementation as complete without tests is inconsistent with the existing convention.

Recommendation: Either:

  1. Include tests/test_dot.py in this PR before marking as ✅, or
  2. Keep the Test column as ❌ until the test file is added

Copilot uses AI. Check for mistakes.
| gemv | general matrix-vector multiply | $y = \alpha A x + \beta y$ | $2mn$ | $mn + n + 2m$ | ❌ | ❌ | ❌ | ❌ | ❌ |
| geru | general rank-1 update | $A = A + \alpha x y^\top$ | $2mn$ | $2mn + m + n$ | ❌ | ❌ | ❌ | ❌ | ❌ |
| gemm | general matrix-matrix multiply | $C = \alpha A B + \beta C$ | $2mnk$ | $mk + nk + 2mn$ | ❌ | ❌ | ❌ | ❌ | ❌ |
Expand Down
21 changes: 21 additions & 0 deletions kernel_course/pytorch_ops/dot.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
import torch


def dot(
x: torch.Tensor,
y: torch.Tensor,
) -> torch.Tensor:
"""
Computes the dot product of two tensors using PyTorch operations.
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The docstring description lacks detail compared to similar operations. Other PyTorch ops include more specific descriptions of what the operation does with the inputs.

For example:

  • copy.py: "Copies the contents of tensor x into tensor y using PyTorch operations."
  • scal.py: "Scales the contents of tensor y by a scalar alpha using PyTorch operations."
  • swap.py: "Swaps the contents of tensor x with tensor y using PyTorch operations."

Recommendation: Update the description to be more specific:

"""
Computes the dot product of two tensors by multiplying corresponding elements and summing the results using PyTorch operations.

This clarifies what "dot product" means computationally and maintains consistency with the existing documentation style.

Suggested change
Computes the dot product of two tensors using PyTorch operations.
Computes the dot product of two tensors by multiplying corresponding elements
and summing the results using PyTorch operations.

Copilot uses AI. Check for mistakes.
Args:
x (torch.Tensor): First tensor.
y (torch.Tensor): Second tensor.
Returns:
torch.Tensor: The dot product of `x` and `y`.
"""

Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PyTorch implementation does not flatten the input tensors before computing the dot product, unlike the Python reference implementation. The Python reference (lines 19-20) uses x.reshape(-1) and y.reshape(-1) to ensure inputs are 1D vectors.

Without flattening, this implementation will:

  1. Produce incorrect results for multi-dimensional tensors
  2. Not be numerically equivalent to the Python reference
  3. Potentially fail with broadcasting errors for certain input shapes

Recommendation: Add tensor flattening before the multiplication:

x = x.reshape(-1)
y = y.reshape(-1)
z = torch.sum(torch.mul(x, y))
Suggested change
x = x.reshape(-1)
y = y.reshape(-1)

Copilot uses AI. Check for mistakes.
z = torch.sum(torch.mul(x, y))

return z
Loading