# Tensors in PyTorch: A Formal Comparison with NumPy

This subsection formalizes the concept of **tensors** as used in PyTorch and contrasts them rigorously with NumPy arrays. The goal is to connect the *from-scratch* implementation of neural networks to the framework-based implementation without introducing conceptual gaps.

## Mathematical Definition of a Tensor

In mathematics, a **tensor** is a multilinear object that generalizes scalars, vectors, and matrices.

- A scalar is a tensor of order 0
- A vector is a tensor of order 1
- A matrix is a tensor of order 2
- Higher-order tensors extend this idea to more dimensions

Formally, a tensor $\mathbf{T}$ of order $k$ is an element of the tensor product:

$$
\mathbf{T} \in V_1 \otimes V_2 \otimes \cdots \otimes V_k
$$

In numerical computing, tensors are represented as multi-dimensional arrays with fixed shapes and data types.

## NumPy Arrays vs PyTorch Tensors: Conceptual Overview

Although NumPy arrays and PyTorch tensors appear similar, they differ fundamentally in purpose and capability.

| Feature | NumPy Array | PyTorch Tensor |
|--------|-------------|----------------|
| Multi-dimensional data | Yes | Yes |
| Automatic differentiation | No | Yes |
| GPU acceleration | No | Yes |
| Computation graph | No | Yes |
| Deep learning primitives | Limited | Native |

## Side-by-Side Example: Forward Computation

Consider a linear transformation:

$$
\mathbf{y} = \mathbf{X}\mathbf{W} + \mathbf{b}
$$

where $\mathbf{X} \in \mathbb{R}^{N \times d}$.

In [None]:
# NumPy implementation
import numpy as np

X = np.random.randn(4, 3)
W = np.random.randn(3, 2)
b = np.random.randn(2)

y_numpy = X @ W + b
y_numpy

In [None]:
# PyTorch implementation
import torch

X_t = torch.randn(4, 3)
W_t = torch.randn(3, 2)
b_t = torch.randn(2)

y_torch = X_t @ W_t + b_t
y_torch

Numerically, both implementations produce equivalent results. The distinction becomes critical when gradients are required.

## Backpropagation: Manual vs Automatic

In the from-scratch implementation, gradients were derived manually:

$$
\frac{\partial L}{\partial W} = X^T \frac{\partial L}{\partial y}
$$

Using NumPy, this must be explicitly coded.

In [None]:
# NumPy: manual gradient computation
grad_y = np.ones_like(y_numpy)
grad_W = X.T @ grad_y
grad_W

PyTorch tensors record the computation graph automatically and apply the chain rule internally.

In [None]:
# PyTorch: automatic differentiation
X_t = torch.randn(4, 3, requires_grad=True)
W_t = torch.randn(3, 2, requires_grad=True)
b_t = torch.randn(2, requires_grad=True)

y = X_t @ W_t + b_t
loss = y.sum()
loss.backward()

W_t.grad

## Why Automatic Differentiation Matters

For deep networks, the loss function may involve millions of parameters and thousands of operations. Manual differentiation becomes error-prone and infeasible. PyTorch tensors transform gradient computation from an *implementation problem* into a *declarative problem*.

## Tensor Data Types and Semantics

Each tensor has an associated data type (`dtype`) that determines numerical precision and valid operations.

- `torch.float32`: model parameters and activations
- `torch.int64`: class labels

PyTorch enforces stricter type semantics than NumPy, which reduces silent bugs.

## Hardware Acceleration and Devices

Every tensor is associated with a device (CPU or GPU):

$$
\text{tensor} \rightarrow \{\text{CPU}, \text{GPU}\}
$$


In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
X_device = X_t.to(device)
X_device.device

## Summary

- NumPy arrays store numerical values
- PyTorch tensors store numerical values **and** differentiation metadata
- The underlying mathematics is unchanged
- Tensors enable scalable, maintainable deep learning systems

**Understanding tensors formalizes and extends the concepts developed in the from-scratch implementation.**