# **Vector Operations:**

## **1. Vector Addition:**

Vector addition is the element-wise addition of two or more tensors of the same shape, where corresponding elements are added together.

For two vectors **a** = $[a₁, a₂, ..., aₙ]$ and **b** = $[b₁, b₂, ..., bₙ]$:

> $a + b = [a₁ + b₁, a₂ + b₂, ..., aₙ + bₙ]$

**Properties:**
   - Commutative: **$a + b = b + a$**
   -  Associative: **$(a + b) + c = a + (b + c)$**
   - Identity: **$a + 0 = a$**
   - Inverse: **$a + (-a) = 0$**

In [18]:
import torch

# Method 1: Using + operator
a = torch.tensor([1, 2, 3, 4])
b = torch.tensor([5, 6, 7, 8])
result1 = a + b
print(f"a + b = {result1}")  # [6, 8, 10, 12]

a + b = tensor([ 6,  8, 10, 12])


In [2]:
# Method 2: Using torch.add()
result2 = torch.add(a, b)
print(f"torch.add(a, b) = {result2}")  # [6, 8, 10, 12]

torch.add(a, b) = tensor([ 6,  8, 10, 12])


In [3]:
# Method 3: In-place addition with +=
a_copy = a.clone()
a_copy += b
print(f"a += b = {a_copy}")  # [6, 8, 10, 12]

a += b = tensor([ 6,  8, 10, 12])


In [4]:
# Method 4: In-place addition with add_()
a_copy2 = a.clone()
a_copy2.add_(b)
print(f"a.add_(b) = {a_copy2}")  # [6, 8, 10, 12]

a.add_(b) = tensor([ 6,  8, 10, 12])


In [5]:
# Method 5: Adding scalar to vector
scalar = 10
result3 = a + scalar
print(f"a + scalar = {result3}")  # [11, 12, 13, 14]

a + scalar = tensor([11, 12, 13, 14])


In [6]:
# Method 6: Adding multiple vectors
c = torch.tensor([1, 1, 1, 1])
result4 = a + b + c
print(f"a + b + c = {result4}")  # [7, 9, 11, 13]

a + b + c = tensor([ 7,  9, 11, 13])


In [7]:
# Method 7: Broadcasting addition (different shapes)
a_2d = torch.tensor([[1, 2], [3, 4]])
b_1d = torch.tensor([10, 20])
result5 = a_2d + b_1d  # Broadcasting
print(f"2D + 1D broadcasting:\n{result5}")
# [[11, 22]
#  [13, 24]]

2D + 1D broadcasting:
tensor([[11, 22],
        [13, 24]])


In [None]:
# Method 8: Weighted addition
alpha = 5
beta = 0.3
result6 = torch.add(a, b, alpha=alpha)  # a + alpha * b
print(f"a + 5*b = {result6}")  

a + 5*b = tensor([26, 32, 38, 44])


In [11]:
# Method 9: Adding with different dtypes (automatic type promotion)
a_float = torch.tensor([1.0, 2.0, 3.0])
b_int = torch.tensor([1, 2, 3])
result7 = a_float + b_int
print(f"float + int = {result7}")  # [2.0, 4.0, 6.0]
print(f"Result dtype: {result7.dtype}")  # torch.float32

float + int = tensor([2., 4., 6.])
Result dtype: torch.float32


In [14]:
# Method 10: GPU vector addition (if CUDA available)
if torch.cuda.is_available():
    a_gpu = torch.tensor([1, 2, 3, 4]).cuda()
    b_gpu = torch.tensor([5, 6, 7, 8]).cuda()
    result_gpu = a_gpu + b_gpu
    print(f"GPU addition: {result_gpu}")

In [19]:
# Method 11: Batch vector addition

batch_a = torch.randn(32, 128)  # 32 vectors of dimension 128
batch_b = torch.randn(32, 128)
batch_result = batch_a + batch_b
print(f"Shape of batch_a: {batch_a.shape}")
print(f"Shape of batch_b: {batch_b.shape}")
print(f"Batch addition shape: {batch_result.shape}")  # torch.Size([32, 128])

Shape of batch_a: torch.Size([32, 128])
Shape of batch_b: torch.Size([32, 128])
Batch addition shape: torch.Size([32, 128])


In [22]:
# Method 12: Element-wise addition with specific output tensor

output = torch.empty_like(a)
print(f"Print a: {a}")
print(f"Print b: {b}")
torch.add(a, b, out=output)
print(f"Print output: {output}")
print(f"shape of output: {output.shape}")
print(f"Addition with output tensor: {output}")  # [6, 8, 10, 12]

Print a: tensor([1, 2, 3, 4])
Print b: tensor([5, 6, 7, 8])
Print output: tensor([ 6,  8, 10, 12])
shape of output: torch.Size([4])
Addition with output tensor: tensor([ 6,  8, 10, 12])


**Key Points:**

   1. **Broadcasting**: PyTorch automatically handles different tensor shapes when possible

   2. **In-place operations**: Methods ending with `_` modify the original tensor

   3. **Type promotion**: PyTorch automatically promotes to the most general dtype

   4. **GPU support**: Operations work seamlessly on CUDA tensors

   5. **Batch operations**: Addition works efficiently on batched data

   6. **Memory efficiency**: Can specify output tensor to avoid extra memory allocation

**Common Use Cases in Deep Learning:**
   - Adding bias terms to linear layers

   - Residual connections in neural networks

   - Combining feature representations

   - Gradient accumulation during backpropagation

---

## **2. Broadcasting:**

> ![](https://s3.amazonaws.com/cloudxlab/static/images/course/numpy_pandas_for_ml/Broacasting_2_rule.png)

Broadcasting refers to the automatic expansion of arrays or tensors with different shapes to make them compatible for element-wise operations. It's a computational technique that allows operations between arrays of different dimensions without explicitly reshaping them.

#### **Rules of Broadcasting:**

**Broadcasting follows these fundamental rules:**

1. **`Rule 1`: Dimension Alignment:**
Arrays are aligned from the rightmost dimension (trailing dimensions). If arrays have different numbers of dimensions, the smaller array is conceptually padded with dimensions of size 1 on the left.

2. **`Rule 2`: Dimension Compatibility**
Two dimensions are compatible if:
- They are equal in size
- One of them is 1
- One of them is missing (treated as 1)

3. **`Rule 3`: Result Shape**
The resulting array has the maximum size along each dimension from the input arrays.

4. **`Rule 4`: Singleton Expansion**
Dimensions of size 1 are "stretched" or "copied" to match the corresponding dimension of the other array.

#### **Broadcasting in PyTorch:**

**1. Implicit Broadcasting:**

PyTorch automatically applies broadcasting rules during element-wise operations:

> ![](https://miro.medium.com/v2/resize:fit:1057/0*_MANoFD5glrde0eh.png)

In [None]:
import torch

# Example 1: Vector + Scalar
a = torch.tensor([1, 2, 3])  # shape: (3,) # Vector 
b = torch.tensor(5)          # shape: () # Scalar 
result = a + b               # shape: (3,) -> [6, 7, 8] # Vector 
result 

tensor([6, 7, 8])

In [13]:
# Example 2: Matrix + Vector
matrix = torch.tensor([[1, 2, 3],
                       [4, 5, 6]])  # shape: (2, 3)

vector = torch.tensor([10, 20, 30])  # shape: (3,)
print(f"Matrix:\n {matrix}")
print()
print(f"Vector: {vector}")
print()
result = matrix + vector     # shape: (2, 3) -> [[11, 22, 33], [14, 25, 36]]
print(f"Result:\n {result}")

Matrix:
 tensor([[1, 2, 3],
        [4, 5, 6]])

Vector: tensor([10, 20, 30])

Result:
 tensor([[11, 22, 33],
        [14, 25, 36]])


In [15]:
# Example 3: Different dimensional arrays
a = torch.tensor([[[1, 2]]])     # shape: (1, 1, 2)
b = torch.tensor([[3], [4]])     # shape: (2, 1)
result = a + b                   # shape: (1, 2, 2) -> [[[4, 5], [5, 6]]]
print(f"A: {a}")
print()
print(f"B: {b}")
print()
print(f"Result:\n {result}")

A: tensor([[[1, 2]]])

B: tensor([[3],
        [4]])

Result:
 tensor([[[4, 5],
         [5, 6]]])


2. **Explicit Broadcasting:**

We can manually control broadcasting using specific PyTorch functions:

In [20]:
# Using expand() - creates a view without copying data
a = torch.tensor([[1, 2, 3]])    # shape: (1, 3)
b = a.expand(4, 3)               # shape: (4, 3) - same data, different view
print(f"A: {a}")
print(f"B:\n {b}")

A: tensor([[1, 2, 3]])
B:
 tensor([[1, 2, 3],
        [1, 2, 3],
        [1, 2, 3],
        [1, 2, 3]])


In [21]:
# Using repeat() - actually copies data
c = a.repeat(4, 1)               # shape: (4, 3) - data is copied
print(f'C:\n {c}')

C:
 tensor([[1, 2, 3],
        [1, 2, 3],
        [1, 2, 3],
        [1, 2, 3]])


In [23]:
# Using unsqueeze() to add dimensions
vector = torch.tensor([1, 2, 3])         # shape: (3,)
column = vector.unsqueeze(1)             # shape: (3, 1)
row = vector.unsqueeze(0)                # shape: (1, 3)
print(f"Vector: {vector}")
print(f"Column:\n {column}")
print(f"Row: {row}")

Vector: tensor([1, 2, 3])
Column:
 tensor([[1],
        [2],
        [3]])
Row: tensor([[1, 2, 3]])


In [26]:
# Using view() to reshape
d = torch.tensor([1, 2, 3, 4])
reshaped = d.view(2, 2)                  # shape: (2, 2)
print(f"D: {d}")
print(f"\nReshaped:\n {reshaped}")

D: tensor([1, 2, 3, 4])

Reshaped:
 tensor([[1, 2],
        [3, 4]])


**Example 1: Matrix-Vector Operations:**

In [29]:
# Broadcasting a vector across matrix rows
matrix = torch.randn(3, 4)      # shape: (3, 4)
vector = torch.randn(4)         # shape: (4,)
print(f"Matrix:\n {matrix}")
print(f"\nVector: {vector}")

Matrix:
 tensor([[-0.3928, -1.3567,  0.5808, -0.4315],
        [ 0.0587, -1.4496,  0.8147, -0.7282],
        [ 1.2816,  0.5309, -1.1299, -0.0172]])

Vector: tensor([ 0.6906, -0.5814,  0.4244, -0.6922])


In [34]:
# Implicit broadcasting
result1 = matrix + vector       # vector broadcasts to (3, 4)
print(f"Result-1:\n {result1}")
# Explicit broadcasting
vector_expanded = vector.expand(3, 4)
print(f"\nVector Expanded:\n {vector_expanded}")
result2 = matrix + vector_expanded
print(f"\nResult-2:\n {result2}")

Result-1:
 tensor([[ 0.2977, -1.9382,  1.0052, -1.1237],
        [ 0.7493, -2.0310,  1.2391, -1.4204],
        [ 1.9722, -0.0505, -0.7055, -0.7094]])

Vector Expanded:
 tensor([[ 0.6906, -0.5814,  0.4244, -0.6922],
        [ 0.6906, -0.5814,  0.4244, -0.6922],
        [ 0.6906, -0.5814,  0.4244, -0.6922]])

Result-2:
 tensor([[ 0.2977, -1.9382,  1.0052, -1.1237],
        [ 0.7493, -2.0310,  1.2391, -1.4204],
        [ 1.9722, -0.0505, -0.7055, -0.7094]])


**Example 2: Batch Operations:**

In [37]:
# Broadcasting in batch processing
batch_data = torch.randn(32, 10, 5)    # 32 samples, 10 features, 5 dimensions
mean = torch.randn(5)                  # mean for each dimension
print(f"Batch Data: {batch_data}")
print("-"*80)
print(f"Mean: {mean}")

Batch Data: tensor([[[ 8.4271e-01, -6.7706e-01,  1.4531e+00,  8.2296e-01,  1.1806e-01],
         [-7.8966e-01, -1.1944e+00, -3.5504e-01,  1.5645e+00, -1.5428e+00],
         [ 1.2981e-01,  7.0207e-01,  4.3856e-01,  4.3933e-01, -1.8242e-01],
         ...,
         [-1.1816e+00, -2.0656e+00, -1.0027e+00, -8.8516e-01,  1.2360e+00],
         [ 7.8958e-01,  1.9610e-01, -8.8837e-01,  1.2824e+00,  9.3674e-01],
         [ 3.8156e-01,  8.6027e-01,  3.2817e-01,  5.9188e-01, -1.9503e+00]],

        [[-1.5584e+00, -1.2375e+00, -3.2094e-01,  6.6372e-01,  1.5936e+00],
         [ 6.7490e-01,  2.7213e-01,  1.8411e+00, -1.4736e+00, -3.2531e-01],
         [ 9.6957e-02,  9.9925e-01, -1.5838e-01,  1.3388e+00,  3.9225e-01],
         ...,
         [-4.7506e-01, -2.6398e-01,  2.5711e-01,  5.5053e-01,  2.9393e-01],
         [ 7.6167e-01,  6.3414e-01, -6.2646e-01,  2.5528e-01, -1.1959e-01],
         [-2.8540e-01, -4.5929e-01, -4.1384e-01,  6.1178e-01,  1.1477e+00]],

        [[ 1.2827e+00,  1.3064e+00, -7.5187e

In [39]:
# Subtract mean from each sample (implicit broadcasting)
normalized = batch_data - mean         # mean broadcasts to (32, 10, 5)
print(f"Normalized:\n {normalized}")

Normalized:
 tensor([[[ 1.1075e+00, -1.0995e+00,  2.8660e+00,  3.0428e+00,  6.0431e-01],
         [-5.2487e-01, -1.6168e+00,  1.0579e+00,  3.7843e+00, -1.0565e+00],
         [ 3.9460e-01,  2.7966e-01,  1.8515e+00,  2.6592e+00,  3.0382e-01],
         ...,
         [-9.1676e-01, -2.4880e+00,  4.1024e-01,  1.3347e+00,  1.7222e+00],
         [ 1.0544e+00, -2.2630e-01,  5.2454e-01,  3.5022e+00,  1.4230e+00],
         [ 6.4636e-01,  4.3787e-01,  1.7411e+00,  2.8117e+00, -1.4641e+00]],

        [[-1.2936e+00, -1.6599e+00,  1.0920e+00,  2.8836e+00,  2.0798e+00],
         [ 9.3969e-01, -1.5027e-01,  3.2540e+00,  7.4624e-01,  1.6094e-01],
         [ 3.6175e-01,  5.7684e-01,  1.2545e+00,  3.5586e+00,  8.7850e-01],
         ...,
         [-2.1026e-01, -6.8638e-01,  1.6700e+00,  2.7704e+00,  7.8017e-01],
         [ 1.0265e+00,  2.1174e-01,  7.8645e-01,  2.4751e+00,  3.6666e-01],
         [-2.0606e-02, -8.8169e-01,  9.9908e-01,  2.8316e+00,  1.6340e+00]],

        [[ 1.5475e+00,  8.8402e-01,  1.3377

In [40]:
# Explicit approach
mean_expanded = mean.expand(32, 10, 5)
normalized_explicit = batch_data - mean_expanded
normalized_explicit

tensor([[[ 1.1075e+00, -1.0995e+00,  2.8660e+00,  3.0428e+00,  6.0431e-01],
         [-5.2487e-01, -1.6168e+00,  1.0579e+00,  3.7843e+00, -1.0565e+00],
         [ 3.9460e-01,  2.7966e-01,  1.8515e+00,  2.6592e+00,  3.0382e-01],
         ...,
         [-9.1676e-01, -2.4880e+00,  4.1024e-01,  1.3347e+00,  1.7222e+00],
         [ 1.0544e+00, -2.2630e-01,  5.2454e-01,  3.5022e+00,  1.4230e+00],
         [ 6.4636e-01,  4.3787e-01,  1.7411e+00,  2.8117e+00, -1.4641e+00]],

        [[-1.2936e+00, -1.6599e+00,  1.0920e+00,  2.8836e+00,  2.0798e+00],
         [ 9.3969e-01, -1.5027e-01,  3.2540e+00,  7.4624e-01,  1.6094e-01],
         [ 3.6175e-01,  5.7684e-01,  1.2545e+00,  3.5586e+00,  8.7850e-01],
         ...,
         [-2.1026e-01, -6.8638e-01,  1.6700e+00,  2.7704e+00,  7.8017e-01],
         [ 1.0265e+00,  2.1174e-01,  7.8645e-01,  2.4751e+00,  3.6666e-01],
         [-2.0606e-02, -8.8169e-01,  9.9908e-01,  2.8316e+00,  1.6340e+00]],

        [[ 1.5475e+00,  8.8402e-01,  1.3377e+00,  3.4027

**Example 3: Complex Broadcasting:**

In [41]:
# Multi-dimensional broadcasting
a = torch.randn(8, 1, 6, 1)      # shape: (8, 1, 6, 1)
b = torch.randn(7, 1, 5)         # shape: (7, 1, 5)

# Broadcasting rules applied:
# a: (8, 1, 6, 1) -> (8, 7, 6, 5)
# b: (   7, 1, 5) -> (8, 7, 6, 5)
result = a + b                    # final shape: (8, 7, 6, 5)
result

tensor([[[[ 3.5980e-01, -7.3781e-01,  2.2131e-02,  1.3499e-01,  1.1594e+00],
          [ 9.9263e-01, -1.0498e-01,  6.5496e-01,  7.6781e-01,  1.7922e+00],
          [ 8.5279e-01, -2.4482e-01,  5.1512e-01,  6.2797e-01,  1.6524e+00],
          [ 1.0398e+00, -5.7785e-02,  7.0215e-01,  8.1501e-01,  1.8394e+00],
          [ 2.0028e+00,  9.0524e-01,  1.6652e+00,  1.7780e+00,  2.8024e+00],
          [ 1.9342e+00,  8.3655e-01,  1.5965e+00,  1.7093e+00,  2.7337e+00]],

         [[ 6.5613e-02, -1.0210e+00,  3.0229e-01, -6.1205e-01, -7.0319e-01],
          [ 6.9844e-01, -3.8816e-01,  9.3511e-01,  2.0773e-02, -7.0364e-02],
          [ 5.5860e-01, -5.2799e-01,  7.9527e-01, -1.1906e-01, -2.1020e-01],
          [ 7.4564e-01, -3.4096e-01,  9.8231e-01,  6.7970e-02, -2.3166e-02],
          [ 1.7087e+00,  6.2206e-01,  1.9453e+00,  1.0310e+00,  9.3986e-01],
          [ 1.6400e+00,  5.5338e-01,  1.8766e+00,  9.6231e-01,  8.7117e-01]],

         [[-1.3048e+00,  7.4937e-01, -7.0184e-01, -1.0913e+00,  1.2733e+

> ![](https://av-eks-blogoptimized.s3.amazonaws.com/15357brd_fig_3.png)

#### **Practical Benefits:**

**Broadcasting provides several advantages:**

1. **`Memory Efficiency`**: Avoids creating unnecessary copies of data by working with views when possible.

2. **`Code Simplicity`**: Eliminates the need for explicit loops or manual reshaping in many cases.

3. **`Performance`**: Optimized implementations can perform broadcasted operations faster than equivalent manual operations.

4. **`Flexibility`**: Allows natural mathematical operations between arrays of different but compatible shapes.

The key to mastering broadcasting is understanding how PyTorch aligns dimensions and applies the compatibility rules, which enables writing more concise and efficient tensor operations.

----
----
------
----

## **Outer Product:**

The outer product is a fundamental operation in linear algebra that takes two vectors and produces a matrix. 

Given two vectors **$u$** and **$v$**, their outer product creates a matrix where each element is the product of corresponding elements from the two vectors.

For vectors **$u$** $∈ ℝᵐ$ and **v** $∈ ℝⁿ$, the outer product **$u$** ⊗ **v** (or **uv**ᵀ) produces an $m×n$ matrix **$M$** where:

**$M$**[i,j] = **$u$**[i] × **$v$**[j]

This is equivalent to multiplying a column vector by a row vector: **$u$** $(m×1)$ × **v**ᵀ $(1×n)$ = **$M$** $(m×n)$.

> ![](https://media.geeksforgeeks.org/wp-content/uploads/20190413155438/outerProduct.png)

**Geometric Interpretation:**

The outer product captures the interaction between every pair of components from two vectors. Each row of the resulting matrix is the first vector scaled by the corresponding element of the second vector.

PyTorch provides several ways to compute outer products:

**Method 1: Using `torch.outer()`:**

In [None]:
import torch

# Define two vectors
u = torch.tensor([1., 2., 3.])      # shape: (3,)
v = torch.tensor([4., 5., 6., 7.])  # shape: (4,)
print(f"U: {u}")
print(f"V: {v}")

# Compute outer product
outer_product = torch.outer(u, v)   # shape: (3, 4)
print(f"\nOuter Product:\n {outer_product}")

U: tensor([1., 2., 3.])
V: tensor([4., 5., 6., 7.])

Outer Product:
 tensor([[ 4.,  5.,  6.,  7.],
        [ 8., 10., 12., 14.],
        [12., 15., 18., 21.]])


**Method 2: Using Matrix Multiplication:**

In [48]:
# Reshape vectors for matrix multiplication
u_col = u.unsqueeze(1)    # shape: (3, 1) - column vector
v_row = v.unsqueeze(0)    # shape: (1, 4) - row vector
print(f"U col:\n {u_col}\n")
print(f"V row:\n {v_row}\n")
outer_product = u_col @ v_row  # or torch.mm(u_col, v_row)
print(f"Outer Product:\n {outer_product}")
# Same result as above

U col:
 tensor([[1.],
        [2.],
        [3.]])

V row:
 tensor([[4., 5., 6., 7.]])

Outer Product:
 tensor([[ 4.,  5.,  6.,  7.],
        [ 8., 10., 12., 14.],
        [12., 15., 18., 21.]])


**Method 3: Using Broadcasting:**

In [49]:
# Leverage broadcasting for outer product
u_expanded = u.unsqueeze(1)  # shape: (3, 1)
v_expanded = v.unsqueeze(0)  # shape: (1, 4)

outer_product = u_expanded * v_expanded  # Broadcasting creates (3, 4)
print(outer_product)
# Same result as above

tensor([[ 4.,  5.,  6.,  7.],
        [ 8., 10., 12., 14.],
        [12., 15., 18., 21.]])


**Method 4: Using `torch.einsum()`:**

In [50]:
# Einstein summation notation
outer_product = torch.einsum('i,j->ij', u, v)
print(outer_product)
# Same result as above

tensor([[ 4.,  5.,  6.,  7.],
        [ 8., 10., 12., 14.],
        [12., 15., 18., 21.]])


### **Practical Examples:**

**Example 1: Feature Interaction Matrix:**

In [51]:
# Creating interaction features between two feature vectors
features_a = torch.tensor([0.5, 1.2, 0.8])     # 3 features
features_b = torch.tensor([2.0, 1.5, 0.9, 1.1]) # 4 features

# Outer product creates all pairwise interactions
interaction_matrix = torch.outer(features_a, features_b)
print(f"Interaction matrix shape: {interaction_matrix.shape}")
# Shape: (3, 4) - captures all 12 possible feature interactions

Interaction matrix shape: torch.Size([3, 4])


**Example 2: Batch Outer Products:**

In [52]:
# Computing outer products for batches of vectors
batch_u = torch.randn(32, 5)  # 32 samples, 5-dimensional vectors
batch_v = torch.randn(32, 3)  # 32 samples, 3-dimensional vectors

# Method 1: Using torch.bmm (batch matrix multiplication)
u_batch = batch_u.unsqueeze(2)  # shape: (32, 5, 1)
v_batch = batch_v.unsqueeze(1)  # shape: (32, 1, 3)
batch_outer = torch.bmm(u_batch, v_batch)  # shape: (32, 5, 3)

# Method 2: Using einsum for batch outer products
batch_outer_einsum = torch.einsum('bi,bj->bij', batch_u, batch_v)
print(f"Batch outer product shape: {batch_outer_einsum.shape}")
# Shape: (32, 5, 3)

Batch outer product shape: torch.Size([32, 5, 3])


**Example 3: Rank-1 Matrix Decomposition:**

In [53]:
# Creating a rank-1 matrix (outer product creates matrices of rank 1)
u = torch.tensor([1., 2., 3.])
v = torch.tensor([4., 5.])

rank_1_matrix = torch.outer(u, v)
print(f"Matrix rank: {torch.linalg.matrix_rank(rank_1_matrix)}")
# Output: Matrix rank: 1

# Any rank-1 matrix can be decomposed as an outer product

Matrix rank: 1


**Example 4: Gradient Computation in Neural Networks:**

In [54]:
# Outer products appear in gradient computations
# For example, in computing gradients of bilinear layers

input_vector = torch.randn(10, requires_grad=True)
output_vector = torch.randn(8, requires_grad=True)

# Bilinear operation simulation
weight_gradient = torch.outer(input_vector, output_vector)
print(f"Weight gradient shape: {weight_gradient.shape}")
# Shape: (10, 8) - represents gradient of a bilinear weight matrix

Weight gradient shape: torch.Size([10, 8])


**Memory Usage:**

In [55]:
# Outer products can create large matrices
u = torch.randn(1000)
v = torch.randn(1000)

# This creates a 1000×1000 matrix (4MB for float32)
large_outer = torch.outer(u, v)
print(f"Memory usage: {large_outer.numel() * 4 / 1024**2:.2f} MB")

Memory usage: 3.81 MB


**Computational Complexity:**
The outer product has O(mn) complexity where m and n are the lengths of the input vectors, as it computes m×n products.

#### **Common Use Cases:**

1. **Machine Learning**: Feature interactions, attention mechanisms, and bilinear transformations.

2. **Computer Vision**: Constructing covariance matrices and kernel matrices.

3. **Signal Processing**: Creating correlation matrices and spectrograms.

4. **Graph Theory**: Adjacency matrix construction from node features.

The outer product is a powerful operation that bridges vector operations with matrix computations, making it essential for many advanced linear algebra applications in deep learning and scientific computing.

----
-----
-----
-----

## **Inner Product:**

The `inner product` (also called `dot product` or `scalar product`) is a fundamental operation in linear algebra that takes two vectors and produces a scalar value. 

It measures the `similarity` between vectors and captures their geometric relationship.

For two vectors **$u$** = $[u₁, u₂, ..., uₙ]$ and **$v$** = $[v₁, v₂, ..., vₙ]$ in ℝⁿ, the inner product is:

> **$u$** · **$v$** = $u₁v₁ + u₂v₂ + ... + uₙvₙ = Σᵢ uᵢvᵢ$

Geometrically, it equals: **$u$** · **$v$** = ||**$u$**|| ||**$v$**|| $cos(θ)$, where $θ$ is the angle between the vectors.

#### **Properties and Interpretation:**

1. **`Similarity Measure`**: Higher values indicate more similar vectors (same direction).

2. **`Orthogonality`**: Inner product = 0 means vectors are perpendicular.

3. **`Magnitude Relationship`**: The inner product relates to vector lengths and angles.

4. **`Projection`**: **$u$** · **$v$** / ||**$v$**|| gives the projection of **$u$** onto **$v$**.

#### **Inner Product in PyTorch**

**Method 1: Using `torch.dot()`:**

In [56]:
import torch

# For 1D vectors only
u = torch.tensor([1., 2., 3.])
v = torch.tensor([4., 5., 6.])

inner_product = torch.dot(u, v)
print(inner_product)  # Output: tensor(32.) = 1*4 + 2*5 + 3*6

tensor(32.)


**Method 2: Using `torch.matmul()` or `@`:**

In [57]:
# Works for various tensor shapes
u = torch.tensor([1., 2., 3.])
v = torch.tensor([4., 5., 6.])

inner_product = torch.matmul(u, v)  # or u @ v
print(inner_product)  # Output: tensor(32.)

tensor(32.)


**Method 3: Using Element-wise Multiplication and Sum:**

In [58]:
# Manual computation
u = torch.tensor([1., 2., 3.])
v = torch.tensor([4., 5., 6.])

inner_product = torch.sum(u * v)
print(inner_product)  # Output: tensor(32.)

tensor(32.)


**Method 4: Using `torch.einsum()`:**

In [59]:
# Einstein summation notation
inner_product = torch.einsum('i,i->', u, v)
print(inner_product)  # Output: tensor(32.)

tensor(32.)


#### **Batch Inner Products:**


### Computing Multiple Inner Products


```python
# Batch of vectors
batch_u = torch.randn(32, 128)  # 32 vectors of dimension 128
batch_v = torch.randn(32, 128)  # 32 vectors of dimension 128

# Element-wise inner products
batch_inner = torch.sum(batch_u * batch_v, dim=1)  # shape: (32,)

# Or using einsum
batch_inner_einsum = torch.einsum('bi,bi->b', batch_u, batch_v)
print(f"Batch inner products shape: {batch_inner.shape}")
```



### Matrix-Vector Inner Products


```python
# Inner product between each row of matrix and a vector
matrix = torch.randn(10, 5)  # 10 vectors of dimension 5
vector = torch.randn(5)      # single vector

# Each row's inner product with vector
inner_products = matrix @ vector  # shape: (10,)
print(f"Inner products shape: {inner_products.shape}")
```



## Uses in Vector Databases



### 1. Similarity Search


```python
# Vector database similarity search simulation
def cosine_similarity(query_vector, database_vectors):
    """Compute cosine similarity using inner products"""
    # Normalize vectors
    query_norm = query_vector / torch.norm(query_vector)
    db_norm = database_vectors / torch.norm(database_vectors, dim=1, keepdim=True)
    
    # Cosine similarity = normalized inner product
    similarities = db_norm @ query_norm
    return similarities

# Example usage
query = torch.randn(512)        # Query embedding
database = torch.randn(1000, 512)  # Database embeddings

similarities = cosine_similarity(query, database)
top_k_indices = torch.topk(similarities, k=5).indices
print(f"Top 5 similar vectors: {top_k_indices}")
```



### 2. Approximate Nearest Neighbor Search


```python
# Efficient similarity computation for large databases
def efficient_search(query_embedding, database_embeddings, top_k=10):
    """Efficient vector similarity search"""
    # Batch inner product computation
    similarities = database_embeddings @ query_embedding
    
    # Get top-k most similar
    top_similarities, top_indices = torch.topk(similarities, k=top_k)
    
    return top_indices, top_similarities

# Example with large database
large_db = torch.randn(100000, 768)  # 100k vectors, 768 dimensions
query = torch.randn(768)

indices, scores = efficient_search(query, large_db)
print(f"Found {len(indices)} similar vectors")
```



### 3. Semantic Search


```python
# Semantic search using pre-computed embeddings
class SemanticSearch:
    def __init__(self, document_embeddings):
        self.embeddings = document_embeddings
        # Normalize for cosine similarity
        self.normalized_embeddings = self.embeddings / torch.norm(
            self.embeddings, dim=1, keepdim=True
        )
    
    def search(self, query_embedding, top_k=5):
        # Normalize query
        query_norm = query_embedding / torch.norm(query_embedding)
        
        # Compute similarities using inner product
        similarities = self.normalized_embeddings @ query_norm
        
        # Return top results
        return torch.topk(similarities, k=top_k)

# Usage example
doc_embeddings = torch.randn(10000, 384)  # Document embeddings
search_engine = SemanticSearch(doc_embeddings)

query_emb = torch.randn(384)
results = search_engine.search(query_emb, top_k=3)
```



## Uses in Deep Learning

### 1. Attention Mechanisms


```python
# Scaled dot-product attention
def scaled_dot_product_attention(query, key, value, scale=None):
    """Attention mechanism using inner products"""
    d_k = query.size(-1)
    scale = scale or (1.0 / torch.sqrt(torch.tensor(d_k, dtype=torch.float32)))
    
    # Inner product between queries and keys
    attention_scores = torch.matmul(query, key.transpose(-2, -1)) * scale
    attention_weights = torch.softmax(attention_scores, dim=-1)
    
    # Weighted sum using inner products
    output = torch.matmul(attention_weights, value)
    return output

# Example usage
seq_len, d_model = 10, 64
query = torch.randn(1, seq_len, d_model)
key = torch.randn(1, seq_len, d_model)
value = torch.randn(1, seq_len, d_model)

attention_output = scaled_dot_product_attention(query, key, value)
print(f"Attention output shape: {attention_output.shape}")
```



### 2. Loss Functions


```python
# Cosine similarity loss
def cosine_similarity_loss(embeddings1, embeddings2, targets):
    """Cosine similarity loss using inner products"""
    # Normalize embeddings
    emb1_norm = embeddings1 / torch.norm(embeddings1, dim=1, keepdim=True)
    emb2_norm = embeddings2 / torch.norm(embeddings2, dim=1, keepdim=True)
    
    # Cosine similarity via inner product
    cosine_sim = torch.sum(emb1_norm * emb2_norm, dim=1)
    
    # Convert to loss
    loss = torch.mean((cosine_sim - targets) ** 2)
    return loss

# Example
emb1 = torch.randn(32, 128)  # Batch of embeddings
emb2 = torch.randn(32, 128)  # Batch of embeddings
targets = torch.ones(32)     # Target similarities

loss = cosine_similarity_loss(emb1, emb2, targets)
```



### 3. Neural Network Layers


```python
# Linear layer implementation using inner products
class CustomLinear(torch.nn.Module):
    def __init__(self, in_features, out_features):
        super().__init__()
        self.weight = torch.nn.Parameter(torch.randn(out_features, in_features))
        self.bias = torch.nn.Parameter(torch.randn(out_features))
    
    def forward(self, x):
        # Each output is inner product of input with weight row
        return torch.matmul(x, self.weight.t()) + self.bias

# Usage
layer = CustomLinear(10, 5)
input_tensor = torch.randn(32, 10)
output = layer(input_tensor)
print(f"Output shape: {output.shape}")  # (32, 5)
```



### 4. Similarity Learning


```python
# Triplet loss using inner products
def triplet_loss(anchor, positive, negative, margin=1.0):
    """Triplet loss using inner product-based distances"""
    # Compute similarities using inner products
    pos_sim = torch.sum(anchor * positive, dim=1)
    neg_sim = torch.sum(anchor * negative, dim=1)
    
    # Triplet loss
    loss = torch.clamp(neg_sim - pos_sim + margin, min=0.0)
    return torch.mean(loss)

# Example
anchor = torch.randn(32, 128)
positive = torch.randn(32, 128)
negative = torch.randn(32, 128)

loss = triplet_loss(anchor, positive, negative)
```



## Performance Optimization

### GPU Acceleration


```python
# Move to GPU for faster computation
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

large_matrix = torch.randn(10000, 1024).to(device)
query_vector = torch.randn(1024).to(device)

# Fast inner product computation on GPU
similarities = large_matrix @ query_vector
```



### **Memory-Efficient Batch Processing:**

In [61]:
def chunked_inner_product(matrix, vector, chunk_size=1000):
    """Compute inner products in chunks to save memory"""
    results = []
    for i in range(0, matrix.size(0), chunk_size):
        chunk = matrix[i:i+chunk_size]
        chunk_result = chunk @ vector
        results.append(chunk_result)
    return torch.cat(results)

# Usage for very large matrices
huge_matrix = torch.randn(1000000, 512)
vector = torch.randn(512)
results = chunked_inner_product(huge_matrix, vector)
results

tensor([28.1641, -4.2277,  0.1349,  ..., -7.6924,  6.2356, 23.3391])

In [62]:
results.shape

torch.Size([1000000])

The inner product is fundamental to modern AI systems, serving as the core operation in `similarity search`, `attention mechanisms`, and `neural network computations`. Its efficiency and mathematical properties make it indispensable for both `vector databases` and `deep learning applications.`


-----
----
-----
-----
----

## **Vector Magnitude (Norm):**

A vector norm is a function that assigns a non-negative real number to each vector in a vector space, representing the `"size"` or `"length"` of that vector. It's denoted as ||$v$|| for a vector $v$.

**Properties of Vector Norms:**

Vector norms must satisfy four fundamental properties:

1. **`Non-negativity`**: $||v|| ≥ 0$ for all vectors v, and $||v|| = 0$ if and only if $v = 0$

2. **`Homogeneity`**: |$|αv|| = |α| ||v||$ for any scalar α and vector $v$

3. **`Triangle inequality`**: $||u + v|| ≤ ||u|| + ||v||$ for any vectors $u$ and $v$

4. **Defini`teness**: $||v|| = 0$ implies $v = 0$ (zero vector)

**Intuitive Meaning:**  
Think of a vector norm as measuring the "distance" from the origin to the point represented by the vector. In 2D or 3D space, this corresponds to the geometric length of the vector. The norm gives you a single number that captures how "big" or "far" the vector is, regardless of its direction.

For example, if you have a vector representing velocity, its norm tells you the speed (magnitude) without caring about the direction of movement.

**Common Types of Norms:**

  - **L1 norm (Manhattan norm)**: $||v||₁ = Σ|vᵢ|$ - sum of absolute values
  - **L2 norm (Euclidean norm)**: $||v||₂ = √(Σvᵢ²)$ - square root of sum of squares
  - **L∞ norm (Maximum norm)**: $||v||∞ = max|vᵢ|$ - maximum absolute value


---

- **L0 norm**: $||v||₀$ = number of non-zero elements (not technically a norm, but useful for sparsity)

- **L1 norm (Manhattan/Taxicab norm)**: $||v||₁ = Σ|vᵢ|$ - sum of absolute values of components

- **L2 norm (Euclidean norm)**: $||v||₂ = √(Σvᵢ²)$ - square root of sum of squared components

- **Lp norm**: $||v||p = (Σ|vᵢ|ᵖ)^(1/p)$ - generalized p-norm for any $p ≥ 1$

- **L∞ norm (Maximum/Chebyshev norm)**: $||v||∞ = max|vᵢ|$ - maximum absolute value among components

- **Frobenius norm**: $||A||F = √(Σᵢⱼ|aᵢⱼ|²)$ - extension of L2 norm for matrices

- **Spectral norm**: $||A||₂$ = largest singular value of matrix A

- **Nuclear norm**: $||A||*$ = sum of singular values of matrix A

- **Weighted norm**: $||v||W = √(vᵀWv)$ - norm with positive definite weight matrix W

- **Hamming norm**: number of positions where vectors differ (for binary vectors)

- **Minkowski norm**: $||v||p = (Σ|vᵢ|ᵖ)^(1/p)$ - another name for Lp norm

- **Mahalanobis norm**: $||v||M = √(vᵀM⁻¹v)$ - norm accounting for covariance structure

----

#### **Applications in Deep Learning:**

Vector norms are crucial in deep learning for several reasons:

1. **`Gradient clipping`**: Prevents exploding gradients by scaling them when their norm exceeds a threshold, ensuring stable training

2. **`Regularization`**: L1 and L2 regularization add norm penalties to loss functions, preventing overfitting by encouraging smaller weights

3. **`Normalization techniques`**: Batch normalization and layer normalization use norms to standardize inputs, improving training stability and convergence

4. **`Weight initialization`**: Proper initialization often involves controlling the norm of initial weights to maintain appropriate signal propagation

5. **`Optimization`**: Many optimization algorithms use norms to determine step sizes and convergence criteria

### **Applications in Vector Databases:**

Vector databases rely heavily on norms for similarity search and retrieval:

1. **`Distance metrics`**: Norms define distance functions (like `Euclidean distance`) used to measure similarity between vectors representing documents, images, or other data

2. **`Indexing efficiency`**: Many indexing structures (like $LSH$ or $HNSW$) use norm-based distances to organize and search through high-dimensional vector spaces

3. **`Normalization`**: Vectors are often normalized (divided by their norm) to unit length, making cosine similarity equivalent to dot product and improving search performance

4. **`Query optimization`**: Norms help prune search spaces by eliminating vectors that are too far from query vectors based on triangle inequality properties

In [1]:
import torch
import torch.nn.functional as F

# Create example vectors
v1 = torch.tensor([3.0, 4.0, 5.0])
v2 = torch.tensor([[1.0, 2.0, 3.0], 
                   [4.0, 5.0, 6.0]])

print("Vector v1:", v1)
print("Matrix v2:", v2)

Vector v1: tensor([3., 4., 5.])
Matrix v2: tensor([[1., 2., 3.],
        [4., 5., 6.]])


In [2]:
# L2 norm (Euclidean norm) - default
l2_norm_v1 = torch.norm(v1)
print(f"L2 norm of v1: {l2_norm_v1}")

L2 norm of v1: 7.071067810058594


In [3]:
# L2 norm with explicit parameter
l2_norm_explicit = torch.norm(v1, p=2)
print(f"L2 norm (explicit): {l2_norm_explicit}")

L2 norm (explicit): 7.071067810058594


In [4]:
# L1 norm (Manhattan norm)
l1_norm = torch.norm(v1, p=1)
print(f"L1 norm of v1: {l1_norm}")

L1 norm of v1: 12.0


In [5]:
# L-infinity norm (Maximum norm)
linf_norm = torch.norm(v1, p=float('inf'))
print(f"L-infinity norm of v1: {linf_norm}")

L-infinity norm of v1: 5.0


In [6]:
# Frobenius norm for matrices (equivalent to L2 for flattened matrix)
frobenius_norm = torch.norm(v2)
print(f"Frobenius norm of v2: {frobenius_norm}")

Frobenius norm of v2: 9.539392471313477


In [7]:
print("\n--- Norms along specific dimensions ---")

# L2 norm along dimension 1 (columns)
l2_dim1 = torch.norm(v2, dim=1)
print(f"L2 norm along dim=1: {l2_dim1}")

# L2 norm along dimension 0 (rows)
l2_dim0 = torch.norm(v2, dim=0)
print(f"L2 norm along dim=0: {l2_dim0}")


--- Norms along specific dimensions ---
L2 norm along dim=1: tensor([3.7417, 8.7750])
L2 norm along dim=0: tensor([4.1231, 5.3852, 6.7082])


In [8]:
print("\n--- Normalization (unit vectors) ---")

# Normalize vector to unit length (L2 norm = 1)
v1_normalized = F.normalize(v1, p=2, dim=0)
print(f"Normalized v1: {v1_normalized}")
print(f"Norm of normalized v1: {torch.norm(v1_normalized)}")

# Normalize matrix rows to unit length
v2_normalized = F.normalize(v2, p=2, dim=1)
print(f"Row-normalized v2:\n{v2_normalized}")
print(f"Row norms after normalization: {torch.norm(v2_normalized, dim=1)}")


--- Normalization (unit vectors) ---
Normalized v1: tensor([0.4243, 0.5657, 0.7071])
Norm of normalized v1: 1.0
Row-normalized v2:
tensor([[0.2673, 0.5345, 0.8018],
        [0.4558, 0.5698, 0.6838]])
Row norms after normalization: tensor([1.0000, 1.0000])


In [11]:
print("\n--- Practical examples ---")

# Gradient clipping example
gradients = torch.tensor([10.0, 20.0, 30.0])
max_norm = 5.0
grad_norm = torch.norm(gradients)
print(f"Original gradient norm: {grad_norm}")

if grad_norm > max_norm:
    clipped_gradients = gradients * (max_norm / grad_norm)
    print(f"Clipped gradients: {clipped_gradients}")
    print(f"Clipped gradient norm: {torch.norm(clipped_gradients)}")

# L2 regularization term
weights = torch.tensor([1.5, -2.0, 0.5])
l2_reg = 0.01 * torch.norm(weights, p=2) ** 2
print(f"L2 regularization term: {l2_reg}")

# Distance calculation between vectors
point1 = torch.tensor([1.0, 2.0, 3.0])
point2 = torch.tensor([4.0, 5.0, 6.0])
euclidean_distance = torch.norm(point1 - point2)
print(f"Euclidean distance: {euclidean_distance}")

# Cosine similarity using normalized vectors
cos_sim = torch.dot(F.normalize(point1, dim= 0), F.normalize(point2, dim=0), )
print(f"Cosine similarity: {cos_sim}")


--- Practical examples ---
Original gradient norm: 37.41657257080078
Clipped gradients: tensor([1.3363, 2.6726, 4.0089])
Clipped gradient norm: 5.0
L2 regularization term: 0.06499999761581421
Euclidean distance: 5.196152210235596
Cosine similarity: 0.9746317863464355


**PyTorch provides several convenient functions for calculating norms:**

- **`torch.norm()`**: Main function for computing various `norms`, with parameters for norm type (`p`) and dimension (`dim`)

- **`torch.nn.functional.normalize()`**: Normalizes vectors to unit length, useful for creating unit vectors

- **`Gradient clipping`**: Use `torch.nn.utils.clip_grad_norm_()` to clip gradients based on their norm

----
----
-----
----
-----
----
-----

## **Unit Vector (Normalization):**

-----
-----
-----
-----
-----
------

## **Distance Between Vectors:**

------
----
-----
------
-----
----

## **Vector Projection:**

----
----
-----
-----
-----
----

## **Scalar Projection:**

-----
----
-----
----
-----

## **Angle Between Vectors:**

----
-----
----
----
-----

## **Linear Combination of Vectors:**


----
----
-----
-----
----
---

## **Hadamard product:**

-----
-----
-----
----
----
---

## **Matrix-vector multiplication:**

------
------
----
-----
-----
----

## **Vector-matrix multiplication:**

----
----
-----
----

## **Transpose of Vectors:**