### Matrix Vector multiplication

#### Set-1

In [2]:
'''
SET–1 : Matrix × Vector Multiplication (Foundations & Intuition)

Q1. What does matrix × vector multiplication mean in simple words?
Ans. It means a matrix takes an input vector and transforms it into a new vector.
'''
# Example
# input vector x → matrix A → output vector y


'''
Q2. What is the shape rule for matrix × vector multiplication?
Ans. The number of columns in the matrix must match the size of the vector.
'''
# Example
# Matrix A shape = (m, n)
# Vector x shape = (n, 1)
# Result shape   = (m, 1)


'''
Q3. Why must the inner dimensions match?
Ans. Because each row of the matrix needs one value from each entry of the vector.
'''
# Example
# (2, 2) × (2, 1) → valid
# (2, 3) × (2, 1) → ❌ invalid


'''
Q4. How does matrix × vector multiplication work geometrically?
Ans. Each row of the matrix takes a dot product with the vector.
'''
# Example
A = [
    [1, 2],
    [3, 4]
]
x = [
    [5],
    [6]
]
# Row 1 · x = 1*5 + 2*6
# Row 2 · x = 3*5 + 4*6


'''
Q5. Why does the output have one value per row of the matrix?
Ans. Because each row produces exactly one dot product result.
'''
# Example
# 2 rows in A → output vector has 2 values


q='''
Q6. What is the data / AI intuition behind matrix × vector multiplication?
Ans. The vector represents input features, and the matrix represents weights
that compute new features.
'''
# Example
# x = [height, weight]
# A = weights
# A × x → transformed features


#### Set-2

In [3]:
'''
SET–2 : Matrix × Vector Multiplication (AI Usage & Deeper Meaning)

Q1. Why is matrix × vector multiplication fundamental in neural networks?
Ans. Because each neural network layer transforms input features into new features using this operation.
'''
# Example
# Input vector x = (features)
# Weight matrix W = (neurons × features)
# Output y = W × x  → neuron activations


'''
Q2. How does one row of a weight matrix relate to one neuron?
Ans. Each row represents one neuron computing a weighted sum of inputs.
'''
# Example
weights_row = [0.2, 0.8]
inputs      = [1.0, 0.5]
# Neuron output = 0.2*1.0 + 0.8*0.5


'''
Q3. Why is matrix × vector multiplication considered many dot products?
Ans. Because each row performs its own dot product with the same input vector.
'''
# Example
# A × x =
# [row₁ · x]
# [row₂ · x]
# [row₃ · x]


'''
Q4. How does this operation help create multiple output features at once?
Ans. Each row generates one output feature, so many features are created together.
'''
# Example
# Matrix shape (4, 10) × vector shape (10, 1)
# Output shape = (4, 1) → 4 new features


'''
Q5. How is matrix × vector multiplication used in Transformers?
Ans. It projects embeddings into query, key, and value spaces.
'''
# Example
# embedding vector x
# Wq × x → query
# Wk × x → key
# Wv × x → value


'''
Q6. Why is this operation efficient on GPUs?
Ans. Because all dot products are independent and can be computed in parallel.
'''
# Example
# Thousands of rows × one vector
# All rows processed simultaneously on GPU


q='''
Q7. What is the key mental model to remember?
Ans. A matrix asks many questions to the same vector, one per row.
'''
# Example
# Each row asks: “How aligned am I with this input?”


### Matrix × Vector Multiplication

In [4]:
import numpy as np

A = np.array([[1, 2],
              [3, 4]])

x = np.array([[5],
              [6]])

y = A @ x
print(y)
# [[17]
#  [39]]

# Input vector → transformed output vector
# Essentially, we can say, Matrix x Vector -> `Vector` transforms the `Matrix` into a new `Vector` (Output Vector)


[[17]
 [39]]


### Each row performs ONE Dot Product

In [5]:
row1 = np.array([1, 2])
row2 = np.array([3, 4])
x_vec = np.array([5, 6])

print(row1 @ x_vec)  # 17
print(row2 @ x_vec)  # 39

# What is happening here?
# -----------------------
# 1. `@` means dot product.
# 2. Dot product = multiply matching elements and add them.
#
# row1 · x_vec = (1*5) + (2*6) = 17
# row2 · x_vec = (3*5) + (4*6) = 39
#
# Think of `x_vec` as a "weight vector".
# Each row is a data sample (one input).
#
# When a matrix has multiple rows:
# - The SAME vector `x_vec` is applied to EVERY row.
# - Dot product is done row-by-row.
#
# This turns each row into ONE number.
#
# If we stack row1 and row2 into a matrix:
#
#   X = [[1, 2],
#        [3, 4]]   shape = (2, 2)
#
# Then:
#   X @ x_vec  →  [17, 39]
#
# This is exactly how a linear layer works:
# - Each row = one sample
# - Each column = one feature
# - Weight vector = how important each feature is
#
# Final idea:
# Matrix × Vector = dot product of EACH ROW with the vector
# Output size = number of rows


17
39


### Why output has one value per row?

In [6]:
# One row → one dot product → one output value.

### Important intuition

In [7]:
# Matrix × vector = many dot products producing many outputs at once.