# 01 - Vector and Matrix Basics

This notebook introduces the basic operations on vectors and matrices using **NumPy**. These operations are foundational for understanding computations in neural networks and large language models (LLMs).

You'll find **tasks** throughout—try to implement each one before checking the solution!

## 🧮 Vector and Matrix Initialization

Create 1D vectors and 2D matrices using NumPy. Understanding how to define and inspect them is the first step.

### Task:
- Create a 3D vector `v = [1, 2, 3]`
- Create a 2x3 matrix `M = [[1, 2, 3], [4, 5, 6]]`

In [None]:
import numpy as np

# TODO: Define a 3D vector
v = np.array([1, 2, 3])

# TODO: Define a 2x3 matrix
M = np.array([[1, 2, 3],
              [4, 5, 6]])

print('Vector v:', v)
print('Matrix M:\n', M)

## 🔗 Dot Product and Cosine Similarity

**Dot product** is used in attention mechanisms (`Q·Kᵀ`), similarity checks, and projection.

### Task:
- Implement dot product of two vectors
- Compute cosine similarity

Recall: $\cos(\theta) = \frac{A \cdot B}{\|A\| \cdot \|B\|}$

In [None]:
# Define vectors
A = np.array([1, 2])
B = np.array([3, 4])

# TODO: Implement dot product
dot = np.dot(A, B)

# TODO: Compute cosine similarity
cos_sim = dot / (np.linalg.norm(A) * np.linalg.norm(B))

print('Dot product:', dot)
print('Cosine similarity:', cos_sim)

## 🔁 Matrix Multiplication

Used everywhere: feedforward layers (`W·x`), attention scores, transformer blocks.

### Task:
- Multiply a 2x3 matrix with a 3x1 vector.
- Explain the shape flow: (2x3) · (3x1) → (2x1)

In [None]:
# Define matrix and vector
mat = np.array([[1, 2, 3], [4, 5, 6]])  # shape: (2, 3)
vec = np.array([[1], [0], [-1]])       # shape: (3, 1)

# TODO: Matrix-vector multiplication
output = np.dot(mat, vec)

print('Output shape:', output.shape)
print('Output:\n', output)

## ⚙️ Element-wise Operations & Broadcasting

Used in activation functions, residual connections, etc.

### Task:
- Add a vector to each row of a matrix (broadcasting).
- Square all elements in a matrix.

In [None]:
matrix = np.array([[1, 2, 3], [4, 5, 6]])
vector = np.array([1, 0, -1])

# TODO: Broadcasted addition
broadcasted_sum = matrix + vector

# TODO: Element-wise square
squared = matrix ** 2

print('Broadcasted sum:\n', broadcasted_sum)
print('Squared matrix:\n', squared)

## 📏 Vector Norms and Unit Vectors

Norms are used in normalization (LayerNorm), cosine similarity, etc.

### Task:
- Compute L2 norm of a vector.
- Normalize it to a unit vector.

In [None]:
vec = np.array([3, 4])

# TODO: Compute L2 norm
norm = np.linalg.norm(vec)

# TODO: Normalize the vector
unit_vector = vec / norm

print('L2 norm:', norm)
print('Unit vector:', unit_vector)

## 🔄 Matrix Transpose

The **transpose** of a matrix flips its rows and columns. Used in backpropagation and attention.

### Task:
- Transpose a 2x3 matrix to a 3x2 matrix.

In [None]:
M = np.array([[1, 2, 3], [4, 5, 6]])

# TODO: Transpose the matrix
M_T = M.T

print('Transposed matrix:\n', M_T)

## 🟰 Check for Square Matrix

A square matrix has the same number of rows and columns. This is important for certain operations (e.g., self-attention weight matrices).

### Task:
- Write a function to check if a matrix is square.

In [None]:
def is_square_matrix(M):
    return M.shape[0] == M.shape[1]

# Test
print(is_square_matrix(np.array([[1,2],[3,4]])))  # True
print(is_square_matrix(np.array([[1,2,3],[4,5,6]])))  # False

## 🧠 Summary: Why This Matters for LLMs

These operations form the low-level math behind:
- Attention mechanism (`Q·Kᵀ`, softmax)
- Feedforward layers (`Wx + b`)
- Token embeddings and projection layers
- LayerNorm and residual connections

Make sure you're comfortable with these — they'll show up in every component of the LLM!

## 🧠 Final Summary: Why These Basics Matter for LLMs

Congratulations! You've practiced the core vector and matrix operations that are the mathematical backbone of neural networks and large language models (LLMs).

**Key Takeaways:**
- **Vector and matrix initialization**: All data, weights, and activations in LLMs are represented as vectors or matrices.
- **Dot product & cosine similarity**: Used in attention mechanisms to measure similarity between queries and keys.
- **Matrix multiplication**: Fundamental for feedforward layers, attention score computation, and transforming embeddings.
- **Element-wise operations & broadcasting**: Power activation functions, residual connections, and normalization steps.
- **Norms and normalization**: Essential for techniques like LayerNorm and for stable training.
- **Transpose and shape checks**: Required for aligning data and weights in multi-head attention and backpropagation.

**Next Steps:**
- These operations will appear in every component of an LLM, from token embedding to attention to output projection.
- As you move forward, try to visualize the data flow and shapes at each step—this will help you debug and design your own models.

Ready? In the next notebook, you'll dive into derivatives and the chain rule, which are essential for training neural networks from scratch!
