# Day 9: Vector Spaces, Subspaces, Basis, and Rank

Welcome to Day 9. Today we're going to touch on some of the more theoretical, yet highly practical, concepts of linear algebra. Understanding ideas like linear independence and matrix rank is crucial for diagnosing issues in machine learning models and for comprehending the 'true' dimensionality of your data.

## Objectives for Today:
- Understand the concepts of linear combination, span, and linear independence.
- Learn what a basis of a vector space is.
- Define the rank of a matrix and its connection to linear independence.
- Learn how to check for linear independence and calculate rank using NumPy.
- Connect these concepts to data redundancy and feature selection in machine learning.

In [1]:
# Import necessary libraries
import numpy as np

## 1. Core Concepts (A Quick Tour)

**Linear Combination:** A linear combination of vectors `v1, v2, ..., vk` is any vector of the form `c1*v1 + c2*v2 + ... + ck*vk`, where `c1, c2, ...` are scalars.

**Span:** The span of a set of vectors is the set of *all possible* linear combinations of those vectors. For example, the span of two non-collinear 2D vectors is the entire 2D plane (R²).

**Linear Independence:** This is a critical concept. A set of vectors is **linearly independent** if no vector in the set can be written as a linear combination of the others. In other words, none of the vectors are redundant. If a set of vectors is *not* linearly independent, it is **linearly dependent**.

**Basis and Dimension:** A **basis** for a vector space is a set of linearly independent vectors that span the entire space. The **dimension** of a space is the number of vectors in its basis. For example, the standard basis for R² is `{[1, 0], [0, 1]}`.

## 2. Rank of a Matrix

The **rank** of a matrix `A` is the dimension of the vector space spanned by its columns (or rows). More simply, it is the **maximum number of linearly independent columns (or rows) in the matrix**.

- A matrix is **full rank** if its rank is the maximum possible for its dimensions (i.e., `rank = min(m, n)` for an `m x n` matrix).
- If a square `n x n` matrix is not full rank (`rank < n`), it is called **singular** or **degenerate**. A singular matrix does not have an inverse, and its determinant is zero.

In Python, we use `np.linalg.matrix_rank()` to find the rank of a matrix.

In [2]:
# A full rank 2x2 matrix
A = np.array([[1, 2],
              [3, 4]])
print("Matrix A (full rank):\n", A)
print("Rank of A:", np.linalg.matrix_rank(A))

print("\n---\n")

# A singular 2x2 matrix (row 2 is 2 * row 1)
B = np.array([[1, 2],
              [2, 4]])
print("Matrix B (singular/rank-deficient):\n", B)
print("Rank of B:", np.linalg.matrix_rank(B))

print("\n---\n")

# A rectangular matrix
C = np.array([[1, 2, 3],
              [4, 5, 6]])
print("Matrix C (rectangular):\n", C)
print("Rank of C:", np.linalg.matrix_rank(C))

Matrix A (full rank):
 [[1 2]
 [3 4]]
Rank of A: 2

---

Matrix B (singular/rank-deficient):
 [[1 2]
 [2 4]]
Rank of B: 1

---

Matrix C (rectangular):
 [[1 2 3]
 [4 5 6]]
Rank of C: 2


### Checking for Linear Independence

The rank is the best way to check if a set of vectors is linearly independent. If you have `k` vectors, you can stack them as columns of a matrix. If the rank of that matrix is `k`, the vectors are linearly independent.

For a **square** matrix, an alternative is to check the determinant (`np.linalg.det`). If the determinant is non-zero, the vectors are linearly independent.

### **Exercise 1: Do these vectors form a basis for R²?**

A set of vectors forms a basis for R² if there are exactly 2 of them and they are linearly independent.

1.  Consider `Set 1`: `v1 = [2, 1]` and `v2 = [4, 2]`.
2.  Consider `Set 2`: `v3 = [1, 2]` and `v4 = [3, 1]`.

For each set, create a matrix where the vectors are the columns. Calculate the rank of the matrix. Based on the rank, determine if that set of vectors forms a basis for R².

In [6]:
# Your code for Exercise 1 here
v1 = np.array([2, 1])
v2 = np.array([4, 2])
matrix1 = np.column_stack([v1, v2])
rank1 = np.linalg.matrix_rank(matrix1)

print("--- Set 1 ---")
print("Matrix for Set 1:\n", matrix1)
print("Rank:", rank1)
print("Conclusion: Set 1 is linearly DEPENDENT and does NOT form a basis for R².")

v3 = np.array([1, 2])
v4 = np.array([3, 1])
matrix2 = np.column_stack([v3, v4])
rank2 = np.linalg.matrix_rank(matrix2)
print("--- Set 2 ---")
print("Matrix for Set 2:\n", matrix2)
print("Rank:", rank2)
print("Conclusion: Set 2 is linearly INDEPENDENT DOES form a basis for R².")

--- Set 1 ---
Matrix for Set 1:
 [[2 4]
 [1 2]]
Rank: 1
Conclusion: Set 1 is linearly DEPENDENT and does NOT form a basis for R².
--- Set 2 ---
Matrix for Set 2:
 [[1 3]
 [2 1]]
Rank: 2
Conclusion: Set 2 is linearly INDEPENDENT DOES form a basis for R².


In [7]:
# Solution

# Set 1
v1 = np.array([2, 1])
v2 = np.array([4, 2])
matrix1 = np.column_stack([v1, v2])
rank1 = np.linalg.matrix_rank(matrix1)

print("--- Set 1 ---")
print("Matrix for Set 1:\n", matrix1)
print("Rank:", rank1)
print("Conclusion: Set 1 is linearly DEPENDENT and does NOT form a basis for R².")

# Set 2
v3 = np.array([1, 2])
v4 = np.array([3, 1])
matrix2 = np.column_stack([v3, v4])
rank2 = np.linalg.matrix_rank(matrix2)

print("--- Set 2 ---")
print("Matrix for Set 2:\n", matrix2)
print("Rank:", rank2)
print("Conclusion: Set 2 is linearly INDEPENDENT and DOES form a basis for R².")


--- Set 1 ---
Matrix for Set 1:
 [[2 4]
 [1 2]]
Rank: 1
Conclusion: Set 1 is linearly DEPENDENT and does NOT form a basis for R².
--- Set 2 ---
Matrix for Set 2:
 [[1 3]
 [2 1]]
Rank: 2
Conclusion: Set 2 is linearly INDEPENDENT and DOES form a basis for R².


### **Exercise 2: Determine the Rank**

Calculate the rank of the following matrices and think about what it means for each one.

1. `M1 = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]])` (Identity Matrix)
2. `M2 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])` (Hint: `row3 = 2*row2 - row1`)
3. `M3 = np.array([[5, 1, 4, 3], [2, 8, 1, 0]])` (A rectangular matrix)

In [8]:
# Your code for Exercise 2 here
M1 = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]])
M2 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
M3 = np.array([[5, 1, 4, 3], [2, 8, 1, 0]])

print("Rank of M1 (Identity):", np.linalg.matrix_rank(M1))
print("Rank of M2 (Identity):", np.linalg.matrix_rank(M2))
print("Rank of M3 (Identity):", np.linalg.matrix_rank(M3))

Rank of M1 (Identity): 3
Rank of M2 (Identity): 2
Rank of M3 (Identity): 2


In [None]:
# Solution
M1 = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]])
M2 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
M3 = np.array([[5, 1, 4, 3], [2, 8, 1, 0]])

print("Rank of M1 (Identity):", np.linalg.matrix_rank(M1))
print("Rank of M2 (Singular):", np.linalg.matrix_rank(M2))
print("Rank of M3 (Rectangular):", np.linalg.matrix_rank(M3))

### **Exercise 3: Conceptual Question**

Briefly explain in your own words the relationship between the **rank of a matrix** and the number of **linearly independent rows/columns** in that matrix.

*Write your answer in the markdown cell below.*

---

*(Your answer for Exercise 3 here)*

The rank of a matrix is exactly the number of linearly independent rows or columns in that matrix. If a `5 x 4` matrix has a rank of 2, it means that even though it has 4 columns and 5 rows, the 'true'dimensionality of the data it represents is only 2. You can choose 2 of its columns (or rows) that are linearly independent, and all other columns (or rows) can be constructed as linear combinations of those 2.

---


**Solution:**

The rank of a matrix is *exactly* the number of linearly independent rows or columns in that matrix. If a `5 x 4` matrix has a rank of 3, it means that even though it has 4 columns and 5 rows, the 'true' dimensionality of the data it represents is only 3. You can choose 3 of its columns (or rows) that are linearly independent, and all other columns (or rows) can be constructed as linear combinations of those 3.

## 3. ML Connection: Data Dimensionality and Redundancy

Rank is a fundamental concept for understanding your data in machine learning.

-   **Intrinsic Dimensionality:** Imagine you have a dataset with 50 features, but the rank of the feature matrix is only 10. This tells you that the 'intrinsic' or 'true' dimensionality of your data is much lower. The data points, despite living in a 50-dimensional space, actually lie on a 10-dimensional subspace.

-   **Data Redundancy:** A rank-deficient feature matrix means you have redundant features. For example, if you have features for 'height in feet' and 'height in meters', one is a simple linear combination of the other. They are linearly dependent, and the second feature adds no new information to the model. This is called **collinearity** (or multicollinearity).

-   **Model Stability:** In models like Linear Regression, a feature matrix that is not full rank (i.e., has collinear features) can make the model unstable and the results unreliable. Techniques like PCA (which we've seen) are used to address this by projecting the data onto a lower-dimensional, full-rank subspace.

## Day 9 Summary and Key Takeaways

You've made it through some of the most abstract concepts in the course. Well done!

Here's what we covered:
-   A set of vectors is **linearly independent** if none are redundant.
-   A **basis** is a minimal set of linearly independent vectors that spans a space.
-   The **rank** of a matrix is the number of linearly independent columns (or rows) it contains.
-   We use **`np.linalg.matrix_rank()`** to easily find the rank and check for linear independence.
-   In Machine Learning, rank helps us understand the **true dimensionality** of our data and identify **redundant features** (collinearity), which can be crucial for building robust models.

Tomorrow, we'll put all these concepts together in a final day of applications and review, including a from-scratch implementation of Linear Regression.