## The objective of this notebook is to
* explain basic linear algebra concepts such as:
    - vector space
    - null space
    - rank
    - linearly dependent vectors
    - linearly independent vectors

* see examples of their use in data science eg. correlation analysis

## Vector Space

A vector space is a mathematical structure that consists of a set of vectors on which two operations, vector addition and scalar multiplication, are defined. These operations satisfy certain properties. In a vector space, vectors can be added together and multiplied by scalars to produce new vectors that still belong to the same vector space.

Here's an example of a vector space in Python:

In [None]:
# In this code, we define vector addition using the np.add() function from the NumPy library
# and scalar multiplication using the np.multiply() function.
# We then demonstrate the usage of these operations with two example vectors.

import numpy as np

# Define vector addition
def vector_addition(v1, v2):
    return np.add(v1, v2)

# Define scalar multiplication
def scalar_multiplication(scalar, vector):
    return np.multiply(scalar, vector)

# Example usage
v1 = np.array([1, 2, 3])
v2 = np.array([4, 5, 6])

# Vector addition
result_addition = vector_addition(v1, v2)
print("Vector addition:", result_addition)

# Scalar multiplication
scalar = 2
result_multiplication = scalar_multiplication(scalar, v1)
print("Scalar multiplication:", result_multiplication)


### Null Space
The null space of a matrix A, denoted as Null(A), is the set of all vectors x such that Ax = 0, where 0 represents the zero vector. In other words, it is the set of vectors that get mapped to zero by the linear transformation defined by the matrix A.

### Rank:
The rank of a matrix A is the maximum number of linearly independent columns (or rows) in the matrix. It provides a measure of the dimensionality of the vector space spanned by the columns (or rows) of the matrix.

### Linearly Dependent:
A set of vectors is said to be linearly dependent if at least one of the vectors in the set can be expressed as a linear combination of the other vectors. In other words, if there exist scalars (not all zero) such that their linear combination gives the zero vector.

###Linearly Independent:
A set of vectors is said to be linearly independent if no vector in the set can be expressed as a linear combination of the other vectors. In other words, if the only scalars that can make a linear combination of the vectors equal to the zero vector are all zero.

Here's an example that demonstrates these concepts using NumPy:

In [None]:
import numpy as np

# Example matrix
A = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

# Null space
_, _, V = np.linalg.svd(A)
null_space = V.T[:, np.where(np.isclose(_, 0))]
print("Null space:", null_space)

# Rank
rank = np.linalg.matrix_rank(A)
print("Rank:", rank)

# Linearly dependent vectors
v1 = np.array([1, 2, 3])
v2 = np.array([2, 4, 6])
v3 = np.array([3, 6, 9])
vectors_dependent = [v1, v2, v3]
dependent_check = np.linalg.det(np.array(vectors_dependent))
print("Linearly dependent check:", dependent_check == 0)

# Linearly independent vectors
v4 = np.array([1, 0, 0])
v5 = np.array([0, 1, 0])
v6 = np.array([0, 0, 1])
vectors_independent = [v4, v5, v6]
independent_check = np.linalg.det(np.array(vectors_independent))
print("Linearly independent check:", independent_check != 0)


####Example
One common data science problem where the concepts of rank, linearly dependent, and linearly independent vectors are involved is in feature selection and dimensionality reduction. For example,
* The concepts of rank and linear independence can help identify redundant or highly correlated features for removal.
* High-dimensional datasets with more features than samples can lead to overfitting issues. Linearly dependent features introduce multicollinearity, which can affect model performance and interpretability. Identifying and removing linearly dependent features helps mitigate overfitting.

Let's consider a dataset with multiple features, and we want to identify the most important and independent features for a given task. We can use the concept of rank and linearly dependent/independent vectors to determine which features are redundant or highly correlated.

Here's an example code that demonstrates feature selection using rank and linearly independent vectors:

In [None]:
import numpy as np
from sklearn.datasets import make_regression

# Generate a synthetic dataset
X, y = make_regression(n_samples=100, n_features=5, random_state=42)

# Compute the correlation matrix
corr_matrix = np.corrcoef(X, rowvar=False)

# Compute the rank of the correlation matrix
rank = np.linalg.matrix_rank(corr_matrix)
print("Rank:", rank)

# Find linearly dependent features
dependent_features = []  # Create an empty list to store the pairs of dependent features

for i in range(X.shape[1]):  # Iterate over the range of number of features (columns) in X
    for j in range(i + 1, X.shape[1]):  # Iterate over the range starting from i+1 to avoid redundant pairs
        if np.abs(corr_matrix[i, j]) > 0.8:  # Check if the absolute correlation between features i and j exceeds the threshold of 0.8
            dependent_features.append((i, j))  # If the correlation condition is met, add the pair (i, j) to the dependent_features list
if dependent_features:
    print("Linearly dependent features:")
    for feature_pair in dependent_features:
        print("Feature", feature_pair[0], "and Feature", feature_pair[1])

# Find linearly independent features
independent_features = [i for i in range(X.shape[1]) if all(i not in pair for pair in dependent_features)]

# Above line creates a list of linearly independent features by iterating over the range of feature indices (range(X.shape[1])).
# For each index i, it checks if i is not present in any of the dependent feature pairs (pair) using the all() function.
# If i is not found in any of the pairs, it is considered as a linearly independent feature, and it is added to the independent_features list.

print("Linearly independent features:", independent_features)
