# Lesson 5: Eigenvalues, Eigenvectors, and Advanced Linear Algebra Concepts

In [23]:
import numpy as np

### Why eigenvalues are imp in ML?
* Dimensionality Reduction: Eigenvalues help in reducing the number of features (dimensions) in the dataset (e.g., Principal Component Analysis or PCA), which simplifies the model and speeds up training.

* Understanding Data Structure: Eigenvalues help to understand the spread and orientation of data, aiding in better data preprocessing and visualization.

* Stability of Models: In some models, eigenvalues indicate the stability of the model's behavior. For example, large or small eigenvalues can tell you if the model is overfitting or underfitting.



In [24]:
# Lets define a square matrix
A = np.array([[4,2],[1,3]])
eigenvalues, eigenvectors = np.linalg.eig(A)
print("EigenValues : ",eigenvalues)
print("EigenVectors : ",eigenvectors)
# np.linalg.eig() is a simple method to calculate eigenValues and eigenVectors

EigenValues :  [5. 2.]
EigenVectors :  [[ 0.89442719 -0.70710678]
 [ 0.4472136   0.70710678]]


In [25]:
# Revising singular value decomposition
# lets again define another matrix
B = np.array([[1,2,3],[4,5,6]])
U,sigma,VT = np.linalg.svd(B)
print("U : ",U)
print("Sigma : ",sigma)
print("VT : ",VT)

U :  [[-0.3863177  -0.92236578]
 [-0.92236578  0.3863177 ]]
Sigma :  [9.508032   0.77286964]
VT :  [[-0.42866713 -0.56630692 -0.7039467 ]
 [ 0.80596391  0.11238241 -0.58119908]
 [ 0.40824829 -0.81649658  0.40824829]]


In [26]:
# The QR decomposition can indeed be performed on non-square matrices
# We have to make sure to have more rows than columns m>n m×n:
# QR decomposition factors a matrix A into:
# A=Q.R
# since we are coding it, it's not handy to explain mathematically. We have to apply Gram-Schmidt orthogonality to find Q
# which takes much time to calculate with a pen and paper.
C = np.array([[1, 2],
              [3, 4],
              [5, 6]])
Q ,R = np.linalg.qr(C)
print("Q : ",Q)
print("R : ",R)
# where Q: Orthogonal matrix.
# 𝑅: Upper triangular matrix.

Q :  [[-0.16903085  0.89708523]
 [-0.50709255  0.27602622]
 [-0.84515425 -0.34503278]]
R :  [[-5.91607978 -7.43735744]
 [ 0.          0.82807867]]


In [27]:
# Cholesky Decomposition
# Define a symmetric positive-definite matrix
# Lets define what a positive-definite matrix is : A matrix 𝐴 is symmetric if it is equal to its transpose
# A matrix
# 𝐴 is positive-definite if: (𝑥𝑇)𝐴𝑥>0 for all nonzero vectors 𝑥


D = np.array([[4, 2],
              [2, 3]])
L = np.linalg.cholesky(D)

print("L:\n", L)
# It speeds up matrix inversion and log-determinant calculations, which are common in probabilistic models and regression.
#  Its numerical stability makes it a reliable choice in algorithms requiring precision.


L:
 [[2.         0.        ]
 [1.         1.41421356]]


In [28]:
E = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])
# Let's compute rank for this matrix
rank = np.linalg.matrix_rank(E)
print("Rank:", rank)

Rank: 2


In [29]:
F = np.array([[1, 2],
              [3, 4]])
# note a point that finding trace is only applicable to square matrices since it is the sum of all diagonal elements.
det = np.linalg.det(F)
trace = np.trace(F)
print("Determinant:", det)
print("Trace:", trace)
# In ML, calculating determinant is iseful to realise the issues in the model or in the data.
# Positive det: Stable, invertible, desirable in most algorithms.
# Negative det: Indicates problems like non-positive definiteness, leading to instability or errors in training.

Determinant: -2.0000000000000004
Trace: 5


In [30]:
# Define a matrix G (the one for which we want to find the dominant eigenvector)
G = np.array([[2, 1],
              [1, 3]])

# Initial guess for the eigenvector (can be any vector)
v = np.array([1, 1])

# Repeat the process of multiplying by the matrix and normalizing the vector
for _ in range(10):  # Repeat 10 times to get a more accurate result
    v = np.dot(G, v)  # Multiply matrix G with the vector v
    v = v / np.linalg.norm(v)  # Normalize the resulting vector to unit length

# Print the final dominant eigenvector
print("Dominant Eigenvector:", v)


Dominant Eigenvector: [0.52574439 0.8506426 ]


In [31]:

from scipy.linalg import expm

# Define a matrix (example: a 2x2 matrix)
A = np.array([[1, 2],
              [3, 4]])

# Calculate the matrix exponential of A
A_exp = expm(A)

# Print the result
print("Matrix Exponential of A:\n", A_exp)

# Goal: We want to compute the matrix exponential of a square matrix.
# This is a generalization of the exponential function for matrices.
# For a scalar 𝑥, 𝑒 power 𝑥 is a well-known function.
# For matrices, we define the matrix exponential similarly, but it is computed using a power series expansion.
# Why Matrix Exponentials are Useful in Machine Learning:

# They are used in many advanced machine learning techniques, such as:
# Solving differential equations that describe how a system evolves over time.
# Stochastic processes, like Markov Chains or state transitions.
# Continuous time models, such as in neural networks or reinforcement learning.

Matrix Exponential of A:
 [[ 51.9689562   74.73656457]
 [112.10484685 164.07380305]]


In [32]:
# Importing necessary libraries
import numpy as np
from scipy.linalg import null_space

# Null Space: The null space of a matrix is the set of vectors that, when multiplied by the matrix, produce a zero vector.
# Why is it useful?: The null space helps in understanding how many degrees of freedom exist when solving a system of equations.
# If the null space has non-zero vectors, the system has infinitely many solutions.

# Define a matrix (example: a 2x2 matrix)
A = np.array([[2, 4],
              [1, 2]])

# Calculate the null space of A
null_space_A = null_space(A)

# null space in ML helps in reducing the complexity of models
# and selecting features that matter, making the model more efficient and less prone to overfitting.

# Print the result
print("Null Space of A:\n", null_space_A)


Null Space of A:
 [[-0.89442719]
 [ 0.4472136 ]]


In [33]:
# Condition Number in Machine Learning

# 1. Import required libraries
import numpy as np

# 2. Define a matrix to work with
A = np.array([[1, 2],
              [3, 4]])

# 3. Calculate the condition number of the matrix using np.linalg.cond()
condition_number = np.linalg.cond(A)

# 4. Display the condition number
print("Condition Number of A:", condition_number)

# Explanation of the condition number
#
# - The condition number of a matrix tells us how sensitive the solution of a system of linear equations is to small changes in the input.
# - If the condition number is high, the matrix is considered ill-conditioned, and small changes in the input can lead to large changes in the output.
# - In machine learning, high condition numbers in models can cause instability, especially when dealing with noisy data.
# - To deal with ill-conditioned problems, we can use regularization techniques such as Ridge or Lasso regression to stabilize the model.



Condition Number of A: 14.933034373659268


**Normalizing Data (Min-Max Scaling using NumPy)**

(Min-Max Scaling):
Min-Max Scaling transforms data into a fixed range, typically [0, 1]. It works by subtracting the minimum value of the feature and dividing by the range (difference between the maximum and minimum).

forumula = Xscaled = (X-Xmin)/(Xmax-Xmin)

*In ML:*

* Consistent Range: It ensures that all features (input variables) are on the same scale, usually between 0 and 1. This makes it easier for ML algorithms to process the data effectively.

* Improves Convergence: Many algorithms, especially gradient-based methods (like gradient descent in linear regression, neural networks, etc.), perform better when the data is normalized because it speeds up convergence.

* Prevents Bias: If features have different ranges, algorithms may give more importance to the ones with larger values. Normalization ensures that no feature is biased due to its scale.


In [34]:
import numpy as np

# Sample data (could be a column in a dataset)
data = np.array([10, 20, 30, 40, 50])

# Step 1: Find the minimum and maximum values of the data
min_value = np.min(data)
max_value = np.max(data)

# Step 2: Apply Min-Max Scaling
scaled_data = (data - min_value) / (max_value - min_value)

# Display results
print("Original Data:", data)
print("Min Value:", min_value)
print("Max Value:", max_value)
print("Scaled Data:", scaled_data)


Original Data: [10 20 30 40 50]
Min Value: 10
Max Value: 50
Scaled Data: [0.   0.25 0.5  0.75 1.  ]


***Standardizing Data (Z-score Standardization)***

Standardization rescales data to have a mean of 0 and a standard deviation of 1. It's often used for algorithms like Logistic Regression, KNN, and SVM.

In [35]:
# Standardizing data (Z-score standardization)
data_standardized = (data - np.mean(data)) / np.std(data)

print("Original Data:", data)
print("Standardized Data:", data_standardized)


Original Data: [10 20 30 40 50]
Standardized Data: [-1.41421356 -0.70710678  0.          0.70710678  1.41421356]


***One-hot Encoding for Categorical Data***

One-hot encoding is used to convert categorical variables into a form that could be provided to machine learning algorithms to do a better job in prediction.

* Categories: We have a list of fruit categories that are not numeric.
Unique Categories: We find all the unique fruit types (like 'apple', 'banana', etc.).

* One-hot Encoding: We create a new array where each fruit category is represented as a vector with 1 in the position corresponding to that category and 0 elsewhere.

In [36]:
# Sample categorical data (e.g., fruit categories)
categories = ['apple', 'banana', 'apple', 'orange', 'banana']

# Step 1: Create a unique set of categories (features)
# This step finds all the unique categories in the data.
unique_categories = np.unique(categories)

# Step 2: Initialize an empty array for one-hot encoding
# This creates a 2D array filled with zeros, with one row for each sample and one column for each unique category.
one_hot_encoded = np.zeros((len(categories), len(unique_categories)))

# Step 3: Fill in the one-hot encoding
# This loop assigns a 1 in the corresponding column for each category in the original data.
for i, category in enumerate(categories):
    # np.where(unique_categories == category) finds the index of the category in the unique_categories array
    one_hot_encoded[i, np.where(unique_categories == category)[0]] = 1

# Print the results
print("Original Categorical Data:", categories)
print("One-hot Encoded Data:\n", one_hot_encoded)



Original Categorical Data: ['apple', 'banana', 'apple', 'orange', 'banana']
One-hot Encoded Data:
 [[1. 0. 0.]
 [0. 1. 0.]
 [1. 0. 0.]
 [0. 0. 1.]
 [0. 1. 0.]]


Most often we see nan values when training a model.
In machine learning, missing data is a common issue. One way to handle this is by setting missing values to np.nan (Not a Number).



In [37]:
# Sample data with missing values (np.nan)
data_with_missing = np.array([1, 2, np.nan, 4, 5])

# Masked data showing where values are missing (np.nan)
print("Data with Missing Values:", data_with_missing)


Data with Missing Values: [ 1.  2. nan  4.  5.]


***Filling Missing Values Using np.nan_to_num()***

After identifying missing values, one way to handle them is by replacing np.nan with a specified value, such as 0 or the mean of the dataset.



In [38]:
# Fill missing values (np.nan) with 0
data_filled = np.nan_to_num(data_with_missing, nan=0)

print("Data with Missing Values (filled):", data_filled)


Data with Missing Values (filled): [1. 2. 0. 4. 5.]


## Reshaping Data for Machine Learning


### Reshaping Data for Models (2D Arrays for Supervised Learning, 3D Arrays for CNNs)

* Machine learning algorithms, especially supervised learning models, typically require input data in a specific shape, often as a 2D array (samples × features).

* For supervised learning, data is typically in 2D (e.g., rows as data points and columns as features).

* For Convolutional Neural Networks (CNNs), data is often 3D, representing height, width, and color channels (e.g., for image processing).

In [39]:
# for Reshaping to 2D for Supervised Learning
# Sample 1D data (representing individual features of multiple data points)
data_points = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# Reshape to 2D array (5 samples with 2 features each)
reshaped_data = data_points.reshape(5, 2)

print("Original Data Shape:", data_points.shape)
print("Reshaped Data Shape:", reshaped_data.shape)


Original Data Shape: (10,)
Reshaped Data Shape: (5, 2)


###  Flattening 3D Arrays for Feeding into ML Models

In machine learning, especially with CNNs, data is often in 3D arrays (e.g., for image data). These 3D arrays are then flattened into 1D arrays to feed into dense layers of a neural network.

In [40]:
# Sample 3D data (representing an image with height=2, width=3, channels=2)
image_data = np.array([[[1, 2], [3, 4], [5, 6]],
                       [[7, 8], [9, 10], [11, 12]]])

# Flatten the 3D data to 1D for ML model input
flattened_data = image_data.flatten()

print("Original Data Shape:", image_data.shape)
print("Flattened Data Shape:", flattened_data.shape)


Original Data Shape: (2, 3, 2)
Flattened Data Shape: (12,)


***Binomial Distribution (np.random.binomial())***

The binomial distribution models the number of successes in a fixed number of independent trials, each with the same probability of success. It is commonly used for binary classification problems (like predicting if an email is spam or not).

In [41]:
import numpy as np

# Binomial distribution: 10 trials, 0.5 probability of success, 1000 samples
binomial_data = np.random.binomial(n=10, p=0.5, size=1000)

# First 10 results
print("First 10 Binomial Distribution samples:", binomial_data[:10])


First 10 Binomial Distribution samples: [4 5 3 9 5 7 5 3 4 8]


This code simulates 1000 samples of a binomial distribution, where each sample is the number of successes in 10 trials with a 50% chance of success on each trial.

***Poisson Distribution (np.random.poisson())***

The Poisson distribution models the number of events happening in a fixed interval of time or space, where the events occur with a known constant mean rate and independently of each other. It is used in ML for problems like predicting the number of arrivals at a service station.



In [42]:
# Poisson distribution: mean = 3 events per hour, 1000 samples
poisson_data = np.random.poisson(lam=3, size=1000)

# First 10 results
print("First 10 Poisson Distribution samples:", poisson_data[:10])


First 10 Poisson Distribution samples: [2 4 4 4 1 2 4 2 4 2]


This code simulates 1000 samples from a Poisson distribution, where the expected number of events in each interval is 3.


In case if you are not aware of my previous colab notebooks, go take a look [here](https://github.com/pranathi000/ML_libraries/tree/main/Numpy).