# Calculation of Eigenvalues of a Matrix

In this notebook, we will explore the concept of eigenvalues and eigenvectors, which are fundamental in many operations of linear algebra and are used in various applications, such as dimensionality reduction in Principal Component Analysis (PCA).

## Key Concepts

1. **Covariance Matrix:** This matrix is essential for identifying which variables are highly correlated in a dataset with many variables. If two variables are highly correlated, they provide the same information, so it doesn't make sense to use both.

2. **Eigenvalues and Eigenvectors:** Eigenvalues and eigenvectors are fundamental concepts in linear algebra. In Python, they can be calculated using the `np.linalg.eig()` function.

3. **Dimensionality Reduction:** Techniques like Principal Component Analysis (PCA) use eigenvalues and eigenvectors to reduce the dimensionality of data, which can be useful when working with high-dimensional datasets.

Let's dive into the practical application of these concepts.

In [None]:
# Importing necessary libraries
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris

# Load the iris dataset
iris = load_iris()
df = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
                  columns= iris['feature_names'] + ['target'])

# Display the first few rows of the dataframe
df.head()

We have loaded the Iris dataset, which is a multivariate dataset introduced by the British statistician and biologist Ronald Fisher. This dataset is commonly used in pattern recognition literature. It includes three iris species with 50 samples each as well as some properties about each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other.

The columns in this DataFrame are:

1. Sepal Length (cm)
2. Sepal Width (cm)
3. Petal Length (cm)
4. Petal Width (cm)
5. Target (Species)

Next, we will calculate the covariance matrix of the features in this dataset.

In [None]:
# Drop the target column as we only want to calculate covariance of the features
df_features = df.drop(columns=['target'])

# Calculate the covariance matrix
cov_matrix = df_features.cov()
cov_matrix

We have calculated the covariance matrix of the features in the Iris dataset. The covariance matrix provides us with a measure of how much each of the dimensions varies from the mean with respect to each other. The covariance matrix is a p x p matrix where each element represents the covariance between two features.

Next, we will calculate the eigenvalues and eigenvectors of this covariance matrix.

In [None]:
# Calculate the eigenvalues and eigenvectors of the covariance matrix
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

# Print the eigenvalues
print('Eigenvalues:\n', eigenvalues)

# Print the eigenvectors
print('\nEigenvectors:\n', eigenvectors)

Eigenvalues:
 [4.22824171 0.24267075 0.0782095  0.02383509]

Eigenvectors:
 [[ 0.36138659 -0.65658877 -0.58202985  0.31548719]
 [-0.08452251 -0.73016143  0.59791083 -0.3197231 ]
 [ 0.85667061  0.17337266  0.07623608 -0.47983899]
 [ 0.3582892   0.07548102  0.54583143  0.75365743]]


We have calculated the eigenvalues and eigenvectors of the covariance matrix. The eigenvalues represent the magnitude of the dimensions, while the eigenvectors represent the directions of the new dimensions when we transform the original four dimensions.

The eigenvalues are sorted in descending order, with the first eigenvalue being the largest. This eigenvalue corresponds to the direction along which the data varies the most.

In the context of Principal Component Analysis (PCA), the eigenvectors are called 'principal components', and the magnitude of the eigenvalues indicates the amount of variance accounted for by each principal component.

Next, we will transform the original data according to these eigenvalues and eigenvectors.

In [None]:
# Transform the original four dimensions into the new dimensions
transformed_data = df_features.dot(eigenvectors)

# Display the first few rows of the transformed data
transformed_data.head()

We have transformed the original data into the new dimensions defined by the eigenvectors. The transformed data now represents the original data in terms of the directions along which the data varies the most.

This transformation is the essence of Principal Component Analysis (PCA). By transforming the data in this way, we can reduce the dimensionality of the data while retaining as much of the variance in the data as possible.

In this notebook, we have explored the concept of eigenvalues and eigenvectors and their application in dimensionality reduction techniques like PCA. We have also seen how to calculate the covariance matrix, eigenvalues, and eigenvectors in Python and how to use them to transform a dataset.