## Q1)

Min-Max scaling, also known as normalization, is a data preprocessing technique used to scale numeric features in a specific range, usually between 0 and 1. The purpose of Min-Max scaling is to ensure that all the features have the same scale, preventing certain features from dominating others simply because of their larger magnitude.

In [1]:
import numpy as np

In [2]:
data = np.array([2,5,10,15,20])

In [3]:
data

array([ 2,  5, 10, 15, 20])

In [5]:
min_value = np.min(data)

In [6]:
max_value = np.max(data)

In [8]:
scaled_data = (data - min_value)/(max_value - min_value)

In [9]:
print("original data:",data)

original data: [ 2  5 10 15 20]


In [10]:
print("min-max scaled data:", scaled_data)

min-max scaled data: [0.         0.16666667 0.44444444 0.72222222 1.        ]


## Q(2)

The Unit Vector technique in feature scaling is also known as vector normalization or L2 normalization. It involves scaling individual samples to have a norm (length or magnitude) of 1. The purpose is to ensure that all samples have the same scale while preserving the direction of the data. 

The key difference between Min-Max scaling and Unit Vector scaling lies in the transformation applied. Min-Max scaling scales the data to a specific range (e.g., between 0 and 1), while Unit Vector scaling normalizes the data such that each sample becomes a vector with a length of 1.

In [12]:
import numpy as np

In [13]:
data = np.array([2,5,10,15,20])

In [14]:
vector_norm = np.linalg.norm(data)

In [16]:
normalized_data = data/vector_norm

In [17]:
print("Original data:", data)
print("Unit Vector scaled data:", normalized_data)


Original data: [ 2  5 10 15 20]
Unit Vector scaled data: [0.0728357  0.18208926 0.36417852 0.54626778 0.72835704]


## Q(3)

Principal Component Analysis (PCA) is a dimensionality reduction technique commonly used in the field of machine learning and statistics. Its primary goal is to transform high-dimensional data into a lower-dimensional representation, capturing the most significant variance in the data. This is achieved by identifying the principal components, which are linear combinations of the original features.

Here's a step-by-step explanation of how PCA works and an example to illustrate its application:

### How PCA Works:
1. **Standardization:**
   - If the features have different scales, it's essential to standardize them (subtract the mean and divide by the standard deviation) to ensure that each feature contributes equally to the analysis.

2. **Covariance Matrix:**
   - Compute the covariance matrix of the standardized data. The covariance matrix provides information about the relationships between different features.

3. **Eigenvalue and Eigenvector Decomposition:**
   - Find the eigenvalues and eigenvectors of the covariance matrix. The eigenvectors represent the principal components, and the corresponding eigenvalues indicate the amount of variance explained by each principal component.

4. **Sort Eigenvectors by Eigenvalues:**
   - Arrange the eigenvectors in descending order based on their corresponding eigenvalues. The eigenvector with the highest eigenvalue is the first principal component, the second-highest is the second principal component, and so on.

5. **Select Principal Components:**
   - Choose the top k eigenvectors to form a new matrix (projection matrix) where k is the desired dimensionality of the reduced data.

6. **Transform the Data:**
   - Multiply the original standardized data by the projection matrix to obtain the lower-dimensional representation of the data.

### Example:

Let's consider a dataset with two features: height (in inches) and weight (in pounds) of a group of individuals. We want to reduce the dimensionality to one dimension using PCA.

1. **Standardize the Data:**
   - Subtract the mean and divide by the standard deviation for both height and weight.

2. **Covariance Matrix:**
   - Compute the covariance matrix based on the standardized data.

3. **Eigenvalue and Eigenvector Decomposition:**
   - Find the eigenvalues and eigenvectors of the covariance matrix.

4. **Sort Eigenvectors by Eigenvalues:**
   - Sort the eigenvectors in descending order.

5. **Select Principal Components:**
   - Choose the top eigenvector as the principal component.

6. **Transform the Data:**
   - Multiply the original standardized data by the selected eigenvector to obtain the one-dimensional representation.



## Q(4)

PCA (Principal Component Analysis) is closely related to feature extraction, and in many cases, PCA is used explicitly for feature extraction. Feature extraction involves transforming the original features of a dataset into a new set of features, usually with the goal of reducing dimensionality, capturing relevant information, or enhancing the performance of machine learning algorithms.

Relationship Between PCA and Feature Extraction:
Dimensionality Reduction:

PCA aims to reduce the dimensionality of the data by transforming it into a new set of features (principal components) that capture the most significant variance in the original data.
Decorrelation of Features:

PCA also has the property of decorrelating the features, meaning that the principal components are orthogonal (uncorrelated). This can be advantageous when dealing with multicollinearity in the original feature space.
Variance Retention:

The principal components are ordered by the amount of variance they explain. By selecting a subset of the top principal components, you can retain most of the important information in the data while discarding less important, noisy, or redundant features.

Standardize the Data:

Standardize the length, width, and height to ensure they have the same scale.
Apply PCA:

Use PCA to find the principal components of the standardized data. Let's say the first principal component captures 80% of the variance, and the second principal component captures 15% of the variance.
Select Principal Components:

Choose the top two principal components as the new features for the dataset. These components are linear combinations of the original length, width, and height.
Transform the Data:

Multiply the standardized data by the selected principal components to obtain the two-dimensional representation of the dataset.


## Q(5)

Min-Max scaling is a data preprocessing technique used to scale numerical features to a specific range, typically between 0 and 1. This normalization method is particularly useful when the features in a dataset have different scales, and it helps ensure that all features contribute equally to the analysis

In [2]:
from sklearn.preprocessing import MinMaxScaler
import pandas as pd


In [4]:
data = {
    'price': [10.0, 25.0, 15.0, 30.0],
    'rating': [4.2, 3.5, 4.8, 4.0],
    'delivery_time': [20, 40, 30, 25]
}

In [6]:
df = pd.DataFrame(data)

In [7]:
features_to_scale = ['price', 'rating', 'delivery_time']

In [8]:
scaler = MinMaxScaler()

In [9]:
df[features_to_scale] = scaler.fit_transform(df[features_to_scale])


In [10]:
print("Scaled Data:")
print(df)

Scaled Data:
   price    rating  delivery_time
0   0.00  0.538462           0.00
1   0.75  0.000000           1.00
2   0.25  1.000000           0.50
3   1.00  0.384615           0.25


## Q(6)

When dealing with a dataset with many features, such as company financial data and market trends for predicting stock prices, Principal Component Analysis (PCA) can be a valuable tool for reducing dimensionality. The primary objective is to transform the original features into a smaller set of uncorrelated variables (principal components) that retain most of the essential information in the data. Here's how you could use PCA for dimensionality reduction in the context of predicting stock prices:

Steps to Use PCA for Dimensionality Reduction:

Data Preprocessing:

Ensure that your dataset is prepared for analysis. Handle missing values, normalize or standardize the data if necessary, and address any other preprocessing steps.

Feature Standardization:

Standardize the features to ensure that they all have the same scale. This step is crucial for PCA since it is sensitive to the scale of the features. Standardization involves subtracting the mean and dividing by the standard deviation for each feature.

Apply PCA:

Use PCA to identify the principal components of the standardized data. The principal components are linear combinations of the original features, sorted by the amount of variance they explain.

Determine the Number of Components:

Examine the explained variance ratio to decide on the number of principal components to retain. The explained variance ratio indicates the proportion of the total variance in the data that is explained by each principal component. You can set a threshold (e.g., 95% of variance explained) to determine the number of components to keep.
Project Data onto Principal Components:

Transform the original dataset by projecting it onto the selected principal components. This results in a reduced-dimensional representation of the data.

Model Training:

Train your stock price prediction model on the dataset with reduced dimensionality. You can use various machine learning algorithms for regression or time series forecasting, depending on the nature of your prediction task.

## Q(7)

In [3]:
import numpy as np

In [4]:
original_data = np.array([1, 5, 10, 15, 20])

In [5]:
new_min = -1
new_max = 1


In [6]:
min_val = np.min(original_data)
max_val = np.max(original_data)
scaled_data = ((original_data - min_val) / (max_val - min_val)) * (new_max - new_min) + new_min

In [7]:
print("Original Data:", original_data)
print("Min-Max Scaled Data:", scaled_data)

Original Data: [ 1  5 10 15 20]
Min-Max Scaled Data: [-1.         -0.57894737 -0.05263158  0.47368421  1.        ]


## Q(8)


The decision on how many principal components to retain in a PCA-based feature extraction process depends on the amount of variance you want to preserve in the dataset. Typically, you aim to retain a sufficiently high percentage of the total variance to capture the essential information in the data. A common approach is to set a threshold, such as retaining 95% or 99% of the total variance.

Here's a general guide on how to determine the number of principal components to retain:

Compute the Explained Variance Ratio:

After applying PCA to your dataset, the explained_variance_ratio_ attribute of the PCA object will provide the proportion of the dataset's variance explained by each principal component.

Cumulative Explained Variance:

Calculate the cumulative explained variance by summing the explained variance ratios as you go through the principal components.

Choose the Number of Components:

Decide on the number of principal components to retain based on the cumulative explained variance. A common threshold is to retain enough components to reach a cumulative explained variance of 95% or 99%.