![image.png](attachment:image.png)

**ANS 1**

1. Min Max Scaling: It's a technique that transforms features by scaling them to a specific range usually between 0 and 1.

2. Application: For example, when scaling a feature *x* from the range [a,b] to [c,d] the Min-Max scaling formula is:


![image.png](attachment:image.png)

**ANS 2**

1. Unit Vectore Technique: It scales each feature by dividing it by its magnitude, ensuring that the scaled feature lies within range of 0 and 1.

2. Difference: Unlike Min-Max scaling, the unit vectore technique does not have a specific range constraint.

3. Example: If a feature vector is [3, 4], the unit vector scaling would be [0.6, 0.8] 


![image.png](attachment:image.png)

**ANS 3** 


PCA: It's a technique used to reduce the dimensionality of a dataset by transforming features into a new set of orthogonal components (principal components) that capture the maximum variance.

Application: For instance, in a dataset with multiple correlated features, PCA can create new uncorrelated features capturing most of the data variance.


**ANS 4**

1. Relationship: PCA is a method of feature extraction, transforming existing features into a reduced set of principal components while retaining the most critical information.

2. Example: In a dataset with various correlated financial indicators, PCA can combine these features into principal components representing the most significant variations in the data.

**ANS 5**

Application: Normalize features like price, rating and delivery time to a common range (e.g. [0, 1]) using Min-Max scaling to ensure all features are on a similar scale.

**ANS 6**


Application: Apply PCA to reduce the dimensionality of features like financial data and market trends, retaining principal components that capture the most variance while reducing computational complexity.

**ANS 7**

In [1]:
import numpy as np

# Given dataset
data = np.array([1, 5, 10, 15, 20])

# Min-Max scaling to range [-1, 1]
min_val = data.min()
max_val = data.max()

# Apply Min-Max scaling
scaled_data = ((data - min_val) / (max_val - min_val)) * 2 - 1

print(scaled_data)



[-1.         -0.57894737 -0.05263158  0.47368421  1.        ]


**ANS 8**


The selection of the number of principal components to retain in PCA involves balancing the trade-off between reducing dimensionality and retaining enough information from the original features. There's no fixed rule for the exact number of components to retain, but it depends on the specific dataset, the variance explained by each component, and the requirements of the analysis or model.

To determine the number of principal components to retain:

1. Explained Variance Ratio: Compute the explained variance ratio for each principal component. This ratio indicates the proportion of variance explained by each component relative to the total variance in the dataset.

2. Cumulative Explained Variance: Calculate the cumulative explained variance as you consider more principal components. This cumulative variance helps understand how much information is retained when including additional components.

3. Elbow Method or Scree Plot: Plot the cumulative explained variance against the number of components. Look for the point where adding more components provides diminishing returns in terms of explained variance.

4. Threshold for Retention: Choose a threshold for the cumulative explained variance (e.g., retaining components explaining 95% or 99% of the variance) to strike a balance between dimensionality reduction and retaining information.



In [3]:
import numpy as np

# Generating synthetic data for demonstration purposes
np.random.seed(42)  # For reproducibility

# Creating synthetic dataset with features: [height, weight, age, gender, blood pressure]
num_samples = 1000
height = np.random.normal(170, 10, num_samples)
weight = np.random.normal(65, 12, num_samples)
age = np.random.randint(20, 60, num_samples)
gender = np.random.choice([0, 1], size=num_samples)  # Binary gender representation (0 or 1)
blood_pressure = np.random.normal(120, 10, num_samples)

# Creating the dataset
data = np.column_stack((height, weight, age, gender, blood_pressure))

# Assuming data preprocessing (scaling, normalization) is done before PCA
# For PCA, it's crucial to scale or normalize features

# Perform PCA for feature extraction
from sklearn.decomposition import PCA

pca = PCA()
pca.fit(data)

# Explained variance ratio of each principal component
explained_variance = pca.explained_variance_ratio_

# Cumulative explained variance
cumulative_variance = np.cumsum(explained_variance)

# Determine the number of components to retain (e.g., 95% variance threshold)
n_components = np.argmax(cumulative_variance >= 0.95) + 1  # Selecting components explaining 95% variance

print(f"Number of components to retain: {n_components}")


Number of components to retain: 4
