Q1. What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its
application.
Q2. What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling?
Provide an example to illustrate its application.
Q3. What is PCA (Principle Component Analysis), and how is it used in dimensionality reduction? Provide an
example to illustrate its application.
Q4. What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature
Extraction? Provide an example to illustrate this concept.
Q5. You are working on a project to build a recommendation system for a food delivery service. The dataset
contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to
preprocess the data.
Q6. You are working on a project to build a model to predict stock prices. The dataset contains many
features, such as company financial data and market trends. Explain how you would use PCA to reduce the
dimensionality of the dataset.
Q7. For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the
values to a range of -1 to 1.
Q8. For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform
Feature Extraction using PCA. How many principal components would you choose to retain, and why?

Q1. Min-Max scaling is a data preprocessing technique used to scale numeric features to a specific range, typically between 0 and 1. It works by subtracting the minimum value from each observation and then dividing by the range (the maximum value minus the minimum value). This ensures that all features are on the same scale, which can be important for certain algorithms that are sensitive to feature scales, such as neural networks and support vector machines.

Example:
Suppose you have a dataset with a feature "age" ranging from 20 to 60 years. After applying Min-Max scaling, the feature values will be transformed to a range between 0 and 1, preserving the relative differences between the values.

Q2. The Unit Vector technique, also known as normalization, scales each feature independently to have a unit norm (length 1). Unlike Min-Max scaling, which transforms values to a specific range, normalization focuses on the direction of the data points in feature space rather than their magnitude.

Example:
Consider a dataset with two features, "height" and "weight." After applying Unit Vector scaling, each data point's feature vector will be scaled such that its Euclidean length (norm) is 1, while the direction of the vector remains unchanged.

Q3. PCA (Principal Component Analysis) is a dimensionality reduction technique used to identify patterns in high-dimensional data and express them in terms of new, orthogonal variables called principal components. It does this by transforming the original variables into a new set of variables, which are linear combinations of the original variables, ordered by the amount of variance they explain.

Example:
In a dataset with multiple correlated features such as height, weight, and age, PCA can identify the principal components that capture the most significant variations in the data, allowing for dimensionality reduction while preserving most of the variability.

Q4. PCA can be used for feature extraction by transforming the original features into a smaller set of principal components that retain most of the variability in the data. These principal components can then be used as features in machine learning models.

Example:
Suppose you have a dataset with multiple features representing different aspects of a car (e.g., horsepower, engine displacement, weight). By applying PCA, you can extract principal components that represent combinations of these features, such as overall performance or efficiency, which can be more informative for predicting car prices.

Q5. In the food delivery service recommendation system project, Min-Max scaling can be used to preprocess features like price, rating, and delivery time. By scaling these features to a range between 0 and 1, you ensure that they are on the same scale, preventing certain features from dominating others in the modeling process due to their larger magnitude.

Q6. In the stock price prediction project, PCA can be used to reduce the dimensionality of the dataset containing various features like company financial data and market trends. By identifying the principal components that capture the most significant variations in the data, PCA allows for a more compact representation of the information while retaining most of its variability, which can improve the efficiency of the predictive model.

In [1]:
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.decomposition import PCA

# Q7: Perform Min-Max scaling
data = np.array([1, 5, 10, 15, 20]).reshape(-1, 1)
scaler = MinMaxScaler(feature_range=(-1, 1))
scaled_data = scaler.fit_transform(data)
print("Scaled Values:", scaled_data.flatten())

# Q8: Perform Feature Extraction using PCA
# Assuming you have a dataset with features height, weight, age, gender, blood pressure
# Create a sample dataset
dataset = np.array([
    [170, 65, 30, 1, 120],
    [175, 70, 35, 0, 130],
    [160, 55, 25, 1, 110],
    [180, 80, 40, 1, 140],
    [165, 60, 28, 0, 125]
])

# Standardize the data
mean = np.mean(dataset, axis=0)
std_dev = np.std(dataset, axis=0)
standardized_data = (dataset - mean) / std_dev

# Perform PCA
pca = PCA()
pca.fit(standardized_data)

# Determine the number of principal components to retain
variance_ratio = pca.explained_variance_ratio_
cumulative_variance_ratio = np.cumsum(variance_ratio)
num_components = np.argmax(cumulative_variance_ratio >= 0.95) + 1  # Retain components explaining 95% variance

# Perform PCA with chosen number of components
pca = PCA(n_components=num_components)
principal_components = pca.fit_transform(standardized_data)

print("Number of Principal Components:", num_components)
print("Principal Components:")
print(principal_components)

Scaled Values: [-1.         -0.57894737 -0.05263158  0.47368421  1.        ]
Number of Principal Components: 2
Principal Components:
[[-0.46777446 -0.85704616]
 [ 1.17551445  1.17162415]
 [-2.72712685 -0.80082314]
 [ 3.04878143 -0.84880437]
 [-1.02939457  1.33504952]]
