## Q1: What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its application.

Answer:

Min-Max scaling transforms features to a fixed range, usually [0, 1] or [-1, 1]. It preserves the relationships between the original data but changes the scale of the features.

In [3]:
from sklearn.preprocessing import MinMaxScaler
import numpy as np

data = np.array([[10], [20], [30], [40], [50]])
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(data)
print(scaled_data)


[[0.  ]
 [0.25]
 [0.5 ]
 [0.75]
 [1.  ]]


## Q2: What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling? Provide an example.

Answer:

The Unit Vector technique (Normalization) scales data such that each data point (vector) has a length of 1. This is useful when the direction of the vector is more important than the magnitude.

Difference from Min-Max:

Min-Max: Scales values to a fixed range.

Unit Vector: Scales to unit norm (length = 1).

In [6]:
from sklearn.preprocessing import Normalizer

data = [[4, 3, 2]]
normalizer = Normalizer()
normalized = normalizer.fit_transform(data)
print(normalized)


[[0.74278135 0.55708601 0.37139068]]


## Q3: What is PCA (Principal Component Analysis), and how is it used in dimensionality reduction? Provide an example.

Answer:

PCA is a statistical method to reduce the dimensionality of data while retaining most of the variance. It converts correlated features into a set of linearly uncorrelated components.

Use Case:

Reduce computation time

Remove multicollinearity

Visualize high-dimensional data

In [9]:
from sklearn.decomposition import PCA
import numpy as np

X = np.array([[2.5, 2.4], [0.5, 0.7], [2.2, 2.9]])
pca = PCA(n_components=1)
X_reduced = pca.fit_transform(X)
print(X_reduced)


[[ 0.81517689]
 [-1.79187826]
 [ 0.97670137]]


## Q4: What is the relationship between PCA and Feature Extraction?

Answer:

PCA is a feature extraction method because it creates new features (principal components) that are combinations of the original features, capturing the most significant patterns in the data.

Example: Using PCA on [height, weight, age], we may extract new components like BodySizeScore.

## Q5: How to use Min-Max scaling for a food delivery dataset (price, rating, delivery time)?

In [13]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

df = pd.DataFrame({
    'price': [200, 150, 250],
    'rating': [4.5, 3.8, 4.2],
    'delivery_time': [30, 45, 20]
})

scaler = MinMaxScaler()
scaled = scaler.fit_transform(df)
scaled_df = pd.DataFrame(scaled, columns=df.columns)
print(scaled_df)


   price    rating  delivery_time
0    0.5  1.000000            0.4
1    0.0  0.000000            1.0
2    1.0  0.571429            0.0


## Q6: How to use PCA for a stock price prediction dataset?

In [19]:
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Assume `X` is your financial feature matrix
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

pca = PCA(n_components=0.95)  # retain 95% variance
X_reduced = pca.fit_transform(X_scaled)


## Q7: Min-Max scale [1, 5, 10, 15, 20] to range [-1, 1]

In [21]:
import numpy as np

data = np.array([[1], [5], [10], [15], [20]])
scaler = MinMaxScaler(feature_range=(-1, 1))
scaled_data = scaler.fit_transform(data)
print(scaled_data)


[[-1.        ]
 [-0.57894737]
 [-0.05263158]
 [ 0.47368421]
 [ 1.        ]]


## Q8: PCA on [height, weight, age, gender, blood pressure]

Answer:

Convert categorical (gender) to numerical.

Standardize the data.

Apply PCA.

Choose components with cumulative variance ≥ 95%.

In [24]:
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

# Dummy data (numeric only)
data = [[170, 70, 30, 1, 120], [160, 60, 25, 0, 115], [180, 80, 35, 1, 130]]
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)

pca = PCA()
pca.fit(data_scaled)
print("Explained Variance:", pca.explained_variance_ratio_)

# Select components where cumulative variance ≥ 95%


Explained Variance: [9.45613542e-01 5.43864583e-02 4.85764838e-33]
