# Q1. What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its application.

Min-Max scaling is a normalization technique that scales the data to a fixed range, typically [0, 1] or [-1, 1]. It is used to ensure that all features contribute equally to the model and to improve the convergence rate of some machine learning algorithms. 

Example:
If we have a dataset with values [10, 20, 30, 40, 50] and we want to scale it to a range of [0, 1], the formula is:
X_scaled = (X - X_min) / (X_max - X_min)


# Q2. What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling? Provide an example to illustrate its application.

The Unit Vector technique scales the feature vector to have a length of 1. This is different from Min-Max scaling, which scales each feature individually to a specific range.

Example:
If we have a vector [1, 2, 3], the unit vector is calculated as:
X_scaled = X / ||X||
where ||X|| is the Euclidean norm of X.


# Q3. What is PCA (Principal Component Analysis), and how is it used in dimensionality reduction? Provide an example to illustrate its application.

PCA is a statistical technique used to emphasize variation and capture strong patterns in a dataset. It is used in dimensionality reduction by transforming the data into a set of orthogonal components that capture the maximum variance.

Example:
Consider a dataset with features [x1, x2, x3]. PCA will find new features [PC1, PC2, PC3] such that PC1 captures the most variance, PC2 the second most, and so on.


# Q4. What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature Extraction? Provide an example to illustrate this concept.

PCA is a form of feature extraction where original features are transformed into a smaller set of new features that still capture most of the information. PCA can be used to reduce dimensionality while retaining significant information.

Example:
If we have a dataset with 100 features, PCA can reduce this to 10 principal components that still capture the majority of the variance.


# Q5. You are working on a project to build a recommendation system for a food delivery service. The dataset contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to preprocess the data.

To preprocess the data using Min-Max scaling, we would scale each feature to the range [0, 1]. This ensures that price, rating, and delivery time contribute equally to the model.

Example:
price_scaled = (price - price_min) / (price_max - price_min)
rating_scaled = (rating - rating_min) / (rating_max - rating_min)
delivery_time_scaled = (delivery_time - delivery_time_min) / (delivery_time_max - delivery_time_min)


# Q6. You are working on a project to build a model to predict stock prices. The dataset contains many features, such as company financial data and market trends. Explain how you would use PCA to reduce the dimensionality of the dataset.

To reduce dimensionality using PCA, we would first standardize the features, then apply PCA to transform the data into a set of principal components. We would select the number of principal components that capture the majority of the variance in the dataset.

Example:
pca = PCA(n_components=0.95)
X_reduced = pca.fit_transform(X)
This retains 95% of the variance in the data.


# Q7. For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the values to a range of -1 to 1.



In [1]:
import numpy as np

data = np.array([1, 5, 10, 15, 20])
data_min = np.min(data)
data_max = np.max(data)

data_scaled = 2 * (data - data_min) / (data_max - data_min) - 1
print(data_scaled)


[-1.         -0.57894737 -0.05263158  0.47368421  1.        ]


# Q8. For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform Feature Extraction using PCA. How many principal components would you choose to retain, and why?

To determine the number of principal components to retain, we would perform PCA and look at the explained variance ratio. We would retain enough components to capture a high percentage of the variance, typically 95%.

pca = PCA()
pca.fit(X)
explained_variance = np.cumsum(pca.explained_variance_ratio_)
n_components = np.where(explained_variance >= 0.95)[0][0] + 1

