
### Q1. What is Min-Max Scaling, and How is it Used in Data Preprocessing? Provide an Example to Illustrate Its Application.

**Min-Max Scaling**:
- **Definition**: Min-Max scaling transforms features by scaling them to a specified range, typically between 0 and 1.
- **Formula**: 
  \[
  X_{\text{scaled}} = \frac{X - X_{\text{min}}}{X_{\text{max}} - X_{\text{min}}}
  \]
  where \( X_{\text{min}} \) and \( X_{\text{max}} \) are the minimum and maximum values of \( X \), respectively.

**Example**:
- **Dataset**: Suppose we have a dataset with feature \( \text{price} \) ranging from 50 to 500.
- **Min-Max Scaling**: Scale the feature \( \text{price} \) to the range [0, 1].
  ```python
  from sklearn.preprocessing import MinMaxScaler
  import numpy as np
  
  # Example dataset
  data = np.array([50, 150, 250, 350, 500]).reshape(-1, 1)
  
  # Min-Max scaling
  scaler = MinMaxScaler(feature_range=(0, 1))
  data_scaled = scaler.fit_transform(data)
  
  print(data_scaled)
  ```

### Q2. What is the Unit Vector Technique in Feature Scaling, and How Does It Differ from Min-Max Scaling? Provide an Example to Illustrate Its Application.

**Unit Vector Technique**:
- **Definition**: Unit vector scaling scales each feature to have unit norm (length 1).
- **Formula**: Each feature \( X_i \) is divided by its Euclidean norm \( \|X\| \).
- **Difference**: Unlike Min-Max scaling, unit vector scaling does not bound features to a specific range but ensures each feature vector has a length of 1.

**Example**:
- **Dataset**: Suppose we have a dataset with two features [3, 4] and [1, 2].
- **Unit Vector Scaling**: Scale each feature vector to have unit norm.
  ```python
  from sklearn.preprocessing import Normalizer
  import numpy as np
  
  # Example dataset
  data = np.array([[3, 4], [1, 2]])
  
  # Unit vector scaling
  scaler = Normalizer(norm='l2')
  data_scaled = scaler.fit_transform(data)
  
  print(data_scaled)
  ```

### Q3. What is PCA (Principal Component Analysis), and How is it Used in Dimensionality Reduction? Provide an Example to Illustrate Its Application.

**PCA (Principal Component Analysis)**:
- **Definition**: PCA is a dimensionality reduction technique that transforms correlated variables into a set of linearly uncorrelated components called principal components.
- **Usage**: It reduces the number of dimensions (features) in a dataset while retaining as much variance as possible.

**Example**:
- **Dataset**: Suppose we have a dataset with multiple correlated features.
- **PCA Application**: Apply PCA to reduce the dimensionality.
  ```python
  from sklearn.decomposition import PCA
  import numpy as np
  
  # Example dataset
  np.random.seed(0)
  data = np.random.rand(5, 3)  # 5 samples, 3 features
  
  # PCA dimensionality reduction
  pca = PCA(n_components=2)
  data_reduced = pca.fit_transform(data)
  
  print(data_reduced)
  ```

### Q4. What is the Relationship Between PCA and Feature Extraction, and How Can PCA be Used for Feature Extraction? Provide an Example to Illustrate This Concept.

**PCA for Feature Extraction**:
- **Relationship**: PCA extracts new features (principal components) that are linear combinations of the original features.
- **Usage**: It helps in summarizing and selecting the most important features that capture the maximum variance in the data.

**Example**:
- **Dataset**: Suppose we have a dataset with features [height, weight, age].
- **PCA for Feature Extraction**: Extract principal components.
  ```python
  from sklearn.decomposition import PCA
  import numpy as np
  
  # Example dataset
  np.random.seed(0)
  data = np.random.rand(5, 3)  # 5 samples, 3 features
  
  # PCA for feature extraction
  pca = PCA(n_components=2)
  data_transformed = pca.fit_transform(data)
  
  print(data_transformed)
  ```

### Q5. You Are Working on a Project to Build a Recommendation System for a Food Delivery Service. Explain How You Would Use Min-Max Scaling to Preprocess the Data.

**Using Min-Max Scaling for Food Delivery Recommendation System**:
- **Dataset**: Features like price, rating, and delivery time.
- **Min-Max Scaling**: Normalize each feature to a range [0, 1] to ensure all features contribute equally without biases due to different scales.
- **Implementation**: Use Min-Max scaling on features like price, rating, and delivery time before feeding them into the recommendation system model.

### Q6. You Are Working on a Project to Build a Model to Predict Stock Prices. Explain How You Would Use PCA to Reduce the Dimensionality of the Dataset.

**Using PCA for Stock Price Prediction**:
- **Dataset**: Contains many features related to company financial data and market trends.
- **PCA Usage**: Apply PCA to reduce the dimensionality, focusing on capturing the most significant variations among features.
- **Implementation**: Compute principal components from the dataset, retaining components that explain most of the variance, and use them as input features for the stock price prediction model.

### Q7. For a Dataset Containing the Values: [1, 5, 10, 15, 20], Perform Min-Max Scaling to Transform the Values to a Range of -1 to 1.

**Min-Max Scaling Example**:
- **Dataset**: [1, 5, 10, 15, 20]
- **Min-Max Scaling**: Scale values to range [-1, 1].
  ```python
  import numpy as np
  
  data = np.array([1, 5, 10, 15, 20]).reshape(-1, 1)
  min_value = -1
  max_value = 1
  
  scaled_data = min_value + (data - data.min()) * (max_value - min_value) / (data.max() - data.min())
  
  print(scaled_data)
  ```

### Q8. For a Dataset Containing the Features: [height, weight, age, gender, blood pressure], Perform Feature Extraction Using PCA. How Many Principal Components Would You Choose to Retain, and Why?

**PCA for Feature Extraction**:
- **Number of Principal Components**: Determine the number based on the explained variance ratio (typically aim to retain components explaining a high percentage of variance, e.g., 95%).
- **Example**: If PCA reveals that the first two principal components explain 95% of the variance, then retain these two components for dimensionality reduction.

```python
from sklearn.decomposition import PCA
import numpy as np

# Example dataset
np.random.seed(0)
data = np.random.rand(10, 5)  # 10 samples, 5 features

# PCA for feature extraction
pca = PCA()
pca.fit(data)

# Determine number of components to retain
explained_variance_ratio = pca.explained_variance_ratio_
cumulative_explained_variance = np.cumsum(explained_variance_ratio)
n_components = np.argmax(cumulative_explained_variance >= 0.95) + 1

print(f"Number of principal components to retain: {n_components}")
```

In practice, the choice of the number of principal components to retain depends on the trade-off between reducing dimensionality and retaining sufficient variance explained by the data.