## Q1. What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its application.

Min-Max scaling, also known as Min-Max normalization, is a data preprocessing technique used to scale numerical features within a specific range, typically between 0 and 1 or -1 and 1. This is done to improve the performance of machine learning algorithms that are sensitive to the scale of the input data.

Imagine you have a dataset with two features: age (ranging from 20 to 60 years) and annual income (ranging from $20,000 to $150,000). If you use these features directly in a machine learning model, the model might be biased towards the income feature due to its larger scale.

By applying Min-Max scaling:

Scaled age: (Age - 20) / (60 - 20) => Values will range from 0 to 1.
Scaled income: (Income - 20,000) / (150,000 - 20,000) => Values will range from 0 to 1.
Now, both features have the same scale (between 0 and 1), and the model won't be biased towards any specific feature based on its original scale.

Limitations to consider:

Sensitive to outliers: Outliers can significantly affect the minimum and maximum values, leading to distorted scaling. Consider handling outliers before applying Min-Max scaling.
Loss of information: The original data distribution might be lost during scaling, potentially impacting the interpretability of the results.

## Q2. What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling? Provide an example to illustrate its application.

The Unit Vector technique scales features by normalizing the entire data point to have a unit length (L2 norm). It transforms each data point into a unit vector on the hypersphere.

Differences from Min-Max scaling:

Range: Unit Vector scales to unit length, not a specific range like 0-1 in Min-Max.
Focus: Unit Vector normalizes the entire data point, while Min-Max scales individual features.
Example:

Consider a 2D point (3, 4). Its unit vector would be (0.6, 0.8). Both points represent the same direction but on different scales.

## Q3. What is PCA (Principle Component Analysis), and how is it used in dimensionality reduction? Provide an example to illustrate its application.

PCA (Principal Component Analysis) is a dimensionality reduction technique that identifies a new set of uncorrelated features, called principal components (PCs), that capture the most variance in the data. These PCs are ordered by their explained variance, allowing you to select the most informative ones while discarding less important features.

How it works:

Center the data: Subtract the mean from each feature.

Calculate the covariance matrix: This captures the linear relationships between features.

Eigenvalue decomposition: Decomposes the covariance matrix into eigenvectors (representing directions of greatest variance) and eigenvalues (indicating the variance explained by each direction).

Select PCs: Choose the top k eigenvectors corresponding to the highest eigenvalues, where k is the desired number of dimensions.

Project data: Transform the data onto the chosen principal components to obtain the reduced-dimensionality representation.

Example:

Imagine you have a dataset with features like height, weight, and arm span. PCA might reveal that the first principal component captures most of the variance, representing a combined effect of height and arm span. The second component might capture weight-related information. By selecting these top PCs, you can effectively reduce dimensionality while retaining the most important information.

## Q4. What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature Extraction? Provide an example to illustrate this concept.

Identifying uncorrelated features: These features, called principal components (PCs), capture the most variance in the data, representing the most informative directions.

Discarding less important information: By selecting only the top PCs, you can effectively reduce the dimensionality of the data while retaining the most important features for capturing the underlying structure and relationships.

Example:

Imagine you have a dataset with image features like pixel intensities. Applying PCA might reveal that the first few PCs capture most of the image's visual information (e.g., edges, shapes). By selecting these top PCs, you can extract the most relevant features for tasks like image recognition, discarding redundant or less informative pixel data.

## Q5. You are working on a project to build a recommendation system for a food delivery service. The dataset contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to preprocess the data.

Identify the minimum and maximum values for each feature (price, rating, delivery time) in your dataset.

For each data point (representing a food item):

Scale the price:
Subtract the minimum price from the original price.

Divide the result by the difference between the maximum and minimum prices.

Scale the rating:

Follow the same steps as for price, using the minimum and maximum rating values.

Scale the delivery time:

Follow the same steps as for price, using the minimum and maximum delivery times.

## Q6. You are working on a project to build a model to predict stock prices. The dataset contains many features, such as company financial data and market trends. Explain how you would use PCA to reduce the dimensionality of the dataset.

Preprocess the data: Handle missing values, outliers, and ensure features are on a similar scale (consider standardization if necessary).

Center the data: Subtract the mean from each feature to remove bias.

Calculate the covariance matrix: This captures the linear relationships between features.

Perform eigenvalue decomposition: Decompose the covariance matrix to obtain eigenvectors (representing directions of greatest variance) and 
eigenvalues (indicating the variance explained by each direction).

Select the top k principal components (PCs): Choose the PCs corresponding to the highest eigenvalues, representing the most significant directions of variance in the data. The number of PCs (k) can be determined based on a desired explained variance threshold or using techniques like the scree plot.

Project the data: Transform the original data points onto the chosen PCs to obtain the reduced-dimensionality representation. This new representation captures the most important information from the original features while discarding redundant or less informative dimensions.

## Q7. For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the values to a range of -1 to 1.

## Q8. For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform Feature Extraction using PCA. How many principal components would you choose to retain, and why?