Q1. What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its 
application.
Ans:-Min-Max scaling, also known as min-max normalization or feature scaling, is a data preprocessing technique used to scale numeric features in a way that transforms them to a specific range, typically between 0 and 1. The purpose of Min-Max scaling is to ensure that all features contribute equally to the analysis and to prevent features with larger scales from dominating the learning process in machine learning algorithms.
Example:

Suppose you have a dataset with a feature, "House Area," representing the size of houses in square feet. The "House Area" feature has values ranging from 800 to 2000 square feet. You want to apply Min-Max scaling to bring these values into a standardized range.

Q2. What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling? 
Provide an example to illustrate its application.
Ans:-The Unit Vector technique, also known as vector normalization or feature scaling, is a method used to scale numeric features by dividing each data point by the Euclidean norm (magnitude) of the entire feature vector. This process transforms the features into a unit vector, meaning that the magnitude of the vector becomes 1. Unit Vector scaling is particularly useful when the direction of the feature vector is more important than its magnitude.
Example:

Suppose you have a dataset with two features, "House Area" and "Number of Bedrooms." The feature vector for each data point is
2
]
[800,2] for "House Area" and "Number of Bedrooms," respectively.

Q3. What is PCA (Principle Component Analysis), and how is it used in dimensionality reduction? Provide an 
example to illustrate its application.
Ans:-Principal Component Analysis (PCA) is a dimensionality reduction technique commonly used in machine learning and statistics. The primary goal of PCA is to transform high-dimensional data into a new coordinate system (principal components) where the data's variability is maximized along the axes. This allows for the reduction of the number of features (dimensions) while retaining as much of the original data's information as possible.

The steps involved in PCA are as follow:

Standardize the Dta:

Standardize the features to have zero mean and unit variance. This step ensures that all features contribute equally to the analysis.
Calculate the Covariance atrix:

Compute the covariance matrix for the standardized data. The covariance matrix represents the relationships between different features.
Calculate Eigenvectors and Eienvalues:

Find the eigenvectors and eigenvalues of the covariance matrix. Eigenvectors represent the directions of maximum variance, while eigenvalues indicate the magnitude of variance in each direction.
Sort Eigenvectors byEigenvalues:

Sort the eigenvectors based on their corresponding eigenvalues in descending order. The eigenvector with the highest eigenvalue represents the principal component with the highest variance.
Select Princpal Components:oose the top 
�
k eigenvectors to form a new matrix called the projection matrix. This matrix is used to transform the original data into a lower-dimensional space.

Q4. What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature 
Extraction? Provide an example to illustrate this concept.
Ans:-PCA (Principal Component Analysis) and feature extraction are closely related concepts. PCA is a specific technique for feature extraction, which involves transforming the original features of a dataset into a new set of features (principal components) that capture the most significant information in the data. Feature extraction, in general, refers to methods that transform raw data into a reduced and more informative representation.

The relationship between PCA and feature extraction can be summarized as follow:

Dimensionality Reducton:

Both PCA and feature extraction aim to reduce the dimensionality of the data. By selecting a subset of the most relevant features or by creating new features (principal components), the goal is to represent the data in a lower-dimensional space while preserving as much of the original information as possible.
Information Retntion:

PCA is designed to maximize the variance captured by the principal components. In the context of feature extraction, the emphasis is on retaining the most informative aspects of the data, discarding less critical information to simplify the representation.
Orthogonal Transormation:

PCA performs an orthogonal transformation to convert the original correlated features into a set of linearly uncorrelated principal components. This transformation facilitates the removal of redundant information and helps identify patterns in the data.

In [None]:
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Sample data
data = [[160, 55, 25],
        [165, 60, 30],
        [170, 65, 35],
        [175, 70, 40],
        [180, 75, 45]]

# Standardize the data
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)

# Apply PCA
pca = PCA(n_components=2)  # Choose the number of components
principal_components = pca.fit_transform(scaled_data)

# Display the transformed data
print("Transformed Data (Principal Components):")
print(principal_components)


Q5. You are working on a project to build a recommendation system for a food delivery service. The dataset 
contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to 
preprocess the data.
Ans:-In the context of building a recommendation system for a food delivery service, Min-Max scaling can be used to preprocess the data, ensuring that the features are on a similar scale. This is particularly important when features have different units or ranges, as it helps prevent certain features from dominating the recommendation process based on their numerical scale. Here's how you can use Min-Max scaling for preprocessing:

Identify Feature:

Identify the relevant features in your dataset that you want to include in the recommendation system. These could be features such as price, rating, delivery time, and any other attributes that may influence recommendations.
Understand Feature Rages:

Examine the ranges of each feature. For example, prices may range from low to high values, ratings may range from 1 to 5, and delivery time may be measured in minutes.

In [None]:
Original Dataset:
Price | Rating | Delivery Time
------|--------|---------------
10    | 4.5    | 30
20    | 3.0    | 45
15    | 4.8    | 25

Scaled Dataset:
Price | Rating | Delivery Time
------|--------|---------------
0.25  | 0.75   | 0.5
0.75  | 0.0    | 1.0
0.5   | 1.0    | 0.0


Q6. You are working on a project to build a model to predict stock prices. The dataset contains many 
features, such as company financial data and market trends. Explain how you would use PCA to reduce the 
dimensionality of the dataset.
Ans:-When dealing with a large number of features in a dataset for predicting stock prices, Principal Component Analysis (PCA) can be a valuable tool for reducing dimensionality while preserving the most important information. Here's a step-by-step explanation of how you would use PCA for dimensionality reduction in the context of predicting stock prices:

Understand the Datase:

Start by understanding the structure of your dataset. Identify the features related to company financial data, market trends, and any other relevant information that may impact stock prices.
Standardize the ata:

Standardize the dataset by ensuring that all features have zero mean and unit variance. This step is crucial for PCA since it is sensitive to the scale of the features.
Aply PCA:

Use PCA to calculate the principal components of the standardized dataset. PCA will transform the original features into a set of linearly uncorrelated principal components, ordered by the amount of variance they capture.
Determine the Number of omponents:

Evaluate the explained variance ratio for each principal component. The explained variance ratio indicates the proportion of the total variance in the data explained by each component. Decide on the number of components to retain based on how much variance you want to preserve. You can plot the cumulative explained variance to help make this decision.

In [None]:
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

# Assume X is your standardized dataset
pca = PCA()
pca.fit(X)

# Plot cumulative explained variance
plt.plot(range(1, len(pca.explained_variance_ratio_) + 1), 
         np.cumsum(pca.explained_variance_ratio_))
plt.xlabel('Number of Components')
plt.ylabel('Cumulative Explained Variance')
plt.show()


Q7. For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the 
values to a range of -1 to 1.