Q1. What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its 
application.

Ans :-
Min-Max scaling, also known as feature scaling or normalization, is a data preprocessing technique used to transform numerical features in a dataset to a specific range, typically between 0 and 1. This is achieved by subtracting the minimum value of the feature and then dividing by the range (difference between the maximum and minimum values). The purpose of Min-Max scaling is to ensure that all features have the same scale, which can be important for machine learning algorithms that rely on the magnitude of features.

The formula for Min-Max scaling is as follows:

X normalized = X - Xmin / Xmax - Xmin

Where:

-->X is the original value of the feature.

-->Xmin is the minimum value of the feature in the dataset.

-->Xmax is the maximum value of the feature in the dataset.

-->Xnormalized is the normalized/scaled value of the feature.

Example:

Let's say you have a dataset of ages with values ranging from 20 to 60. You want to apply Min-Max scaling to this dataset. Here's how you would do it:

-->Find the minimum and maximum values of the age feature :
   
   Xmin = 20
   
   Xmax = 60
   
-->Apply the Min-Max scaling formula to each age value in the dataset:

   For an age value of 30:
   
   Xnormalized = 30 - 20/60 - 20 = 10/40 = 0.25
   
   For an age value of 50:
   
   Xnormalized = 50 - 20/60 - 20 = 30/40 = 0.75

By applying Min-Max scaling, you've transformed the original age values into a normalized range between 0 and 1, making them suitable for use in machine learning algorithms. It's important to note that while Min-Max scaling is useful for algorithms that are sensitive to the scale of features (e.g., gradient descent-based algorithms), it might not be necessary or optimal for all types of algorithms. Additionally, outliers in the data can sometimes have a significant impact on the scaling, so it's important to consider the distribution of your data when applying preprocessing techniques.


Q2. What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling? 
Provide an example to illustrate its application.

Ans :- 
The Unit Vector technique, also known as vector normalization or unit normalization, is a feature scaling method that transforms the feature vectors in a dataset so that they have a length of 1 (i.e., they become unit vectors). This technique is particularly useful when the direction of the data points is more important than their magnitude. It's commonly used in scenarios where you want to measure the similarity or distance between data points without being influenced by their original magnitudes.

The formula for unit vector normalization is as follows:

Unit Vector = X / ||X||

Where:

-->X is the original feature vector.

-->||X|| is the Euclidian norm(magnitude) of the feature vector.

This technique scales each feature vector by dividing it by its magnitude, resulting in a unit vector that points in the same direction as the original vector but has a length of 1.

Difference from Min-Max Scaling:
The key difference between Unit Vector normalization and Min-Max scaling is that Min-Max scaling scales the features to a specific range (typically between 0 and 1), whereas Unit Vector normalization only adjusts the magnitude of the vectors, keeping their direction intact.

Example:

Let's consider a dataset with two-dimensional data points representing coordinates on a Cartesian plane. We want to apply Unit Vector normalization to these data points.

Original data points:

-->Data point A: (3, 4)
-->Data point B: (1, 2)

1.Calculate the Euclidean norm (magnitude) for each data point:

For data point A:

||A|| = √(3**2 + 4**2) = 5

For data point B:

||B|| = √(1**2 + 2**2) = √5 ≈ 2.236

2.Apply the Unit Vector normalization formula to each data point:

For data point A:

Unit Vector(A) = (3,4)/5 = (3/5, 4/5)

For data point B:

Unit Vector(B) = (1,2)/√5 ≈ (1/2.236, 2/2.236) ≈ (0.447, 0.894)

After applying Unit Vector normalization, both data points have been transformed to unit vectors while preserving their directions. The length of each vector is now approximately 1, making them suitable for distance-based calculations such as computing cosine similarity or using them as inputs for machine learning algorithms that rely on direction rather than magnitude.


Q3. What is PCA (Principle Component Analysis), and how is it used in dimensionality reduction? Provide an 
example to illustrate its application.

Ans :-
Principal Component Analysis (PCA) is a dimensionality reduction technique commonly used in data analysis and machine learning. It's used to transform high-dimensional data into a new coordinate system, where the new dimensions (called principal components) are orthogonal and sorted in decreasing order of variance. PCA aims to capture the most important patterns or variations in the data by projecting it onto a lower-dimensional space while minimizing information loss.

Here's how PCA works:

1.Standardize the Data: Before applying PCA, it's important to standardize the data by subtracting the mean of each feature and dividing by its standard deviation. This step ensures that features with larger scales don't dominate the PCA process.

2.Compute Covariance Matrix: Calculate the covariance matrix of the standardized data. The covariance matrix shows how different features vary together.

3.Compute Eigenvectors and Eigenvalues: Find the eigenvectors and eigenvalues of the covariance matrix. The eigenvectors represent the directions (principal components) of maximum variance, and the eigenvalues represent the amount of variance captured along each eigenvector.

4.Sort Eigenvalues: Sort the eigenvalues in decreasing order. The eigenvectors corresponding to the largest eigenvalues are the most important principal components.

5.Select Principal Components: Choose the top k eigenvectors based on how much variance you want to retain in the reduced data. k is the desired number of dimensions in the reduced space.

6.Project Data: Project the original data onto the selected k principal components to obtain the lower-dimensional representation.

Example:

Let's say you have a dataset of two features: "Age" and "Income." You want to apply PCA to reduce the dimensionality to one dimension for visualization purposes.

Original data:

Data point 1: Age = 30, Income = 50000
Data point 2: Age = 25, Income = 60000
Data point 3: Age = 35, Income = 55000

1.Standardize the Data: Subtract the mean and divide by the standard deviation for both Age and Income.

2.Compute Covariance Matrix: Calculate the covariance matrix based on the standardized data.

3.Compute Eigenvectors and Eigenvalues: Calculate the eigenvectors and eigenvalues of the covariance matrix.

4.Sort Eigenvalues: Assume the eigenvalues are sorted as follows:

First eigenvalue: 0.08
Second eigenvalue: 0.02

5.Select Principal Component: Choose the first eigenvector since it corresponds to the largest eigenvalue.

6.Project Data: Project the original data onto the first principal component.

Projected data:

Data point 1: Projected value = 0.6
Data point 2: Projected value = 0.8
Data point 3: Projected value = 0.7

In this example, PCA has reduced the dimensionality from two dimensions (Age and Income) to one dimension (the first principal component), capturing the most important patterns in the data. The projected values can be plotted on a one-dimensional axis for visualization.

Q4. What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature 
Extraction? Provide an example to illustrate this concept.

Ans :-
PCA and feature extraction are closely related concepts in the context of dimensionality reduction. Feature extraction is a broader term that encompasses various techniques for transforming or representing original features in a way that captures the most important information while reducing dimensionality. PCA is one specific method for feature extraction.

PCA can be used for feature extraction by transforming the original features into a new set of features (principal components) that are linear combinations of the original features. These principal components are selected in such a way that they capture the maximum variance present in the data. By focusing on the components with the most variance, PCA helps in retaining the most significant information while reducing the number of dimensions.

Example:

Consider a dataset with four features: "Height," "Weight," "Age," and "Income." You want to use PCA for feature extraction to reduce the dimensionality of the data while retaining as much variance as possible.

Original data (sample):

Data point 1: Height = 170 cm, Weight = 65 kg, Age = 30 years, Income = $50,000
Data point 2: Height = 160 cm, Weight = 55 kg, Age = 25 years, Income = $60,000
Data point 3: Height = 180 cm, Weight = 75 kg, Age = 35 years, Income = $55,000

1.Standardize the Data: As in the previous explanation, standardize the data by subtracting the mean and dividing by the standard deviation for each feature.

2.Compute Covariance Matrix: Calculate the covariance matrix based on the standardized data.

3.Compute Eigenvectors and Eigenvalues: Calculate the eigenvectors and eigenvalues of the covariance matrix.

4.Sort Eigenvalues: Assume the eigenvalues are sorted as follows:

First eigenvalue: 0.08
Second eigenvalue: 0.05
Third eigenvalue: 0.02
Fourth eigenvalue: 0.01

5.Select Principal Components: Choose the first two eigenvectors (principal components) since they correspond to the largest eigenvalues and capture the most variance.

6.Project Data: Project the original data onto the selected principal components to obtain the lower-dimensional representation.

Projected data:

Data point 1: Projected values = (0.6, 0.4)
Data point 2: Projected values = (-0.2, -0.1)
Data point 3: Projected values = (0.8, -0.3)

In this example, PCA has transformed the original four-dimensional data into a two-dimensional representation using the first two principal components. These new features, obtained through PCA, can be used for downstream analysis, visualization, or machine learning tasks. The projected values capture the most significant patterns in the data while reducing the dimensionality, making the data more manageable and potentially improving the performance of machine learning algorithms.

Q5. You are working on a project to build a recommendation system for a food delivery service. The dataset 
contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to 
preprocess the data.

Ans :-
To preprocess the features for building a recommendation system for a food delivery service, you can use Min-Max scaling to ensure that all the features are on a similar scale, which can improve the performance of recommendation algorithms. Here's how you would use Min-Max scaling for each feature:

1.Price Feature:
Let's say the price of items in your dataset ranges from $5 to $50. Apply Min-Max scaling to transform the price feature to a range between 0 and 1:

Scaled Price = Price - Min Price / Max Price - Min Price

Where:

Price
Price is the original price of the item.
Min Price
Min Price is the minimum price in the dataset (e.g., $5).
Max Price
Max Price is the maximum price in the dataset (e.g., $50).

2.Rating Feature:
If the rating of items in your dataset ranges from 1 to 5, you can apply Min-Max scaling to transform the rating feature to a range between 0 and 1:

Scaled Rating = Rating - Min Rating / Max Rating - Min Rating

 

Where:

Rating
Rating is the original rating of the item.
Min Rating
Min Rating is the minimum rating in the dataset (e.g., 1).
Max Rating
Max Rating is the maximum rating in the dataset (e.g., 5).

3.Delivery Time Feature:
If the delivery time of items in your dataset ranges from 20 to 60 minutes, you can apply Min-Max scaling to transform the delivery time feature to a range between 0 and 1:

Scaled Delivery Time = Delivery Time - Min Delivery Time / Max Delivery Time - Min Delivery Time

 

Where:

Delivery Time
Delivery Time is the original delivery time of the item.
Min Delivery Time
Min Delivery Time is the minimum delivery time in the dataset (e.g., 20 minutes).
Max Delivery Time
Max Delivery Time is the maximum delivery time in the dataset (e.g., 60 minutes).

After applying Min-Max scaling to all the features (price, rating, and delivery time), your data will be transformed so that each feature is within the range of 0 to 1. This ensures that features with different scales don't disproportionately affect the recommendation algorithm. Once the data is scaled, you can proceed with building your recommendation system using techniques like collaborative filtering, content-based filtering, or hybrid approaches, depending on your specific project requirements.

Q6. You are working on a project to build a model to predict stock prices. The dataset contains many 
features, such as company financial data and market trends. Explain how you would use PCA to reduce the 
dimensionality of the dataset.

Ans :-
Using PCA to reduce the dimensionality of a dataset for predicting stock prices can help improve model efficiency, reduce noise, and prevent overfitting. Here's how you can apply PCA to the dataset with many features:

1.Data Preparation:

Organize your dataset with various features such as company financial data and market trends. Make sure the data is properly cleaned, normalized, and standardized.

2.Standardize the Data:

Standardize the features by subtracting the mean and dividing by the standard deviation. This step ensures that features with larger scales do not dominate the PCA process.

3.Compute Covariance Matrix:

Calculate the covariance matrix of the standardized data. The covariance matrix shows how different features vary together.

4.Compute Eigenvectors and Eigenvalues:

Calculate the eigenvectors and eigenvalues of the covariance matrix. Eigenvectors represent the directions of maximum variance, and eigenvalues represent the amount of variance captured along each eigenvector.

5.Sort Eigenvalues:

Sort the eigenvalues in decreasing order. This allows you to identify which principal components capture the most variance.

6.Select Principal Components:

Choose the top k eigenvectors based on how much variance you want to retain in the reduced data. You can use methods like explained variance ratio to determine the number of components to retain.

7.Project Data onto Principal Components:

Project the original data onto the selected k principal components to obtain the lower-dimensional representation. This is done by taking the dot product of the standardized data with the selected eigenvectors.

8.Model Building and Evaluation:

Use the reduced-dimensional data obtained from PCA as input to build your stock price prediction model. Common algorithms like linear regression, support vector machines, or neural networks can be used. Make sure to split your dataset into training and testing subsets for proper evaluation.

It's important to note that while PCA can help reduce dimensionality and enhance model performance, it also results in losing interpretability as the new features are combinations of the original ones. Additionally, when using PCA for stock price prediction, it's crucial to remember that stock prices are influenced by a wide range of factors, including economic indicators, geopolitical events, and market sentiment, which may not be fully captured by financial data and market trends. Therefore, while PCA can be a useful tool, stock price prediction remains a challenging task that requires a comprehensive understanding of financial markets and careful consideration of the data and modeling approaches used.


Q7. For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the 
values to a range of -1 to 1

In [5]:
from sklearn.preprocessing import MinMaxScaler

l1 = [[1], [5], [10], [15], [20]]

scaler = MinMaxScaler()
scaler.fit(l1)



In [6]:
scaler.transform(l1)

array([[0.        ],
       [0.21052632],
       [0.47368421],
       [0.73684211],
       [1.        ]])

In [7]:
scaler.fit_transform(l1)

array([[0.        ],
       [0.21052632],
       [0.47368421],
       [0.73684211],
       [1.        ]])

Q8. For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform 
Feature Extraction using PCA. How many principal components would you choose to retain, and why?

Ans :-
The decision of how many principal components to retain in PCA depends on the amount of variance you want to capture and the specific goals of your analysis. Typically, you aim to retain enough principal components to explain a high percentage of the total variance in the data while reducing dimensionality.

Here's a general process for deciding the number of principal components to retain:

1.Compute Explained Variance Ratio:

Calculate the explained variance ratio for each principal component. The explained variance ratio for a component 
i is the proportion of the total variance that is explained by that component.

2.Cumulative Explained Variance:

Compute the cumulative explained variance by summing up the explained variance ratios from the first component to the 
ith component. This will give you an idea of how much variance is explained by including 
i components.

3.Decide on Retention Threshold:

Set a retention threshold for the cumulative explained variance. This threshold indicates the proportion of total variance you want to retain. A common threshold is often around 90% or higher.

4.Choose Number of Components:

Determine the number of principal components to retain based on the retention threshold you set. It's usually the number of components needed to exceed or get close to the threshold.

5.Visualization and Interpretation:

Visualize the cumulative explained variance graph and observe the point where it reaches your chosen threshold. Additionally, consider the interpretability of the principal components – you might choose to retain fewer components if they are easier to interpret.

Given that your dataset contains features like height, weight, age, gender, and blood pressure, you can perform PCA to see how much variance is explained by each principal component. You can then decide on the number of components to retain based on your goals and the explained variance.

Note that the "gender" feature is categorical, so you would need to encode it using techniques like one-hot encoding before performing PCA. The "blood pressure" feature might require careful handling as well, depending on its format (e.g., systolic and diastolic readings).

In practice, you might start by retaining a higher number of components and then analyze the cumulative explained variance to determine how many components are sufficient for your specific use case.

In [30]:
import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

data = [
    [165, 60, 30, 'Male', 120],
    [175, 70, 25, 'Female', 110],
    [180, 80, 35, 'Male', 130],
    [160, 50, 28, 'Female', 115],
    # ... more data points ...
]

#Separate categorical and numerical features
categorical_features = [3] #Index of the gender feature
numerical_features = [0, 1, 2, 4] #Index of height, weight, age and blood pressure features

# Extract numerical data
numerical_data = [row[numerical_features[0]] for row in data]
data1 = np.array(numerical_data)
data2 = np.reshape(data1,[-1,1])
# Standardize numerical data
scaler = StandardScaler()
standardized_data = scaler.fit_transform(data2)

# Perform PCA
pca = PCA()
principal_components = pca.fit_transform(standardized_data)

# Choose number of principal components
# You can use explained_variance_ratio_ to understand how much variance is explained by each component
# and decide how many components to retain based on your goal