Q1. What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its
application.


 Way of data scaling, where the minimum of feature is made equal to zero and the maximum of feature equal to one. MinMax Scaler shrinks the data within the given range, usually of 0 to 1. It transforms data by scaling features to a given range. It scales the values to a specific value range without changing the shape of the original distribution.
The MinMax scaling is done using:

x_std = (x – x.min(axis=0)) / (x.max(axis=0) – x.min(axis=0))

x_scaled = x_std * (max – min) + min


In [6]:
# import module
from sklearn.preprocessing import MinMaxScaler
 
# create data
data = [[11, 2], [3, 7], [0, 10], [11, 8]]
 
# scale features
scaler = MinMaxScaler()
model=scaler.fit(data)
scaled_data=model.transform(data)
 
# print scaled features
print(scaled_data)

[[1.         0.        ]
 [0.27272727 0.625     ]
 [0.         1.        ]
 [1.         0.75      ]]


Q2. What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling?
Provide an example to illustrate its application.


Scaling is done considering the whole feature vector to be of unit length. This usually means dividing each component by the Euclidean length of the vector (L2 Norm). In some applications (e.g., histogram features), it can be more practical to use the L1 norm of the feature vector.
Like Min-Max Scaling, the Unit Vector technique produces values of range [0,1]. When dealing with features with hard boundaries, this is quite useful. For example, when dealing with image data, the colors can range from only 0 to 255.

The Unit Vector technique rescales the features of a dataset to have unit norm (i.e., length or magnitude of 1). This ensures that each feature contributes equally to the analysis, regardless of its scale. This technique is useful when you want to preserve the direction of the data and focus on the relative importance of each feature.

On the other hand, Min-Max scaling rescales the features to a fixed range, usually between 0 and 1. To apply Min-Max scaling, you first subtract the minimum value of the feature from each value in the feature vector, then divide the result by the range (i.e., the difference between the maximum and minimum values). This technique is useful when you want to normalize the data to a specific range and ensure that all features have the same scale.

Suppose we have a dataset with two features: age and income. Age ranges from 18 to 65, while income ranges from 20,000 to 200,000. We want to scale these features so that they have equal weights in our analysis.

First, we need to calculate the norm for each data point. For a data point with age = 25 and income = 50,000, the norm would be:

norm = sqrt(age^2 + income^2) = sqrt(25^2 + 50000^2) = 50,001

Next, we need to divide each feature by the norm to get the unit vector. For this data point, the unit vector would be:

age_unit = age / norm = 25 / 50,001 = 0.0005 income_unit = income / norm = 50,000 / 50,001 = 0.9999

The unit vector represents the direction of the data point, while its length is always 1. By applying the Unit Vector technique, we've rescaled the features so that they have equal weights in our analysis, regardless of their original scales. We can now use these scaled features for any analysis or modeling we want to perform on the dataset.

Q3. What is PCA (Principle Component Analysis), and how is it used in dimensionality reduction? Provide an
example to illustrate its application.


Principal component analysis, or PCA, is a statistical procedure that allows you to summarize the information content in large data tables by means of a smaller set of “summary indices” that can be more easily visualized and analyze

Analysts looking for patterns and trends in stock prices have an extensive dataset containing numerous stocks along with dozens of variables, including closing price, trading volume, earnings per share, market liquidity and volatility, GDP, inflation, company earnings and revenue, dividend yield, international conditions, supply and demand factors, competition, and so on.

Principal components can take the multitude of variables and reduce them to the most important indices. This method finds a smaller set of values that explain most of the variation in stock prices. Importantly, PCA ranks the components by importance, helping you know which ones to focus on.

Q4. What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature
Extraction? Provide an example to illustrate this concept.


PCA, is a dimensionality-reduction method that is often used to reduce the dimensionality of large data sets, by transforming a large set of variables into a smaller one that still contains most of the information in the large set.

Reducing the number of variables of a data set naturally comes at the expense of accuracy, but the trick in dimensionality reduction is to trade a little accuracy for simplicity. Because smaller data sets are easier to explore and visualize and make analyzing data much easier and faster for machine learning algorithms without extraneous variables to process.

So, to sum up, the idea of PCA is simple — reduce the number of variables of a data set, while preserving as much information as possible.

For example, after applying PCA, you may find that the first principal component is strongly correlated with the overall market trend. This means that the performance of the portfolio is heavily influenced by the general market conditions. The second principal component may be related to the performance of a specific sector, such as technology stocks.

Q5. You are working on a project to build a recommendation system for a food delivery service. The dataset
contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to
preprocess the data.


For a food delivery system, there are 3 criteria:
price : Prices on an avg will be in range of 100 to 5000 (in INR)
rating: It will be on a scale of 1 to 5 or 1 to 10
delivery time: Time will be in range of 10 to 60 (mins)
So for this dataset the prices will have huge impact on recommendation system therefore it needs to be scaled down. All 3 features can be scaled to 0 to 1 using min max scaling.
Formula: Value - min / max - min

Q6. You are working on a project to build a model to predict stock prices. The dataset contains many
features, such as company financial data and market trends. Explain how you would use PCA to reduce the
dimensionality of the dataset.


PCA is generally applied to data points that have associated features that are independent of time. Each time series realization (or percent change of value per day) is considered to be another feature of the stock. i.e. We are treating all realizations of percent change in value of stocks to be independent of each other. This may not be a good modeling assumption since there may be temporal influence.

When building a model to predict stock prices, we might have a dataset with many features, such as company financial data and market trends. However, using all of these features in a machine learning model can lead to the "curse of dimensionality," where the model becomes overly complex and overfit to the training data. To address this issue, we can use Principal Component Analysis (PCA) to reduce the dimensionality of the dataset.

Here's how we would use PCA to reduce the dimensionality of the dataset:

Standardize the data: We first standardize the data by scaling each feature to have a mean of 0 and a standard deviation of 1. This ensures that all features are on the same scale and have equal importance in the analysis.

Calculate the covariance matrix: We then calculate the covariance matrix of the standardized data, which measures the relationships between the different features.

Perform eigendecomposition: We perform eigendecomposition on the covariance matrix to calculate the principal components of the data. Each principal component is a linear combination of the original features and represents a different axis in the data. The first principal component explains the most variance in the data, the second explains the second-most variance, and so on.

Choose the number of principal components: We can use the scree plot or cumulative variance plot to decide how many principal components to keep. For example, we might decide to keep the first 10 principal components, which explain 80% of the variance in the data.

Transform the data: Finally, we transform the original features into the new principal components, which represent the most important features in the data. These new features can then be used as input for machine learning algorithms.

By using PCA to reduce the dimensionality of the dataset, we can improve the performance of machine learning algorithms by focusing on the most important features that explain the most variance in the data. This can help us build a more accurate and efficient model to predict stock prices.

Q7. For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the
values to a range of -1 to 1.


In [9]:
from sklearn.preprocessing import MinMaxScaler
import pandas as pd

In [14]:
min_max=MinMaxScaler(feature_range=(-1,1))

In [11]:
values=[1, 5, 10, 15, 20]

In [12]:
df=pd.DataFrame(data=values)
df

Unnamed: 0,0
0,1
1,5
2,10
3,15
4,20


In [16]:
df_scale = pd.DataFrame(min_max.fit_transform(df[[0]]))
df_scale

Unnamed: 0,0
0,-1.0
1,-0.578947
2,-0.052632
3,0.473684
4,1.0


Q8. For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform
Feature Extraction using PCA. How many principal components would you choose to retain, and why?

Method 1: If your sole intention of doing PCA is for data visualization, you should select 2 or 3 principal components.
PCA is extremely useful for data visualization. Visualization of high-dimensional data can be achieved through PCA.
Since we are only familiar with 2D and 3D plots, we should convert high-dimensional data into 2 or 3-dimensional data to visualize them on 2D or 3D plots. This can be achieved through PCA.

Method 2: If you want an exact amount of variance to be kept in data after applying PCA, specify a float between 0 and 1 to the hyperparameter n_components.

Method 3: Plot the explained variance percentage of individual components and the percentage of total variance captured by all principal components.
This is the most advanced and effective method that can be used to select the best number of principal components for the dataset.