# Assignment

### Ans1)

Min-Max scaling is a data preprocessing technique used to normalize features to a specified range, typically [0, 1]. The formula for Min-Max scaling is:

x_scaled = (x - x_min) / (x_max - x_min)


Here's an example of Min-Max scaling in Python:

In [4]:
import numpy as np

# Original feature values
x = np.array([2, 5, 10, 15, 20])

# Compute the minimum and maximum values
x_min = np.min(x)
x_max = np.max(x)

# Scale the feature values to [0, 1]
x_scaled = (x - x_min) / (x_max - x_min)

print(x_scaled)


[0.         0.16666667 0.44444444 0.72222222 1.        ]


### Ans2)

The unit vector technique is a feature scaling technique that scales the values of a feature to have a magnitude of 1. This means that the direction of the feature vector is preserved, but its length is normalized to 1. This technique is used to ensure that all features have equal importance in the analysis, regardless of their original scale.

Compared to Min-Max scaling, which scales features to a fixed range of values (usually [0,1]), unit vector scaling preserves the direction of the feature vector and does not restrict the values to a specific range.

Here's an example of how to use unit vector scaling in Python using the scikit-learn library:

In [5]:
from sklearn.preprocessing import Normalizer
import numpy as np

# create a sample dataset
X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# apply unit vector scaling
transformer = Normalizer().fit(X)
X_normalized = transformer.transform(X)

print(X_normalized)


[[0.26726124 0.53452248 0.80178373]
 [0.45584231 0.56980288 0.68376346]
 [0.50257071 0.57436653 0.64616234]]


### Ans3)

Principle Component Analysis (PCA) is a statistical technique used for dimensionality reduction. It allows one to identify patterns in data, by reducing the number of variables, while retaining most of the information in the original dataset.

In PCA, a dataset is transformed into a new coordinate system, where each variable is represented as a linear combination of a set of uncorrelated variables called principal components. The first principal component is the direction of the highest variance in the data, and the second principal component is the direction orthogonal to the first, with the next highest variance, and so on. The number of principal components is equal to the number of original variables.

An example of its application is in facial recognition, where a dataset of images of faces can be represented in a lower dimensional space using PCA. Each image is first flattened into a vector of pixel values. PCA is then applied to this dataset to identify the principal components that explain the most variance in the data. The images can then be reconstructed using only a subset of these principal components, resulting in a lower-dimensional representation of the original images. This can be used for efficient storage and retrieval of large datasets, as well as for identifying patterns in the data, such as common facial features or expressions.

### Ans4)

PCA can be used for feature extraction, which is the process of identifying a smaller set of features or variables that best represent the underlying structure of a dataset. Feature extraction is often used in machine learning tasks, such as classification or clustering, to reduce the dimensionality of the input data and improve the accuracy and efficiency of the algorithms.

In PCA, the principal components are a linear combination of the original features, which can be used as new, reduced features. This is a type of feature extraction, where the most important information in the original features is retained in the principal components, while the less important information is discarded. The resulting reduced feature set can be used for further analysis or modeling.

For example, consider a dataset of images of handwritten digits, where each image is represented as a vector of pixel values. The dimensionality of the dataset can be reduced using PCA, by identifying the principal components that capture the most variation in the images. The principal components can be interpreted as features that represent the most important patterns or structures in the images, such as the orientation and thickness of the strokes. These reduced features can then be used as input to a classification algorithm, to identify the digit represented in each image.

### Ans5)

In order to build an effective recommendation system for a food delivery service, it is important to preprocess the data before applying any machine learning algorithms. One common preprocessing step is Min-Max scaling, which is a technique used to transform the data to a common scale.

Min-Max scaling can be applied to the features such as price, rating, and delivery time. For example, if the price range is between 0 and 100, and the rating range is between 1 and 5, the Min-Max scaling formula can be applied as follows:

scaled_price = (price - min_price) / (max_price - min_price)

scaled_rating = (rating - min_rating) / (max_rating - min_rating)

where min_price and max_price are the minimum and maximum values of the price feature, respectively, and min_rating and max_rating are the minimum and maximum values of the rating feature, respectively.

By applying Min-Max scaling to the data, the different features are put on a common scale, which allows for better comparisons between them. This can help to identify important patterns or relationships between the features, which can then be used to make recommendations to users of the food delivery service.

### Ans6)

When building a model to predict stock prices, it is common to have a large number of features in the dataset, such as company financial data and market trends. However, having a high number of features can lead to overfitting and decreased model performance. Therefore, it is important to reduce the dimensionality of the dataset to improve the accuracy and efficiency of the model. Principal Component Analysis (PCA) is a technique that can be used to reduce the dimensionality of the dataset while retaining most of the important information.

To use PCA for dimensionality reduction in the stock price prediction dataset, the following steps can be taken:

1) Standardize the data: Before applying PCA, the data should be standardized to ensure that each feature is on the same scale. This is important because PCA is sensitive to the scale of the data.

2) Determine the number of principal components: The number of principal components to retain depends on the amount of variance explained by each component. A common rule of thumb is to select the number of components that explain at least 80% of the total variance in the data.

3) Apply PCA: Once the number of principal components has been determined, PCA can be applied to the standardized data to identify the principal components that capture the most variation in the dataset.

4) Interpret the results: The principal components can be interpreted as linear combinations of the original features, and can be used as a reduced set of features in the model. The features with the highest weights in each principal component can provide insights into the most important variables in the dataset.

By using PCA to reduce the dimensionality of the stock price prediction dataset, we can improve the accuracy and efficiency of the model. However, it is important to note that PCA is a linear technique, and may not be suitable for datasets with non-linear relationships between the features. In such cases, non-linear techniques such as kernel PCA may be more appropriate.

### Ans7)

To perform Min-Max scaling on the given dataset and transform the values to a range of -1 to 1, we can use the following formula:

scaled_value = 2 * (value - min_value) / (max_value - min_value) - 1

where value is the original value, min_value is the minimum value in the dataset, and max_value is the maximum value in the dataset.

In this case, the minimum value is 1 and the maximum value is 20. Applying the formula for each value in the dataset, we get:

scaled_1 = 2 * (1 - 1) / (20 - 1) - 1 = -0.8947

scaled_5 = 2 * (5 - 1) / (20 - 1) - 1 = -0.4211

scaled_10 = 2 * (10 - 1) / (20 - 1) - 1 = 0.0526

scaled_15 = 2 * (15 - 1) / (20 - 1) - 1 = 0.5263

scaled_20 = 2 * (20 - 1) / (20 - 1) - 1 = 1.0000

Therefore, the Min-Max scaled values for the dataset [1, 5, 10, 15, 20] in the range of -1 to 1 are: [-0.8947, -0.4211, 0.0526, 0.5263, 1.0000].

### Ans8)

To perform Feature Extraction using PCA on the given dataset, we can follow the following steps:

1) Standardize the data: Before applying PCA, we need to standardize the data to ensure that each feature is on the same scale.

2) Apply PCA: We can then apply PCA to the standardized data to identify the principal components that capture the most variation in the dataset.

3) Determine the number of principal components to retain: We need to determine the number of principal components to retain based on the amount of variance explained by each component. A common rule of thumb is to select the number of components that explain at least 80% of the total variance in the data.

The number of principal components to retain depends on the amount of variance explained by each component. A common rule of thumb is to select the number of components that explain at least 80% of the total variance in the data.

However, it is difficult to say how many principal components to retain without knowing the specifics of the dataset and the problem being solved. In general, we want to retain enough principal components to capture most of the important information in the dataset while minimizing the amount of noise and redundancy in the data.