In [None]:
# Q1. Min-Max scaling, also known as normalization, is a data preprocessing technique used to transform numeric data to a specific range, typically between 0 and 1. It rescales the original values by subtracting the minimum value and dividing by the range(maximum value minus minimum value).

# The formula for Min-Max scaling is :
# scaled_value = (x - min) / (max - min)

# For example, let's consider a dataset of housing prices, where the minimum price is $50, 000 and the maximum price is $500, 000. To apply Min-Max scaling, you subtract the minimum value($50, 000) from each price and divide it by the range($500, 000 - $50, 000= $450, 000). This transformation brings the prices into a range between 0 and 1, preserving the relative relationships between the prices.

# Q2. The Unit Vector technique, also known as normalization or feature scaling, rescales the features of a dataset to have a unit norm(length of 1). Unlike Min-Max scaling, which brings the values to a specific range, Unit Vector scaling focuses on the direction of the data points and makes them comparable based on their orientations.

# The formula for Unit Vector scaling is :
# scaled_vector = vector / | | vector | |

# For example, let's consider a dataset with two features: height and weight. To apply Unit Vector scaling, you calculate the Euclidean norm of each data point(square root of the sum of squares of the features) and divide each feature by its respective norm. This ensures that the resulting vectors have a length of 1, making them comparable based on their orientations rather than absolute magnitudes.

# Q3. Principal Component Analysis(PCA) is a dimensionality reduction technique used to transform a high-dimensional dataset into a lower-dimensional space while preserving the most important information. PCA identifies the principal components, which are new variables that are linear combinations of the original features. These components capture the maximum variance in the data.

# The steps for performing PCA are as follows:
# 1. Standardize the data by subtracting the mean and scaling to unit variance.
# 2. Compute the covariance matrix or correlation matrix of the standardized data.
# 3. Calculate the eigenvectors and eigenvalues of the covariance matrix.
# 4. Sort the eigenvalues in descending order and select the desired number of principal components.
# 5. Project the original data onto the selected principal components.

# PCA is used in dimensionality reduction to reduce the number of features while retaining most of the variability in the dataset. It helps in eliminating redundant or less informative features, reducing computational complexity, and visualizing high-dimensional data.

# Q4. PCA can be used for feature extraction by selecting a subset of the principal components as new features. Instead of using the original features, the transformed dataset consists of the new components, which are linear combinations of the original features. These new components capture the most significant patterns or variations in the data.

# For example, let's consider a dataset with several features such as height, weight, age, income, and education level. By applying PCA, you can identify the principal components that explain the most variance in the dataset. Suppose you find that the first three principal components capture 95 % of the total variance. In this case, you can use these three components as the new features, effectively reducing the dimensionality of the dataset from five to three.

# Q5. In the context of building a recommendation system for a food delivery service, Min-Max scaling can be used to preprocess the data as follows:

# 1. Identify the relevant features from the dataset, such as price, rating, and delivery time.
# 2. Calculate the minimum and maximum values for each feature.
# 3. Apply Min-Max scaling to each feature using the formula: scaled_value = (x - min) / (max - min).
# 4. The scaled values will now fall within the range of 0 to 1, making them comparable and suitable for further analysis.
# 5. Use the preprocessed data to build the recommendation system, considering the scaled features' relative relationships to make accurate recommendations.

# Q6. In the context of predicting stock prices using a dataset with multiple features, PCA can be used to reduce the dimensionality of the dataset:

# 1. Identify the relevant features from the dataset, such as company financial data and market trends.
# 2. Standardize the dataset by subtracting the mean and scaling to unit variance, ensuring all features have comparable scales.
# 3. Apply PCA to the standardized dataset to identify the principal components.
# 4. Analyze the explained variance ratio of the principal components to understand how much information they capture.
# 5. Select the desired number of principal components based on the explained variance ratio or a specific threshold.
# 6. Project the original dataset onto the selected principal components to obtain the reduced-dimensional representation.
# 7. Use the reduced-dimensional dataset as input for building the stock price prediction model, reducing computational complexity and potentially improving model performance.

# Q7. To perform Min-Max scaling on the dataset[1, 5, 10, 15, 20] and transform the values to a range of - 1 to 1, you can follow these steps:

# 1. Find the minimum and maximum values in the dataset. In this case, the minimum is 1 and the maximum is 20.
# 2. Apply the Min-Max scaling formula: scaled_value = (x - min) / (max - min).
# 3. Subtract the minimum value(1) from each data point, resulting in [0, 4, 9, 14, 19].
# 4. Divide each data point by the range(max - min) = (20 - 1) = 19, resulting in [0/19, 4/19, 9/19, 14/19, 19/19].
# 5. Simplify the fractions: [0, 4/19, 9/19, 14/19, 1].
# 6. To transform the values to a range of - 1 to 1, multiply each value by 2 and subtract 1: [(-1) * 2, (4/19) * 2 - 1, (9/19) * 2 - 1, (14/19) * 2 - 1, (1) * 2 - 1].
# 7. Simplify the expressions: [-2, (8/19) - 1, (18/19) - 1, (28/19) - 1, 1 - 1].
# 8. Final transformed values: [-2, -11/19, -1/19, 9/19, 0].

# Q8. To perform feature extraction using PCA on a dataset with features[height, weight, age, gender, blood pressure], the number of principal components to retain depends on the desired level of dimensionality reduction and the explained variance ratio.

# Here's a general approach to determine the number of principal components to retain:

# 1. Standardize the dataset by subtracting the mean and scaling to unit variance.
# 2. Apply PCA to the standardized dataset and obtain the eigenvalues and eigenvectors.
# 3. Calculate the explained variance ratio, which indicates the proportion of variance explained by each principal component.
# 4. Sort the eigenvalues and their corresponding eigenvectors in descending order.
# 5. Calculate the cumulative explained variance ratio by summing the explained variance ratios from the first principal component up to the nth component.
# 6. Choose the number of principal components that capture a significant amount of the

# variance, typically aiming for a cumulative explained variance ratio of 80-95 % .

# The choice of the number of principal components depends on the trade-off between dimensionality reduction and preserving sufficient information for the task at hand. It's important to strike a balance between reducing complexity and maintaining the predictive power of the model.
