## Q1. What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its application.

Min-Max scaling, also known as normalization, is a data preprocessing technique used to rescale the values of a feature or variable to a specific range. It transforms the data so that it falls within a predetermined range, typically between 0 and 1.

The formula for Min-Max scaling is as follows:

scaled_value = (x - min_value) / (max_value - min_value)

where 'x' is the original value, 'min_value' is the minimum value in the dataset, and 'max_value' is the maximum value in the dataset.

In [15]:
Price= [100000, 200000, 300000, 400000, 500000, 100000]

In [16]:
import pandas as pd

In [17]:
df = pd.DataFrame(Price, columns = ['price'])

In [18]:
df

Unnamed: 0,price
0,100000
1,200000
2,300000
3,400000
4,500000
5,100000


In [19]:
from sklearn.preprocessing import MinMaxScaler

In [20]:
min_max = MinMaxScaler()

In [24]:
pd.DataFrame(min_max.fit_transform(df[['price']]), columns=['price'])

Unnamed: 0,price
0,0.0
1,0.25
2,0.5
3,0.75
4,1.0
5,0.0


## Q2. What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling? Provide an example to illustrate its application.

** The Unit Vector technique, also known as vector normalization or feature scaling, is a data preprocessing method that transforms the values of a feature or variable to have a unit norm, i.e., a length or magnitude of 1. It scales the feature vector to maintain the direction of the data while making its length uniform.



The formula for unit vector scaling is as follows:


scaled_value = x / ||x||


where 'x' is the original value and ||x|| represents the Euclidean norm or magnitude of the vector.
Unlike Min-Max scaling, which rescales the values to a specific range (e.g., 0 to 1), unit vector scaling focuses on maintaining the direction of the data while adjusting its magnitude. This technique is especially useful when the magnitude or length of the feature vectors is important in a particular analysis or when dealing with algorithms that are sensitive to the scale of the data.


Here's an example to illustrate the application of the Unit Vector technique:


Let's consider a dataset of two features, height and weight, for a group of individuals. We want to scale the features using unit vector scaling.


Original data:
Height: [160, 170, 180, 190]
Weight: [60, 70, 80, 90]


To apply unit vector scaling, we need to calculate the Euclidean norm of each feature vector and then divide each value by its respective norm.


First, calculate the Euclidean norm:
||[160, 60]|| = sqrt(160^2 + 60^2) = 172.047

||[170, 70]|| = sqrt(170^2 + 70^2) = 182.925

||[180, 80]|| = sqrt(180^2 + 80^2) = 193.649

||[190, 90]|| = sqrt(190^2 + 90^2) = 204.603


Next, divide each value by its corresponding norm to obtain the scaled values:

Scaled data:
Height: [160/172.047, 170/182.925, 180/193.649, 190/204.603]

Weight: [60/172.047, 70/182.925, 80/193.649, 90/204.603]

Scaled Height: [0.930, 0.930, 0.930, 0.930]

Scaled Weight: [0.349, 0.383, 0.416, 0.441]

Now, the feature vectors have been scaled using unit vector scaling, resulting in vectors with a magnitude of 1. The direction of the data is preserved, but the 
magnitude has been standardized.


It's important to note that unit vector scaling doesn't necessarily result in values between 0 and 1, as in Min-Max scaling. Instead, it ensures that the magnitude of each feature vector is 1. This technique is particularly useful in scenarios where the length or magnitude of the vectors is relevant, such as in machine learning algorithms like cosine similarity or when dealing with distance-based calculations.

In [31]:
from sklearn.preprocessing import normalize
Height=[160, 170, 180, 190]
Weight=[60, 70, 80, 90]
df = pd.DataFrame( {'Height': Height, 'Weight': Weight} )

In [32]:
normalize(df[['Height','Weight']])

array([[0.93632918, 0.35112344],
       [0.9246781 , 0.38074981],
       [0.91381155, 0.40613847],
       [0.90373784, 0.42808634]])

In [33]:
df = pd.DataFrame( normalize(df[['Height','Weight']]), columns = ['Height','Weight'])

In [34]:
df

Unnamed: 0,Height,Weight
0,0.936329,0.351123
1,0.924678,0.38075
2,0.913812,0.406138
3,0.903738,0.428086


## Q3. What is PCA (Principle Component Analysis), and how is it used in dimensionality reduction? Provide an example to illustrate its application.

## Q5. You are working on a project to build a recommendation system for a food delivery service. The dataset contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to preprocess the data.

Min-Max scaling will help bring these features to a common scale, making them comparable and avoiding any bias towards features with larger values.

1 Determine the range: Identify the minimum and maximum values for each feature in the dataset,

2 Apply Min-Max scaling: Apply the Min-Max scaling formula to each feature in the dataset, which rescales the values to a range between 0 and 1. The formula is as follows:

scaled_value = (x - min_value) / (max_value - min_value),

3 Perform Min-Max scaling on each feature: Calculate the scaled value for each data point in the dataset using the Min-Max scaling formula. This ensures that the values of each feature are transformed to the range between 0 and 1.

In [44]:
import numpy as np
# Original dataset
data = np.array([
    [5, 3, 15],
    [10, 4, 30],
    [20, 5, 45],
    [50, 2, 60]
])

In [36]:
price = data[:, 0]
rating = data[:, 1]
delivery_time = data[:, 2]

In [38]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()

In [52]:
price = price.reshape(-1, 1)
rating = rating.reshape(-1, 1)
delivery_time = delivery_time.reshape(-1, 1)

In [51]:
price

array([[ 5],
       [10],
       [20],
       [50]])

In [53]:
scaled_price = scaler.fit_transform(price)
scaled_rating = scaler.fit_transform(rating)
scaled_delivery_time = scaler.fit_transform(delivery_time)

In [46]:
scaled_price

array([[0.        ],
       [0.11111111],
       [0.33333333],
       [1.        ]])

In [43]:
print("Scaled Price:", scaled_price.flatten())
print("Scaled Rating:", scaled_rating.flatten())
print("Scaled Delivery Time:", scaled_delivery_time.flatten())


Scaled Price: [0.         0.11111111 0.33333333 1.        ]
Scaled Rating: [0.33333333 0.66666667 1.         0.        ]
Scaled Delivery Time: [0.         0.33333333 0.66666667 1.        ]


## Q6. You are working on a project to build a model to predict stock prices. The dataset contains many features, such as company financial data and market trends. Explain how you would use PCA to reduce the dimensionality of the dataset.

## Q7. For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the values to a range of -1 to 1.

To perform Min-Max scaling on the given dataset to transform the values to a range of -1 to 1, you can use the following steps:

Determine the minimum and maximum values in the dataset. In this case, the minimum is 1, and the maximum is 20.

Apply the Min-Max scaling formula to each value in the dataset using the desired range of -1 to 1. The formula is as follows:

scaled_value = ((x - min_value) / (max_value - min_value)) * 2 - 1

where 'x' represents the original value, 'min_value' is the minimum value of the dataset, and 'max_value' is the maximum value of the dataset

In [78]:
from sklearn.preprocessing import MinMaxScaler

# Original dataset
dataset = [1, 5, 10, 15, 20]


# Create an instance of MinMaxScaler with the desired feature_range (-1, 1)
scaler = MinMaxScaler((-1, 1))

# Reshape the dataset to match the expected input shape of MinMaxScaler
dataset = [[value] for value in dataset]

# Fit the scaler on the original data and transform it
scaled_dataset = scaler.fit_transform(dataset)

# Flatten the scaled dataset
scaled_dataset = [value[0] for value in scaled_dataset]

# Print the scaled dataset
print("Scaled dataset:", scaled_dataset)


Scaled dataset: [-0.9999999999999999, -0.5789473684210525, -0.05263157894736836, 0.47368421052631593, 1.0]


In [72]:
dataset = [value for value in dataset]

In [64]:
dataset

[[[1]], [[5]], [[10]], [[15]], [[20]]]

In [67]:
dataset = [value[0] for value in dataset]

In [68]:
dataset

[1, 5, 10, 15, 20]