In [None]:
Q1. What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its
application.

In [None]:
Min-Max scaling, also known as normalization, is a data preprocessing technique used to rescale numeric features to a specific range,
typically between 0 and 1. It is applied to ensure that all features have the same scale and to prevent any feature from dominating the learning 
algorithm due to its larger magnitude.

The formula for Min-Max scaling is as follows:

scaled_value = (x - min) / (max - min)

where:

x is the original value of the feature.
min is the minimum value of the feature in the dataset.
max is the maximum value of the feature in the dataset.
The result of the scaling formula is that the minimum value of the feature is transformed to 0, the maximum value is transformed to 1, and 
all other values are linearly scaled between these two extremes.

In [1]:
import numpy as np

# Sample data
data = np.array([10, 20, 30, 15, 25])

# Min-Max scaling
min_value = np.min(data)
max_value = np.max(data)
scaled_data = (data - min_value) / (max_value - min_value)
print("Original data:", data)
print("Scaled data:", scaled_data)

Original data: [10 20 30 15 25]
Scaled data: [0.   0.5  1.   0.25 0.75]


In [None]:
Q2. What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling?
Provide an example to illustrate its application.

In [None]:
The Unit Vector technique, also known as normalization or feature scaling, is a data preprocessing technique used to transform numeric features 
into unit vectors, meaning that the resulting vectors have a length or magnitude of 1. It differs from Min-Max scaling in the way it scales the
features.

The formula for Unit Vector scaling is as follows:

scaled_vector = x / ||x||

where:

x is the original vector.
||x|| represents the Euclidean norm or magnitude of the vector.

In [4]:
## Pyhthon code
import numpy as np
data = np.array([3,2,1])
manngitude = np.linalg.norm(data)
scaled_data = data/magnitude
print("Orginal data is: ",data)
print("scaled_data is : ", scaled_data)

Orginal data is:  [3 2 1]
scaled_data is :  [0.80178373 0.53452248 0.26726124]


In [None]:
Q3. What is PCA (Principle Component Analysis), and how is it used in dimensionality reduction? Provide an
example to illustrate its application.

In [None]:
PCA, which stands for Principal Component Analysis, is a statistical technique used for dimensionality reduction and feature extraction. 
It transforms a high-dimensional dataset into a lower-dimensional space while preserving the most important information in the data.

The main idea behind PCA is to find a new set of orthogonal variables called principal components that capture the maximum variance in the data.
These principal components are linear combinations of the original features and are ranked in order of their importance. The first principal component
explains the maximum variance in the data, the second principal component explains the second maximum variance, and so on.

In [10]:
import numpy as np
from sklearn.decomposition import PCA
data = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
pca = PCA(n_components = 1)
reduced_data = pca.fit_transform(data)

In [11]:
data

array([[-1, -1],
       [-2, -1],
       [-3, -2],
       [ 1,  1],
       [ 2,  1],
       [ 3,  2]])

In [17]:
print('Orginal data dimensions is :', data.shape)
print('Reduced data dimention is : ', reduced_data.shape)

Orginal data dimensions is : (6, 2)
Reduced data dimention is :  (6, 1)


In [None]:
Q4. What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature
Extraction? Provide an example to illustrate this concept.

In [None]:
PCA (Principal Component Analysis) can be used as a feature extraction technique. Feature extraction refers to the process of transforming the 
original features of a dataset into a new set of features that captures the most important information. PCA achieves feature extraction by creating 
new variables called principal components, which are linear combinations of the original features.

The principal components are ranked in order of their importance, with the first principal component capturing the maximum variance in the data, the
second principal component capturing the second maximum variance, and so on. By selecting a subset of the principal components, we can effectively
reduce the dimensionality of the dataset while preserving the most significant information.

In [18]:
import numpy as np
from sklearn.decomposition import PCA

# Sample data
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])

# Apply PCA for feature extraction
pca = PCA(n_components=2)  # Set the desired number of components
extracted_features = pca.fit_transform(data)

print("Original data shape:", data.shape)
print("Extracted features shape:", extracted_features.shape)
print("Explained variance ratio:", pca.explained_variance_ratio_)

Original data shape: (4, 3)
Extracted features shape: (4, 2)
Explained variance ratio: [1. 0.]


In [None]:
Q5. You are working on a project to build a recommendation system for a food delivery service. The dataset
contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to
preprocess the data.

In [None]:
To use Min-Max scaling for preprocessing the data in a recommendation system for a food delivery service, you would apply this technique to the 
numerical features such as price, rating, and delivery time. The purpose of Min-Max scaling is to rescale the features to a specific range,
typically between 0 and 1, ensuring that all features have the same scale. Here's how you could apply Min-Max scaling to preprocess the data:

Identify the numerical features: In your dataset, identify the features that are numerical and require scaling. In this case, it would be price, 
rating, and delivery time.

Determine the minimum and maximum values: Calculate the minimum and maximum values for each of the numerical features. The minimum value represents
the smallest value observed for that feature, while the maximum value represents the largest value observed.
After applying Min-Max scaling, the numerical features such as price, rating, and delivery time will have values within the range of 0 to 1. This 
ensures that all features are on a comparable scale and avoids any feature dominating the recommendation system due to its larger magnitude.

In [25]:
from sklearn.preprocessing import MinMaxScaler
data = [[1, 2,5], [0.5, 6,12], [0, 10,15], [1, 18,20]]
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(data)


In [27]:
print("Orginal data is :", data)
print()
print("scaled data is :",scaled_data)

Orginal data is : [[1, 2, 5], [0.5, 6, 12], [0, 10, 15], [1, 18, 20]]

scaled data is : [[1.         0.         0.        ]
 [0.5        0.25       0.46666667]
 [0.         0.5        0.66666667]
 [1.         1.         1.        ]]


In [None]:
Q6. You are working on a project to build a model to predict stock prices. The dataset contains many
features, such as company financial data and market trends. Explain how you would use PCA to reduce the
dimensionality of the dataset.

In [None]:
In the context of building a model to predict stock prices with a dataset containing many features, such as company financial data and market trends, 
PCA (Principal Component Analysis) can be used to reduce the dimensionality of the dataset. Here's an explanation of how you could apply PCA for 
dimensionality reduction:

Identify the features: Start by identifying the features in your dataset that are relevant to your prediction task. These features could include 
financial indicators, market trends, historical prices, or any other factors that may impact stock prices.

Standardize the data: Before applying PCA, it's important to standardize the data to have zero mean and unit variance. This step ensures that 
features with larger magnitudes do not dominate the PCA process. You can use techniques like Z-score normalization or Min-Max scaling to standardize 
the numerical features in your dataset.

Apply PCA: Once the data is standardized, you can apply PCA to the dataset. PCA will transform the original high-dimensional feature space into a 
lower-dimensional space by creating new orthogonal variables called principal components. These principal components capture the maximum variance in 
the data.

Determine the number of components: Decide on the number of principal components to retain in the reduced dimensionality. This decision can be based 
on the explained variance ratio, which indicates the proportion of variance explained by each principal component. You can examine the cumulative 
explained variance ratio and select the number of components that explain a significant portion of the total variance, typically above a certain 
threshold (e.g., 80% or 90%).

Transform the data: Finally, transform the original dataset into the reduced dimensionality by selecting the desired number of principal components.
This transformation can be done using the PCA model's transform method.

Model training and evaluation: With the reduced-dimensional dataset, you can proceed to train and evaluate your stock price prediction model using 
the transformed features. The reduced dimensionality obtained through PCA can help mitigate the curse of dimensionality, improve computational 
efficiency, and potentially enhance the model's performance by focusing on the most informative components.

In [None]:
Q7. For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the
values to a range of -1 to 1.

In [30]:
from sklearn.preprocessing import MinMaxScaler
data = [[1, 5, 10, 15, 20]]
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(data)

In [31]:
print("Orginal data is :", data)
print()
print("scaled data is :",scaled_data)

Orginal data is : [[1, 5, 10, 15, 20]]

scaled data is : [[0. 0. 0. 0. 0.]]


In [None]:
Q8. For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform
Feature Extraction using PCA. How many principal components would you choose to retain, and why?

In [33]:
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
data = pd.DataFrame({'height':[170,175,160,180],
                   'weight':[65,68,55,70],
                   'age':[30,35,28,40],
                 'gender':[1,0,1,1] ,
                  'blood_pressure':[120,130,110,125]})
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)
pca = PCA()
pca.fit(scaled_data)

# Compute explained variance ratio
explained_variance_ratio = pca.explained_variance_ratio_

# Compute cumulative explained variance
cumulative_variance_ratio = np.cumsum(explained_variance_ratio)

print("Explained variance ratio:", explained_variance_ratio)
print("Cumulative explained variance:", cumulative_variance_ratio)



Explained variance ratio: [7.75323857e-01 1.92470848e-01 3.22052946e-02 4.09017810e-33]
Cumulative explained variance: [0.77532386 0.96779471 1.         1.        ]


In [None]:
 you will see the explained variance ratio for each principal component and the cumulative explained variance at each step. Based on the cumulative
    explained variance, you can make a decision on the number of principal components to retain.

In [34]:
a = 12
a

12