# Data Science Masters Assignment - Feature Engineering 3

## Q1. What is Min-Max scaling, and how is it used in data preprocessing?

Min-Max scaling is a normalization technique that scales the data to a fixed range, typically [0, 1] or [-1, 1]. 
This is useful when the scale of data varies widely, as scaling ensures each feature contributes proportionally to the model performance. 

The formula for Min-Max scaling is:

\[
X' = \frac{X - X_{min}}{X_{max} - X_{min}} \times (max_{range} - min_{range}) + min_{range}
\]

### Example
Let’s scale a dataset `[1, 5, 10, 15, 20]` to the range [-1, 1].

In [6]:
from sklearn.preprocessing import MinMaxScaler
import numpy as np

# Data
data = np.array([1, 5, 10, 15, 20]).reshape(-1, 1)
scaler = MinMaxScaler(feature_range=(-1, 1))
scaled_data = scaler.fit_transform(data)
print('Scaled Data:', scaled_data.ravel())

Scaled Data: [-1.         -0.57894737 -0.05263158  0.47368421  1.        ]


## Q2. What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling?

The Unit Vector technique (or normalization to unit norm) scales the data so that the norm (magnitude) of each data point is 1. 
This technique is more suited for applications where the direction of data points is more important than their absolute values, such as text analysis.


### Example
Let’s apply Unit Vector scaling to the dataset `[1, 5, 10, 15, 20]`.


In [11]:
from sklearn.preprocessing import normalize

# Normalizing the data using unit vector technique
unit_vector_scaled_data = normalize(data, norm='l2')
print('Unit Vector Scaled Data:', unit_vector_scaled_data.ravel())

Unit Vector Scaled Data: [1. 1. 1. 1. 1.]


## Q3. What is PCA (Principal Component Analysis), and how is it used in dimensionality reduction?

PCA is a dimensionality reduction technique that transforms the data into a set of orthogonal components called principal components. These components represent directions of maximum variance in the data.

### Example
We will reduce a 3D dataset to 2D using PCA.

In [14]:
from sklearn.decomposition import PCA

# Creating a 3D dataset
data_3d = np.random.randn(100, 3)

# Applying PCA
pca = PCA(n_components=2)
data_2d = pca.fit_transform(data_3d)
print('Transformed Data Shape:', data_2d.shape)

Transformed Data Shape: (100, 2)


## Q4. What is the relationship between PCA and Feature Extraction?

PCA can be used for Feature Extraction by transforming the original features into principal components. 
These components represent linear combinations of the original features that capture the most variance in the data.

## Q5. Min-Max scaling for a recommendation system

In a food delivery recommendation system, Min-Max scaling can normalize features such as price, rating, and delivery time to a common range.

### Example
We will scale features for three orders.


In [23]:
import pandas as pd

# Sample data
recommendation_data = pd.DataFrame({
    'Price': [10, 20, 30],
    'Rating': [4.5, 3.8, 4.9],
    'Delivery_Time': [30, 25, 40]
})

# Applying Min-Max scaling
scaled_recommendation_data = scaler.fit_transform(recommendation_data)
print(pd.DataFrame(scaled_recommendation_data, columns=recommendation_data.columns))

   Price    Rating  Delivery_Time
0   -1.0  0.272727      -0.333333
1    0.0 -1.000000      -1.000000
2    1.0  1.000000       1.000000


## Q6. PCA for stock price prediction

For stock price prediction, PCA can reduce the dimensionality of financial data by selecting principal components that capture the most variance.

In [27]:
# Simulated stock data with 10 features
stock_data = np.random.randn(100, 10)

# Applying PCA to reduce to 2 components
pca_stock = PCA(n_components=2)
stock_data_reduced = pca_stock.fit_transform(stock_data)
print('Reduced Data Shape:', stock_data_reduced.shape)

Reduced Data Shape: (100, 2)


## Q7. Min-Max scaling example

Min-Max scaling was already applied to `[1, 5, 10, 15, 20]` in Q1.

## Q8. PCA for feature extraction on a dataset

Let’s apply PCA to a dataset with features `[height, weight, age, gender, blood pressure]`.


In [32]:

# Simulating a dataset with 5 features
feature_data = np.random.randn(100, 5)

# Applying PCA
pca_features = PCA()
pca_features.fit(feature_data)
explained_variance_ratio = pca_features.explained_variance_ratio_
print('Explained Variance Ratio:', explained_variance_ratio)


Explained Variance Ratio: [0.26406698 0.22989685 0.1871897  0.18230549 0.13654098]
