# Feature transformation
in machine learning refers to the process of modifying
or converting input features in a dataset to improve the performance of a
machine learning model. It involves applying mathematical or statistical
operations to the features in order to make them more suitable for the
learning algorithm.

Feature transformation techniques can include 

scaling, 

normalization,

binarization, 

polynomial expansion, 

logarithmic transformation, and more.


These transformations can help address issues such as different scales or
distributions of features, nonlinearity, and outliers, which can affect the
performance of the model. By transforming features, the goal is to create a
new set of features that better capture the underlying patterns and
relationships in the data, ultimately enhancing the model's ability to make
accurate predictions or classifications.


# Calculating Distance for ML Algorithm

Types of Distance Metrics in Machine Learning

1. Euclidean Distance

2. Manhattan Distance

3. Minkowski distance

4. Hamming Distance

5. Cosine Distance

**Euclidean distance**

Euclidean distance is a widely used distance metric. It works on the principle of the Pythagoras theorem and signifies the shortest distance between two points.

![image1_11zon_fa4497e473.jpg](attachment:eb2ee7d6-2840-4099-b446-5d8d815cee37.jpg)
![word-image-26834-1.png](attachment:6d97f052-bd4e-4371-8813-e5e9488b89a1.png)

Let’s write the code of Euclidean Distance in Python. We will first import the the SciPy library that contains pre-written codes for most of the distance functions used in Python:

![1_9RAGfavMuw6FSG6Wc_axkg.jpg](attachment:036c301b-2c49-4a48-8542-a2888b194a2a.jpg)


# Manhattan distance

The Manhattan distance or the cityblock distance is used to calculate the distance between two coordinates in a grid-like path.

The Manhattan Distance is the total difference between two places in all dimensions. The term “Manhattan Distance” is frequently used to refer to the distance between two city blocks. Let’s consider two points – A = () and B = (). Manhattan Distance can be expressed graphically as follows:

![image5_11zon_7723f44a19.jpg](attachment:eb64add6-4e00-4364-a34d-27616e579b92.jpg)

Because the following illustration is two-dimensional, we’ll use the total of absolute distances in both the x and y axes to calculate Manhattan Distance.

In a two-dimensional space, the Manhattan distance is given as:

![word-image-26834-4.png](attachment:83d434d6-25d5-4f9e-8127-e0a7afb94330.png)

![image6_11zon_c5ff6ffa5a.jpg](attachment:7bc1d1bf-4328-41d0-9faa-56ede362e5d9.jpg)

Where,

n = number of dimensions

pi, qi = data points

Now, we will calculate the Manhattan Distance between the two points. SciPy has a function called cityblock that returns the Manhattan Distance between two points.

![1_8eZ5qOin-EaAXu5mRUzkug.jpg](attachment:bd3e2015-430b-41a6-8ad0-7da5235c4368.jpg)


# Minkowski Distance in Machine Learning

Minkowski Distance calculates the distance between two points.

It is a generalization of the Euclidean and Manhattan distance measures and adds a parameter, called the “order” or “p“, that allows different distance measures to be calculated.

The Minkowski distance measure is calculated as follows:

![image4_11zon_1_d5961a7f6e.jpg](attachment:d7e805f6-cb1f-4591-ab94-dd58dbb30d9c.jpg)

Where “p” is the order parameter.

When p is set to 1, the calculation is the same as the Manhattan distance. When p is set to 2, it is the same as the Euclidean distance.

p=1: Manhattan distance.
p=2: Euclidean distance.

It is common to use Minkowski distance when implementing a machine learning algorithm that uses distance measures as it gives control over the type of distance measure used for real-valued vectors via a hyperparameter “p” that can be tuned.

the Minkowski Distance of the order 3:

![1_Hw8Ips2WaFn4Ys-2ZlszWg.jpg](attachment:63b499a1-380f-4ca6-8562-3afd5122fafd.jpg)

When the order(p) is 1, it will represent Manhattan Distance and when the order in the above formula is 2, it will represent Euclidean Distance. Let’s verify that in Python:

![1_7R1riJtuE_G8Fs3GdM_lsQ.jpg](attachment:824a02c4-76e9-4cc7-913b-c9db75775202.jpg)

Here, we can see that when the order is 1, both Minkowski and Manhattan Distance are the same.

Let’s verify the Euclidean Distance as well:

![1_rjkb1NhI4RJ9vWRtHZwqUg.jpg](attachment:29a7b0b0-0d2c-4314-af2c-2e9059a8a682.jpg)

When the order is 2, we can see that Minkowski and Euclidean distances are the same.

# Techniques to perform Feature Transformation:

▪ Normalization

▪ Standardization

▪ Log Transformation

▪ Robust Scaler

▪ Max Absolute Scaler


# Normalization

Normalization is a data preparation technique that is frequently used in machine learning.

# Min-Max Normalization

Subtract the minimum value from each column’s highest value and divide by the range. Each new column has a minimum value of 0 and a maximum value of 1.

![2023-05-22 23_44_19-Class 05.pdf and 2 more pages - Profile 1 - Microsoft​ Edge.png](attachment:b60d412c-fe9e-4526-b837-4c451b780e3e.png)

Min-Max_Normalization is the most simple straight forward approch.It scalled the ranged to [0,1] or [1,1]. for the standard deviations it effect to reduce the significance of any outliers. 

The following formula must be used in order to rescale a range that contains an arbitrary set of values:

![2023-05-23 20_14_32-🔥🔥 Min-Max Normalization. _ Kaggle.png](attachment:77b2d4ce-357b-4545-bca6-b647b1db31c2.png)

a and b are the min-max values.

In this tutorial, we will learn how to use the sklearn minmaxscaler function to carry out a Min-Max normalization on our data. It does this by applying a scale to each characteristic, bringing them all into the same range.



In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
df=pd.read_csv('../input/supershops/supershops.csv')
df.head()

Unnamed: 0,Marketing Spend,Administration,Transport,Area,Profit
0,114523.61,136897.8,471784.1,Dhaka,192261.83
1,162597.7,151377.59,443898.53,Ctg,191792.06
2,153441.51,101145.55,407934.54,Rangpur,191050.39
3,144372.41,118671.85,383199.62,Dhaka,182901.99
4,142107.34,91391.77,366168.42,Rangpur,166187.94


In [3]:
df.dtypes

Marketing Spend    float64
Administration     float64
Transport          float64
Area                object
Profit             float64
dtype: object

In [4]:
df2 = df.copy()
df3 = df.copy()
df4 = df.copy()
df5 = df.copy()
df6 = df.copy()
df7 = df.copy()

In [5]:
#dataframe.iloc[:,start_col:end_col]
x=df2.iloc[:,0:4]
y=df2.iloc[:,4:5]
x.head()

Unnamed: 0,Marketing Spend,Administration,Transport,Area
0,114523.61,136897.8,471784.1,Dhaka
1,162597.7,151377.59,443898.53,Ctg
2,153441.51,101145.55,407934.54,Rangpur
3,144372.41,118671.85,383199.62,Dhaka
4,142107.34,91391.77,366168.42,Rangpur


In [6]:
y.head()

Unnamed: 0,Profit
0,192261.83
1,191792.06
2,191050.39
3,182901.99
4,166187.94


In [7]:
from sklearn.preprocessing import MinMaxScaler
m=MinMaxScaler(feature_range=(0, 1))

# apply normalization techniques

df2[['Marketing Spend','Administration','Transport']]=m.fit_transform(df3[['Marketing Spend','Administration','Transport']])
df2.head()



Unnamed: 0,Marketing Spend,Administration,Transport,Area,Profit
0,0.692617,0.651744,1.0,Dhaka,192261.83
1,0.983359,0.761972,0.940893,Ctg,191792.06
2,0.927985,0.379579,0.864664,Rangpur,191050.39
3,0.873136,0.512998,0.812235,Dhaka,182901.99
4,0.859438,0.305328,0.776136,Rangpur,166187.94


# Standardization

Standardizing a dataset involves rescaling the distribution of values so that the mean of observed values is 0 and the standard deviation is 1. Like normalization, standardization can be useful, and even required in some machine learning algorithms when data has input values with differing scales.

A value is standardized as follows:

y = (x – mean) / standard_deviation

Where the mean is calculated as:

mean = sum(x) / count(x)
And the standard_deviation is calculated as:

standard_deviation = sqrt( sum( (x – mean)^2 ) / count(x))

We can estimate a mean of 10.0 and a standard deviation of about 5.0. Using these values, we can standardize the first value of 20.7 as follows:

y = (x – mean) / standard_deviation

y = (20.7 – 10) / 5

y = (10.7) / 5

y = 2.14

![2023-05-23 23_11_28-Class 05.pdf and 2 more pages - Profile 1 - Microsoft​ Edge.png](attachment:25116c58-8a99-4bb8-a804-7c314a2580ae.png)


![2023-05-23 23_10_02-Class 05.pdf and 2 more pages - Profile 1 - Microsoft​ Edge.png](attachment:b23c75a6-c0f1-48da-aeac-6b5f9b4e7b3d.png)

In the presence of outliers, StandardScaler does not guarantee balanced feature scales, due to the influence of the outliers while computing the empirical mean and standard deviation. This leads to the shrinkage in the range of the feature values. 
By using RobustScaler(), we can remove the outliers and then use either StandardScaler or MinMaxScaler for preprocessing the dataset. 

In [8]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
# transform data
df3[['Marketing Spend','Administration','Transport']]=scaler.fit_transform(df3[['Marketing Spend','Administration','Transport']])
df3.head()

Unnamed: 0,Marketing Spend,Administration,Transport,Area,Profit
0,0.897913,0.560753,2.165287,Dhaka,192261.83
1,1.95586,1.082807,1.929843,Ctg,191792.06
2,1.754364,-0.728257,1.626191,Rangpur,191050.39
3,1.554784,-0.096365,1.417348,Dhaka,182901.99
4,1.504937,-1.079919,1.27355,Rangpur,166187.94


# Log Transformation

The Natural logarithm transformation can be used to make the data more normally distributed and stabilize its variance.

The log transform can be applied as follows:
1. Check if the feature has any zero or negative values. If so, consider using a modified version of the log 
transform (e.g., adding a constant value or using the logarithm of the absolute values).
2. Add a small constant value (e.g., 5) to the feature before applying the logarithm. This is done to avoid 
taking the logarithm of zero or close-to-zero values, which would result in undefined or infinite values.
3. Apply the natural logarithm function (base e) to each value of the feature.

In [9]:
from sklearn.preprocessing import FunctionTransformer
ft = FunctionTransformer(np.log1p) #np.log1p will avoid 0

In [10]:
df4[['Marketing Spend','Administration','Transport','Profit']]=ft.fit_transform(df4[['Marketing Spend','Administration','Transport','Profit']])
df4.head()

Unnamed: 0,Marketing Spend,Administration,Transport,Area,Profit
0,11.648545,11.826997,13.064279,Dhaka,12.166619
1,11.99904,11.927539,13.003354,Ctg,12.164172
2,11.941081,11.524326,12.918864,Rangpur,12.160298
3,11.880158,11.684126,12.856314,Dhaka,12.116711
4,11.864345,11.422922,12.810851,Rangpur,12.020881


#  Max Absolute Scaler

Scale each feature by its maximum absolute value.

In simplest terms, the Max Absolute Scaler takes the absolute maximum value of each 
column and divides each value in the column by the maximum value

![2023-05-25 20_23_20-Class 05.pdf and 4 more pages - Profile 1 - Microsoft​ Edge.png](attachment:ed3578d8-c6df-41f6-9270-c2c0244c7649.png)

In [11]:
#df['column_name'].dtype != 'object'

from sklearn.preprocessing import MaxAbsScaler
max=MaxAbsScaler()
df5[['Marketing Spend','Administration','Transport']]=max.fit_transform(df5[['Marketing Spend','Administration','Transport']])
df5.head()

Unnamed: 0,Marketing Spend,Administration,Transport,Area,Profit
0,0.692617,0.749527,1.0,Dhaka,192261.83
1,0.983359,0.828805,0.940893,Ctg,191792.06
2,0.927985,0.553781,0.864664,Rangpur,191050.39
3,0.873136,0.649738,0.812235,Dhaka,182901.99
4,0.859438,0.500378,0.776136,Rangpur,166187.94


In [12]:
#for col in df5.columns:
    #if df5[col].dtype =='object':
       # continue
    #df5[col]= max.fit_transform(df5[col])  
    
for col in df5.columns:
    if df5[col].dtype == 'object':
        continue
    df5[col]= max.fit_transform(df5[[col]])
df5.head()    

Unnamed: 0,Marketing Spend,Administration,Transport,Area,Profit
0,0.692617,0.749527,1.0,Dhaka,1.0
1,0.983359,0.828805,0.940893,Ctg,0.997557
2,0.927985,0.553781,0.864664,Rangpur,0.993699
3,0.873136,0.649738,0.812235,Dhaka,0.951317
4,0.859438,0.500378,0.776136,Rangpur,0.864383


# Robust Scaler

Both standard and robust scalers transform inputs to comparable scales. The difference lies in how they scale raw input values.
Standard scaling uses mean and standard deviation. Robust scaling uses median and interquartile range (IQR) instead.

Robust scaling answers a simple question. How far is each data point from the input’s median? More precisely, it measures this distance in terms of the IQR using the below formula:
