<a href="https://colab.research.google.com/github/mostafizur1997/Machine-Learning-Project/blob/main/Feature_transformation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Feature transformation
Feature transformation is used in machine learning and data analysis to improve the performance of models by modifying or creating new data from existing data.

Advantage of feature extraction:
* Makes training faster
* prevents from getting stuck in local optima
* give a better error surface shape

Feature scaling matters on K-Means, K-Nearest Neighbours, Principal Component Analysis, Gradient Descent.

But Naive Bayes, Decision Tree, Random Forest and all tree based models not affected by feature scaling.


In [1]:
#import library
import pandas as pd

In [2]:
#read the data
df= pd.read_csv('/content/supershops.csv')

In [3]:
#check the data
df.head()

Unnamed: 0,Marketing Spend,Administration,Transport,Area,Profit
0,114523.61,136897.8,471784.1,Dhaka,192261.83
1,162597.7,151377.59,443898.53,Ctg,191792.06
2,153441.51,101145.55,407934.54,Rangpur,191050.39
3,144372.41,118671.85,383199.62,Dhaka,182901.99
4,142107.34,91391.77,366168.42,Rangpur,166187.94


In [4]:
df.shape

(50, 5)

In [5]:
#copy the data
df1= df.copy()
df2= df.copy()
df3= df.copy()
df4= df.copy()
df5= df.copy()


#Normalization
Normalization is the process of bringing data into a standardized range or format. It is commonly used in various fields, including statistics, data analysis, and machine learning.

Math Formula: x ′ = ( x − x m i n ) / ( x m a x − x m i n )

In [6]:
#import MinMaxScaler library
from sklearn.preprocessing import MinMaxScaler
ms = MinMaxScaler(feature_range =(0,1))


In [7]:
#check data head
df.head()

Unnamed: 0,Marketing Spend,Administration,Transport,Area,Profit
0,114523.61,136897.8,471784.1,Dhaka,192261.83
1,162597.7,151377.59,443898.53,Ctg,191792.06
2,153441.51,101145.55,407934.54,Rangpur,191050.39
3,144372.41,118671.85,383199.62,Dhaka,182901.99
4,142107.34,91391.77,366168.42,Rangpur,166187.94


In [8]:
# Apply transformation
df['Marketing Spend'] = ms.fit_transform(df[['Marketing Spend']])
df['Administration']=ms.fit_transform(df[['Administration']])
df['Transport']=ms.fit_transform(df[['Transport']])

In [9]:
# after fit_transform
df.head()

Unnamed: 0,Marketing Spend,Administration,Transport,Area,Profit
0,0.692617,0.651744,1.0,Dhaka,192261.83
1,0.983359,0.761972,0.940893,Ctg,191792.06
2,0.927985,0.379579,0.864664,Rangpur,191050.39
3,0.873136,0.512998,0.812235,Dhaka,182901.99
4,0.859438,0.305328,0.776136,Rangpur,166187.94


#Standardization
Standardization, in the context of statistics and data analysis, refers to the process of transforming data in a way that makes it comparable and consistent. It involves rescaling the data so that it has a specific mean and standard deviation.

Formula:
standardization:
X = (Xi - mean(X))/ standard deviation

Standard deviation = root((Xi-mean)^2/N



In [10]:
#check data 
df1.head()

Unnamed: 0,Marketing Spend,Administration,Transport,Area,Profit
0,114523.61,136897.8,471784.1,Dhaka,192261.83
1,162597.7,151377.59,443898.53,Ctg,191792.06
2,153441.51,101145.55,407934.54,Rangpur,191050.39
3,144372.41,118671.85,383199.62,Dhaka,182901.99
4,142107.34,91391.77,366168.42,Rangpur,166187.94


In [11]:
#import Standard Scaler 
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler() #scaler object
df1['Marketing Spend']=scaler.fit_transform(df1[['Marketing Spend']])


In [12]:
df1.head()

Unnamed: 0,Marketing Spend,Administration,Transport,Area,Profit
0,0.897913,136897.8,471784.1,Dhaka,192261.83
1,1.95586,151377.59,443898.53,Ctg,191792.06
2,1.754364,101145.55,407934.54,Rangpur,191050.39
3,1.554784,118671.85,383199.62,Dhaka,182901.99
4,1.504937,91391.77,366168.42,Rangpur,166187.94


In [13]:
#fit transform standscaler
df1['Administration']=scaler.fit_transform(df1[['Administration']])
df1['Transport']=scaler.fit_transform(df1[['Transport']])

In [14]:
#check head
df1.head()

Unnamed: 0,Marketing Spend,Administration,Transport,Area,Profit
0,0.897913,0.560753,2.165287,Dhaka,192261.83
1,1.95586,1.082807,1.929843,Ctg,191792.06
2,1.754364,-0.728257,1.626191,Rangpur,191050.39
3,1.554784,-0.096365,1.417348,Dhaka,182901.99
4,1.504937,-1.079919,1.27355,Rangpur,166187.94


#Mean Max absolute Scaler
The formula of MaxAbsScaler is Xi /|Xmax|. If the data has negative values, MaxAbsScaler sets the data between -1 and 1. It scales data according to the absolute maximum, so it is not suitable for outliers.


In [15]:
#import library
from sklearn.preprocessing import MaxAbsScaler

In [16]:
#check absolute scaler
MAbs = MaxAbsScaler()


In [17]:
# fit transform absolute scaler
df2['Marketing Spend']=scaler.fit_transform(df2[['Marketing Spend']])
df2['Administration']=scaler.fit_transform(df2[['Administration']])
df2['Transport']=scaler.fit_transform(df2[['Transport']])

In [18]:
#check head data
df2.head()

Unnamed: 0,Marketing Spend,Administration,Transport,Area,Profit
0,0.897913,0.560753,2.165287,Dhaka,192261.83
1,1.95586,1.082807,1.929843,Ctg,191792.06
2,1.754364,-0.728257,1.626191,Rangpur,191050.39
3,1.554784,-0.096365,1.417348,Dhaka,182901.99
4,1.504937,-1.079919,1.27355,Rangpur,166187.94


# Robust Scaler
Formula of RobustScaler is (Xi-Xmedian) / Xiqr, so it is not affected by outliers. Since it uses the interquartile range, it absorbs the effects of outliers while scaling. The interquartile range (Q3 — Q1) has half the data point.

In [19]:
#import RobustScaler library
from sklearn.preprocessing import RobustScaler
rs=RobustScaler()


In [20]:
df3['Marketing Spend']=rs.fit_transform(df3[['Marketing Spend']])
df3['Administration']=rs.fit_transform(df3[['Administration']])
df3['Transport']=rs.fit_transform(df3[['Transport']])

In [21]:
df3.head()

Unnamed: 0,Marketing Spend,Administration,Transport,Area,Profit
0,0.67253,0.345355,1.552016,Dhaka,192261.83
1,1.452113,0.697565,1.383714,Ctg,191792.06
2,1.303634,-0.52429,1.166654,Rangpur,191050.39
3,1.156567,-0.097977,1.017368,Dhaka,182901.99
4,1.119836,-0.761543,0.914576,Rangpur,166187.94
