# What is function transformer in machine learning?

# The Function Transformer is a tool in scikit-learn, a popular python library for machine learning, that allows you tpo apply a specified function to the input data.The function Transformer can be useful for performning custom transformations of input data in a machine leaning pipeline.
The FunctionTranformer takes as input a single function that will be applied to each sample in the data. This function can be python function that takesa single argument,such as a lambda function or a user defined function.The function should return the transformed sample.

In [3]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import FunctionTransformer

In [5]:
X=np.array([[1,2],[3,4]])
#define the transformation function
log_transform=FunctionTransformer(np.log1p)
#apply the Tarnsformation  to the Data sets
X_transformed=log_transform.transform(X)
#view the transformed data
print(X_transformed)

[[0.69314718 1.09861229]
 [1.38629436 1.60943791]]


# Types of function transformer in machine learning ?

# There are two types of FunctionTransformer available in scikit-learn:
FunctionTransformer - This transformer allows you to specify a single function that will be applied to the entire input data matrix. This transformer can be useful for feature scaling or feature extraction.

ColumnTransformer - This transformer allows you to specify a different function for each column or subset of columns in the input data matrix. This transformer can be useful for applying different transformations to different features in a dataset.

Both of these transformers are part of the scikit-learn library in Python and can be used in a machine learning pipeline to preprocess data before training a model.


# for which condition I have to use function transformer in machine learning ?

# We might consider using a FunctionTransformer in a machine learning pipeline in the following situations:
Custom feature engineering: If you want to engineer new features using a custom function, you can use a FunctionTransformer to apply the function to the input data matrix and create new features based on the output.

Scaling and normalization: If you want to scale or normalize the input data matrix in a custom way, you can use a FunctionTransformer to apply a custom scaling or normalization function.

Data cleaning: If you want to clean the input data matrix by removing outliers, imputing missing values, or replacing certain values, you can use a FunctionTransformer to apply a custom cleaning function.

Dimensionality reduction: If you want to reduce the dimensionality of the input data matrix by selecting a subset of features or by applying a dimensionality reduction technique such as PCA, you can use a FunctionTransformer to apply the custom function.

In general, a FunctionTransformer can be useful for any situation in which you want to apply a custom function to the input data matrix before training a machine learning model.

In [14]:
#Practical usecases
#1. Custom Feature Engineering

from sklearn.preprocessing import FunctionTransformer
import numpy as np
X=np.array([[1,2],[3,4]])

#define a custom feature engineering function
def my_feature_engineering(X):
    return np.hstack((X,X**2))
#Create a functiontransformer to apply the custom function
custom_transformer=FunctionTransformer(my_feature_engineering)
#apply the transformer to the input data
X_transformed=custom_transformer.transform(X)
print(X_transformed)

[[ 1  2  1  4]
 [ 3  4  9 16]]


In [17]:
#2. Scaling and Normalization
from sklearn.preprocessing import FunctionTransformer
import numpy as np
X=np.array([[1,2],[3,4]])

#define a custom feature engineering function
def my_scaling(X):
    return X / np.max(X)
#Create a functiontransformer to apply the scaling function
custom_transformer=FunctionTransformer(my_scaling)
#apply the transformer to the input data
X_transformed=custom_transformer.transform(X)
print(X_transformed)

[[0.25 0.5 ]
 [0.75 1.  ]]


In [18]:
#3. Data Cleaning
from sklearn.preprocessing import FunctionTransformer
import numpy as np
X=np.array([[1,2],[3,np.nan]])

#define a custom feature cleaning function
def my_cleaning(X):
    X[np.isnan(X)]=0
    return X 
#Create a functiontransformer to apply the custom function
custom_transformer=FunctionTransformer(my_cleaning)
#apply the transformer to the input data
X_transformed=custom_transformer.transform(X)
print(X_transformed)

[[1. 2.]
 [3. 0.]]


# FunctionTransformer on covid_toy

In [20]:
df=pd.read_csv("C:\\Users\\user\\Datasets\\covid_toy.csv")

In [22]:
df.head()

Unnamed: 0,age,gender,fever,cough,city,has_covid
0,60,Male,103.0,Mild,Kolkata,No
1,27,Male,100.0,Mild,Delhi,Yes
2,42,Male,101.0,Mild,Delhi,No
3,31,Female,98.0,Mild,Kolkata,No
4,65,Female,101.0,Mild,Mumbai,No


In [23]:
from sklearn.preprocessing import LabelEncoder

In [25]:
lb=LabelEncoder()

In [26]:
df['gender']=lb.fit_transform(df['gender'])
df['cough']=lb.fit_transform(df['cough'])
df['city']=lb.fit_transform(df['city'])
df['has_covid']=lb.fit_transform(df['has_covid'])

In [27]:
df.head()

Unnamed: 0,age,gender,fever,cough,city,has_covid
0,60,1,103.0,0,2,0
1,27,1,100.0,0,1,1
2,42,1,101.0,0,1,0
3,31,0,98.0,0,2,0
4,65,0,101.0,0,3,0


In [29]:
from sklearn.preprocessing import FunctionTransformer
trf=FunctionTransformer(func=np.log1p)

In [31]:
new_df=trf.fit_transform(df)

In [32]:
new_df.head()

Unnamed: 0,age,gender,fever,cough,city,has_covid
0,4.110874,0.693147,4.644391,0.0,1.098612,0.0
1,3.332205,0.693147,4.615121,0.0,0.693147,0.693147
2,3.7612,0.693147,4.624973,0.0,0.693147,0.0
3,3.465736,0.0,4.59512,0.0,1.098612,0.0
4,4.189655,0.0,4.624973,0.0,1.386294,0.0


# Tips

In [35]:
df=pd.read_csv("C:\\Users\\user\\Datasets\\tips.csv")
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [34]:
lb=LabelEncoder()

In [37]:
df['sex']=lb.fit_transform(df['sex'])
df['smoker']=lb.fit_transform(df['smoker'])
df['day']=lb.fit_transform(df['day'])
df['time']=lb.fit_transform(df['time'])

In [38]:
from sklearn.preprocessing import FunctionTransformer
trf=FunctionTransformer(func=np.log1p) #log function

In [39]:
new_df=trf.fit_transform(df)
new_df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,2.889816,0.698135,0.0,0.0,1.098612,0.0,1.098612
1,2.428336,0.978326,0.693147,0.0,1.098612,0.0,1.386294
2,3.091497,1.504077,0.693147,0.0,1.098612,0.0,1.386294
3,3.205993,1.460938,0.693147,0.0,1.098612,0.0,1.098612
4,3.242202,1.528228,0.0,0.0,1.098612,0.0,1.609438


# Assignment ==> Click

In [42]:
df=pd.read_csv("C:\\Users\\user\\Datasets\\click.csv")
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 10 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Daily Time Spent on Site  10000 non-null  float64
 1   Age                       10000 non-null  float64
 2   Area Income               10000 non-null  float64
 3   Daily Internet Usage      10000 non-null  float64
 4   Ad Topic Line             10000 non-null  object 
 5   City                      10000 non-null  object 
 6   Gender                    10000 non-null  object 
 7   Country                   10000 non-null  object 
 8   Timestamp                 10000 non-null  object 
 9   Clicked on Ad             10000 non-null  int64  
dtypes: float64(4), int64(1), object(5)
memory usage: 781.4+ KB


In [43]:
df.head()

Unnamed: 0,Daily Time Spent on Site,Age,Area Income,Daily Internet Usage,Ad Topic Line,City,Gender,Country,Timestamp,Clicked on Ad
0,62.26,32.0,69481.85,172.83,Decentralized real-time circuit,Lisafort,Male,Svalbard & Jan Mayen Islands,2016-06-09 21:43:05,0
1,41.73,31.0,61840.26,207.17,Optional full-range projection,West Angelabury,Male,Singapore,2016-01-16 17:56:05,0
2,44.4,30.0,57877.15,172.83,Total 5thgeneration standardization,Reyesfurt,Female,Guadeloupe,2016-06-29 10:50:45,0
3,59.88,28.0,56180.93,207.17,Balanced empowering success,New Michael,Female,Zambia,2016-06-21 14:32:32,0
4,49.21,30.0,54324.73,201.58,Total 5thgeneration standardization,West Richard,Female,Qatar,2016-07-21 10:54:35,1


In [45]:
df=df.drop(columns=['Timestamp'],axis=1)

In [46]:
df.head()

Unnamed: 0,Daily Time Spent on Site,Age,Area Income,Daily Internet Usage,Ad Topic Line,City,Gender,Country,Clicked on Ad
0,62.26,32.0,69481.85,172.83,Decentralized real-time circuit,Lisafort,Male,Svalbard & Jan Mayen Islands,0
1,41.73,31.0,61840.26,207.17,Optional full-range projection,West Angelabury,Male,Singapore,0
2,44.4,30.0,57877.15,172.83,Total 5thgeneration standardization,Reyesfurt,Female,Guadeloupe,0
3,59.88,28.0,56180.93,207.17,Balanced empowering success,New Michael,Female,Zambia,0
4,49.21,30.0,54324.73,201.58,Total 5thgeneration standardization,West Richard,Female,Qatar,1


In [55]:
df['Ad Topic Line']=lb.fit_transform(df['Ad Topic Line'])
df['City']=lb.fit_transform(df['City'])
df['Gender']=lb.fit_transform(df['Gender'])
df['Country']=lb.fit_transform(df['Country'])
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 10 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Daily Time Spent on Site  10000 non-null  float64
 1   Age                       10000 non-null  float64
 2   Area Income               10000 non-null  float64
 3   Daily Internet Usage      10000 non-null  float64
 4   Ad Topic Line             10000 non-null  int32  
 5   City                      10000 non-null  int64  
 6   Gender                    10000 non-null  int64  
 7   Country                   10000 non-null  int64  
 8   Clicked on Ad             10000 non-null  int64  
 9   Ad Tpoic Line             10000 non-null  int32  
dtypes: float64(4), int32(2), int64(4)
memory usage: 703.3 KB


In [56]:
from sklearn.preprocessing import FunctionTransformer
trf=FunctionTransformer(func=np.log1p) #log function
new_df=trf.fit_transform(df)
new_df.head()

Unnamed: 0,Daily Time Spent on Site,Age,Area Income,Daily Internet Usage,Ad Topic Line,City,Gender,Country,Clicked on Ad,Ad Tpoic Line
0,4.147253,3.496508,11.148835,5.158078,4.574711,5.459586,0.693147,5.164786,0.0,4.574711
1,3.754901,3.465736,11.032326,5.338355,5.710427,6.133398,0.693147,5.117994,0.0,5.710427
2,3.815512,3.433987,10.966095,5.158078,6.184149,5.940171,0.0,4.276666,0.0,6.184149
3,4.108905,3.367296,10.93635,5.338355,3.218876,5.598422,0.0,5.327876,0.0,3.218876
4,3.916214,3.433987,10.902753,5.311135,6.184149,6.206576,0.0,5.010635,0.693147,6.184149


# Real Life Use-Case of Function Transformer
There are many real-life use cases where FunctionTransformer can be useful in machine learning pipelines. Here are a few examples:

Image processing: In computer vision applications, FunctionTransformer can be used to apply custom functions to preprocess image data. For example, a custom function can be used to resize images, change the color balance, or apply filters to improve image quality.

Natural language processing: In NLP applications, FunctionTransformer can be used to preprocess text data by applying custom functions to perform tasks such as tokenization, stemming, or removing stop words.

Financial modeling: In finance, FunctionTransformer can be used to preprocess financial data by applying custom functions to transform the data, such as scaling stock prices, normalizing financial ratios, or imputing missing values.

Audio signal processing: In speech recognition or music analysis applications, FunctionTransformer can be used to preprocess audio data by applying custom functions to perform tasks such as filtering noise, extracting features such as MFCCs (Mel-frequency cepstral coefficients), or resampling the audio signal.

Sensor data processing: In Internet of Things (IoT) applications, FunctionTransformer can be used to preprocess sensor data by applying custom functions to remove outliers, impute missing values, or rescale sensor readings.