<a href="https://colab.research.google.com/github/mukeshyadav4747/ML/blob/main/One_Hot_Encoding_and_column_Transformer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Column Transformer


In [None]:


import numpy as np
import pandas as pd

from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import OrdinalEncoder

In [None]:
df = pd.read_csv("/content/covid_toy.csv")
df

Unnamed: 0,age,gender,fever,cough,city,has_covid
0,60,Male,103.0,Mild,Kolkata,No
1,27,Male,100.0,Mild,Delhi,Yes
2,42,Male,101.0,Mild,Delhi,No
3,31,Female,98.0,Mild,Kolkata,No
4,65,Female,101.0,Mild,Mumbai,No
...,...,...,...,...,...,...
95,12,Female,104.0,Mild,Bangalore,No
96,51,Female,101.0,Strong,Kolkata,Yes
97,20,Female,101.0,Mild,Bangalore,No
98,5,Female,98.0,Strong,Mumbai,No


In [None]:
df.isnull().sum()

Unnamed: 0,0
age,0
gender,0
fever,10
cough,0
city,0
has_covid,0


In [None]:
from sklearn.model_selection import train_test_split

In [None]:
x_train, x_test, y_train, y_test = train_test_split(df.drop(columns = ['has_covid']),df['has_covid'],test_size=0.2)

In [None]:
x_train

Unnamed: 0,age,gender,fever,cough,city
73,34,Male,98.0,Strong,Kolkata
3,31,Female,98.0,Mild,Kolkata
63,10,Male,100.0,Mild,Bangalore
92,82,Female,102.0,Strong,Kolkata
24,13,Female,100.0,Strong,Kolkata
...,...,...,...,...,...
25,23,Male,,Mild,Mumbai
97,20,Female,101.0,Mild,Bangalore
5,84,Female,,Mild,Bangalore
77,8,Female,101.0,Mild,Kolkata


In [None]:
## Manually type output

# adding simple imputer to fever column
si = SimpleImputer(strategy ='mean')
x_train_fever = si.fit_transform(x_train[['fever']])

# also the test data
x_test_fever = si.fit_transform(x_test[['fever']])

x_train_fever.shape

(80, 1)

In [None]:
# Ordinal Encoding ---> cough

oe = OrdinalEncoder(categories =[['Mild','Strong']])
x_train_cough = oe.fit_transform(x_train[['cough']])

 # also the test data
x_test_cough = oe.fit_transform(x_test[['cough']])
x_train_cough.shape

(80, 1)

In [None]:
## OneHotEncoding -----> Gender , city

ohe = OneHotEncoder(drop = 'first', sparse_output = False)
x_train_gender_city = ohe.fit_transform(x_train[['gender','city']])

# also the test data
x_test_gender_city = ohe.fit_transform(x_test[['gender','city']])

x_train_gender_city.shape

(80, 4)

In [None]:
# Extracting Age

x_train_age = x_train.drop(columns= ['gender','fever','cough','city']).values

# also the test data
x_test_age = x_test.drop(columns = ['gender','fever','cough', 'city']).values

In [None]:
x_train_age.shape

(80, 1)

In [None]:
x_train_transformed = np.concatenate((x_train_age, x_train_fever, x_train_gender_city,x_train_cough),axis = 1)

In [None]:
x_train_transformed.shape

(80, 7)

In [None]:
## By the help of column Transformer

from sklearn.compose import ColumnTransformer    # this is how to import columnTransformer

transformer = ColumnTransformer(transformers=[('tnf1',SimpleImputer(),['fever']),   ##in a fever column by the help of SI we fill missing values by mean, median, mode
                                              ('tnf2',OrdinalEncoder(categories=[['Mild','Strong']]),['cough']),  # by this process we encode our data.
                                              ('tnf3',OneHotEncoder(sparse_output = False, drop='first'),['gender','city'])],
                                remainder = 'passthrough')  ## remainder = passthrough ===> it means rest all the columns remain same.


In [None]:
transformer.fit_transform(x_train).shape

(80, 7)

In [None]:
transformer.transform(x_test).shape

(20, 7)

## FUNCTION TRANSFORMER

The function transformer is a tool in scikit-learn, a popular python library for machine learning, that allows you to apply a specified function to the input data. The function transformer can be useful for performing custom transformations of input in a machine learning pipeline.

The function transformer takes as input a single function that will be apllied to each sample in the data. This function can be any python function that takes a single argument. Such as a lambda function or a user- defined function. The function should return the transformed sample.

In [None]:
from sklearn.preprocessing import FunctionTransformer
import numpy as np

# create a dataset
X = np.array([[1,2],[3,4]])

# define the transformation function
log_transform = FunctionTransformer(np.log1p)

# apply the transformation to the dataset
x_transformed= log_transform.transform(X)

# view the transformed data
print(x_transformed)

[[0.69314718 1.09861229]
 [1.38629436 1.60943791]]


# Types of function transformer in machine learning.

There are two types of FunctionTransformer available in scikit-learn:
1. Function Transformer- This transformer allows you to specify a single function that will be applied to the entire input data matrix. This transformer can be useful for feature scaling or feature extraction.

2. column transformer- this transformer allows you to specify a function a different functions for each column or subset of columns in the input data matrix. this transformer can be useful for applying different transformer to different features in a dataset.


# For which condition I have to use function transformer in machine learning?

"We might consider using a Function Transformer in a machine learning pipeline in the following situations:

Custom feature engineering: If you want to engineer new features using a custom function, you can use a Function Transformer to apply the function to the

input data matrix and create new features based on the output.

Scaling and normalization: If you want to scale or normalize the input data matrix in a custom way, you can use a Function Transformer to apply a custom scaling or normalization function.

Data cleaning: If you want to clean the input data matrix by removing outliers, imputing missing values, or replacing certain values, you can use a Function Transformer to apply a custom cleaning function.

Dimensionality reduction: If you want to reduce the dimensionality of the input data matrix by selecting a subset of features or by applying a dimensionality
reduction technique such as PCA, you can use a Function Transformer to apply the custom function.

In general, a Function Transformer can be useful for any situation in which you want to apply a custom function to the input data matnx before training a machine learning model.


In [None]:
# Practical usecases

# 1. custum feature engineering

from sklearn.preprocessing import FunctionTransformer
import numpy as np

# create a dataset
X = np.array([[1,2],[3,4]])

# define a custom feature engineering function
def custom_engineering(X):
  return np.hstack((X,X**2))

# create a function Transformer to apply the custom function
custom_transformer = FunctionTransformer(custom_engineering)

# apply the transformer to the input data
X_transformed = custom_transformer.transform(X)

# view the transformed data
print(X_transformed)


[[ 1  2  1  4]
 [ 3  4  9 16]]


# Scaling and Normalization

In [None]:
from sklearn.preprocessing import FunctionTransformer
import numpy as np

# create a dataset
X = np.array([[1,2],[3,4]])

# define a custom scaling function
def my_scaling(X):
  return X/np.max(X)

# create a function transformer to apply the custom function
custom_transformer = FunctionTransformer(my_scaling)

# apply the transformer to the input data
X_transformed = custom_transformer.transform(X)

# view the transformed data
print(X_transformed)


[[0.25 0.5 ]
 [0.75 1.  ]]


Data Cleaning


In [None]:
from sklearn.preprocessing import FunctionTransformer
import numpy as np

# create a dataset with missing values
X = np.array([[1,2],[3,np.nan]])

# define a custom cleaning function
def my_cleaning(X):
  X[np.isnan(X)]= 0
  return X

custom_transformer = FunctionTransformer(my_cleaning)

X_transformed = custom_transformer.transform(X)

print(X_transformed)

[[1. 2.]
 [3. 0.]]


# Real Life Use-Case of Function Transformer

There are many real-life use cases where Function Transformer can be useful in machine learning pipelines. Here are a few examples:

1. Image processing: In computer vision applications, Function Transformer can be used to apply custom functions to preprocess image data. For example, a custom function can be used to resize images, change the color balance, or apply filters to improve image quality.

2. Natural language processing: In NLP applications, Function Transformer can be used to preprocess text data by applying custom functions to perform tasks such as tokenization, stemming, or removing stop words.

3. Financial modeling: In finance, Function Transformer can be used to preprocess financial data by applying custom functions to transform the data, such as scaling stock prices, normalizing financial ratios, or imputing missing values.

4. Audio signal processing: In speech recognition or music analysis applications, Function Transformer can be used to preprocess audio data by applying custom functions to perform tasks such as filtering noise, extracting features such as MFCCS (Mel-frequency cepstral coefficients), or resampling the audio signal.

5. Sensor data processing: In Internet of Things (IoT) applications, Function Transformer can be used to preprocess sensor data by applying custom functions to remove outliers, impute missing values, or rescale sensor readings.

In [None]:
import numpy as np
import pandas as pd


In [None]:
df = pd.read_csv("/content/placement.csv")
df.head(3)

Unnamed: 0,cgpa,placement_exam_marks,placed
0,7.19,26.0,1
1,7.46,38.0,1
2,7.54,40.0,1


In [None]:
x= df.drop(columns = ['placed'])
y = df['placed']

In [None]:
from sklearn.preprocessing import FunctionTransformer

In [None]:
log_transform = FunctionTransformer(np.log1p)

# apply the transformation to the dataset
x_transformed = log_transform.transform(x)
x_transformed

Unnamed: 0,cgpa,placement_exam_marks
0,2.102914,3.295837
1,2.135349,3.663562
2,2.144761,3.713572
3,2.004179,2.197225
4,2.107786,2.890372
...,...,...
995,2.289500,3.806662
996,2.314514,4.189655
997,1.773256,3.555348
998,2.263844,3.850148
